AI firm “cut up and destroyed” millions of books

A curiously revealing story at Ars Technica (June 25, 2025):

Anthropic destroyed millions of print books to build its AI models : Company hired Google’s book-scanning chief to cut up and digitize “all the books in the world.”

On Monday, court documents revealed that AI company Anthropic spent millions of dollars physically scanning print books to build Claude, an AI assistant similar to ChatGPT. In the process, the company cut millions of print books from their bindings, scanned them into digital files, and threw away the originals solely for the purpose of training AI—details buried in a copyright ruling on fair use…

… in February 2024, the company hired Tom Turvey, the former head of partnerships for the Google Books book-scanning project, and tasked him with obtaining “all the books in the world.” …

While destructive scanning is a common practice among some book digitizing operations, Anthropic’s approach was somewhat unusual due to its documented massive scale. By contrast, the Google Books project largely used a patented non-destructive camera process to scan millions of books borrowed from libraries and later returned. ….

The article is well-worth reading for what it reveals about the insides of the AI world.  The 32-page court judgement is also interesting itself as it describes what the AI company did, and why.  Anthropic made a billion dollars this way.

For AI systems (“large language models”) to work, they have to be populated with high quality text.  Unfortunately that all belongs to other people, publishers and the like, who have lawyers.  So one way around this is to buy a physical copy of a book, and then store it inside your computer in digital form.

This trick is perfectly legal, or so a court has just ruled.  Why? because they legally purchased them, destroyed each copy after use, and kept the digital files internally rather than distributing them.

Buying used physical books sidestepped licensing entirely while providing the high-quality, professionally edited text that AI models need, and destructive scanning was simply the fastest way to digitize millions of volumes. The company spent “many millions of dollars” on this buying and scanning operation, often purchasing used books in bulk. Next, they stripped books from bindings, cut pages to workable dimensions, scanned them as stacks of pages into PDFs with machine-readable text including covers, then discarded all the paper originals.

The court documents don’t indicate that any rare books were destroyed in this process—Anthropic purchased its books in bulk from major retailers—but archivists long ago established other ways to extract information from paper. For example, The Internet Archive pioneered non-destructive book scanning methods that preserve physical volumes while creating digital copies. And earlier this month, OpenAI and Microsoft announced they’re working with Harvard’s libraries to train AI models on nearly 1 million public domain books dating back to the 15th century—fully digitized but preserved to live another day.

While Harvard carefully preserves 600-year-old manuscripts for AI training, somewhere on Earth sits the discarded remains of millions of books that taught Claude how to juice up your résumé.

I think most of us will feel somewhat appalled at this treatment of books.  Clearly the development of AI is straining the US copyright regime.

Share

Happy St Botolph’s Day! English Translation of the Epitome in the Schleswig Breviary

June 17 is the day on which St Botolph is commemorated in the Roman calendar, so Happy St Botolph’s day to you all.

In honour of the day, I thought that I would post an English translation of the abbreviated “Life”, found in the printed Schleswig Breviary of 1512 (Breviarium Slesvicense).  It’s the latest of the late-medieval abbreviations of the “Life”.  I’ve put a Word .docx version at the end.

    *    *    *    *

Epitome of the Life of St Botolph, from the Schleswig Breviary[1]

1.     After the faith of Our Lord Jesus Christ became well-known throughout the world, there was a man worthy in God, named Botolph, descended from the noble lineage of the kings of Scotland, who, when he was pressed to accept the throne after the death of his father,[2] for the love of God not only relinquished the throne, but also his homeland, and journeyed to England. There, he was received with reverence by Edmund, King of England, and not long after, by the command of the same king,[3] he was raised to holy orders.

2.     But when he had stayed with the same king for seven years, he petitioned him to grant him a place where he might more freely serve the Lord.[4] The king assigned him a most beautiful place, surrounded on all sides by the streams of a certain river.  There he built a church to the honour of God, and began through divine grace to become well-known for many miracles.  Now while the man of God was staying there with his disciple, one day a poor man knocked at the door, begging for alms in the name of God.

3.    When the holy Father ordered the disciple to give him something, he replied that he had nothing for all their[5] food, except a single loaf of bread: which he ordered to be divided into four parts, and one of them to be given to the poor man.  Then what?  When three other poor men came, he distributed the three remaining pieces.  When the disciple therefore murmured about this, the holy man said, “Do not be troubled, my son, for God is able to give it all back to us again.”  Hardly had he finished his words, and behold: four little boats loaded with food and drink were being drawn along the aforementioned river, which Almighty God, through His faithful ones, provided for the holy man.

4.    But one day, when he was visited by the aforementioned king, he petitioned for another place to live, because in the first site he was exceedingly pestered by unclean spirits. The king, granting his request, gave him a more suitable place on the River Thames;[6] in which place the man of God built a church in honour of St. Martin.  Then, staying in the same place, he began to raise hens, which an eagle from a nearby forest used to come and carry off. But one day, when it had carried off a cockerel, the man of God rebuked[7] it, and it immediately came and placed the cockerel alive at his feet, and then fell down dead.

5.    After thirteen years had passed in that place, the ancient enemy[8] came in the form of a snake and inflicted a nasty bite on the man of God. Because of this, he again approached the king to give him another place; who led him far from the sea, into a vast wilderness: where, as he proceeded through thorny places, he came to a certain valley, which had a small stream of water; and the man of God said, “This is the place.”[9]  And so in that place given to him by the king, he built two churches, in honour of the apostles Peter and Paul. When these were completed, he went abroad[10] to Rome for the purpose of prayer, to visit the shrines[11] of those same most blessed apostles.

6.    Returning from there and bringing with him many relics of the saints, before entering his own cell, he restored sight to a blind girl through his prayers. King Edmund, hearing of the return of the holy man, met him with great joy, and stayed with him for three days.  After these things, Botolph, the man of God, passed over to the Lord. His disciples honorably committed his body for burial.  Many miracles happen at his tomb, by the grace of our Lord Jesus Christ, to whom be honour and glory forever. Amen.


[1] The Schleswig Breviary is a service book printed in Paris in 1512 at the order of Gotteschalk von Ahlefldt, the last Catholic bishop.  Two copies are held in the Danish Royal Library in Copenhagen.  This text was reprinted in the Acta Sanctorum, with notes by D. Papebroch, which are translated below, prefixed by a, b, c etc.  This translation and other notes by Roger Pearse, 2025, and improved by comparison with the unpublished translation of D. G. Dalziel, kindly made available to me by Denis Pepper of the the Society of St Botolph.

[2] a. It seems that this was Eugenius IV, who died in the year 620; nor was the kingdom offered immediately to Botulph, but only after the princes and people were no longer able to tolerate the crimes of his successor Ferquard: so great that it was decided to throw him into prison, in which he later died, say around the year 624. But when Botulph fled, the administration passed to another of the brothers, Donald, who then reigned after Ferquard’s death until the year 646. (See Wikipedia article on Legendary Kings of Scotland – RP)

[3] b. Or rather, the Christian mother of the still pagan king, who took him as her chaplain, and as an instructor in the pious education of her daughters.

[4] c. In order to obtain this more conveniently, I believe he had first persuaded the Queen to send her daughters to one of the Frankish monasteries.

[5] SB actually has “eorum”; but strangely the AASS copy has “corporis,” which would make this “he had nothing for all the food of the body.”

[6] d. This confirms what I have said, that Edmund ruled in Surrey on the right bank of the Thames, and that it was a part of Southern England. Perhaps also the saint was moved to leave the court because he saw that he was wasting his time in trying to lead the king to faith.

[7] Cf. Mark 4:39.

[8] Satan.

[9] [e] Thus far, that is, up to around the year 644, Botulph had lived as a hermit, when it seemed divinely inspired to him to cross over into Gaul, there to be trained in monastic discipline (though this is here omitted) and to visit various monasteries, especially staying at the one where his spiritual daughters, the sisters of the King, resided, who had taken monastic vows. And so he will first have returned around the year 654, advanced in age and now fitted to establish and promote monastic discipline among the South Angles; and from this point begins that opening part of the earlier “Life,” which alone we approve, as written by a near-contemporary.

[10] [f] I would think that this happened after the year 660, suppposing that the saint returned while Edmund was still alive; who (unless the South Angles had different kings from the East Saxons, for which there is no evidence) received as his successor about that year Edelwalch, baptized in 661 (as Alford believes). At that time St. Vitalianus was the Pope of the Roman Church. (This refers to Fr. Michael Alford S.J. (1587-1652), Fides Regia Britannica, sive Annales Ecclesiae Britannicae, Liege (1663).  – RP)

[11] “limina”, lit. “thresholds”, but indicating the tombs and basilicas – Niemeyer, “Mediae Latinitas Lexicon Minus.”

Downloads:  (Update: I have added in the Latin)

Share

The Schleswig Breviary (Breviarium Slesvicense)

The Duchy of Schleswig is the most northern district of Germany, and since 1920 has been divided between Denmark and Germany.  In 1510 a man with the interesting name of Gottschalk von Ahlefeldt (1475-1541) became bishop of Schleswig.  The Ahlefeldt family were originally of the Danish nobility, but by this time was settled in Germany.  Ahledfeldt seems to have been a clever and competent man, who set about restoring his bankrupt diocese, even mortgaging part of his own income to satisfy the creditors.  Sadly all his efforts were swept away by the rise of Lutheranism in the 1520s, which offered both moral and financial incentives to the local nobility to convert, and he was the last Catholic bishop.  His biography in Danish records that, shortly before his death, he advised the nobility of Holstein not to “lightly let the old doctrine go.”

Soon after his election, in 1512, he commissioned the creation of new service books for his diocese.  Two of these, a Liber Agendarum, and a Breviarium, were printed in Paris that year.  Two copies of the Breviarium Slesvicense are held in the Danish Royal Library in Copenhagen (KB København, LN 033 8° copy 1, and copy 2), and catalogued on the Hungarian Usuarium liturgical texts site, here and here.

Here’s the title page of the Breviarium Slesvicense, from KB København, LN 033 8° copy 2:

A single page introduction explains why the work was commissioned.

I.e.

Reuerendus in Christo pater et dominus: dominus Godschalcus de Ahleuelde: dei et apostolice sedis gratia episcopus ecclesiae Sleszuicensis.  Attendens in sua diocesi librorum breviariorum paucitatem: et ex hoc clericis iuxta ordinarium dicte diocesis horas canonicas legere debentibus oriri turbationem et defectum. Quibus pastorali cura inederi cupiens hec breviaria sanctorum ordinarium prefate sue ecclesie et diocesis correcta et impressa auctoritate ordinaria approbauit et confirmauit.  Ac omnibus et singulis Christi fidelibus confessis et contritis ex eisdem libris horas canonicas communiter aut diuisim deo omnipotenti per suam diocesim rite quantum poterint persoluentibus totiens quotiens de omnipotentis dei misericordia: ac beatorum petri et pauli apostolorum eius auctoritate confisus quadraginta dies indulgeniarum de iniunctis ipsis et cuilibet ipsorum penitentiis misericorditer in domino relaxavit. Anno domini Mdillensimo quingenesimo duodecimo.

The Reverend Father and Lord in Christ, Lord Godschalk of Ahlefeldt, by the grace of God and the Apostolic See, Bishop of the Church of Schleswig, observing the scarcity of breviary books in his diocese and the resulting confusion and deficiency among the clergy who are obliged to recite the canonical hours according to the ordinate of the said diocese, desiring to provide for these matters with pastoral care, has approved and confirmed, by his ordinary authority, these corrected and printed breviaries of the saints according to the ordinate of his aforesaid church and diocese. Moreover, trusting in the authority of Almighty God and the blessed apostles Peter and Paul, he has mercifully granted in the Lord, to each and every one of Christ’s faithful who, being confessed and contrite, duly recite the canonical hours either together or separately from these books throughout his diocese as best they can, forty days of indulgence from the penances enjoined upon them and upon each of them, as often as they do so. In the year of our Lord one thousand five hundred and twelve. (DeepSeek)

The volume ends with a lengthy colophon.

This tells us who did the work of compiling it:

Expresis venerabilis virorum dominorum et magistrorum Johannis tetens sacre theologie baccalarii formati lectoris ordinarii: ac Andree Frederici prepositi Wyda i dicta ecclesia canonicorum ibidem necnon providi wesseli goltsme des incole husemen. Cura per vigili domini Seszeconis beszeconis presbyteri medullitus prospectu, ac per venerabilis viros et magistros wilhelmum mercator et Thomas Kees civem in urbe Parisiensi.

Which DeepSeek, slightly cleaned up, renders as:

Produced by the venerable men, the lords and masters, Johannes Tetens, Bachelor of Sacred Theology,[1] and ordinary reader; and Andreas Fredericus, provost of Wida and canon of the said church there, as well as the prudent Wessel Goltsme, resident of Husemen. Carefully overseen with deep insight by the vigilant lord Seszeconis Beszeconis, priest, and by the venerable men and masters, Wilhelm Mercator and Thomas Kees, citizens in the city of Paris.

Guilliemus Marchand and Thomas Kees were the printers.  The work was completed on 16 July 1512.

There is a useful table of contents on the page for copy 1 here, and part of the Breviarium is the “sanctoral offices.”  Each office includes an abbreviated life of the saint.

On folio 347 of copy 1, or 344 of copy 2 (page 704 of the PDFs in both cases) begins the office of St Botolph, and the “Life” is over the page, broken up into 6 readings or lectiones.  This “Life” was copied into the Acta Sanctorum, not very accurately, and is assigned the reference number BHL 1430.

But more about this in the next post.

Share
  1. [1]baccalaureus formatus is apparently an academic rank: see here.

Deciphering a South Arabian script – the Dhofari alphabet

Out in the deserts of southern Arabia, there are a lot of rocks, and a lot of those rocks have inscriptions painted on them, or inscribed into them.  It seems that not all of these scripts are understood.  I came across an article on Academia by Ahmad al-Jallad, of Ohio State University, on the deciphering of one of them, known as the Dhofari alphabet.

It seems that some rock art found at Duqm, east of Dhofar, in South-Central Oman, in 2022-3 proved to include a text, a snake-like series of letters:

This was misidentified by the original discoverers, but Al-Jallad writes:

…the text is clearly in a variant of the Dhofari alphabet, and its glyph shapes more closely correspond to King’s script 1 classification. The text consists of seven units separated by word dividers. None of the glyphs repeat. I, therefore, submit that we are dealing with an abecedary following the South Arabian halḥam order, and that this text provides our first real key into the glyph-phoneme values for the Dhofari script.

It seems that South Semitic languages have a canonical order of letters, just as we have “a – b – c – d…” etc, and so mapping this inscription to this order gives the meaning of each glyph.  Many of the shapes are clearly related to known forms of the letters.

Dr Al-J. adds:

A primary reason scholars struggled to interpret Dhofari inscriptions as an early form of Modern South Arabian languages was the use of the word bn for ‘son.’ But with the correct understanding of the script, it is clear that the sequence XX should be understood as br and therefore is compatible with the Modern South Arabian Language family.

I don’t suppose that most of us know anything about Arabian language inscriptions, but the discovery is interesting for how the author went about it.

Share

Interesting work on searching Migne for themes at scholarios.graeca.org

I’ve had an email from Evangelos Varthis, telling me about his project at Ionian University.  It’s still very experimental, but there is some very interesting thinking going on here.  Basically he’s making the Greek text of the PG available, in image, and in electronic text, plus a simple way to get an AI translation of it alongside.

Here’s what he says:

I am mainly involved in presenting information about PG Migne and I personally appreciate and understand the value of these texts….

Experimentally, I and others have uploaded a list of patristic texts from various sources, mainly to see how Artificial Intelligence translation can help.

The Greek texts have a decent translation into Greek (I understand Greek and English), although manual editing is required in various places for greater clarity. Here I would say that even human translated material has a degree of ambiguity.

If you have time, visit the following website, i would appreciate any feedback.
https://scholarios.graeca.org/pgworks/

also (select greek text and right click to translate)
https://scholarios.graeca.org/public/pgfront/index.html?vol=1&page=0001

The first link takes you to a list of authors and works.

Clicking on the first of these gives a list of languages, and clicking English gives you this:

However I notice that the AI translation has omitted the title and first sentence, so perhaps a bug there.  All the same, this works fine.

The second link takes you to a presentation of the volumes with parallel transcription, and again an AI translation option.  This is potentially really useful.  Unfortunately there is some work to do here: the only way to change page is to change the URL manually – not a problem – and right-clicking on the text brings up a menu, which, instead of calling the AI translation, prompts for the text to translate!  I’m sure that this did work, but AI can be tricky like that, and changes what response it gives without warning.

All the same, this will be a very useful thing to have when they’ve got a bit further down the line with it.  Well done guys!

Share

The First Hymn: Resurrected third-century praise song (P.Oxy 1786)

Via Twitter I learn of a bit of a buzz about a papyrus.  That’s always a good thing in principle – public interest means funding!  Indeed the whole Oxyrhynchus papyri project came about because the public got interested in “new words of Jesus” and a newspaper raised the money to find more.  So what’s this one about?

Via Baptist Press here:

What was left of the hymn, archeologists found 100 years ago in ancient Egyptian ruins on a scrap of tattered papyrus, long buried by desert sand. The discovery was sealed in a climate-controlled vault at Oxford University until John Dickson came along.

Dickson, who joined Wheaton College in 2022 as the inaugural Jean Kvamme Distinguished Professor of Biblical Studies and Public Christianity, began to realize the importance of the papyrus for today’s Christians.

“I’m thinking, why has no one brought this back to life? You know, this is a song from before there were denominations,” he told Baptist Press. “And it’s thoroughly Orthodox Christian theology.”

Archeological dating could certify without a doubt, Dickson said, that the hymn dated to the mid-200s, owing to paleography and “a corn contract on the back” of the papyrus. About a fifth of the words, the beginning lines, were missing, he said, as well as the corresponding tune to the missing lyrics. But the rest, including a tune that would have resonated with pagans of the day, was intact.

What is most notable, Dickson said, is the certainty with which the song presents the Trinity, although it predates by generations the Council of Nicaea, in 325 AD, which scholars say confirmed the Trinity.

But Dickson’s challenge was rebirthing the hymn in tune and lyrics for today’s Christians, while maintaining the high praise of the early Christians…. Chris Tomlin, whom Time Magazine has hailed as “potentially the most often sung artist in the world,” and Ben Fielding of Australia….

The massive collaboration comes together in a song, The First Hymn Project, releasing April 11 worldwide, and the accompanying documentary featuring a cast of scholars streaming April 14 in the U.S. on Wonder. Special documentary showings and concerts are scheduled 7-9 p.m. April 14 at Biola University in La Mirada, Calif., and April 15 from 7-9 p.m. at the Museum of the Bible in Washington, D.C.

And another site here.  The razzmatazz is a little alien to the world of scholarship, but if it brings interest and money to papyrology then only a fool could disapprove.  (Although past experience suggests that papyrology actually does contain a significant number of elitist fools….)

The articles tend to give the impression that this is a fresh discovery. But it is not.

It is in fact P.Oxy 1786, published in 1922 in volume 15 of the Oxyrhynchyus papyri.  It is held in the Sackler Library in Oxford.  There are pictures online at the Oxyrhynchus site here.  There is even a Wikipedia article about it.

Well done, John Dickson.

Share

Kassel University online manuscripts -a fabulous interface!

Well here’s something special! (via this twitter post)  The image below (online here) is fairly familiar.  It shows the “serpent column” in Constantinople, as it was in the 16th century before the heads broke off.  The column is still there, in the Hippodrome.  It is, in fact, the ancient Greek monument commemorating the battle of Marathon, where the Greek cities defeated the Persians.  On it are inscribed the names of all the cities that sent soldiers.  But this is not what makes this site special.

Kassel 4° Ms. hist. 31 (Türkisches Manierenbuch / A Book of Turkish Customs), image 33 / f15r

The whole manuscript is there! It’s on folio 15r, which is the 33rd image in the manuscript.  The manuscript itself is a 16th century collection of illustrations of Turks in costume, with a few other things like this.  Such collections of pictures exist at other libraries too.

The interface is actually useful, at least on PC.  You get thumbnails, you get IIIF, you get proper references.  It’s really rather marvellous.  Universität Kassel have excelled!  The platform is something called “Orka”, and frankly this is very nice.

The breadcrumbs at the top make it easy to find the collection, select the Latin manuscripts, display a list of shelfmarks.  Whoever designed this actually talked to people who use these sites.

There are some 474 Latin manuscripts dated before 1500, which is very respectable.  And, blessedly, you can display 100 mss at a time, in various orders.

It’s tremendously useful.  It’s now time to note that the Kassel manuscripts are online, and may be accessible and usable.

Share