Bodleian Library manuscripts can now be downloaded as PDFs!!

I was looking at the online copy of the Bodleian manuscript of Plato, the “Clarkianus” 39 (here), when I discovered something wonderful.  We can now download the whole thing as a PDF!

This is just so amazing!  It also means that any cyber-attack can only do so much damage, if you have offline copies.

Here’s the screen grabs of what to do:

  1.  Go to the manuscript online:

2.  Click on the “Download” icon and you get this.

3.  Click on the download for the whole item.

Note that if you select a page range, it has to assemble that offline and email you, so it takes longer.

That’s it!  It’s actually the best user interface for downloads that I’ve yet seen.  Nice!

The only downside is resolution.  The download of this manuscript (871 pages) is a pretty massive 800mb.  If you look at folio 1r, the scholia are a bit fuzzy.  So for these you still need to use the website.  It would be good to have an “ultra-high res, kiss your disk space goodbye” option.  But it’s still a huge step forward.


A new project: “translating key pieces of patristic pseudepigrapha into English” by Nathan Porter

A post on Bluesky by Nathan Porter:

Now online, and coming soon to an airport near you, is the first English translation of the Pseudo-Athanasian work, De Incarnatione et contra Arianos.… So begins my long-term project of translating key pieces of patristic pseudepigrapha into English.

Coming soon: Ps-Basil, Against Eunomius IV and V Ps-Athanasius, Dialogues on the Trinity Ps-Epiphanius, Homily on the Resurrection Anonymous, Life of Amphilochius.

On the Academia page he adds:

This is the first English translation of the Pseudo-Athanasian work De Incarnatione et contra Arianos (PG 26: 984-1028). Though it has received little scholarly attention, it is a work of considerable interest for its novel exegesis of biblical texts and unusual theological formulations. Some have attributed it to Marcellus of Ancyra, though probably erroneously.

The work is CPG 2806.  The edition is that of Montfaucon.  Interestingly there is a Latin version in Florence BML 584, of the 9-10th century; an Armenian version , and a Syriac version in the CSCO series!


So… farewell, Abbyy Finereader, but why did you just commit suicide?

It must be 20 years ago or more that I first stumbled upon the OCR software, Abbyy Finereader.  I was enthralled, and I bought it, with the option for Cyrillic recognition.  At the time the word was that it had originally been developed for the KGB!  It was much better than anything else.

Since that day I have bought every upgrade.  These appeared every couple of years, and always gave you a  bit better OCR.  The user interface was not really improved tho – they tended to mess with it, when it worked really very well.  And currently I am using Abbyy Finereader 15 Pro.  This is a wonderful OCR engine.  In the last couple of versions, the software developers have gone a bit insane, and started forcing you to produce PDF as your output.  But in fact they don’t do PDFs that well!  Never mind – it is still possible to just do straight OCR.  The addition of the Fraktur module is good also.

But … disaster!  I learn today that the idiots and nincompoops at Abbyy have decided to make it available only on a “subscription” model.  You can’t buy it any more.  Instead they will lease it to you for a year, for the same price as a purchase used to be.  At the end of the year, you have to pay again.  And again.

I have never purchased a subscription for any software, and I never will.  This is predatory pricing, and it should be illegal.

We all know that  Microsoft have their “Office 365” subscription.  A few years back I was horrified to find that a poor girl living on benefits was paying for a subscription.  She had to count every penny; yet Microsoft was bleeding her each month.  I quickly put a stop to that, I should say.

Last month I discovered that my late mother had also been taken in by this scam, and was paying a monthly sum to Microsoft just to do basic word-processing.

This is classic monopoly abuse.  Create a monopoly, then force people to accept predatory prices that benefit only the monopolist.  Instead of bringing in better software each year, so that people want to buy something better, how much easier it is to just force them to pay again for the same thing?

Microsoft can get away with it, because word-processing is essential, and they have donated heavily to the US political establishment.

But I rather doubt that Abbyy has a monopoly.  All they have done is to ensure that I don’t buy any more upgrades.

All the same, it’s a shame.  Abbyy Finereader really was good.  I always recommended it.

Those who don’t feel like being robbed like this may wish to know that Google Docs does OCR for free, and for an even wider range of languages than Abbyy.


“Scriptor Syrus”, the scholiast on Dionysius bar Salibi: oft-quoted, but from where?

Something that comes around every year at this time is a quotation from a certain “Scriptor Syrus,” supposedly about the origins of Christmas.  Often it is supposed to be 4th century. This is the usual wording.

It was a custom of the pagans to celebrate on the same Dec. 25 the birthday of the sun, at which they kindled lights in token of festivity …Accordingly, when the church authorities perceived that the Christians had a leaning to this festival, they took counsel and resolved that the true Nativity should be solemnized on that day.

There is an excellent post at Andrew McGowan’s blog here about this “quote”, and the many errors and falsehoods involved, and a mention by Tom Holland.  It is, in fact, a marginal note by an unknown Syrian writer (= “scriptor syrus”) in a manuscript of the works of Dionysius bar Salibi, a 12th century Syriac author.

There is a somewhat fuller translation by Ramsay MacMullen, Christianity and Paganism in the Fourth to Eighth Centuries, Yale (1997), p.155:

A twelfth-century Syrian bishop explained,

“The reason, then, why the fathers of the church moved the January 6th celebration [of Epiphany] to December 25th was this, they say: it was the custom of the pagans to celebrate on this same December 25th the birthday of the Sun, and they lit lights then to exalt the day, and invited and admitted the Christians to these rites. When, therefore, the teachers of the church saw that Christians inclined to this custom, figuring out a strategy, they set the celebration of the true Sunrise on this day, and ordered Epiphany to be celebrated on January 6th; and this usage they maintain to the present day along with the lighting of lights.”[8]

p.244, 8.  Dionysius Bar-Salibi, bishop of Amida, whom I quote from the Latin of G. S. Assemani, Bibliotheca orientalis Clementino-Vaticanae 2 (Rome 1721) 164; and compare such other festivals as that of the Natale Petri of February, particularly in Fevrier (1977) 515, who protests against apologetic arguments to insulate the choice of date from any pagan antecedents or competition.

The overt polemical purpose of the modern author needs no discussion. But the reference is a useful entry-point to try to find the actual source.

What work are we talking about?  What manuscript?

Assemani was an Eastern Christian who published a whole series of extracts from eastern authors, in the original language, in his Bibliotheca orientalis Clementino-Vaticanae, with commentary and translation in Latin.  These are now online, and volume 2, page 164 may be found at Google books here.  The text is in two columns.  The original language is given, a text in italics is the translation, and Assemani’s own words are in normal text.

Page 164 from Bibliotheca Orientalis Clementino-Vaticanae, vol. 2 (1721)

Assemani introduces our scholiast thus (Google translate follows):

Hunc tamen Armenorum ritum, quem hic rejicit Bar-Salibaeus, anonymus nescio quis Syrus probare contendit in margine apud eundem Bar-Salibaeum fol. 43. a tergo, his verbis:

However an anonymous Syrian, I don’t know who, tries to prove this Armenian rite, which Bar-Salibaeus here rejects, in the margin in the same Bar-Salibaeus fol. 43. on the back, in these words:

Then follows the Syriac text, and then the Latin translation prepared by Assemani:

Mense Januario natus est Dominus eodem die quo Epiphaniam celebramus, quia veteres uno eodemque die festum Nativitatis & Epiphaniae peragebaret, quoniam eadem die natus & baptizatus est. Quare hodie etiam ab Armenis uno dic ambae festivitates celebrantur. Quibus adstipulantur Doctores, qui de utroque festo simul loquuntur. Causam porro, cur a Patribus praedicta solemnitas a die 6. Januarii ad 25. Decembris translata fuit, hanc fuisse ferunt. Solemne erat ethnicis hac ipsa die 25. Decembris festum ortus solis celebrare; ad augendam porro diei celebritatem, ignes accendere solebant: ad quos ritus populum etiam Christianum invitare & admittere consueverant. Quum ergo animadverterent Doctores ad eum morem Christianos propendere, excogitato consilio eo die festum veri Ortus constituerunt; die vero 6. Januarii Epiphaniam celebrari jussere. Hunc itaque morem ad hodiernum usque diem cum ritu accendendi ignis retinuerunt. Et quoniam sol duodecim gradus ascendit Dominus natus est hac die tertiadecima, & sicut S. Ephram docet, Solis justitiae & duodecim Apostolorum ejus mysteria repraesentat. Numerus, inquit S. Doctor, denarius perfectus est. Die decima Martii uterum intravit. Numerus item senarius perfectus est. Die 6. Januarii utramque partem nativitas ejus reconciliavit.

In the month of January, the Lord was born on the same day on which we celebrate the Epiphany, because in the olden days the festival of Nativity and Epiphany was held on the same day, since he was born and baptized on the same day. Therefore, even today, both festivals are celebrated by the Armenians. The Doctors [of the Church] support this, who speak of both festivals at the same time. Furthermore, the reason why the aforesaid solemnity was transferred by the Fathers from the 6th of January to the 25th of December, they say was this. It was traditional for the pagans to celebrate the birth of the sun on this very day, the 25th of December; to further enhance the celebration of the day, they used to light fires: to which rites they were accustomed to invite and admit even Christian people. When, therefore, the Doctors noticed that the Christians were inclined to that custom, they devised a plan and established on that day the feast of the true Resurrection; but on the 6th of January they ordered that the Epiphany be celebrated. So they have kept this custom to this day with the ritual of lighting fires. And since the sun has risen twelve degrees, the Lord was born on this thirteenth day, and as St. Ephraim teaches, he represents the mysteries of the sun of justice and his twelve apostles. The number, says the Holy Doctor, is a perfect denarius. On the tenth of March he entered the womb. The same number is perfect. On the 6th of January his birth reconciled both parties.

I don’t understand the bit about “denarius”; is it a typo for “senarius,” which seems to mean “a multiple of six”?  But it doesn’t matter for our purposes.  Assemani then continues his work by introducing a different extract from fol. 125 concerning Caiaphas, of no relevance here.

So these words, by the anonymous “syrian writer”, are on folio 43v of the manuscript used by Assemani.

But what is this a manuscript *of*?  What text?

Looking up to page 161, I see that Assemani is quoting material from folio 37v of this manuscript of a work by Dionysius bar Salibi, about the “progenitores” of Christ, from Luke’s gospel:

Quos Lucas refert Christi progenitores, eos ex Africano, Eusebio, Nazianzeno,Sarugensi, Graecisque & Syriacis Codicibus sic enumerat fol.37. a tergo:

He enumerates those whom Luke gives as progenitors of Christ, from Africanus, Eusebius, Nazianzen, [Jacob of] Sarug, from Greek and Syriac manuscripts, on fol. 37v:

He then continues with a passage from folio 161, on the nativity of Christ, before adding the material above from the scholiast.  It’s odd that this jumps about like this.

On pp.157-8, it all becomes clear.  Assemani is giving extracts from the Commentary on the Four Gospels by Dionysius bar Salibi, and he is extracting this material from a Vatican manuscript:

Commentaria in Testamentum Vetus & Novum. Et quidem expositio in quatuor Evangelia exstat in Cod. Syr. Vatic. 11. & in Cod. Syr. Clem. Vat. 16. a fol. 27. usque ad fol. 263. ejusque duo exemplaria in Bibliotheca Colbertina haberi testatur Renaudotius tom. 2. Liturg. Orient. pag. 454.

Commentaries on the Old and New Testaments. And a certain exposition on the four Gospels exists in Cod. Syr. Vatic 11. And in Cod. Syr. Clem. Vat. 16, from fol. 27. up to fol. 263. Renaudius testifies, Liturg. Orient. vol. 2, page 454 that two copies of this are held in the Bibliotheca Colbertina [i.e. now in the French National Library].

So… let’s take it further.  A lot of Vatican manuscripts are online.  But when I use the excellent Wiglaf guide to Vatican mss, and look at Vatican. Syr. 11, and Vaticanus Syr. 16, – I don’t think there is a “Clementine” subdivision of Syriac manuscripts – I find that neither has scholia on fol. 43v.  Someone has messed up the numbering of the manuscripts since!  It turns out that Assemani and his son did so, later in life, in the 1750s.  The marvellous website tells me of a concordance by Hyvernat, “Vatican Syriac Mss Old And New Press Marks” (1903), online here.

But this too is useless.  The old “Vat. Syr. 1” became Vat. Syr. 19, online here, but there is still no marginal note on folio 43v.  Hyvernat does not explain the “Clem.” collection at all.

Thankfully Hyvernat tells us about a catalogue composed by Assemani and son, and gives links to text-searchable PDF’s!

Looking at these, if we do a text search for “Salib”, we find that manuscript 156 contains Dionysius bar Salibi.  But… no scholion on fol. 43v.  In fact the manuscript has been divided into two parts, and part 2 is also online here.

The catalogue for Vat. Syr 156 says the Luke portion begins on fol. 188, which doesn’t sound right.  But at the end it says “see ms 155, fol. 161v”  And when I look at the catalogue entry for Vat. Syr. 155 – it too contains Dionysius bar Salibi!  The text search had missed it.   Are these two, perhaps, the two manuscripts that Assemani used, now placed side by side?  Hyvernat says look at the start of the catalogue entry, there may be the old shelfmark there.  And…

CLV. Codex in fol. bombycinus, foliis constans 294. Syriacis recentioribus literis exaratus, inter Syriacos Codices, a nobis in Vaticanam Bibliothecam inlatos, olim Decimus sextus: quo continentur:

150.  Folio manuscript on cotton-paper, consisting of 294 leaves, written in modern Syriac letters, one of the Syriac manuscripts brought by us into the Vatican Library, once the Sixteenth: which contains:

So this is indeed the one-time manuscript Vat. Syr. 16!   Hyvernat expresses himself bitterly toward the authors of the catalogue – “of no practical use” -, and, after more than two hours working on this, I too am less than chuffed with them.  The manuscript was never simply “Vat. Syr. 16”; prior to the reorganisation it was, in fact, Vat. Syr. Assemani 16; and the other manuscript, 156, was Vat Syr. Assemani 46.  Aaargh!

But … viewing Vat. Syr. 155 on folio 43v – there is a long scholion!  We’re there!  It matches!

Vatican Syr. 155, folio 43v – the scholion on Dionysius bar Salibi, Commentary on Luke, discussing the date of Christmas

One last wrinkle.  The catalogue (part 3, p.297) tells us that Luke is on fol.160v onwards.  That’s is item 23 in this manuscript, which contains various texts.  So what is fol. 43v part of?  Well, item 21 is the commentary on Matthew, starting on folio 32, and continuing to fol. 148v.  Not Luke, as anyone would infer from the original in the Bibliotheca Orientalis, unless they were very careful.

So this passage by “Scriptor Syrus” is, in fact, a scholion by some unknown person, on a passage in the Vatican Syr. 155 copy of Dionysius bar Salibi’s Commentary on Matthew.

It would be most useful to know exactly which passage of Dionysius bar Salibi is so annotated.  But there we must leave this.

Update: 24 Dec. 2023.  A useful comment from Syriacist Grigory Kessel is that Dionysius bar Salibi’s commentary on the gospels was printed in the CSCO series, with a Latin translation; and that the annotation above is against Dionysius’ comments on Matthew 2:1 (“Now when Jesus was born in Bethlehem of Judea in the days of Herod the king, behold, wise men from the East came to Jerusalem, saying,…”), and the relevant passage is here.  I imagine it relates to the paragraph on p.67, l.12 onwards, where 25 December is specified.  Thank you!


Working with Bauer’s 1783 translation of Bar Hebraeus’ “History of the Dynasties”

Following my last post, I’ve started to look at the PDFs of Bauer’s 1783-5 German translation of Bar Hebraeus’ History of the Dynasties.

It must be said that the Fraktur print is not pleasant to deal with.  But it could be very much worse!  I’ve seen much worse.  Here’s the version from Google Books:

And here is the same page from the MDZ library:

I’ve tried running both through Abbyy Finereader 15 Pro.  Curiously the results are better, on the whole, from the higher resolution MDZ version.  I had expected that the bleed-through from the reverse might cause problems – and it may yet!  Even more oddly, the OCR on the “Plain Text” version of Google Books is better still.

But there is a problem with using Google Books in plain text mode.  There is no way to start part way through the book.  You will always be placed at the very start, and you can only navigate by clicking “Next page” or whatever it is.  This is not good news if you have 100 pages to click through before you get to where you want to be.

The opening portion of these world chronicles is always a version of the biblical narrative about the creation, followed by material from the Old Testament, combined with apocryphal material.  I may be alone here, but I have always found these parts of the narratives unreadable.  When I translated Agapius, I started with the time of Jesus, part way through.  I did the same with Eutychius. I only did the opening chapters at the end, after I had translated all the way from Jesus to the end of the book first.  I recall that it felt like wading through glue. I might have given up, except that I had already invested so much time in the project.

Starting in the time of Jesus immediately introduces us to familiar figures.  On page 88 of volume 1, the “Sixth Dynasty” starts, with Alexander the great.  It ends on page 98 with Cleopatra.  Each section starts with a familiar name, one of the Ptolemies in most cases.

On page 99, dynasty 7 begins, after an introduction, with Augustus.  The dynasty ends on p.139 with Justinian.  Each ruler gets a paragraph, often only a few sentences.

It’s all do-able, clearly.  I’m not sure that I want to get into working on this book seriously, with the St Nicholas project still in mid-air.  But it’s not hard work, which is something!


An adventurer in Arab Christian Studies – Prof. Bartolomeo Pirone

None of the histories of Arabic Christian literature – Agapius, Eutychius, Yahya ibn Said al-Antaki, Al-Makin, Bar Hebraeus – exist in English translation.  This site has made some modest efforts to remedy this, by turning the French translation of Agapius and the Italian translation of Eutychius into English, and posting them online.  Judging from queries received, the effort has been worthwhile, and has drawn attention to both.  It was difficult to obtain a copy of the Italian translation, but eventually I located  and purchased one over the web from the Franciscan bookshop in Jerusalem, where it had plainly sat and gathered dust for many years.  The translator was a certain Bartolomeo Pirone, of whom I knew nothing.

Indeed how many of us are that aware of material in Italian?  Even though Google Translate handles Italian very well these days, few of us have any idea what is out there.  Yet there are invaluable translations of otherwise inaccessible patristic material.

A few days ago I became aware of a series of translations into Italian of Arabic Christian literature, the PCAC series.  This includes 30-odd texts from the literature of the Christians in the Near East, such as Theodore Abu Qurrah.  The region was occupied by Islam in the 7th century, and they were obliged to write in Arabic from the 9th century onwards, as the cultural pressure became irresistible.  But it is, at that period, a branch of Byzantine literature, and full of interest.

Much to my surprise, I discovered that the series was edited by none other than the same Dr Bartolomeo Pirone.  Now retired but still active, he was a full professor at the University of Naples L’Orientale, and lectured in Cairo and Beirut.  Judging from a google search, he has dedicated a portion of his life to making this literature known, in the most obvious way possible; by translating it into the vernacular, and gathering other scholars to do likewise. Indeed I have at this very instant just discovered that he also made a translation of Agapius into Italian![1]  But this does not exhaust his work, which also includes Muslim literature, and the interaction between Christianity and Islam.

Much of his work was published by the Franciscan Province of the Holy Land, known as the “Custody of the Holy Land“.  This in turn explains why a copy of his standalone translation of Eutychius was available in their bookshop in Jerusalem.  There is an article from 2018 at the Franciscan website here, celebrating his 40 years of research.

Prof. Bartolomeo Pirone

I would imagine that very few people in the English-speaking world have ever heard of Dr Pirone and his immensely valuable work on an area of literature known to very few.  But if you are at all interested in Arabic Christian literature, and especially if you – like myself – do not know any Arabic, then you need to know about his work.

  1. [1]Agapio di Gerapoli, Storia universale, Terra Sancta (2013), ISBN 9788862401647.

Getting manuscript reproductions in the UK – important and useful court judgement?

Via Dr Bendor Grosvenor on Twitter, I learn of an interesting court case about “image fees”.  According to Dr. G, this is very good news for manuscript researchers, and historians in general, and also for those who want to download and post online images of out-of-copyright material.  Here’s his thread:

Those of us who’ve had to pay image fees will know the system relies on museums claiming copyright in their photos – irrespective of whether the art they’re photographing is itself in copyright. (In the UK, copyright lasts for 70 years after the death of the artist).  In other words, a painting by John Constable may be long out of copyright, but taking a photo of it creates a new copyright in that photo. By restricting the taking or sharing of other photos, museums force us to use their own photos for publication, and thus charge large sums.

Copyright is the glue which holds the system together, otherwise, we’d be able to either take a photo from the museum’s website, or use a photo someone else has already paid for. The ‘copyright licence’ we buy prevents us from sharing the image for wider re-use.

In the UK, this copyright claim has for long been contentious. For example, under the 2019 EU Copyright Directive (Article 14), it is not possible to claim copyright in a straightforward reproduction of a work of art which is itself out of copyright (older than 70 years).  The relevant bit of Art. 14: “when the term of protection of a work of visual art has expired, any material resulting from an act of reproduction of that work is not subject to copyright or related rights unless the material resulting from that act of reproduction is original in the sense that it is the author’s own intellectual creation.”

In other words, take a straightforward photo of the Constable painting = no new copyright in your photo. But pose something in front of it, add an extra cow in Photoshop = new copyright in your photo.

For many of us, that EU Directive looked like the end to image fees in the UK – but Brexit happened just before ratification was required in member states.

In the UK, museums and image libraries relied on the UK’s Copyright, Designs and Patents Act 1988, which appeared to give copyright to your photo of the Constable simply because of the effort you took in taking it. This was called the ‘sweat of the brow’ concept.  In other words, you did not need to demonstrate any creative effort, or add any personal touch, to claim your copyright. BUT, since 1988, various EU and UK judgements have eroded the ‘sweat of the brow’ concept.

But the situation was still not entirely clear, until now. In an Appeal Court judgement this November (THJ v Sheridan [2023] EWCA Civ 1354). Here’s the full judgement.

Click to access ewca_civ_2023_1354.pdf

(And here (to which I am indebted) is Prof. Eleonora Rosati @eLAWnora  commentary on the judgement.)

Para 16 rules that, for copyright to pertain: ‘What is required is that the author was able to express their creative abilities in the production of the work by making free and creative choices so as to stamp the work created with their personal touch.”

So, taking a straightforward photo does not count, nor does getting the lighting right or other labour of a ‘technical’ kind.

What does this mean for the image fee system which strangles so much art historical scholarship, prevents the public learning about the art they own, and acts as a tax on knowledge? In the UK, it means it’s over.  In fact, because in THJ v Sheridan, the judges said the ‘skill and labour’ test has not been valid *since 2004*, it suggests that all those ‘image licences’ which have been sold relying on copyright have been invalid, and (I suspect?) mis-sold.

Those of us who’ve been campaigning against image fees have been arguing (with hard evidence) that the system doesn’t raise meaningful revenue for museums (and in many cases, costs them money).  But to little avail, as far as museums are concerned. They just carried on charging, insisting they had copyright, which encouraged publishers to insist we kept buying ‘licences’. And now we know that for historic, 2D artworks it’s basically been a scam.

What do we do now? I suppose museums can carry on restricting the availability of decent photos. That’s why Tate’s website only lets us see low-res photos (of the art we own).  But without the glue of copyright, the system must collapse, because there’s nothing to stop images being re-used.  So, if you’re able to take a tolerably good photo of a historic artwork from online for your publication, do so.  Don’t let publishers and journals bully you into buying ‘licences’. Don’t agree to label photos (C) when no copyright exists.  And if you’re a museum director or trustee, think hard about your museum mis-selling licences for the last two decades.

Note that this is clearly downstream of the EU ruling.  This now leaves the USA behind, at least until some public-spirited person clarifies the law there.

The actual court case was about whether a GUI could be copyrighted, so it isn’t really the same thing.  But the case is about “originality” in copyright, and this is what lies behind the claim of museums that a photograph is an “original work” and therefore in copyright. There is discussion of the case on these sites:

UK Court of Appeal rules on copyright in GUIs

Originality in copyright – a review of THJ v Sheridan

Let us hope that the judgement does indeed mean what Dr G. says that it does, and frees up public domain material for the use of us all.  I suspect the foot-dragging will be immense, tho.


More experiments with Amharic and technology

In my last post I found that it was possible to turn a PDF full of images of Amharic text into recognised electronic text using Google Drive, and then get some translation of the results into English using Google Translate.

There were some extremely interesting comments made on the post, which I have been reading.  I have also prepared a PDF of the whole text of the Life of Garima by Yohannes, and run that through the Google Drive process.

Where we started was in trying to read a passage of this text, in which – supposedly – God stopped the sun so that St Garima could copy the bible in one day.  The summary of the work  given by Rossini (instead of a proper translation, drat him), indicates that this was on lines 356-60 of his text, which turns out to be the last line of p.161 and the first three of p.162.  Here they are:

The output from the OCR is good, but you still have to compare the characters carefully.  Errors can often be picked up just by dumping the raw scan output into Google Translate, which shows things like numerals.

Here we have a character that is plainly wrong, and coming out as a numeral “4”.  It looks like an “o” with a hat and two dots under.  The two dots under are legs in another copy of Rossini.

I’m guessing that it’s a “ge” character, from looking at the Wikipedia article, but I can’t be sure. The script isn’t an alphabet, but a syllabary, based on syllables.  Each character is a consonant followed by a  vowel, which makes for a lot more characters.  There’s a table of the characters on the Wikipedia article, consonants down the left, vowels across the top.  I’ve not really looked at this.

The Google translate output is also interesting because of the choice of “detected language” – Tigrayan, rather than Amharic.  If you force it to Amharic, you get a lot less meaning.

One awkward part of using Google Drive to do the OCR is that it doesn’t preserve the line breaks.  That makes comparing the lines more awkward.   So you have to manually do this:

፬ ፡ ወኮነ ፡ በአሐቲ ፡ ዕላት ፡ ወነሥአ ፡ መጽሐፈ ፡ ወቀለመ ፡ ወወጠነ፡
ይጽሐፍ ። ወተንሥአ ፡ ለጸሎት በሰርክ ። ወጸሐፉ ፡ ሎቱ : መላእክት ፡ ወንጌ ለ ፡
በ፬ ፡ ሰዓት ፡ ወትርጓሜሁ ። ወመላእክተ ፡ እግዚአብሔር ፡ ወትረ ፡ ይት ለአክዎ ፡
ወእግዚእነሂ ፡ ክርስቶስ ፡ ያንሶሱ ፡ ምስሌሁ ። ወተሰምዐ ፡ ዜናሁ :
ውስተ ፡ ኵሉ ፡ ሀገር ። ጸሎቱ ፡ ወበረከቱ ፡ የሀሉ ፡ ምስሌነ ።

The Wikipedia article mentioned earlier gave me a list of punctuation marks.  There are two sorts of punctuation visible in here.  The colon mark is actually word division, which means that some words above go over two lines.  I’ve chosen not to split words above.  The double colon mark “::” is the full stop.  Interestingly Google Translate gives different results if you remove the spaces!

Going through the electronic text, removing spaces, I notice that sometimes the word-separator isn’t detected by the OCR.  So I added that in.  Sometimes it put a Roman colon instead, so I replaced that.  Finally I split on sentence:

ወጸሐፉ፡ሎቱ፡መላእክት፡ወንጌ ለ፡በ፬፡ሰዓት፡ወትርጓሜሁ።

And run it again and I get this:

But this still is not good enough to do much with.  If we didn’t have an idea what the text said, this would not tell us.

All this fiddling about would certainly get to into contact with the language, and start you on a journey to learning it.  But it’s not good enough a translation for other purposes, although intriguing.

One suggestion that was made in the comments to the last article was that ChatGPT gave better results.  The output quoted was indeed produced, and was very smooth and seemed to be a series of liturgical prayers.  But… I don’t think that this is actually the content.  These AI tools are really only an improved version of the text prediction tools you get on messaging on a mobile phone.  So it was pumping out garbage.

Anyway I tried it on this passage, and it crashed GPT very effectively!  At the moment I can’t get any reply of any sort, not even to “hello”.

I don’t think that I will do more here.  Clearly the technology is almost, but not quite good enough to be useful.


Is it possible to read editions of Amharic texts? An experiment

In my last post I mentioned how the Life of St Garima in Ethiopian was printed by Rossini, but without a translation.  In fact it has never been translated into any modern language, to my knowledge.  I don’t know any Ethiopian, and I doubt that I ever will.

But we live in an age of wonders, when it comes to unfamiliar languages.

So… is it possible to work with Ethiopian language editions, even if you know no Ethiopian?  What about Google Translate?  Ethiopian is in this heavy unfamiliar script.  Is there OCR for this?  If you can scan Rossini’s edition, can you pop it into Google Translate and get the English?

There are two sorts of Ethiopian out there, I know.  There is Ge`ez, or classical Ethiopian; and there is Amharic, the modern dialect.  Rossini printed his text from a 19th century manuscript.  So it seems likely that this is in Amharic.

A quick Google confirmed; Google Translate knows Amharic!  A bit of googling found me an Amharic news website online, here.  I’m using Chrome, so all I had to do was right-click anywhere and select “Translate to English” and the whole website was rendered into some sort of English.  And… it worked!!  Yay me!  It’s obviously not 100%, but it’s way better than 0%!

So what about OCR?  I was sad to see that Abbyy Finereader apparently doesn’t support Amharic.  That’s a blow.  It was developed originally to handle Cyrillic, so it certainly has the capability.  But it’s not offered.  Drat.

A bit of googling brought me to a dubious-looking website here, claiming to offer a selection of tools which could do Amharic OCR.  The prose felt a bit machine-generated, so I worried that it was bunk, or worse, a malicious site.  But the first option was… Google Drive.

I never knew this, but seems that, if you upload a PDF containing an image of text, and then open it in Drive as a Google Docs document, it OCR’s the content.

Well, I thought, let’s give it a try.  So I extracted the first page of Rossini’s edition, using Adobe Acrobat Pro 9 – no flashy latest-edition stuff going on here!  Here’s a pic:

Then I uploaded it, and opened as a Google document.  And … it just treated the Amharic as an image.  Dang!  But I noticed that it did indeed OCR the Italian at the top of the page!

This is supposed to work.  So I thought maybe I should work over the image a bit.  I imported the one-page PDF into Abbyy Finereader 15, and chopped off the Italian at the top, and the critical apparatus at the bottom.  I then used the image editor in Finereader to “whiten the background”.  This can be flaky, but this time it worked fine, and I got a pure white background.   And I got this:

(I’ve just seen the marginal notes, which I need to chop off as well, so I’ll have to go round the loop again)

I exported the image as a PNG, and I used Acrobat again to create a PDF from the image.  Then I uploaded the new PDF to Google Drive, and opened it as a Google Docs document.  And… it worked!  Sort of…

በስመ : አብ : ወወልድ ‘ ወመንፈስ ፡ ቅዱስ ፡ ፩ ፡ አምላከ ፡ ላዕሌሁ ፡ ተወ ከልኩ፡ ወቦቱ ፡ አመንኩ ፡ እስከ ፡ ላዓለመ ፡ ዓለም ፡ አሜን ።

ድርሳን ፡ ዘደረሰ ፡ ቅዱስ ፡ ዮሐንስ ፡ ኤጲስ ፡ ቆጶስ ፡ ዘአክሱም o ፡ በእንተ ዕበዩ ፡ ወክብሩ ፡ ለቅዱስ ፡ ይስሓቅ = ወይቤ ፤ ስምዑ ‘ ወልብዉ ፡ ኦአኀውየ 5 ፍቁራንየ ፡ ዘእነግረከሙ ። ርኢኩ ፡ ብእሲተ ፡ እንዘ ፡ ይዘብጥዋ ፡ ዕራቃ ወእንዘ ፡ ይሀርፉ ፡ ላዕሌሃ ፡ ወላዕለ ፡ እግዝእትነ ፡ ማርያም ፡ እንዘ ፡ ይብሉ በእንተ ፡ ወልዳ ፡ ክርስቶስ ፤ እምብእሲት ፡ ኪያሁ : ኢተወልደ ፣ ይብሉ ፡ እላ ፡ ኢየአምኑ ፡ በክርስቶስ = ወኮንኩ ፡ እንዘ ፡ እረውጽ ፡ ወአኀዝኩ እስዐም ፡ ታሕተ ፡ እገሪሃ ፡ ለይእቲ ፡ ብእሲት ፡ እንዘ ፡ ትብል ፤ እወ ▪በዝ ፡ አንቀጽ ፡ ወፅአ ፡ ንጉሠ ፡ ሰማያት ፡ ወምድር ። ወሶበ ፡ ትብል፡ ከሙዝ ፡ ወ

That’s… rather astonishing.  No idea what all that is, but it looks sort of right.  Let’s bear in mind that Rossini printed his edition in 1897.  This is not a modern typeface.  So this is rather good.

Next step was to paste it into Google Translate.  It set it to auto-detect the language, and pasted in the first bit.  And… it worked.  In fact it gave a really useful transcription into Roman letters as well, which makes it a LOT easier to manipulate the text.

OK, I’m cheating slightly.  The first time I uploaded, the translation ended at “Spirit”.  But this is a Google Translate bug – it sometimes omits the remainder of a sentence.  If you split the text with a line feed, you often get the rest.  And that’s what I did.  I worked out by experiment where I needed to be, and then I got the above.

I don’t quite believe the translation of the second sentence either.  I suspect I need to play with this a bit to work out what each word is.

I notice all those colons between every word.  It might help if I actually looked up the script online!

But I think you’ll agree that this is quite marvellous – I, who know absolutely nothing about the language, am getting something useful out!



Why we should use Latin spellings of Greek names

A twitter thread by @EzhmaarSul from June 11, 2023, made some interesting points about the use in English of spellings like “Nikaia” rather than “Nicaea”. Few will have seen it, and I’ve never seen another public discussion of the subject.  So let’s give it a bit more visibility.

It went as follows:

Something I really hate about modern amateur historians (and which will leak into the professional class as these amateurs achieve doctorates) is the mixing of Greek and Latin spellings of Greek names. I’ve fallen prey to the same because of the ubiquity of amateur historians.

It starts with people wanting to use phonetic spellings of closer to the original Greek.

This urge comes from a deranged, nerdish desire to “well actually” people through text. Not as malicious as BCE, but coming from an adjacent place in petty souls.

“It’s Nikaia, not Nicaea!”

Not only does this look ugly and wrong in modern English, which is based on Latin rules of spelling and grammar, but it betrays a certain philistinism.

Greeks don’t use our alphabet! You’re broadcasting to us, “I don’t know how to pronounce this unless I spell it wrong.”

This also screws up scholarship. We have centuries of scholarship referring to Alexius Comnenus and John Palaeologus. Then along comes some redditor-turned-PHD writing about “Alexius Komnenos” and “Ioannes Palaiologos.”

And they invariably f**k it up.

“Oh Theodoros is obviously Theodore, so I’ll call him Theodore Laskaris in my paper… but Ioannes is exotic! I’ll call him Ioannes even though everyone recognizes it’s the Greek version of ‘John.’”

Don’t get me started on “Constantine.”

Just stick with the Latin and Anglophone spellings, you buffoons.

I think the author has a point. It does look hideous.  It does create a barrier.  It makes Greek history look barbarous.

There is a definite tendency among elites to create barriers for others in order to advance themselves, to order others around while feeling smug.  How else did we end up with printed Latin texts where the useful modern separation of consonant and vowel, of “i”/”j” and “u”/”v”, was actually and deliberately abandoned?  So… I rather agree.