Is it possible to read editions of Amharic texts? An experiment

Posted on August 26, 2023March 28, 2025 by Roger Pearse

In my last post I mentioned how the Life of St Garima in Ethiopian was printed by Rossini, but without a translation. In fact it has never been translated into any modern language, to my knowledge. I don’t know any Ethiopian, and I doubt that I ever will.

But we live in an age of wonders, when it comes to unfamiliar languages.

So… is it possible to work with Ethiopian language editions, even if you know no Ethiopian? What about Google Translate? Ethiopian is in this heavy unfamiliar script. Is there OCR for this? If you can scan Rossini’s edition, can you pop it into Google Translate and get the English?

There are two sorts of Ethiopian out there, I know. There is Ge`ez, or classical Ethiopian; and there is Amharic, the modern dialect. Rossini printed his text from a 19th century manuscript. So it seems likely that this is in Amharic.

A quick Google confirmed; Google Translate knows Amharic! A bit of googling found me an Amharic news website online, here. I’m using Chrome, so all I had to do was right-click anywhere and select “Translate to English” and the whole website was rendered into some sort of English. And… it worked!! Yay me! It’s obviously not 100%, but it’s way better than 0%!

So what about OCR? I was sad to see that Abbyy Finereader apparently doesn’t support Amharic. That’s a blow. It was developed originally to handle Cyrillic, so it certainly has the capability. But it’s not offered. Drat.

A bit of googling brought me to a dubious-looking website here, claiming to offer a selection of tools which could do Amharic OCR. The prose felt a bit machine-generated, so I worried that it was bunk, or worse, a malicious site. But the first option was… Google Drive.

I never knew this, but seems that, if you upload a PDF containing an image of text, and then open it in Drive as a Google Docs document, it OCR’s the content.

Well, I thought, let’s give it a try. So I extracted the first page of Rossini’s edition, using Adobe Acrobat Pro 9 – no flashy latest-edition stuff going on here! Here’s a pic:

Then I uploaded it, and opened as a Google document. And … it just treated the Amharic as an image. Dang! But I noticed that it did indeed OCR the Italian at the top of the page!

This is supposed to work. So I thought maybe I should work over the image a bit. I imported the one-page PDF into Abbyy Finereader 15, and chopped off the Italian at the top, and the critical apparatus at the bottom. I then used the image editor in Finereader to “whiten the background”. This can be flaky, but this time it worked fine, and I got a pure white background. And I got this:

(I’ve just seen the marginal notes, which I need to chop off as well, so I’ll have to go round the loop again)

I exported the image as a PNG, and I used Acrobat again to create a PDF from the image. Then I uploaded the new PDF to Google Drive, and opened it as a Google Docs document. And… it worked! Sort of…

በስመ : አብ : ወወልድ ‘ ወመንፈስ ፡ ቅዱስ ፡ ፩ ፡ አምላከ ፡ ላዕሌሁ ፡ ተወ ከልኩ፡ ወቦቱ ፡ አመንኩ ፡ እስከ ፡ ላዓለመ ፡ ዓለም ፡ አሜን ።

ድርሳን ፡ ዘደረሰ ፡ ቅዱስ ፡ ዮሐንስ ፡ ኤጲስ ፡ ቆጶስ ፡ ዘአክሱም o ፡ በእንተ ዕበዩ ፡ ወክብሩ ፡ ለቅዱስ ፡ ይስሓቅ = ወይቤ ፤ ስምዑ ‘ ወልብዉ ፡ ኦአኀውየ 5 ፍቁራንየ ፡ ዘእነግረከሙ ። ርኢኩ ፡ ብእሲተ ፡ እንዘ ፡ ይዘብጥዋ ፡ ዕራቃ ወእንዘ ፡ ይሀርፉ ፡ ላዕሌሃ ፡ ወላዕለ ፡ እግዝእትነ ፡ ማርያም ፡ እንዘ ፡ ይብሉ በእንተ ፡ ወልዳ ፡ ክርስቶስ ፤ እምብእሲት ፡ ኪያሁ : ኢተወልደ ፣ ይብሉ ፡ እላ ፡ ኢየአምኑ ፡ በክርስቶስ = ወኮንኩ ፡ እንዘ ፡ እረውጽ ፡ ወአኀዝኩ እስዐም ፡ ታሕተ ፡ እገሪሃ ፡ ለይእቲ ፡ ብእሲት ፡ እንዘ ፡ ትብል ፤ እወ ▪በዝ ፡ አንቀጽ ፡ ወፅአ ፡ ንጉሠ ፡ ሰማያት ፡ ወምድር ። ወሶበ ፡ ትብል፡ ከሙዝ ፡ ወ

That’s… rather astonishing. No idea what all that is, but it looks sort of right. Let’s bear in mind that Rossini printed his edition in 1897. This is not a modern typeface. So this is rather good.

Next step was to paste it into Google Translate. It set it to auto-detect the language, and pasted in the first bit. And… it worked. In fact it gave a really useful transcription into Roman letters as well, which makes it a LOT easier to manipulate the text.

OK, I’m cheating slightly. The first time I uploaded, the translation ended at “Spirit”. But this is a Google Translate bug – it sometimes omits the remainder of a sentence. If you split the text with a line feed, you often get the rest. And that’s what I did. I worked out by experiment where I needed to be, and then I got the above.

I don’t quite believe the translation of the second sentence either. I suspect I need to play with this a bit to work out what each word is.

I notice all those colons between every word. It might help if I actually looked up the script online!

But I think you’ll agree that this is quite marvellous – I, who know absolutely nothing about the language, am getting something useful out!

Magic!

belmurru says:

August 26, 2023 at 9:00 pm

Here is ChatGPT’s translation of your Google Docs text: “”In the name of: Father: Son: and Holy Spirit. Amen.

Praise to God the Father and the Son and the Holy Spirit, one God. Amen.

We praise you, we bless you, O God of truth. Amen.

Alleluia.

Rising from the dead on the third day, ascended into the heavens. Amen.

Sitting at the right hand of his Father. Amen.

And he is coming again in glory. Amen.

In the name of the Father and the Son and the Holy Spirit. Amen.”

Loading...
Steve Ulrich says:

August 26, 2023 at 9:01 pm

I have a friend who is from Ethiopia and reads Amharic. Its been probably 30 years since he emigrated her to the US.

Loading...
Roger Pearse says:

August 26, 2023 at 9:01 pm

Interesting! Thank you!

Loading...
Asha says:

August 27, 2023 at 11:03 am

Awesome work and tenacity! I used this site to force ocr.

https://tools.pdf24.org/en/ocr-pdf

Once you attached the file, set the language to EnglIsh or whatever language and force OCR.

Loading...
Diego says:

August 27, 2023 at 1:22 pm

Google Books (US-restricted) and Hathi Trust also seem to do a decent job OCR’ing this page, even though their scans aren’t as good as yours:

https://books.google.com/books?pg=RA11-PA149&id=nw5BAQAAMAAJ&output=text

https://hdl.handle.net/2027/uc1.$c130996?urlappend=%3Bseq=337%3Bownerid=13510798903062332-355

Loading...
matt says:

August 27, 2023 at 3:50 pm

I used ChatGPT … the results is magic! In stead of using Google Translate I “said” to ChatGPT “Translate to English” and the result was:

“In the name of: the Father, the Son, and the Holy Spirit. One God. Amen.

Declaration of faith: In the name of the Holy Trinity: Yohannes (John), Etsge (Ezge), Kogdis (Kogdis), and Aksum o, in the name of Ene Ebye’u (Virgin Mary) and the angels of the heavenly hosts, Amen.

Praise to the Holy Trinity, to Yohannes (John), Etsge (Ezge), Kogdis (Kogdis), and to the apostle Aksoum, the rock of faith, and to Kedus Michael, the leader of the heavenly angels, Amen.

Intercession: And through the prayers of the Mother of God, Saint Mary, O Lord, grant us mercy and forgiveness. And through the intercession of Kerya (Kyah), the great archangel, and E’elem (Eelem), the martyr, protect us from all harm, Amen.

The bishops of the land, may God bless them, and the clergy, and all the people, and this city, and the faithful departed, Amen. Glory be to God forever. Amen.”

Loading...
Roger Pearse says:

August 27, 2023 at 3:52 pm

That is quite amazing! I’ll scan some more and try it!

Loading...
Roger Pearse says:

August 27, 2023 at 3:53 pm

@Diego That’s very interesting. I’ll experiment with these also!

Loading...
Roger Pearse says:

August 27, 2023 at 3:53 pm

@Asha Thank you for the tip!

Loading...
Mattias Rouw says:

August 27, 2023 at 7:56 pm

I am playing a bit further with CHATGPT, depending on the prompt you get different results… it moght be interesting to play and research for the best prompt to get a good translation, maybe in combination with Google Translate. I used the folling prompt: “translate the words from the following text word for word into English:”

now the result was:

Sure, here is the word-for-word translation of the provided text into English:

በስመ : አብ : ወወልድ ‘ ወመንፈስ ፡ ቅዱስ ፡ ፩ ፡ አምላከ ፡ ላዕሌሁ ፡ ተወ ከልኩ፡ ወቦቱ ፡ አመንኩ ፡ እስከ ፡ ላዓለመ ፡ ዓለም ፡ አሜን ።
In the name: Father: and Son, and Holy Spirit, One. Amen. By the prayers of the Virgin Mary, and all the saints, have mercy on us. Amen.

ድርሳን ፡ ዘደረሰ ፡ ቅዱስ ፡ ዮሐንስ ፡ ኤጲስ ፡ ቆጶስ ፡ ዘአክሱም o ፡ በእንተ ዕበዩ ፡ ወክብሩ ፡ ለቅዱስ ፡ ይስሓቅ = ወይቤ ፤ ስምዑ ‘ ወልብዉ ፡ ኦአኀውየ 5 ፍቁራንየ ፡ ዘእነግረከሙ ።
Church: Zion: and synagogue, and tabernacle, and altar o By the prayers of the Mother of God Mary, O Lord, grant us the forgiveness of our sins. Amen.

ርኢኩ ፡ ብእሲተ ፡ እንዘ ፡ ይዘብጥዋ ፡ ዕራቃ ወእንዘ ፡ ይሀርፉ ፡ ላዕሌሃ ፡ ወላዕለ ፡ እግዝእትነ ፡ ማርያም ፡ እንዘ ፡ ይብሉ በእንተ ፡ ወልዳ ፡ ክርስቶስ ፤
We believe in one God, God the Father Almighty, Creator of heaven and earth, and of all things seen and unseen.

እምብእሲት ፡ ኪያሁ : ኢተወልደ ፣ ይብሉ ፡ እላ ፡ ኢየአምኑ ፡ በክርስቶስ = ወኮንኩ ፡ እንዘ ፡ እረውጽ ፡ ወአኀዝኩ እስዐም ፡ ታሕተ ፡ እገሪሃ ፡ ለይእቲ ፡ ብእሲት ፡ እንዘ ፡ ትብል ፤
I believe in one God, the Father Almighty, Maker of heaven and earth, and of all things visible and invisible.

እወ ▪በዝ ፡ አንቀጽ ፡ ወፅአ ፡ ንጉሠ ፡ ሰማያት ፡ ወምድር ።
And ▪by the Holy Spirit, the Lord, the Giver of life, who proceeds from the Father, who with the Father and the Son is worshiped and glorified.

ወሶበ ፡ ትብል፡ ከሙዝ ፡ ወ
And in one, holy, catholic, and apostolic Church.

Note: The text appears to be a liturgical passage or creed, and some parts might not have a direct equivalent in English, so the translations provided are interpretations that capture the essence of the original text.

Loading...
Roger Pearse says:

August 27, 2023 at 8:15 pm

That’s a very interesting ploy. A good trick to use when a sentence seems doubtful, to break it down.

In Google translate you get a Roman-letters transcription, so at least you can count the number of words and check they seem about the same in number as the English. Also to see what is just “Amen”.

Please carry on. This is super stuff.

Loading...
Douglas Domingo-Foraste says:

August 28, 2023 at 1:00 am

Ah, the goddess Technologeia works her miracle cure on the Amharic but if only it sleeps in the abaton and thinks holy thoughts.

Loading...
Pingback: More experiments with Amharic and technology – Roger Pearse
eoc says:

October 2, 2023 at 8:05 am

Anthropic’s Claude has a very different interpretation:

“Based on reviewing the passage more closely, I believe this text is in Ge’ez rather than Amharic. Here are a few reasons:
-The opening invocation references “Father, Son, and Holy Spirit, one God” which is very common in Ge’ez Christian texts.
-There are frequent Ge’ez terms like “ብእሲት” (prostitute), “ንጉሠ” (king), and “ምድር” (earth)

Given this, here is my attempted translation:

“In the name of the Father, the Son, and the Holy Spirit, one God, to whom belongs glory forever and ever. Amen.

A miracle Saint John the Evangelist, who is from Ephesus, performed regarding the sin and disgrace of Saint Isḥāq, who said: Listen and comprehend, O my beloved brothers, what I will tell you. I saw a prostitute while she was adorning herself and putting on makeup, and they were speaking about our Lady Mary and the birth of Christ. Those from the prostitute said: He was not born. They say these things but they do not believe in Christ. And I was there when I returned and passed by the entrance of her house, that prostitute who said: By this truth the King of Heaven and Earth arose. And when she said this and…””

Loading...
eoc says:

October 2, 2023 at 8:10 am

And GPT-4, prompted carefully, suggests:

“The passage you’ve provided is in Ge’ez, a Semitic language that was once spoken in the Horn of Africa and now is used mainly as a liturgical language by the Ethiopian Orthodox Church and the Eritrean Orthodox Tewahedo Church.

Here’s a translation:

“In the Name of: the Father, and the Son, and the Holy Spirit, One God, to whom be the glory. I have risen and I have believed, and I profess, unto the age of ages, Amen.

The homily that Saint John, the Bishop of Aksum, delivered concerning the veneration and honor of Saint Isaac, and he said: “Hear and understand, O my brethren, that which I have told you. I have seen a woman who defames the Virgin and who speaks against the Virgin Mary, saying concerning the birth of Christ. ‘From a woman, this is how He was born,’ they say, ‘Thus, they do not believe in Christ.’ And I was agitated and I rebuked them, saying, ‘Be silent concerning this woman who says, ‘I know in this, the chapter, and the King of Heaven and Earth has departed. And when she said, ‘Such.'”

Please note that some parts of the text are not clear, and hence the translation might not be completely accurate. The Ge’ez language, as well as the ancient cultures and religions that used it, often have complex concepts and terms that may not have direct analogues in modern English or in other languages and cultures.”

Loading...
Roger Pearse says:

October 2, 2023 at 8:40 pm

Both very interesting! Thank you. Better this one.

Loading...
biim says:

August 1, 2024 at 8:46 pm

if anyone is interested in making a whatsapp group to go into translations together (ethio and african language speakers/ enthusiasts) please let me know since we have already created one.

Loading...
Roger Pearse says:

August 1, 2024 at 8:51 pm

Anyone interested, add a comment here and I will forward your email address.

Loading...

Roger Pearse

Is it possible to read editions of Amharic texts? An experiment

Like this:

18 thoughts on “Is it possible to read editions of Amharic texts? An experiment”

Leave a ReplyCancel reply

Share this:

Like this:

18 thoughts on “Is it possible to read editions of Amharic texts? An experiment”

Leave a ReplyCancel reply