Is it possible to read editions of Amharic texts? An experiment

In my last post I mentioned how the Life of St Garima in Ethiopian was printed by Rossini, but without a translation.  In fact it has never been translated into any modern language, to my knowledge.  I don’t know any Ethiopian, and I doubt that I ever will.

But we live in an age of wonders, when it comes to unfamiliar languages.

So… is it possible to work with Ethiopian language editions, even if you know no Ethiopian?  What about Google Translate?  Ethiopian is in this heavy unfamiliar script.  Is there OCR for this?  If you can scan Rossini’s edition, can you pop it into Google Translate and get the English?

There are two sorts of Ethiopian out there, I know.  There is Ge`ez, or classical Ethiopian; and there is Amharic, the modern dialect.  Rossini printed his text from a 19th century manuscript.  So it seems likely that this is in Amharic.

A quick Google confirmed; Google Translate knows Amharic!  A bit of googling found me an Amharic news website online, here.  I’m using Chrome, so all I had to do was right-click anywhere and select “Translate to English” and the whole website was rendered into some sort of English.  And… it worked!!  Yay me!  It’s obviously not 100%, but it’s way better than 0%!

So what about OCR?  I was sad to see that Abbyy Finereader apparently doesn’t support Amharic.  That’s a blow.  It was developed originally to handle Cyrillic, so it certainly has the capability.  But it’s not offered.  Drat.

A bit of googling brought me to a dubious-looking website here, claiming to offer a selection of tools which could do Amharic OCR.  The prose felt a bit machine-generated, so I worried that it was bunk, or worse, a malicious site.  But the first option was… Google Drive.

I never knew this, but seems that, if you upload a PDF containing an image of text, and then open it in Drive as a Google Docs document, it OCR’s the content.

Well, I thought, let’s give it a try.  So I extracted the first page of Rossini’s edition, using Adobe Acrobat Pro 9 – no flashy latest-edition stuff going on here!  Here’s a pic:

Then I uploaded it, and opened as a Google document.  And … it just treated the Amharic as an image.  Dang!  But I noticed that it did indeed OCR the Italian at the top of the page!

This is supposed to work.  So I thought maybe I should work over the image a bit.  I imported the one-page PDF into Abbyy Finereader 15, and chopped off the Italian at the top, and the critical apparatus at the bottom.  I then used the image editor in Finereader to “whiten the background”.  This can be flaky, but this time it worked fine, and I got a pure white background.   And I got this:

(I’ve just seen the marginal notes, which I need to chop off as well, so I’ll have to go round the loop again)

I exported the image as a PNG, and I used Acrobat again to create a PDF from the image.  Then I uploaded the new PDF to Google Drive, and opened it as a Google Docs document.  And… it worked!  Sort of…

በስመ : አብ : ወወልድ ‘ ወመንፈስ ፡ ቅዱስ ፡ ፩ ፡ አምላከ ፡ ላዕሌሁ ፡ ተወ ከልኩ፡ ወቦቱ ፡ አመንኩ ፡ እስከ ፡ ላዓለመ ፡ ዓለም ፡ አሜን ።

ድርሳን ፡ ዘደረሰ ፡ ቅዱስ ፡ ዮሐንስ ፡ ኤጲስ ፡ ቆጶስ ፡ ዘአክሱም o ፡ በእንተ ዕበዩ ፡ ወክብሩ ፡ ለቅዱስ ፡ ይስሓቅ = ወይቤ ፤ ስምዑ ‘ ወልብዉ ፡ ኦአኀውየ 5 ፍቁራንየ ፡ ዘእነግረከሙ ። ርኢኩ ፡ ብእሲተ ፡ እንዘ ፡ ይዘብጥዋ ፡ ዕራቃ ወእንዘ ፡ ይሀርፉ ፡ ላዕሌሃ ፡ ወላዕለ ፡ እግዝእትነ ፡ ማርያም ፡ እንዘ ፡ ይብሉ በእንተ ፡ ወልዳ ፡ ክርስቶስ ፤ እምብእሲት ፡ ኪያሁ : ኢተወልደ ፣ ይብሉ ፡ እላ ፡ ኢየአምኑ ፡ በክርስቶስ = ወኮንኩ ፡ እንዘ ፡ እረውጽ ፡ ወአኀዝኩ እስዐም ፡ ታሕተ ፡ እገሪሃ ፡ ለይእቲ ፡ ብእሲት ፡ እንዘ ፡ ትብል ፤ እወ ▪በዝ ፡ አንቀጽ ፡ ወፅአ ፡ ንጉሠ ፡ ሰማያት ፡ ወምድር ። ወሶበ ፡ ትብል፡ ከሙዝ ፡ ወ

That’s… rather astonishing.  No idea what all that is, but it looks sort of right.  Let’s bear in mind that Rossini printed his edition in 1897.  This is not a modern typeface.  So this is rather good.

Next step was to paste it into Google Translate.  It set it to auto-detect the language, and pasted in the first bit.  And… it worked.  In fact it gave a really useful transcription into Roman letters as well, which makes it a LOT easier to manipulate the text.

OK, I’m cheating slightly.  The first time I uploaded, the translation ended at “Spirit”.  But this is a Google Translate bug – it sometimes omits the remainder of a sentence.  If you split the text with a line feed, you often get the rest.  And that’s what I did.  I worked out by experiment where I needed to be, and then I got the above.

I don’t quite believe the translation of the second sentence either.  I suspect I need to play with this a bit to work out what each word is.

I notice all those colons between every word.  It might help if I actually looked up the script online!

But I think you’ll agree that this is quite marvellous – I, who know absolutely nothing about the language, am getting something useful out!

Magic!

Share

16 thoughts on “Is it possible to read editions of Amharic texts? An experiment

  1. Here is ChatGPT’s translation of your Google Docs text: “”In the name of: Father: Son: and Holy Spirit. Amen.

    Praise to God the Father and the Son and the Holy Spirit, one God. Amen.

    We praise you, we bless you, O God of truth. Amen.

    Alleluia.

    Rising from the dead on the third day, ascended into the heavens. Amen.

    Sitting at the right hand of his Father. Amen.

    And he is coming again in glory. Amen.

    In the name of the Father and the Son and the Holy Spirit. Amen.”

  2. I used ChatGPT … the results is magic! In stead of using Google Translate I “said” to ChatGPT “Translate to English” and the result was:

    “In the name of: the Father, the Son, and the Holy Spirit. One God. Amen.

    Declaration of faith: In the name of the Holy Trinity: Yohannes (John), Etsge (Ezge), Kogdis (Kogdis), and Aksum o, in the name of Ene Ebye’u (Virgin Mary) and the angels of the heavenly hosts, Amen.

    Praise to the Holy Trinity, to Yohannes (John), Etsge (Ezge), Kogdis (Kogdis), and to the apostle Aksoum, the rock of faith, and to Kedus Michael, the leader of the heavenly angels, Amen.

    Intercession: And through the prayers of the Mother of God, Saint Mary, O Lord, grant us mercy and forgiveness. And through the intercession of Kerya (Kyah), the great archangel, and E’elem (Eelem), the martyr, protect us from all harm, Amen.

    The bishops of the land, may God bless them, and the clergy, and all the people, and this city, and the faithful departed, Amen. Glory be to God forever. Amen.”

  3. I am playing a bit further with CHATGPT, depending on the prompt you get different results… it moght be interesting to play and research for the best prompt to get a good translation, maybe in combination with Google Translate. I used the folling prompt: “translate the words from the following text word for word into English:”

    now the result was:

    Sure, here is the word-for-word translation of the provided text into English:

    በስመ : አብ : ወወልድ ‘ ወመንፈስ ፡ ቅዱስ ፡ ፩ ፡ አምላከ ፡ ላዕሌሁ ፡ ተወ ከልኩ፡ ወቦቱ ፡ አመንኩ ፡ እስከ ፡ ላዓለመ ፡ ዓለም ፡ አሜን ።
    In the name: Father: and Son, and Holy Spirit, One. Amen. By the prayers of the Virgin Mary, and all the saints, have mercy on us. Amen.

    ድርሳን ፡ ዘደረሰ ፡ ቅዱስ ፡ ዮሐንስ ፡ ኤጲስ ፡ ቆጶስ ፡ ዘአክሱም o ፡ በእንተ ዕበዩ ፡ ወክብሩ ፡ ለቅዱስ ፡ ይስሓቅ = ወይቤ ፤ ስምዑ ‘ ወልብዉ ፡ ኦአኀውየ 5 ፍቁራንየ ፡ ዘእነግረከሙ ።
    Church: Zion: and synagogue, and tabernacle, and altar o By the prayers of the Mother of God Mary, O Lord, grant us the forgiveness of our sins. Amen.

    ርኢኩ ፡ ብእሲተ ፡ እንዘ ፡ ይዘብጥዋ ፡ ዕራቃ ወእንዘ ፡ ይሀርፉ ፡ ላዕሌሃ ፡ ወላዕለ ፡ እግዝእትነ ፡ ማርያም ፡ እንዘ ፡ ይብሉ በእንተ ፡ ወልዳ ፡ ክርስቶስ ፤
    We believe in one God, God the Father Almighty, Creator of heaven and earth, and of all things seen and unseen.

    እምብእሲት ፡ ኪያሁ : ኢተወልደ ፣ ይብሉ ፡ እላ ፡ ኢየአምኑ ፡ በክርስቶስ = ወኮንኩ ፡ እንዘ ፡ እረውጽ ፡ ወአኀዝኩ እስዐም ፡ ታሕተ ፡ እገሪሃ ፡ ለይእቲ ፡ ብእሲት ፡ እንዘ ፡ ትብል ፤
    I believe in one God, the Father Almighty, Maker of heaven and earth, and of all things visible and invisible.

    እወ ▪በዝ ፡ አንቀጽ ፡ ወፅአ ፡ ንጉሠ ፡ ሰማያት ፡ ወምድር ።
    And ▪by the Holy Spirit, the Lord, the Giver of life, who proceeds from the Father, who with the Father and the Son is worshiped and glorified.

    ወሶበ ፡ ትብል፡ ከሙዝ ፡ ወ
    And in one, holy, catholic, and apostolic Church.

    Note: The text appears to be a liturgical passage or creed, and some parts might not have a direct equivalent in English, so the translations provided are interpretations that capture the essence of the original text.

  4. That’s a very interesting ploy. A good trick to use when a sentence seems doubtful, to break it down.

    In Google translate you get a Roman-letters transcription, so at least you can count the number of words and check they seem about the same in number as the English. Also to see what is just “Amen”.

    Please carry on. This is super stuff.

  5. Ah, the goddess Technologeia works her miracle cure on the Amharic but if only it sleeps in the abaton and thinks holy thoughts.

  6. Anthropic’s Claude has a very different interpretation:

    “Based on reviewing the passage more closely, I believe this text is in Ge’ez rather than Amharic. Here are a few reasons:
    -The opening invocation references “Father, Son, and Holy Spirit, one God” which is very common in Ge’ez Christian texts.
    -There are frequent Ge’ez terms like “ብእሲት” (prostitute), “ንጉሠ” (king), and “ምድር” (earth)

    Given this, here is my attempted translation:

    “In the name of the Father, the Son, and the Holy Spirit, one God, to whom belongs glory forever and ever. Amen.

    A miracle Saint John the Evangelist, who is from Ephesus, performed regarding the sin and disgrace of Saint Isḥāq, who said: Listen and comprehend, O my beloved brothers, what I will tell you. I saw a prostitute while she was adorning herself and putting on makeup, and they were speaking about our Lady Mary and the birth of Christ. Those from the prostitute said: He was not born. They say these things but they do not believe in Christ. And I was there when I returned and passed by the entrance of her house, that prostitute who said: By this truth the King of Heaven and Earth arose. And when she said this and…””

  7. And GPT-4, prompted carefully, suggests:

    “The passage you’ve provided is in Ge’ez, a Semitic language that was once spoken in the Horn of Africa and now is used mainly as a liturgical language by the Ethiopian Orthodox Church and the Eritrean Orthodox Tewahedo Church.

    Here’s a translation:

    “In the Name of: the Father, and the Son, and the Holy Spirit, One God, to whom be the glory. I have risen and I have believed, and I profess, unto the age of ages, Amen.

    The homily that Saint John, the Bishop of Aksum, delivered concerning the veneration and honor of Saint Isaac, and he said: “Hear and understand, O my brethren, that which I have told you. I have seen a woman who defames the Virgin and who speaks against the Virgin Mary, saying concerning the birth of Christ. ‘From a woman, this is how He was born,’ they say, ‘Thus, they do not believe in Christ.’ And I was agitated and I rebuked them, saying, ‘Be silent concerning this woman who says, ‘I know in this, the chapter, and the King of Heaven and Earth has departed. And when she said, ‘Such.'”

    Please note that some parts of the text are not clear, and hence the translation might not be completely accurate. The Ge’ez language, as well as the ancient cultures and religions that used it, often have complex concepts and terms that may not have direct analogues in modern English or in other languages and cultures.”

Leave a Reply