Writing Greek translation software – searching for meaning

One of the problems with using free online sources  — aside from bumptious Germans claiming ownership of the Word of God — is that the data is never quite in the format you would like. 

I’m still working on my software to help translate ancient Greek into English.

I’ve just found a set of morphologies — lists of Greek words, with the tense, mood, voice, etc — which omits to include the part of speech! 

Likewise meanings for my purposes would best be a single English word; most dictionaries are all waffly, which looks very odd when you put it against each word!

May 2009 Bloodsucker Award – the German Bible Society

I am pleased to announce a winner for the Bloodsucker Award this month — the German Bible Society! 

Their successful entry was their emails demanding that various open-source projects which use the 10-year old morphologised Greek New Testament be abandoned, on the grounds that they “own” the text of the Greek New Testament.

When I announced this award, I described the criterion as follows:

I will award it, ad hoc, to institutions in receipt of state funding which in order to make money violate their primary directive; to make books available and promote learning.

I don’t know whether the GBS receives state money, although in Germany religious bodies often do.  But it does enjoy charitable status in order to promote learning and study of the scriptures, and so falls within the general area — abuse of public funding in order to make money instead of doing its job.

More on “copyright” of the Greek New Testament

Still quite angry about the actions of the German Bible Society in claiming copyright of the work of the apostles.  I’ve been looking around the web for comment. 

The best comment I have seen is that the text can only be copyright if the scholars who produced it did their work badly.  Their intention was NOT to create an “original creative work”!

If the German Bible Society believes that it is not issuing the work of the apostles, but of Mr. Aland — to the extent that it is an original, creative work — and that no-one else has the work of the apostles, then I would like to see them say so!

But the most interesting comment was by Stan Gundry of Zondervan, here.

I am not a copyright attorney myself, but I have had lengthy phone conversations with a lawyer who is credited with being the best in the USA. Here’s the deal, at least according to USA copyright law. Ancient texts such as those we are dealing with in the OT (Hebrew/Aramaic) and NT (Greek) are in the public domain and are not protected by copyright. In fact (and this is controversial), even the critical texts as reconstructed by textual critics cannot be protected by enforceable copyrights. The textual critical apparatus has a somewhat better claim to copyright, but to the extent that such an apparatus is a catalog of information, my sources tell me that any claim to an enforceable copyright is weakened. “Sweat equity” in the recreation of ancient texts is not sufficient to establish copyright. It takes sweat equity to create a phone book, but you cannot copyright a phone book. This is not something that the United Bible Society or the German Bible Society wants to hear or agrees to, this is what our lawyer consultants have told us.

Peter Kirk has two posts full of common sense on this also.  Among other things he points out that the Germans have not actually issued take-down demands, and we shouldn’t act as if they have until they do. 


Linking electronic Greek words to their English meanings

Ancient Greek is tough for computers, and computer programmers, to work with.  Firstly it’s a dead language, secondly it’s a non-Roman script, and thirdly no-one knows Greek anyway (although a lot of people pretend).

What we need are tools on our computers.  These are appearing, but very slowly.  The problem is the non-availability of data.

Except that data does exist.  For some years the Perseus site has had a very nice electronic edition of Liddell and Scott, and a tool wherein you can put in any Greek word and it will spit out the meaning and the standardised form.  The latter is known as the ‘lemma’, presumably to keep people from understanding. 

Perseus have now made their data available in the Perseus Hopper, which can be downloaded for non-commercial purposes.  Liddell and Scott is in a big XML file. 

Peter Heslin of Durham University has grasped the implications.  Version 3.1 of his Diogenes tool includes this XML file, and another file containing all the possible forms of all the words in the Greek language, their lemma, the part of speech (noun, verb, etc), tense, mood, singular or plural (etc), and most importantly the line number of the full description in the XML file.  This means that you can look up any word, and get a full description; so long as it’s in L&S.  The code is in perl, and is supplied.  Perl tends to be impenetrable, but this is a relatively well-written example.  So if you want to create your own dictionary program, here’s the materials.

But what about post-classical Greek?  Well, there’s the New Testament.  A list of all the words, in order, with part of speech, lemma, etc, was created long ago by James Tauber as MorphGNT.  The site is down at the moment, but the 1Mb text file does exist.

Now this is fine, but useless.  It doesn’t contain the English meaning.  But… Ulrik Sandborg-Petersen has digitised Strong’s dictionary and created an XML file of it.  This contains the Greek Lemma, for all words in the New Testament, plus the English meaning and other bits of info of no present concern.  You can see on his site what the data is, by tapping in his demo example.

MorphGNT also contains the lemma.  So this means that if we join the two together, we get all the possible forms of a word in MorphGNT, and the lemma for them; and the lemma plus the meaning in Strong’s.  Effectively, we now have a dictionary of NT Greek, forms, base form, and meanings.  All we have to do is program it.

What about other, non-classical Greek literature?  Somewhere around is a Septuagint in electronic form, with lemmas.  This can be referenced either against the meaning in Strong’s, or that in Liddell and Scott.  How many words appear in neither?  — I don’t know, but it would be interesting to know.  Mostly names, I would guess. 

Every lemmatized Greek text can now be a source of data to this process of creating as large an electronic Greek dictionary as we like.  And, of course, we need more dictionaries of lemmas-plus-English-meaning.  What others could be done, I wonder? 

I’ve just looked for “lemmatized Greek text” in Google and, among many interesting results, I have found the Lexis site, which claims to be able to help produced lemmatized Greek texts.  It runs on Mac, and I haven’t tried it; but it works with the TLG.  Likewise Hypotyposeis talks about lemmatized searches in TLG.  I think Josephus must be available somewhere in lemmatized form — where?

What I’m not finding is much Patristic Greek, tho.  What we need, clearly, is G.W.H.Lampe’s Patristic Greek Lexicon in XML.  This was published in 1961, so will be in copyright until all of us are dead.  But… couldn’t someone license an electronic version for non-commercial use?   It’s much too expensive for me to buy just at the moment (although a pirate PDF of the page images does exist, I see; apparently pp.1202-3 are missing, tho).

There is much that I don’t know still, tho.  Interesting to see that there is a blog called Coding Humanist.  Is there anyone out there interested in this stuff too?