Ancient Greek is tough for computers, and computer programmers, to work with. Firstly it’s a dead language, secondly it’s a non-Roman script, and thirdly no-one knows Greek anyway (although a lot of people pretend).
What we need are tools on our computers. These are appearing, but very slowly. The problem is the non-availability of data.
Except that data does exist. For some years the Perseus site has had a very nice electronic edition of Liddell and Scott, and a tool wherein you can put in any Greek word and it will spit out the meaning and the standardised form. The latter is known as the ‘lemma’, presumably to keep people from understanding.
Perseus have now made their data available in the Perseus Hopper, which can be downloaded for non-commercial purposes. Liddell and Scott is in a big XML file.
Peter Heslin of Durham University has grasped the implications. Version 3.1 of his Diogenes tool includes this XML file, and another file containing all the possible forms of all the words in the Greek language, their lemma, the part of speech (noun, verb, etc), tense, mood, singular or plural (etc), and most importantly the line number of the full description in the XML file. This means that you can look up any word, and get a full description; so long as it’s in L&S. The code is in perl, and is supplied. Perl tends to be impenetrable, but this is a relatively well-written example. So if you want to create your own dictionary program, here’s the materials.
But what about post-classical Greek? Well, there’s the New Testament. A list of all the words, in order, with part of speech, lemma, etc, was created long ago by James Tauber as MorphGNT. The site is down at the moment, but the 1Mb text file does exist.
Now this is fine, but useless. It doesn’t contain the English meaning. But… Ulrik Sandborg-Petersen has digitised Strong’s dictionary and created an XML file of it. This contains the Greek Lemma, for all words in the New Testament, plus the English meaning and other bits of info of no present concern. You can see on his site what the data is, by tapping in his demo example.
MorphGNT also contains the lemma. So this means that if we join the two together, we get all the possible forms of a word in MorphGNT, and the lemma for them; and the lemma plus the meaning in Strong’s. Effectively, we now have a dictionary of NT Greek, forms, base form, and meanings. All we have to do is program it.
What about other, non-classical Greek literature? Somewhere around is a Septuagint in electronic form, with lemmas. This can be referenced either against the meaning in Strong’s, or that in Liddell and Scott. How many words appear in neither? — I don’t know, but it would be interesting to know. Mostly names, I would guess.
Every lemmatized Greek text can now be a source of data to this process of creating as large an electronic Greek dictionary as we like. And, of course, we need more dictionaries of lemmas-plus-English-meaning. What others could be done, I wonder?
I’ve just looked for “lemmatized Greek text” in Google and, among many interesting results, I have found the Lexis site, which claims to be able to help produced lemmatized Greek texts. It runs on Mac, and I haven’t tried it; but it works with the TLG. Likewise Hypotyposeis talks about lemmatized searches in TLG. I think Josephus must be available somewhere in lemmatized form — where?
What I’m not finding is much Patristic Greek, tho. What we need, clearly, is G.W.H.Lampe’s Patristic Greek Lexicon in XML. This was published in 1961, so will be in copyright until all of us are dead. But… couldn’t someone license an electronic version for non-commercial use? It’s much too expensive for me to buy just at the moment (although a pirate PDF of the page images does exist, I see; apparently pp.1202-3 are missing, tho).
There is much that I don’t know still, tho. Interesting to see that there is a blog called Coding Humanist. Is there anyone out there interested in this stuff too?