Linking electronic Greek words to their English meanings

Ancient Greek is tough for computers, and computer programmers, to work with.  Firstly it’s a dead language, secondly it’s a non-Roman script, and thirdly no-one knows Greek anyway (although a lot of people pretend).

What we need are tools on our computers.  These are appearing, but very slowly.  The problem is the non-availability of data.

Except that data does exist.  For some years the Perseus site has had a very nice electronic edition of Liddell and Scott, and a tool wherein you can put in any Greek word and it will spit out the meaning and the standardised form.  The latter is known as the ‘lemma’, presumably to keep people from understanding. 

Perseus have now made their data available in the Perseus Hopper, which can be downloaded for non-commercial purposes.  Liddell and Scott is in a big XML file. 

Peter Heslin of Durham University has grasped the implications.  Version 3.1 of his Diogenes tool includes this XML file, and another file containing all the possible forms of all the words in the Greek language, their lemma, the part of speech (noun, verb, etc), tense, mood, singular or plural (etc), and most importantly the line number of the full description in the XML file.  This means that you can look up any word, and get a full description; so long as it’s in L&S.  The code is in perl, and is supplied.  Perl tends to be impenetrable, but this is a relatively well-written example.  So if you want to create your own dictionary program, here’s the materials.

But what about post-classical Greek?  Well, there’s the New Testament.  A list of all the words, in order, with part of speech, lemma, etc, was created long ago by James Tauber as MorphGNT.  The site is down at the moment, but the 1Mb text file does exist.

Now this is fine, but useless.  It doesn’t contain the English meaning.  But… Ulrik Sandborg-Petersen has digitised Strong’s dictionary and created an XML file of it.  This contains the Greek Lemma, for all words in the New Testament, plus the English meaning and other bits of info of no present concern.  You can see on his site what the data is, by tapping in his demo example.

MorphGNT also contains the lemma.  So this means that if we join the two together, we get all the possible forms of a word in MorphGNT, and the lemma for them; and the lemma plus the meaning in Strong’s.  Effectively, we now have a dictionary of NT Greek, forms, base form, and meanings.  All we have to do is program it.

What about other, non-classical Greek literature?  Somewhere around is a Septuagint in electronic form, with lemmas.  This can be referenced either against the meaning in Strong’s, or that in Liddell and Scott.  How many words appear in neither?  — I don’t know, but it would be interesting to know.  Mostly names, I would guess. 

Every lemmatized Greek text can now be a source of data to this process of creating as large an electronic Greek dictionary as we like.  And, of course, we need more dictionaries of lemmas-plus-English-meaning.  What others could be done, I wonder? 

I’ve just looked for “lemmatized Greek text” in Google and, among many interesting results, I have found the Lexis site, which claims to be able to help produced lemmatized Greek texts.  It runs on Mac, and I haven’t tried it; but it works with the TLG.  Likewise Hypotyposeis talks about lemmatized searches in TLG.  I think Josephus must be available somewhere in lemmatized form — where?

What I’m not finding is much Patristic Greek, tho.  What we need, clearly, is G.W.H.Lampe’s Patristic Greek Lexicon in XML.  This was published in 1961, so will be in copyright until all of us are dead.  But… couldn’t someone license an electronic version for non-commercial use?   It’s much too expensive for me to buy just at the moment (although a pirate PDF of the page images does exist, I see; apparently pp.1202-3 are missing, tho).

There is much that I don’t know still, tho.  Interesting to see that there is a blog called Coding Humanist.  Is there anyone out there interested in this stuff too?

Share

11 thoughts on “Linking electronic Greek words to their English meanings

  1. I’m very interested in this sort of thing, as a matter of fact many of these sorts of things are what I do in my job at Logos Bible Software.

    While glosses associated with lemma (dictionary) forms are helpful, they are also very delicate. Direct links to dictionaries are better; diglot presentation is pretty good too when there is an available translation because it reinforces in-context use and in-context translation. The substitute-gloss-for-lemma approach sounds great, but language is not isomorphic. I mean, when is λαμβανω “take”, and when it is “recieve”? A gloss association doesn’t help very much in such cases. This is why one can get many laughs by roundtripping things using Google Translate.

    The bottom line is that *some* knowledge of Greek is necessary. There are/will be debates as to how much that “some” really is; but some sort of knowledge is necessary to work with the original language texts. I think that for the ‘Biblical’ languages and text that bar is dropping, at least for those using software designed to help cross the bridge from English to Greek/Hebrew/Aramaic.

    Note Josephus is available in TLG as I recall (though Philo isn’t, to my knowledge). When reading it, one can click and get parsing/lemma info, though I don’t know if such data can be used in searches (single corpus or cross-corpus).

  2. Thank you for these notes. I didn’t know you worked for Logos — interesting in itself. What sort of things do you do?

    You’re absolutely right about the limitations of substituting a meaning for a lemma. But it’s still better than staring at a heap of funny-looking text! One would also need to change “to be” into “he was” etc, using the part of speech, tense, etc — which must be such a common task that I wonder whether code to do that exists out there already.

    The need is for more lemma->meaning dictionaries in XML form, I think. Most of the coding efforts are really for specialists only. But the arrival of “Liddell and Scott” and “Strongs” mean that we can start doing things for normal people.

    I presume the lemmatized text of Josephus must be in the Perseus Hopper download, then, if it’s available for browsing on Perseus.

    Has Logos considered licensing any of this content from Perseus, do you know? Do they charge much, if so?

  3. Interesting; but… I couldn’t understand your post at all! I read it twice and was no wiser. (Sorry if that sounds rude; it isn’t meant to be!) Probably I just don’t understand whatever it is you are referring to.

    Could you have another go? I’m all for proposals on translation tools.

    I found your blog before I read this, while searching for stuff on Lampe, and written another post about that.

    Time is something that gets rarer as we get older. At age 20, the time from Christmas to Christmas seems forever. A month is an age. At age 40, Christmas to Christmas feels about the same time as a month did when I was 20. Apparently it gets worse as you get older. That’s why old people never do anything; once they’ve got up, had breakfast, and coughed a couple of times, the day is over.

  4. The tools being discussed here would greatly help (and hasten!) translation efforts. Unfortunately the most I can program is my alarm clock, so I wish I could help out more. That said I’ll jump at anything that helps someone to translate Greek.

  5. Google Translate isn’t perfect, but it’s faster than looking up all the words yourself.

    If you don’t have any knowledge of the language, it’s not very useful and you can’t really trust it, except in the most general terms. But if you do know the language, even the basics, you can get the gist, figure out translation errors, and then look up the words you don’t know. (And sometimes the easiest way to do that is to put the words through the search engine and look for a dictionary or a context clue. Not that you’re likely to find much Internet Greek slang in the Fathers.)

    So I’m a big fan of glosses, though obviously in the long run it’s faster just to be able to read the stuff at normal seepd.

  6. ‘speed’.

    Anyway, I forgot to add that if you get this new, cool program for Greek up and working, I will probably use it to produce horrendous public domain translations of the least-known works of deservedly obscure authors. Mwahaha!

    Also, I finally finished reading my audiobook of Possidius last night. Thanks again for making its existence known and its content more accessible!

  7. Google translate seems to do a reasonable job of Italian. I don’t know how good it is on Greek; must go and have a look. But yes, the idea is that people should have more access to texts by being able to fiddle with the original language, however badly!

    Glad to hear about Possidius.

Leave a Reply