This post at Vitruvian Design is very timely to a man trying to write some Greek->English translation software. I can’t comment on it from behind this firewall, so will comment here.
I am delighted to see someone else interested in getting a master list of Greek words and morphologies for the first thousand years. I must look into this project that is referred to. The problem, surely, will be patristic Greek; and the answer would be to turn G.W.H.Lampe’s Patristic Lexicon into an XML file, in the same way that Perseus have done for Liddell and Scott. Someone would have to argue with Oxford, who own the copyright; but for non-commercial use, I expect a license could be negotiated. Lampe is out of print anyway.
I think that I know why Liddell and Scott give weird accusatives as an extra entry. The book is designed for manual use, and someone finding an odd word is liable to look for something in that form, rather than the unknown to them base form. But such things are unnecessary in a digital file, I agree.
Not all of the files mentioned in the post are known to me. I know that an XML file of L&S exists in the Perseus Hopper, and also in the Diogenes download. But I’m not clear where to find the “invaluable list” by Peter Heslin resulting from running the Perseus morphologiser over the TLG disk E. A morphology file greek.morph.xml is part of the Perseus Hopper download.
The issue of mismatches between this and L&S is quite interesting. I’d like to follow this more.
But one obvious omission is the New Testament. The morphology list in MorphGNT is also available; and English meanings in the XML file of Strong’s dictionary. These too need integrating into the project, I would suggest.
All this work is enormously valuable. The project is also trying to establish something shockingly fundamental; a list of extant Greek literature!
I’m not sure how I feel about this. I agree that the task should be undertaken — indeed it’s appallingly hard to find out these things, as I found out when I wanted a list of manuscript traditions — , but it seems a digression from the main IT-related task. They’ve decided to start with poets; again, a minority taste. I can’t help feeling that this task should be spun off.
The post also introduces me to Epidoc, of which I know little, in the context of converting to and from unicode. If some way to do this reliably exists, I want it! More details here. This is the ‘transcoder’. I downloaded it from SourceForge and was slightly surprised to find that it was Java, and without source code. Java is certainly better than perl. But in the Microsoft world, being in Java is not really ideal. That said, what is? Some sort of DLL, which will become unusuable in a few years as Microsoft change the specifications?
All in all, a super post!