More on QuickGreek

I’m still stuck at home with a temporarily dodgy leg, so I’ve been looking again at QuickGreek.  This is a bit of software to help people like me, who know Latin, deal with polytonic Ancient Greek text. 

The idea is that you paste in a bunch of unicode Greek into one window and hit Ctrl-T. 


It reads through the Greek, splitting it up into short bits (i.e. when there is a comma or colon or whatever).  For each bit it parses the individual words, looks up the meaning and displays something underneath the word.

The sections and the meanings are interleaved like this:


Listing the meanings one after another does not make a sentence, but it’s a start on producing your own.

You then hover the mouse over the Greek word you wish to inspect, and you get a morphology in the bottom left — nominative singular etc — and whatever information I have about the word in the bottom right.

In this way you can build up a translation of short sections, even if you don’t know much Greek at all.  Which is sort of the idea.

I’ve done a little more on the thing today, and I’m quite pleased with what I’ve done and what I’ve got so far.  It needs more work in every area.  The problem is that I can never devote very long to it at any one time, and it takes a while to get back into it.

I might make a  version available for download for people to play with.  I think it’s reached the point of serving some purpose.  But I need to play around with texts with wrong or no accentuation now.


An algorithm for matching ancient Greek despite the accents?

I need to do some more work on my translation helper for ancient Greek.  But I have a major problem to overcome.  It’s all to do with accents and breathings.

These foreigners, they just don’t understand the idea of letters.  Instead they insist on trying to stick things above the letters — extra dots, and quotes going left and right, little circumflexes and what have you.  (In the case of the Greeks they foolishly decided to use different letters as well, but that’s another issue).

If you have a dictionary of Latin words, you can bang in “amo” and you have a reasonable chance.  But if you have a dictionary of Greek, the word will have these funny accent things on it.  And people get them wrong, which means that a word isn’t recognised.

Unfortunately sometimes the accents are the only thing that distinguishes two different words.  Most of the time they don’t make a bit of difference.

What you want, obviously, is to search first for a perfect match.  Then you want the system to search for a partial match — all the letters correct, and as many of the accents, breathings, etc as possible.

Anyone got any ideas on how to do that? 

I thought about holding a table of all the words in the dictionary, minus their accents; then taking the word that I am trying to look up, stripping off its accents, and doing a search.  That does work, but gives me way too many matches.  I need to prune down the matches, by whatever accents I have, bearing in mind that some of them may be wrong.

Ideas, anyone?


More on Greek translator

One advantage of translating that fragment from Euthymius Zigabenus a couple of days ago is that it made me look again at my Greek->English translator.  It doesn’t give you a good “translation”; but it did give the tools for any Latinist to get the idea.  So I’m resuming work on it for a bit.  Let’s see where it goes.


Unicode Greek font and vowel length

I didn’t realise that doing Ancient Greek on computers was still a problem, but I found out otherwise today.  We all remember a myriad of incompatible fonts, and partial support for obscure characters; and like most people I imagined that Unicode had taken our problems away.  Hah!

Unicode character 0304 is the “combining macron”.  What that means, to you and I, is the horizontal line above a long vowel.  Character 0306 is the “combining breve” – the little bow above a short vowel.  The “combining” bit means that if you stick one after an “A” in a wordprocessor, the display will stick it above the preceding letter.  Both symbols are required to display dictionary material correctly, of course.  Poetry needs this stuff.

Today I find that neither character is supported in quite a range of fonts.  Palatino Linotype, found on every PC, doesn’t support either.  Ms Arial Unicode supports both, but of course most people don’t have it (or has that changed?).  The links I give above give you lists of supporting fonts, mostly conspicuous for not being present on most PC’s.

This is a bit silly.  Come on, chaps, I thought this was sorted out years ago.

I wonder if I can remember where I met a Microsoft font chap, and suggest to him that Palatino be extended to include these?

An interesting list of fonts tested by the TLG people is here.


Why do we write accents on our ancient Greek?

The most obvious omission to strike the eye [in his book] is the disappearance of accents.  We are indebted to D. F. Hudson’s Teach Yourself New Testament Greek for pioneering this revolution.  The accentual tradition is so deeply rooted in the minds of classical scholars and of reputable publishers that the sight of a naked unaccented text seems almost indecent.  Yet from the point of view of academic integrity, the case against their use is overwhelming.  The oldest literary texts regularly using accents of any sort date from the first century B.C.  The early uncial manuscripts of the New Testament had no accents at all.  The accentual system now in use dates only from the ninth century A.D. 

It is not suggested that the modern editor should slavishly copy first-century practices.  By all means let us use every possible device that will make the text easier and pleasanter to read; but the accentual system is emphatically not such a device.  Accurate accentuation is in fact difficult.  Most good scholars will admit that they sometimes have to look their accents up.  To learn them properly consumes a great deal of time and effort with no corresponding reward in the understanding of the language.  When ingrained prejudice has been overcome, the clear unaccented text becomes very pleasant to the eye. 

In Hellenistic Greek the value of accents is confined to the distinguishing of pairs of words otherwise the same.  In this whole book it means only four groups of words; EI) and EI=); the indefinite and interrogative pronouns; parts of the article and the relative pronoun; and parts of the present and future indicative active of liquid verbs.  I have adopted the practice of retaining the circumflex in MENW=, -EI=S, -EI=, -OU=SIN and in EI=); of always using a grave accent for the relatives (\H, (\O, O(\I, and A(\I, and an acute for the first syllable of the interrogative pronoun (TI/S, TI/NA, etc.).  These forms are then at once self-explanatory, and the complications of enclitics are avoided.  All other accents have been omitted.

I should dearly love to take the reform one stage further, by the omission of the useless smooth breathing.  Judging by the criterion of antiquity, breathings have no right to inclusion.   Judged by the criterion of utility, ) should be used as an indication of elision or crasis, and nothing else, and the rough breathing would then stand out clearly as the equivalent of h.  The fear that examinees might be penalised for the omission of the smooth breathing has alone deterred me from trying to effect this reform.  I should like to know if other examiners would support this proposal. — J. W. Wenham, Elements of New Testament Greek, pp. vii-viii.

As someone fairly new to Greek, I don’t quite know how to look at this.  If the accents really are largely useless, why have them?  But is it as simple as this?

At the moment I’m working on software to automatically look up Greek words.  In the inscription we were looking at yesterday, the words mostly are found in the dictionaries, including Ares; but not “Aphrodite”.  I don’t really believe that the goddess isn’t in the dictionary.  Rather, I suspect, that some faulty accentuation means that X\ is not equalling X, or the like.  Most bits of code that I have seen for use with ancient Greek involve reams of code to try to overcome this sort of thing; all more or less inept.

Perhaps when I am searching for a word, I should first strip off all its accents, and all smooth breathings except one at the end of a word — e.g. A)LL) would become ALL) — and search using that?  Would I get a load of spurious matches?

And why do we have this complicated thing, if it is such a burden?  Is perhaps the accentuation thing just a bit of snobbery?  A way to keep the hoi polloi out?  No doubt there is snobbery around, as in all things to do with men and their deeds.  But is that all there is?  Or is there more to it than this?


Diogenes limitations

I’ve been looking at Peter Heslin’s Diogenes tool, which is really quite extraordinary.  It does things that I do not need, but frankly it’s  a marvel, particularly when you realise that he worked out so much of the content himself.

One limitation seems to be that the parsing information for a word does not indicate whether it is a noun, a verb, a participle, or whatever.  It does tell  you whether it is singular or plural, masculine or feminine etc; but not whether it is a noun or an adjective.  This is a singular omission, and, for a newcomer, a somewhat frustrating one.

Does anyone have any ideas how this information might be calculated?


Writing Greek translation software – searching for meaning

One of the problems with using free online sources  — aside from bumptious Germans claiming ownership of the Word of God — is that the data is never quite in the format you would like. 

I’m still working on my software to help translate ancient Greek into English.

I’ve just found a set of morphologies — lists of Greek words, with the tense, mood, voice, etc — which omits to include the part of speech! 

Likewise meanings for my purposes would best be a single English word; most dictionaries are all waffly, which looks very odd when you put it against each word!


LXX text marked up with part of speech, etc

I was hunting around the web for a morphologised Septuagint text — one with the word, the part of speech (noun, verb, etc) and other details, plus the headword or lemma.  I remember doing this search a few years ago, so I know it exists.  This time I was less lucky.  In general there seemed to be less data available online, not more.

I can’t imagine the labour involved in taking each word of the Greek Old Testament, working out all these details, and creating a text file of it all.  It seems enormous to me.  But… to do it, and then let it just disappear, as if unimportant?  That seems even less believable, if anything.  Whatever is going on?

Somewhere there is a great database of morphologised French.  I can find webpages that refer to it; but the download site is gone.  This was state-funded; yet it too has gone.

Why does this happen?


More notes on QuickGreek

I’m continuing work on a piece of software to help me translate from ancient Greek to English.  One problem has been the time taken before it finishes starting.  When it takes 10-20 seconds on startup, just to load the various dictionaries, you quickly weary of it.  If you want to look up two words, you find it very annoying.  Since I am running it repeatedly, to test things, I’ve got very weary of it.

I’ve now got the load time down to a couple of seconds.  This is still rather longer than I like, but obviously a lot better.  The downside of this is that it takes marginally longer to parse each word.   I tend to work with only a few hundred words at a time, so this is not too onerous.  The processing time for my test text (of 386 words) is about a second, which is acceptable if not wonderful.