Notes on unicode editing in Coptic

Here’s a couple of notes on how I’m editing unicode Coptic in Microsoft Word 2007.

I’m using Wazu Japan’s Comprehensive Unicode Test Page for Coptic a lot.  This allows me to identify characters and unicode character sets.

I find I can enter any character in word by just typing the four-character code, and hitting Alt-X.  So if I type 0307 after a Coptic character and hit Alt-X, I get a diacritical dot above the character.  Wazu’s page tells me what the codes are!  What I have actually done is to record a macro, so I move to the character and hit Alt-1, which runs a macro that types 0307 and hits alt-X.  It saves keystrokes.

OK; I’ve manually replaced unicode accents (code 0300) with dots on a couple of fragments, and I’m getting fed up.  Can I do a global replace?  I think so.  This microsoft page (I had to use the Google cache version, as Microsoft tried to divert me to some useless registration process) seems to tell you.  You can search for any unicode character using this:

 ^Unnnn where nnnn is the character code

Let’s try it: ^U0300 in the Find box… and it doesn’t work.  ^U is not allowed.  I try ^u, lower case, and that is allowed but finds nothing.  Rats.  It seems I am not the first to discover this.  Not merely must it be lower-case; it must be decimal, not the hexadecimal (base-16) codes supplied by charmap or the Wazu page. 

OK, let’s try.  A hex converter is here.  Hex 0300 is decimal 0768, it seems.  Let’s try ^u0768.  And … nope.  That doesn’t work either.

 Boy this wastes a lot of time!  Thanks Microsoft.

UPDATE: Persistence pays off.  Well, I have a workaround.  You cannot replace unicode combining characters like dots and accents.  But … you can replace the character and the dot together.  I have just copied an e+accent into Find What (it looks like garbage when it arrives – but no matter) and copied an e+dot into Replace, and it worked.  It replaced 462 instances, indeed.  So… I can do a lot of these that way.

Still annoyed that Word doesn’t deal with it properly, tho.

More on Coptic unicode fonts

A few minutes ago I wrote about Alphabetum, the commercial Coptic font which uses the Bohairic typeface, and the way in which this limited people working with Coptic.  This led me to think about the idea of commissioning a free font. Of course really this is something that a grant body should make happen. 

A hunt around the web revealed that Keft, the free Coptic unicode font with the Sahidic typeface, was designed by Michael Everson of evertype.com.  It seems that it was commissioned by the International Association of Coptic Studies, whose website is rather out of date and does not say so.  I wonder what it cost?  It seems that Stephen Emmell was responsible, and it sounds like a long and arduous process was involved!

Both these fonts support unicode 5.1 which matters for things like dots over letters (diacritics).  Few of the other free fonts do.

I do wonder a bit about Coptic studies.  Syriac studies is pretty free-wheeling, everyone is friendly, everyone wants to encourage people, and everyone just pitches in.  In Coptic studies there seems to be a lot of stuffiness, a lot of “I’m far too important to reply” and general crustiness.  I got that feeling again reading the stuff about Keft.  Maybe that’s why I’ve never paid any attention to Coptic.

More on the Alphabetum font

My copy of the alphabetum font has arrived.  Unfortunately the email that supplied it added some extra conditions on use, not disclosed at time of purchase.  I bought the license that allows use in books, you see, for the Eusebius project.

First he wants purchasers who use it in a book to acknowledge the use of the font.  That’s just advertising, of course, and doesn’t really matter.

Much more serious is that he also wants a free copy of any book using the font.  Drat the man.  That’s an extra charge to use it for the purpose for which I bought it, and for which he advertised it.  In fact that must be illegal, I would have thought.  I’ve written to tell him so politely.  After all, I doubt he wants to annoy people. 

What all this brings home, tho, is how fortunate Syriac users are in having the Meltho unicode fonts.  Meltho are absolutely free, and indeed one of them even comes with Windows.  We all owe George Kiraz such a debt of gratitude for this.

By contrast Coptic users are crippled by lack of availability of a family of good quality unicode fonts, and are obliged to scurry around for whatever happens to exist.  Many of the fonts don’t handle dots and overscores very well — although Alphabetum does handle them exactly. 

A further problem is that you can’t pass around a Word document with material in Alphabetum; the recipient won’t be able to read it, unless they have a copy of the font.  You find yourself tangled up in a mess of problems that obstruct and hamper, for tiny amounts of money.

If I knew Coptic, I might fix all this by commissioning a font designer to make one.  But since I don’t know the alphabet, it’s out of the question.

I’m generally impressed with Alphabetum.  If you need a Bohairic Coptic font in Unicode, it will do the job.

Alphabetum – a more “Bohairic” coptic font? Plus notes on Coptic

I’ve had complaints from my translator that the Keft unicode font for Coptic isn’t that “Bohairic” in appearance.  Well, I could pass a Bohairic book in the street and not recognise one!  But I do recognise a difference in letter forms between Keft and what is used by De Lagarde in his 19th century printed text.

Quite by accident I have come across the Alphabetum font.  It’s not free, but not expensive.  Here’s a bitmap comparing the fonts; top one is De Lagarde; the middle one is Alphabetum; bottom one is Keft. 

Three Coptic Fonts; De Lagarde, Alphabetum and Keft

 The Keft font is apparently a “Sahidic” Coptic font.  The New Athena Unicode font is of the same type.

There’s some stuff on entering Coptic unicode here.  It looks as if I’m going to need to do it.  And I have just found these links by Christian Askeland, which look good.  These led me here, to some more fonts, of which only Arial Coptic seemed like De Lagarde, and the diacriticals didn’t seem right.  And this in turn gave this test page.

One difference I can see between De Lagarde and Alphabetum is the diacriticals.  It’s not that easy to find out about these, I find.  I wonder if the difference is important?

I need to find a basic grammar that is good on these things.

UPDATE: I have also found a wikipedia test page for Coptic in unicode 5.1, which lists a number of fonts as well-supported although is still vague on typefaces.  Quivira is listed, and is a VERY nice font; but Sahidic again.  Analecta is another new one to me.

Killing the dipsticks of the world

It’s funny how the world can suddenly become a hostile place!  I thought people might be amused by the litany of improbable problems that has prevented me from doing something simple this evening.

I got an email today from one of the people I’m working with, saying that they couldn’t work out how to install a unicode font, so could I print some stuff for them and send them out.  They don’t want advice, I find.  I don’t have much choice, but I cursed when I read this; I’m tired and have much to do.

  • I get home after a very long day, dog-tired, fire up the Windows 7 laptop, plug in the Canon i560 inkjet, a couple of years old, and … it won’t install. 
  • I search out the drivers disk… it says it isn’t compatible.
  • I hunt around the web — it’s nearly impossible to find ANY driver.  I find pages saying Canon won’t support Win7 for this product.  I download the XP driver.  It refuses to install.
  • Fine, I boot up the Vista  machine — and it locks up.  I look for my XP machine… and then think, hang on, why am I bothering with all this pain.  Let’s just use my laser printer.
  • I plug in the laser, print 3 pages and … the toner light comes on and it too refuses to print.
  • At that point I give in.  I am NOT going to attempt to change toner while stumbling tired.

But I won’t be buying a Canon ever again.

Sorry everyone — if you’re waiting for something from me, I am too frazzled to do it this evening!

Thinking about fonts to use for book

Professional publishers do not print using Microsoft’s “Times Roman” font.  Instead commercial fonts are used.  I don’t know much about these, but I’ve been looking around the web.

A font called “Bembo” seems widely used.  Unfortunately the character map does not include polytonic Greek.  I don’t expect these fonts to include Syriac, but that much is a minimum.

Another font is Adobe’s MinionPro, which does seem to include polytonic Greek.  This is my current candidate.  Apparently it comes free with Adobe InDesign CS3, or can be purchased separately.

I’ve also been looking at the text itself.  It needs to be kerned, which it seems can be done in Microsoft Word.  It also needs hyphenation, because justified text usually gets areas of whitespace in the middle of the line unless you do this.

What else?  Well, lots, probably.  I just wish I could find a useful guide to this, rather than working it all out by trial and error.

Thinking about typesetting

The two translations that I have commissioned are both very nearly complete.  In fact I hunger for the day when they will be entirely complete — which will probably be in a month or two.  It is remarkable how long it has all taken.

Then I need to create a book form of them both, so that I can sell copies to libraries.  This will ensure availability in that community, and perhaps recover some of the commissioning costs.

The unwary start with Microsoft Word, create a PDF and send it to a print-on-demand site like Lulu.com.  Then they wonder why it doesn’t look right.

Part of the reason is typesetting.  By default Word does not kern text — that is, move letters like AVA together so that there isn’t a big gap between them.  It can be turned on, under font formatting.

Likewise book publishers do not rely on Times Roman, but use professional fonts like Bembo and Baskerville.

I am profoundly conscious that this is a specialised area, which I have no real desire to learn.  Surely it should be possible to hire in the skill at a reasonable price?

I’ve found a forum here of people offering their services; I suspect that many of them have limited professional skills.  Someone who did seem to know what he was doing did write to me last year, but never replied to my last email.  I must pester him again!

Stopped by a PC

The Dell I mentioned a week ago turned out to be a turkey.  It was a Dell Studio 15, but it vibrated so strongly that my desk shook, and also had a headache-inducing mid-tone howl.  It’s going back, naturally.  Today I went out and got whatever was for sale in local shops, which turned out to be a Sony Vaio.  The Sony actually seems very nice.

Both it and my old Vista machine are now chained together, moving 250Gb of data across.  After that, I need to install lots of software — probably on Monday,  I would guess, as I don’t use a PC on Sundays.  Meanwhile I’ve found an even older laptop on which I am typing this. 

I apologise if any correspondence goes unanswered until I am back up and running properly.

The Vista machine (a Dell Inspiron 1720) started having problems.  Worse yet, I was out of disk space for all the PDF’s of books that I need and want.  The new machine has twice the disk space. It also has an “eSata” port — apparently that will allow an external hard disk to run at a reasonable speed —  USB hard disks are hopelessly slow.  If so, I can put my PDF collection on such a drive, and not need quite so much on the local hard disk.

An algorithm for matching ancient Greek despite the accents?

I need to do some more work on my translation helper for ancient Greek.  But I have a major problem to overcome.  It’s all to do with accents and breathings.

These foreigners, they just don’t understand the idea of letters.  Instead they insist on trying to stick things above the letters — extra dots, and quotes going left and right, little circumflexes and what have you.  (In the case of the Greeks they foolishly decided to use different letters as well, but that’s another issue).

If you have a dictionary of Latin words, you can bang in “amo” and you have a reasonable chance.  But if you have a dictionary of Greek, the word will have these funny accent things on it.  And people get them wrong, which means that a word isn’t recognised.

Unfortunately sometimes the accents are the only thing that distinguishes two different words.  Most of the time they don’t make a bit of difference.

What you want, obviously, is to search first for a perfect match.  Then you want the system to search for a partial match — all the letters correct, and as many of the accents, breathings, etc as possible.

Anyone got any ideas on how to do that? 

I thought about holding a table of all the words in the dictionary, minus their accents; then taking the word that I am trying to look up, stripping off its accents, and doing a search.  That does work, but gives me way too many matches.  I need to prune down the matches, by whatever accents I have, bearing in mind that some of them may be wrong.

Ideas, anyone?