OCR with macrons and other funny letters in Finereader

I’m scanning Brockelmann’s Geschichte der arabischen Litteratur.  It’s mostly in German, of course; but the Arabic is transliterated using a wide variety of odd unicode characters.  There are letter “a” with a macron over it (a horizontal line), and “sh” written as “s” with a little hat on it and so forth.  These don’t occur in modern German, so get weeded out.

But you can do this, in Finereader.  You just define a new language, based on German.  I called mine “German with Arabic”.  And when you do, you specify which unicode characters the language contains.  So all I had to do was scroll down through the unicode characters, find the funnies that Brockelmann had used, and add them in.

And, if you don’t get them all first time, you can edit the language, select it, get the properties, and add the next few in.  And … it works.  It really does.

Finereader is really amazing OCR software.  And I learned all this from the help file.  Look under “alphabet” in the search.