OCR with macrons and other funny letters in Finereader

I’m scanning Brockelmann’s Geschichte der arabischen Litteratur.  It’s mostly in German, of course; but the Arabic is transliterated using a wide variety of odd unicode characters.  There are letter “a” with a macron over it (a horizontal line), and “sh” written as “s” with a little hat on it and so forth.  These don’t occur in modern German, so get weeded out.

But you can do this, in Finereader.  You just define a new language, based on German.  I called mine “German with Arabic”.  And when you do, you specify which unicode characters the language contains.  So all I had to do was scroll down through the unicode characters, find the funnies that Brockelmann had used, and add them in.

And, if you don’t get them all first time, you can edit the language, select it, get the properties, and add the next few in.  And … it works.  It really does.

Finereader is really amazing OCR software.  And I learned all this from the help file.  Look under “alphabet” in the search.

Share

3 thoughts on “OCR with macrons and other funny letters in Finereader

  1. How to Write those Ltters and from i can copy them????
    Please i Want to know so badly

  2. The way I do it is to open Charmap in Windows. Pick a font like Times Roman, or Tahoma, or Palatino Linotype, and scroll down. You’ll find all these characters in there, and you can copy them from that.

    It’s very fiddly, tho. There’s probably a better way.

  3. I’m doing this for Finereader 12 and Italian today. I note that, after you define your new language, Finereader says “Invalid language” in the box at the top of the screen. But this is fixed easily if you exit and re-enter Finereader.

    For some reason this works better for Italics than for normal text. Hmmm…

Leave a Reply