Finereader 15 includes Fraktur OCR! Finally!

Excellent news this afternoon.  It seems that the new version of Abbyy Finereader, version 15 (which for some reason they have renamed Finereader PDF 15) incorporates their excellent Fraktur recognition engine for the first time.

And it works!  I tried it out on some 19th century German text.

That is pretty darned good.  That’s exactly what comes out, without any editing!

This has been an awful long time coming.   Back in 2003 a “European Union” (i.e. German) project commissioned Russian software firm Abbyy to adapt their excellent OCR engine to handle Fraktur.  They did so, and the results were good.  But then somehow it all went wrong.  Instead of being added to Finereader, which we all were buying, they created a standalone version purely for Fraktur, at a price that only universities could afford.  The result is that for 17 years we have been denied the use of something paid for by taxpayers.  But no longer.

The addition feels a bit bodged in.  You turn on Fraktur recognition by selecting one of 6 languages.  Instead of the language being “German (Fraktur)”, it is “Old German”, so you don’t see it in the list of languages next to “German”.  But once you know, it’s fine.  That’s all you have to do; just select “Old German”.

Myself I can barely read texts printed in Fraktur, and German is not my best language anyway.  But with the help of this, and dear old Google Translate, we can see what these authors have to say!


OCR with macrons and other funny letters in Finereader

I’m scanning Brockelmann’s Geschichte der arabischen Litteratur.  It’s mostly in German, of course; but the Arabic is transliterated using a wide variety of odd unicode characters.  There are letter “a” with a macron over it (a horizontal line), and “sh” written as “s” with a little hat on it and so forth.  These don’t occur in modern German, so get weeded out.

But you can do this, in Finereader.  You just define a new language, based on German.  I called mine “German with Arabic”.  And when you do, you specify which unicode characters the language contains.  So all I had to do was scroll down through the unicode characters, find the funnies that Brockelmann had used, and add them in.

And, if you don’t get them all first time, you can edit the language, select it, get the properties, and add the next few in.  And … it works.  It really does.

Finereader is really amazing OCR software.  And I learned all this from the help file.  Look under “alphabet” in the search.