I’ve been working on placing Theodoret’s commentary on Romans on the web for a while. I OCR’d it in Abbyy Finereader 11, and I finished proofing the OCR in Finereader before Easter.
Today I tried exporting the text to HTML. It has rather a lot of italics in it, so imagine my fury when I discovered that exporting “formatted” text had lost all the italics! A bit of experimentation revealed that the same happened when saving “formatted” text as .RTF. Only saving “exact text” retained the italics. And you don’t want all the crud that comes with that.
I imagine that it’s just a bug; but it is a frustrating one. I really do not want to reitalicise some 100 pages.
Another annoyance was that Finereader now attempts to work out where footnotes are involved, and create its own numeration. In Word this is fine, as inserting and renumbering footnotes is trivial. In HTML, however, it simply creates work that has to be undone.
Finereader does excellent OCR. But I wish they would spend some time getting the product user-tested, really I do.