From my diary

I pulled up the OCR project for the Book of Asaph the physician in Finereader 11 this lunchtime.  It’s a 6th century Jewish medical text, which apparently contains interesting quotes from classical writers.

Readers may remember — I can hardly remember myself — that I was experimenting with deskewing the pages, increasing the brightness, etc, in order to improve OCR.

Pretty much the last thing that I did was to open the PDF and import it into FR11, without doing any work.  I ran the OCR anyway, just to see what the raw result would look like.

The raw result is certainly better than some of the rubbish that I have had to clean up in the past.  But it is far from simple.  I think deskewing etc would be the answer.  However there are 250 pages to do, one at a time.   It might be a gentle task to do some time.


From my diary

This afternoon I sat down with Origen, Homilies on Ezekiel 8-10 (and Jerome’s preface), and compared our translation with the 2010 ACW one.  The object of the exercise was to locate any serious differences in understanding, and allow us to revise the translation if the ACW version suggested an improvement.  I am pleased to say that I think all the deviations so far are in our favour.  There is one obscure section where I am not convinced that we are right, but we’ll see.  I’ve passed this material over to the translator for review.  I still have homilies 11-14 to do, but I think I have done what I will do today.  It is hard work!

This evening I’ve been playing with Abbyy Finereader 11, using the PDF’s of the unpublished translation of Book of Asaph the Physician, discovered by Douglas Galbi at the US National Library of Medicine.  I don’t know a sausage about this text, I should say at once, so it’s a voyage of discovery here.  I’m not committed to OCR’ing it either!  But it’s a convenient vehicle for experimentation.

Now in the past I found that Finereader 11 wouldn’t play with my Finereader 10 projects, so I ignored it.  But starting afresh, I’m discovering some interesting and useful new facilities.

The photos of Asaph are all rather skewed.  This is inevitable in photographing books, unless you can press the pages on a glass to get them flat.

But in Finereader 11, I find that some new tools have been added to the image editor.  There’s a very nice facility to adjust for “trapezium” effects — and it works well.  Even better is the line straightener.  Also there is a brightness/contrast control. If the type on the far side of the paper shows through, you can lose it by increasing the brightness.

The image files for Asaph are pretty bulky, so things are slow.  But I was able to turn a page that was skewed to blazes back into something straight.  Skewed pages require intervention on pretty much every line, which slows OCR to a crawl.  But Finereader 11 can cope with this.  I’d like the facility to apply the same deskew to a bunch of images, rather than one-by-one, tho.

Something Abbyy could usefully do is allow us to change the background colour of the OCR window.  The green-ish coloured images result in a green-ish coloured background in the text window, for some reason, and this is very unpleasant and impossible to remove.

One pleasing thing that I see has at last arrived: an “insert symbol” facility.  Long overdue and very welcome it is too!