Michael the Syrian part 3 – progress report

I’ve now scanned in images of all the pages (around 600) of this monstrously heavy volume — my forearms will never be the same — using Abbyy Finereader 8 to control the scanner.  I scanned in black-and-white at 400 dpi, which is the best for OCR.

I’ve gone through the batch, turning alternate pages the right way up.  I’m now importing it into Finereader 9, which has better OCR and produces smaller PDF’s.

UPDATE (16:30): I’ve created a searchable PDF, which is about 33Mb.  Now starting to upload it to Archive.org.  This can be slow and frustrating, and will probably take all evening.  I’ve also exported the text as .htm and .doc, which I’ll probably place there also.  I haven’t proofed any of the OCR output, but FR9 gives rather better results than FR8, which is what the automatic processes at Archive.org use.

UPDATE (16:36): Good grief.  It uploaded first time.  It’s here: http://www.archive.org/details/MichelLeSyrien3  I’d better add the other formats, then (if it will let me).  It’s not in the searches yet, tho.

UPDATE (16:39): Hmm.  The interface for uploads of extra files has changed.  Somewhat better than it was.  Still very slow, it seems, and not that intuitive.  You can tell it was tested by someone local to the server, and not someone far away from it.

Share

Leave a Reply