More Michael the Syrian

A crisp sunny morning, a free afternoon at home, and an email arrives telling me that volume 2 of Michael the Syrian is available for collection at my local library.  Sometimes it all just comes together.  I wonder how much of it I can scan today?

UPDATE: (Early Afternoon) I’d forgotten how HEAVY the volumes are.  The physical labour in picking  them up, turning the page, placing it on the scanner, turning it round, etc, it pretty exhausting.  The paper is yellow-ish, which makes for speckling when scanned.  70 pages so far, tho.  The speckling seems to affect the margins most.

It’s an interesting question, whether to trim the margins or not.  Why bulk out the file with speckled white-space? 

UPDATE: (3pm) 123 pages. Groan.  One page had a bit of foxing, which came out as black splotches in black/white scanning.  So I did that page in colour.

UPDATE: (5pm) I’m aiming for 200 pages.  On page 190 at the moment, although I had to stop when the plumber arrived.  Then I can have dinner!  Somewhere in the reign of Justinian at the moment; I saw the name Belisarius a moment ago.

Uploading to Archive.org

Like most people, I have become used to searching Google books and Archive.org for out-of-copyright scholarly texts.  These are an enormous blessing to us all, where books normally hidden in University rare books rooms can be downloaded as a PDF. 

I’ve become aware that it is possible to upload books to Archive.org, and have uploaded a couple of items which I have, and which were not in the archive. 

Of course the first step is to scan the book.  For this I use Abbyy Finereader 8.0, which drives a Plustek Opticbook 3600 scanner at 400 dpi.   This creates images of the pages, and all the pages in the book can be saved as a single PDF file from Finereader.  For optical character recognition, I use Finereader 9.0 (which can only drive the scanner at 300 dpi or 600 dpi, curiously) which has much improved accuracy over Finereader 8.

It is necessary to create an account on Archive.org in order to upload.  Then you get a button ‘Upload’, and can use this to do an upload of a PDF.  This will work fine.  To add extra file formats, use the instructions in the FAQ; edit the item, use the item manager, checkout the item (no download is involved in checkout), and then use an FTP interface to add more files.  I was unable to get this to work in Internet Explorer 7 or Firefox 3; but the CuteFTP programme worked fine once I disabled secure-FTP and used simple FTP. 

I added to each item a text file output, a Word document with all the formatting, and an HTML file with simple formatting only. 

I would like to encourage readers to look at their shelves and consider which texts might be usefully uploaded.  Every printed item prior to 1st January 1923 is out of copyright in the USA and so can go up.  Copyright laws in the EU and UK require knowledge of the biography of the author, as copyright there absurdly expires 70 years after the death of the author.  But union catalogues of research material like COPAC these days often indicate the birth and death date of authors, making it possible to determine status.