Digitizing your own library, and how to build your own book scanner

The existence of Google books is causing some interesting ripples.  Some people are now wondering whether they really need all those books in paper form. From Ancient History Ramblings I learn of this interesting article in the Chronicle of Higher Education, Digitizing the personal library:

Books take up space. That’s a problem for any librarian tasked with finding room on overcrowded shelves. It’s also a problem for a book-loving scholar who lives in a small New York City apartment with a toddler and more than 3,000 books. Under those conditions, something’s got to give. Chances are good it won’t be the toddler.

Alexander Halavais, an associate professor of communications at Quinnipiac University, found a partial solution to his city dweller’s no-space-for-books dilemma: Slice and scan. A digital file takes up a lot less room than a codex book does.

In a post on his blog, A Thaumaturgical Compendium, Mr. Halavais described what he had done to some 800 of his books so far: “First I cut the boards off, and then slice the bindings. I have tried a table saw, but a cheap stack cutter works better. Then I feed [the pages] into my little page-fed scanner, OCR them (imperfectly) using Acrobat, and back them up to a small networked attached storage device.” (See before-and-after pictures, above.) Many of the scanned books he also stores as image files. …

Read the whole article.  It contains much of interest.  Alex Halavais is using a Fujitsu Scansnap, although he doesn’t say which model.  I use one myself, and the speed is definitely a selling point, as is the PDF output.

The comments on the article are also interesting.  Some worry about whether this is allowed under copyright, although since they aren’t wealthy publishers, and probably never make any money from copyright, you have to wonder why they are rushing to defend someone else’s profit stream.  But comment 27 is perhaps the most relevant:

I hope after all of the effort and expense put into this project there is a plan in place for preserving the digital files. Digital files are unstable and subject to corruption. It would be unfortunate if the drives on the networked storage device failed and Professor Halavais lost not only his printed books but the digital surrogates as well. With books on the shelf you can be assured that when you open them in 20 years the words are still the same words, without active management of the digital files this simply isn’t true in the digital world.

When I talk about digital preservation to people I often help people understand the issues by referencing things like eight track tapes, zip discs, floppy discs, Wordstar, etc.

This issue is a very real one, and I don’t know what the answer is.  I myself had to throw away some old backup tapes from years gone by, being unable to persuade the old tape drive to read them.  Both drive and tapes went into the skip.

The comments also link to a forum of people engaged in designing and building their own book scanners.  I have not read  through it all, but it is quite clear that it is not difficult to do.  This is what you do with books too large for an A4 scanner.

Do we want to slice up our books?  I certainly do not.  But I do have quite a lot of academic books which I could really use better in PDF.  It’s interesting.  But scanning a book without cutting it up is very slow indeed.

The world, once again, is changing.

7 thoughts on “Digitizing your own library, and how to build your own book scanner

  1. Your example prompted me to get a Plustek OpticBook 3600, and I find it a great improvement over the Canon LiDE 100 that I had been using.

    My wife and I had accumulated an estimated 4000 books, for which we really don’t have storage space, so I’ve been scanning them and donating them to the local public library.

    I put everything into PDFs and back them up to off-site storage via Carbonite. If the day comes that PDFs aren’t sufficient, no doubt there will be plenty of notice, and I’ll transfer them over to a new format.

  2. There was rather a complete suite of software that came with the machine. Don’t recall the name right now, but it is much more robust than the Canon software — each machine produces collections of images (I’ve set it for JPEG), and then collate those into PDFs; the Canon software can only do about 30 pages at a go, which leads to a lot of tedious assembly (tiresome on the Mac, damned near impossible on the PC). I’ve done up to 400 pages with the OpticBook software so far with no problems.

    Carbonite: http://www.carbonite.com/en/default.aspx — there are other possible ‘cloud’ solutions, such as DropBox and Microsoft’s SkyDrive.

    I suppose I ought to OCR them into e-books, but I just haven’t had the time.

  3. If you do recall the name, do add it.

    I’ve used Abby Finereader in the past for collecting images, but the firm is engaged in wrecking the product, and that element of it just does not work that well any more, and does not export the images properly. So I’m on the lookout for another.

    I use Adobe Acrobat to open PDF’s and make them searchable. Wish it wasn’t so expensive.

  4. It’s “Presto! PageManager 7.10”. Abbey Finereader is, if I recall correctly, one of the programs that comes with, although it requires page-at-a-time processing (as if I had time for that).

    The Preview utility that comes with the Macintosh operating system allows quite extensive manipulations of PDF files, although I don’t recall whether making the text sercheable is among them.

Leave a Reply