Google books lets me down badly

I’ve just had a very bad experience, because I relied rather uncritically on a volume that I found on Google books.  It’s a warning, and I doubt I shall forget it in a hurry.

I have someone out in the Middle East transcribing the Arabic from Erpenius’ 1625 edition of the 13th century Coptic historian, al-Makin.  Of course I got a copy from google books and sent it off, and thought no more about it.

The text is 300 pages.  It turns out that various pages are missing, others appear out of order, or several times.  Of course the transcriber was chosen for their Arabic skills, and, although they’ve done their best, have been utterly confused by this.  Worse yet, they live in a region where internet access is poor, so downloads are very slow.

I have had to spend the entire evening working on the Erpenius PDF in Adobe Acrobat 9 Pro; indicating, page by page, whether the page should be included or not; marking up individual pages with red crossings out; inserting missing pages from another copy.

I’ve had to do this so that the transcribe can go through their transcription, in the order of the original defective PDF, and find the material in the right places.

It’s a hideous job.

Moral: never rely on a Google books PDF.  Take the time and just go through it and collate it.  It will take 15 minutes at most, and it will save you a world of frustration.


11 thoughts on “Google books lets me down badly

  1. Did you find any pages obscured by a picture of someone’s thumb? They do that a lot too.

  2. That, indeed, is a frustrating experience.
    Sometimes, the volumes can be in a “bad state”. I once came across a volume with the Greek text of Vettius Valens, only to discover that many pages were missing, sometimes had been replaced by a different page, and page order was not linearly incrementing… Page 24 was all over the book…

  3. Sure, this sometimes happens. I also had, for greek texts, books digitised from the last to the firts page… not very nice.
    But, though Google-Books is not perfect (and indeed, it is not) it is one of the best way to get a text for one who doesn’t live next to a library. But you are right, let’s take time to check our copies…

  4. I think we are all grateful for Google books. Without it, merely to obtain a copy of Erpenius would be a slow and expensive business. Rare books rooms used to supply *terrible* quality photocopies.

  5. It makes me feel terrible. The few times I had to make rare books copies when I worked at Rare Books and Archives at my university, they were as beautiful and clear as I could make them. Doing the job right the first time saves time.

    I realize that you can’t expect my level of obsessiveness from everyone (especially for work/study or minimum wage); and I realize that the Google Books scanning machines move fast and have their pitfalls; but still, it’s just not right to do a job that’s only good on certain pages.

  6. Well it’s a trade-off. Do we do a small number of things perfectly or a large number of things not so well and accept that most of the time it will not matter? Google has always done the latter.

  7. As a former librarian my fears with Google books and the like is this:
    1. There may be several hundred or even a few thousand copies of a book in various libraries and private ownership
    2. The book may be significant in its content but not an especially collectable book for its appearance.
    3. A copy is digitised and made available either freely or much more cheaply than the original copies
    4. Libraries, to save on storage costs and increase access to students who research and write their essays the night before they are due, gets rid of their print copy and link to the online copy [Note: libraries are under pressure from institutional management to cut costs and increase access to their fee paying students – it is all about money]
    5. The print copies loose value and have minimal 2nd hand value – when they become available 2nd hand dealers have no interest in stocking a book that people will only download off the net
    6. Print copies over the next few decades are tossed, only a few are kept and these are in a handful of libraries in USA and Europe
    7. Somebody like yourself in 20-50 years from now discovers the problem with the digitised copy – but too late. Unless they live near one of those handful of libraries they won’t find a copy in any library nearby, and it will be impossible to buy a 2nd hand copy

    Another fear – what happens when most copies have been tossed – and Google etc. decide to charge for access with the same steep prices that some journal publishers charge for access to journals

    Another fear – Google may grow in size, wealth, and power to either force out or buy out other digitised collections, creating a monopoly over access – a situation perfect to charge for access

  8. Indeed. On the other hand, in our days when many texts are digitized as htm-text, often corrected and even modified, and this htm-text is the first we can find, we shall recognize that the “picture-page pdf” (on Google books or elsewhere, is a kind of chartophylax : there we can check what was the originaly edited text…
    I shall add that when a library burns (the last one is bookstore-library of Father Ibrahim Serruj, in Tripoli, Lebanon) books even the last known copy of very rare edition are destroyed. If the book has been digitized, and copie spread all around the world, the loss is not absolute.

  9. We’re clearly at a time of change, akin to the time when the roll gave way to the codex, or the manuscript to the printed book. The question is how we avoid the losses that occurred at both previous gateways.

Leave a Reply