More on the ancient Greek and Latin at Google

A few days ago I gave a link to 500 ancient Greek and Latin texts at Google.  What I had not realised was that this list was not just a bunch of pointers, but a new set of scans, done at high resolution specifically to aid OCR.  A reader has emailed me a link to an article on the Inside Google Books blog — itself new to me. This states, after an intro:

I’m pleased to announce that Google Books is now assisting this work by sharing high-resolution digital scans of over 500 volumes of Ancient Greek and Latin, dating from the sixteenth through nineteenth centuries. (Of course, downloadable versions of over a million volumes in all fields are available from books.google.com, in a more compressed form.) Jon Orwant and I created this collection using a list of several thousand important Classics volumes identified by our collaborators Professor Gregory Crane and Alison Babeu of Tufts University. We are analyzing additional volumes and expect to be able to release more high-resolution scans in the future.

These scans will aid the development of accurate OCR (Optical Character Recognition) algorithms for Ancient Greek, and provide the basis for electronic versions of important editions of these Classics texts; but perhaps their greatest value will be for the development of new methods in this emerging field. We’re honored that Professor Crane called this donation “a major contribution to what scholars can do.”

It also mentions something equally interesting:

… scholars around the world can now consult a high-resolution digital scan of Venetus A, one of the best manuscripts of the Iliad, at the Center for Hellenic Studies.

Mind you, I find on linking to it that someone at the website decided to block people using Internet Explorer.  That’s strange, but a minor thing.  The great thing is to get the thing online.

Among the manuscripts of the Iliad, one of the oldest and most important is the manuscript in the Biblioteca Marciana, shelfmark gr. 822.  This is given the reference letter (=siglum) “A” in the editions.  It is not merely a very important copy, beautifully written, nor merely one of the oldest outside of the very extensive papyrus fragments.  It also contains the ancient scholia to the text, originating in the text critical school at the Museum in Alexandria ca. 150 BC.   I have yet to manage to see any of the pages, thanks to the quirk above, but it can only be a very good thing indeed!

4 Responses to “More on the ancient Greek and Latin at Google”


  1. Ryan

    For the Venetus A, they have linked directly to the “manuscript browser,” which uses Google Maps to provide a somewhat easier interface to the 40 megapixel images (as well as toggling through the editions/images available, jumping to a specific passage in the Iliad, etc.), not sure what the bug with IE is. You can, however, go to the project page at http://chs.harvard.edu/wa/pageR?tn=ArticleWrapper&bdc=12&mn=1560 and click on “directory listing of all images” to get a very plain interface where you can download the entire set of everything (including multiple JPEG resolutions). This is, in fact, encouraged and Chris Blackwell has been a great advocate of this in negotiating Creative Commons licensing for all of the images. There is also a mirror of the images here: http://amphoreus.hpcc.uh.edu/

    Right now we have up high-res images for “A” (natural light, some ultraviolet, limited infrared, as well as most of the images from the 1901 “Comparetti” facsimile edition and Villoison’s 1788 printed edition), “B” (natural, some UV), and another MS in the Marciana (gr. 458 (= 841)) referred to as “U4″ in Allen.

    This is actually quite timely, as in a week’s time we will begin multispectral imaging of two Homeric MSS at El Escorial, Υ.I.1 (291 = E3 Allen) and Ω.1.12 (509 = E4 Allen). This means that for every page we will have UV and IR, in addition to multiple wavelengths of visible light, allowing us to capture an incredible amount of information about every page. E3 in particular has extensive fire, water, and mold damage.

    Chris is blogging about the project here: http://nobleswineherd.blogspot.com/. Right now they’re actually in Lichfield, imaging a previously-unknown Wycliff Bible, as well as the famous St. Chad’s Gospel.

  2. Roger Pearse

    This sounds very good indeed! I can’t see a sausage from here (behind corporate firewall using IE6) but will look when I can. I like the idea of adapting the Google books approach.

    I approve entirely of making the raw images available. I know people worry about this, and about widespread copying for commercial purposes, but the sad truth is that only people like us *care* about such things. I made the images of the 5 Tertullian mss I photographed available that way, and offered a CD if people wanted it — no-one ever asked for a copy, no copies of the pages ever appeared elsewhere online (although I would have been pleased if they had). It just isn’t a concern in 99% of cases, sadly.

  3. Maureen

    Iliad ms T-shirts, shields, and armor.

    Tertullian mss would have to be pallium cloaks. :)

  4. Roger Pearse

    Ouch! :)