A new use for the parallel Latin translations in the Patrologia Graeca

Now that we have a very effective Latin translation in Google translate, it occurs to me that we can also use this to read a great deal of patristic Greek.  For as we all know, the Greek fathers were all translated into Latin at the renaissance and after, and were nearly always printed with parallel Latin translation, right the way down to the 19th century.

The obvious example of this is Migne’s Patrologia Graeca, our standard reference collection of texts.  It’s never been worth transcribing the Latin side.  But maybe now it is, just as a reading aid for those of us without fluent Greek?

This isn’t a new situation, in a way.  Indeed the reason why all these Latin translations even exist at all, is that knowledge of Greek was always rarer than fluency in Latin.  The translations are not always reliable; but something is better than nothing.

On the other hand it won’t be all that easy to OCR the Latin of Migne…

An excerpt from PG volume 78, column 226, a letter of Isidore of Pelusium in the Migne edition.

The low quality of Migne’s printing is something that we have all struggled with.

But there are workarounds.  The last time that I needed to OCR the Latin of Migne, I went and found the edition that he was reprinting on Google Books.  This, needless to say, was far better printed, and created many fewer errors in Finereader 15.

So it is possible, and it’s worth bearing in mind if we need to work with a large patristic text for which no modern translation exists.  Spend some time creating an electronic text of the Latin translation, and push it through Google Translate!

Update (5 Aug 2023): Note that it is actually possible to copy the OCR’d text from Google books itself, for both the Greek and Latin sides in the PG.  Go to the page in question.  Hit the cut-and-paste icon so it goes dark grey, then drag a rectangle over the area that you want to copy the text from. As you release the mouse, a dialogue will pop up, and the text is in the top box. It looks as if its monotonic for Greek. The results are quite respectable.

Share

Help wanted by Perseus with metadata for Patrologia Graeca

The Perseus project are working on the Patrologia Graeca and Patrologia Latina.  I’m not entirely certain what they are hoping to produce as output, but it looks as if they are OCRing the volumes, as best they can, and producing lists of what texts are contained, on what pages/column numbers, what footnotes, introductions, etc.  They also need help with proofreading.

It might be a fun thing to get involved in, if you have some time (which I don’t myself).  Although how you contact them I don’t know (for, curiously, they do not say).

Via here, and slightly reformatted:

Help sought with Metadata for the Open Patrologia Graeca Online

http://tinyurl.com/p39fx3f  [draft — January 19, 2015]
Gregory Crane (Perseus Project and the Open Philology Project, The University of Leipzig and Tufts University)

We are looking for help in preparing metadata for the Patrologia Graeca (PG) component of what we are calling the Open Migne Project; an attempt to make the most useful possible transcripts of the full Patrologia Graeca and Patrologia Latina freely available.

Help can consist of proofreading, additional tagging, and checking the volume/column references to the actual PG.

In particular, we would welcome seeing this data converted into a dynamic index to online copies of the PG in Archive.org, the HathiTrust, Google Books, or Europeana.

For now, we make the working XML metadata document available on an as-is basis.

They’ve been attacking the OCR in an interesting way:

Nick White … trained and ran the Tesseract OCR engine and Bruce Robertson [ran] … the OCRopus OCR engine on scans of multiple copies of each volume of the Patrologia Graeca.

The resulting OCR [outputs] contain … a very very high percentage of the correct readings [allowing] very useful searching, as well as text mining…

This is all very well; but of course you need to be able to label each text, so that you can find things.  This means indexing the texts and tagging them.  There is already an index, created by Cavallera in 1912.  So…

To support this larger effort, we are working on Metadata for the collection.

We have OCRd and begun editing the core index at columns 13-114 of Cavallera’s 1912 index to the PG ([link] here).

A working TEI XML transcription, which has begun capturing the data within the print source, is available for inspection here.

I must confess a small bit of pride here: for I had long forgotten that I uploaded that PDF of Cavallera to the web.  But this is the beauty of the web – each contribution makes another contribution possible.

Share