Digitising ancient texts – the future that did not happen

This morning I saw the following announcement:

We’re really proud to announce that EpiDoc XML versions of all 99 volumes of the monumental Corpus Scriptorum Ecclesiasticorum Latinorum (CSEL) are now being added to the Open Greek and Latin Project‘s GitHub repository!

What it means, for non-techno junkies, is that someone has scanned the 99 volumes of the CSEL, turned them into text, encoded that within the XML format, and uploaded them to a standard open-access repository.  The point of the XML is to preserve the footnotes and other weird formatting.  It will take some kind of viewer to make this useful.

In a way this is good news.  Only half the CSEL has been online, in page images scanned by Google and Archive.org and others.

And yet … haven’t we been here before?

How is this different, in many ways, to what I was doing back in 1998?  I was taking printed Latin texts (by Tertullian), and creating an electronic text.  Mine was in HTML, rather than XML.  I didn’t always bother with apparatus – but then, there was only one of me doing it.

But essentially … isn’t this the same activity?

I was inspired by Harry Plantinga of the CCEL.  Even earlier than me – was it in 1995? – he had got Logos to digitise the 38 volumes of the Ante-Nicene Fathers, footnotes and all, and posted them online in text files.

Back then, we knew that the future was bright.  We knew that in ten years time, there would be a sea of texts online.

So what happened?  Because, unless I miss my calculation, it’s now sixteen years later.  And we’re only now getting something like this done, in much the same way as a solitary individual – myself – was doing it all those years ago.

The classical texts have mainly been the work of Bill Thayer at Lacus Curtius.  He’s been hacking away all these years.  Why isn’t his work long superceded?

The patristic texts have mainly been me.  Again, why hasn’t my site been overtaken by massive digitisation efforts?

What’s changed in the interval?  Yes, Google Books has scanned trillions of page images.  That has been great.  Microsoft started to do the same and then abandoned it.  Not so great.  Archive.org has flown the flag in its place, in a much lower budget way – well done, but not what we anticipated.  Publishers have, on the whole, been mainly concerned to ensure that Google Books would only educate Americans and people not living in Europe.  And nobody has cared.

In many ways the world is a far different place than it was in 1998, 16 years ago.  And yet, as we learn today, most of the ambitions of people like myself, like Harry, like Bill, and indeed others who have laboured in the same fields[1], have not been fulfilled.

Which is a bit sobering, really.

We are getting, gradually, the mass digitisations of manuscripts.  But … I was doing this back in 2000.  Undoubtedly I was ahead of my time, and I gave up after doing a handful.  But … with all the technical advances, surely in fourteen years we should be further down the line?

In other ways we are losing ground.  James Tauber created the electronic Greek New Testament in the MorphGNT text file, lemmatized and ready for processing by anybody.  The German bible society threatened litigation, on the basis that the Greek New Testament belongs to THEM, and not to some funny blokes named Matthew, Mark, Luke and John, and offline it went.  Nothing replaced it.  Nobody cared.

What I take from this is that we really must not simply assume that stuff will come online any time soon.  It isn’t happening.  There are any number of initiatives, and all these are welcome.  We’re in a much better place, in some ways.  And yet … compared to the progress of technology, the content has hardly moved forward.

Will the classical internet ever truly come to be?  Or the patristic internet?  In our life-times?

  1. [1] A list would be invidious – I’m just pulling a couple of names here, without disrespect to others.

9 Responses to “Digitising ancient texts – the future that did not happen”


  1. Dan King

    there are many of us that care, we’re just not in positions to do much about it…
    You should perhaps mentioned the TLG online – I know it’s not free, but neither is it prohibitively expensive like anything Brill puts out

  2. GFranzini

    Hi Roger, just a quick one to let you know that I added a link to your list of CSEL volumes to the blogpost you quoted and reworded it slightly as I later realised it implied we’re going to publish the copyrighted volumes as well, which is not the case.

  3. Roger Pearse

    Ah, thank you. I was surprised, I admit.

  4. jk

    James Tauber probably made the mistake of including UBS variants. They cracked down on the Westcott Hort text with UBS variants a few years ago. But its the UBS variants that’s the problem.

    You can still download Stephanus, Scrivener, WH, Tischendorf’s 8th, and Robinson-Piermont 2000 Majority Text (which they released to public domain) from http://unbound.biola.edu/ in a easy to parse text format. (All without accents and diacritics, however.)

    There is also https://sites.google.com/a/wmail.fi/greeknt/home/greeknt where the 2005 Robinson-Piermont (also released to public domain) can be downloaded along with several other texts. Here you can get the 2005 with such markup that any programmer can easily convert it to UTF8 with accents and diacritics.

  5. Livius Nieuwsbrief / april | Mainzer Beobachter

    […] conclusies over de eerste twintig jaar internet: als het gaat om het digitaliseren van teksten zijn de beloften niet waargemaakt (en zo denkt uw […]

  6. Aantekeningen bij de Bijbel · Livius Nieuwsbrief / april

    […] conclusies over de eerste twintig jaar internet: als het gaat om het digitaliseren van teksten zijn de beloften niet waargemaakt (en zo denkt uw […]

  7. Dan King

    Can we get any help in how to use these XML files for those of us non-techies? And I was looking for CSEL3 but it wasn’t in the list

  8. Roger Pearse

    Good question. I haven’t looked at it. It would have to be processed somehow. The xml schema would be needed.

    I think it is not, in fact, all the CSEL … only those already online at Archive.org.

  9. Greta Franzini

    @Dan King: we upload the public domain volumes as soon as we receive them from our data entry company. CSEL 3 will come soon.