Back in 2010, I wrote about the problem of Greek text formatted in pre-unicode fonts. Yes, it was a problem even then. But the problem is still with us, because we have editions of Greek texts, embedded in older PDFs, which are not in unicode.
A PDF from Byzantinische Zeitschrift 98 (2006) contains the following bit of text:
But when I copy the Greek text from line 1, I get this:
MetÛvra, Monã ^AgÝaw TriÀdow
Yes, the PDF has embedded an old pre-unicode font. Looking at the file properties, I see which one it is:
It was “Times-TenGreek”. Whatever that was.
A lot of googling gives little information, and a vast number of fake sites offering “downloads” that are of some random font. Eventually I stumble across this site by Luc Devroye:
Times Ten (Adobe, 1988-1990) was the house font used by the Frankfurter Allgemeine Zeitung. It has an Adobe version with Greek, called Times Ten Greek Upright (1988-1991). The full family can be found here.
But sadly Times Ten Greek Upright is not there, although other TimesTenFonts are:
Looking at the file name patterns, I conclude that “TimesTenGreek-Upright.otf” or something like that might be the file name. Searching for this gives a PDF at https://www.fonts.at/pdf/LT_Originals_OT_Edition_3.pdf, which lists four files.
4584. Times® Ten Greek Upright TimesTenGreek-Upright.otf
4585. Times® Ten Greek Inclined TimesTenGreek-Inclined.otf
4586. Times® Ten Greek Bold TimesTenGreek-Bold.otf
4587. Times® Ten Greek Bold Inclined TimesTenGreek-BdInclined.otf
So these did exist. But that’s all I can find. No source from which to obtain it, and no indication of what the encoding was. Basically we have an electronic text but no way to use it.
This will become a significant problem, if it is not already so. What can be done about it?



Dear Roger, I happened to have a similar problem that I could easily solve in forcing the run of an OCR over the PDF file. Afterwards I had no problem in copying the text. The biggest issue is in case you have a multilingual text and need to chose only one dictionary for the OCR. In your sample test for instance, in case you chose the Greek dictionary, you would get something like:
Ρἰὰ Ο ΠΊΘΠΟ ΔΠΊρΡἰ6 4 [θίΐθγο αἱ Ργοοορίο ἰ Ξοριοηίί Πιον] σΟα οὶ ΔΠΟΟΓᾺῚ ἰΠΘ ΒΟ 551:
1. Μετέωρα, Μονὴ Ἁγίας Τριάδος 95, ἃ. 1778: ΕΓ, 138–152ν.
2. Μετέωρα, Μονὴ Ἁγίας Τριάδος 110, ἃ. 1804: ΕΡῚ 11δν-12θν.
3. ΜεποΖία, ΒΙΡΙοΐθοα Μαγοίαπα, οι. 521 (ς9}]. 316), 5. ΧΠῚ: ΓΕ 109ν–111].
As you can see, only the Greek text is properly rendered, while the sentence in Italian is useless.
Hope it helps anyway!
In the federal courts in the United States, documents that are filed electronically must be filed in the PDF/A format, which is a document format that embeds the fonts in the document. That solves the problem of the document’s legibility but it doesn’t ensure that it’s possible to copy text from the document.
The advent of Unicode arguably solves the latter problem: no matter what font you use, the character code is always the code for alpha+smooth+acute.