Archive for the 'Scanning' Category
September 6th, 2010 by Roger Pearse
Ten years ago I was still scanning material for the Tertullian Project. One thing that I started to do was acquire foreign-language translations. In a way this was a mistake; it was quite hard to scan and proof these, and really those who speak that language group will be far better at it. So after a while I stopped. But I had by then acquired a fair collection of Tertullian translations.
These have languished in a pile of books ever since. Nor are they of value. When I took a whole load of books to sell in Oxford, the dealer wouldn’t even look at the Italian translations. These I ended up giving to Oxfam there, in the faint hope that they might find a reader.
One translation that I bought, in January 2001, was Tertullian: Udvalgte Skriften (=Selected Works). This was a small collection of works by Tertullian in Norwegian translation, published in 1887. It’s about small paperback size, and some 260 pages long. Unfortunately when it arrived I found that it was in the ‘gothic’ font (or ‘Fraktur’) favoured in Germany up to WW2 and then deep-sixed by an edict from Hitler himself (or so I am told). That meant that I couldn’t even OCR it. OCR for Fraktur was developed eventually, in collaboration with Abbyy, the owners of Finereader, but then stitched up so that no-one could have access to it.
I found the book again a couple of weeks ago, when I pulled all my academic books out of the cupboard and piled them on the side. I felt morally obliged to create a digital copy, and today I’ve done so. It’s just a PDF full of page images, but at least it exists. So … if you speak Norwegian, and can read text in Fraktur, enjoy!
The PDF is now online at Archive.org, here.
Would anyone like the book itself? It’s unbound, and coming apart a bit, but everything is there. It cost me around 110 Norwegian Kronor. It’s yours for $10 by Paypal, plus whatever postage costs to wherever you are. If not, I think I know a Norwegian scholar who would probably give it a home.
The book is volume 15 in a series. The volumes were listed inside the back cover. I can’t even read the letters but these seem to be the texts.
1. Two of Cyprian’s works.
2. A Tertullian work – maybe the Apologeticum?
3. A work by Augustine.
4. Clemens Romanus, 1st letter.
5. Cyril of Jerusalem, Catechical lectures
6. Cyprian, another two works (one about Donatus)
7. Justin Martyr, Apology.
8. Augustine again.
9. Augustine, Enchiridion.
10. Selected works of Chrysostom.
11. Ignatius and Polycarp, letters and martyrium.
12. Minucius Felix, Octavius.
13. Augustine. Something about Donatism.
14. Athenagoras, Tatian, Letter to Diognetus.
15. Tertullian, Selected Works.
I see the word “subscriptionen” so I suspect there were more. But who would know?
Is there a norseman in the house?
December 17th, 2009 by Roger Pearse
For many years I have used Abbyy Finereader as my OCR software. Version 10 is now out, and I have just bought an upgrade.
Mind you, I have retained copies of FR8 and FR9 on my disk, installed and ready to use. FR9 was quite an improvement in OCR terms on FR8, and has better PDF handling, but the user interface is a lot harder to use. It fights you. I’ve never got used to its quirks. In particular it decided that it wouldn’t allow me to scan images at 400 dpi on my Plustek Opticbook 3600 — which FR8 did — and since I prefer to scan at that resolution, I had to retain FR8. It’s also better for image cropping.
So … FR10. I’ve just installed it, which was painless. It asks if I want to start some screengrab software every time I start my PC — I uncheck this. I open it up for the first time, and it wants me to register – that too is painless.
Then I get a screen with a big red window of “helpful” options — with no way to close it. I uncheck “display on startup” and it still won’t go. I’m forced to close the application, and restart. Not really that good a start.
Next I open an existing FR9 project. I’d started work on Censorinus, so I use that. I select the folder; and then it asks me to save it somewhere else. Yes, OK, we never had to do that in FR5, FR6, FR7, FR8 or FR9. Why change it? So I waste some disk space and create folder censorinus_fr10. I suppose newcomers will find it useful. And it opens the project OK. Hmmm. Now what?
I click on a page, and it doesn’t seem to include any of the OCR’d text. I select ‘Read’ and it OCR’s it. But … where is the text I was working on? A look shows that FR10 has kindly deleted all my recognised text. It’s kept the blocks on the screen, and that’s it. B*****ds!! Now we know why they insisted on keeping the old directory — boy would they be lynched if they hadn’t! This is bad. This is really, really bad. Who wants to restart a whole project?
OK, well I look through a few pages rather hopelessly, and I see one where the image needs editing. So … what do we have? Well, we have the FR9 style: “Let’s hide all the tools boys! Hee hee!” I had to customise mine to get an eraser on it. How do I do that now?
Well, I can’t say. If I choose Page|Edit page image, I get a rubbish image editor, with no tools, on which I can crop. This is the FR9 approach, way inferior to the FR8 one. It looks as if they still haven’t got rid of that idiot who ruined the interface. I erase a bit of rubbish on the image … it takes ages. The pages flashes as I do. Awful!
OK, I see it. You choose View|Toolbars|Quick Access bar. This puts an extra bar at the top, under the file menu. Then you do View|Toolbars|Customize. Choose categories “Image”, and you are looking at that toolbar. Now go down the icons on the left, and insert them where you want them on that toolbar. I add erase and a few others, and suddenly I can clean up the image as I want to. I can zoom the image (although only to 200%, unlike before – another degradation in service), and I can get rid of the image of some long dead student’s pencil on the page.
I’m dispirited, tho. I’m having to work at this, just to do simple OCR tasks.
OK. Let’s OCR that page. Right-click, read and … off it goes. I get two windows, image and text. Luckily the “Quick Access Bar” also allows me to minimise the image! And I click on the text at one point, where it’s duff, and … hang on, where’s the zoom at the bottom? Ah, it’s still at the bottom; just not displayed by default. (Why?!) One click on it, and it appears.
The OCR quality appears about the same, or possibly a little better. We’ll see.
Overall verdict? Wish they’d shoot the interface designer.
UPDATE: another glitch. While working on Censorinus, I had to do a global replace of “aera” to “era”. This I did, but they’ve made a subtle change. After the replace, I used to just hit Esc to get rid of the search/replace dialog box. Now it doesn’t work. And why? Because each time you do a replace, they shift the focus to the document, meaning you have to click the dialog box to get back to where you were.
This is unbelievably infuriating, and will make for much more work in using the product. All those extra clicks during a long search/replace…
August 5th, 2009 by Roger Pearse
It’s hot and humid here; so much so, that I can’t think straight. So I’ve been looking at the piles of photocopied articles and running them through my scanner and throwing away the photocopy. That’s a mindless activity I can do.
Not sure I’m quite there yet, tho. The PDF’s are OK, but they aren’t OCR’d. The scanner software has OCR, but it’s not good enough. Nor is the built-in OCR in Acrobat. The best still seems to be Finereader 9; but the PDF’s don’t go through FR9 unchanged. The images can look strange.
Not sure what to do about that. But I am gradually freeing up storage space.
July 28th, 2009 by Roger Pearse
I’ve now scanned in images of all the pages (around 600) of this monstrously heavy volume — my forearms will never be the same — using Abbyy Finereader 8 to control the scanner. I scanned in black-and-white at 400 dpi, which is the best for OCR.
I’ve gone through the batch, turning alternate pages the right way up. I’m now importing it into Finereader 9, which has better OCR and produces smaller PDF’s.
UPDATE (16:30): I’ve created a searchable PDF, which is about 33Mb. Now starting to upload it to Archive.org. This can be slow and frustrating, and will probably take all evening. I’ve also exported the text as .htm and .doc, which I’ll probably place there also. I haven’t proofed any of the OCR output, but FR9 gives rather better results than FR8, which is what the automatic processes at Archive.org use.
UPDATE (16:36): Good grief. It uploaded first time. It’s here: http://www.archive.org/details/MichelLeSyrien3 I’d better add the other formats, then (if it will let me). It’s not in the searches yet, tho.
UPDATE (16:39): Hmm. The interface for uploads of extra files has changed. Somewhat better than it was. Still very slow, it seems, and not that intuitive. You can tell it was tested by someone local to the server, and not someone far away from it.
July 27th, 2009 by Roger Pearse
And boy is it hard work! Just lifting and turning the heavy volume itself is tiring. Just scanned p. 165. I find that I have to play games with myself, to avoid giving up. So at the moment I’m saying, “only a couple more to 170; you can pause there.” When I get to 170, of course I have 171 open. So I tend to just scan the extra page — just turn the book and lower it on the scanner. Then, “well, may as well do a couple more.” And so on.
We tend to take for granted how all those books on Google and Archive.org got scanned. But it was hard, slow, back-breaking work. When we grumble about missing pages, perhaps we should think of some low-paid person, very tired.
P. 173 done. Maybe I’ll just do as far as 180…
UPDATE. p.269. Wonder if I can get to 300 tonight?
UPDATE2. p.361. But I’m missing One Tree Hill! Still, when the pages are turning and the pain-level is low, you have to keep rolling.
July 24th, 2009 by Roger Pearse
I scanned volume 1 and volume 2 of the French translation of the Chronicle of Michael the Syrian, the big 12th century Syriac Chronicle and placed them on Archive.org. I learned today that after a very long wait, volume 3 has appeared at the local library via ILL. I shall go and get it tomorrow, and fire up my scanner.
July 17th, 2009 by Roger Pearse
Ninth century Byzantine chronicler Theophanes is the earliest Greek source to give a biography of Mohammed, or so I have been told. I referenced yesterday the relevant pages in the Bonn edition. But an English translation does exist, made by minor sci-fi author Harry Turtledove, although this only starts in 602 AD. This was published in 1982 so will be offline and in copyright long after I am dead, which is a pity.
Every time I find myself having to seek out an offline source, it’s a pain. I’ll only want the book for five minutes; but to get it will involved a lot of labour and time, or some money. This can’t be an unusual experience, and indicates why academic offline publishing must be doomed. It so pointless.
Another translation was made by Cyril Mango for Oxford University Press, in 1997, which starts in 284AD. It translated the De Boor text, and calls the Turtledove version “highly inaccurate” — pretty steep language. Apparently it look Mango 15 years to do. Yet the Turtledove translation is still being sold. I wonder how many copies it sells? Would the publisher sell the copyright? How much for?
I find that I have access to a DJVU version of Mango, and — bless them — that Abbyy Finereader will open it so I can scan the portion about Mohammed (on page 464). The chunk is not that long. In the meantime I’m reading Mango’s introduction.
Theophanes Confessor (d. 822) uses and continues the better known chronicle of George Syncellus. He was aristocratic in manner, addicted to sport when young, handsome and even portly in appearance. He was easy-going, a generous host, and even as a monk was not averse to taking the waters at a fashionable spa. He does not seem to have travelled much, staying in the Constantinople-Bithynia area. He openly says that he did not have a proper education, and learned his work as a scribe as part of his monastic obligation.
Where Theophanes’ chronicle differs from many is that he had access to a Syro-Palestinian source which informed him about Eastern events. He thus includes the Moslem rulers in his lists. No other Byzantine chronicler was so well equipped, nor so interested in this material, which Theophanes uses extensively. Like George Syncellus, he uses the Anno Mundi chronology and his work is a descendant of that of Eusebius of Caesarea; indeed the last such.
I will add Theophanes on Mohammed here when my OCR job finishes!
UPDATE: Here it is, translated by Cyril Mango:
 In this year died Mouamed, the leader and false prophet of the Saracens, after appointing his kinsman Aboubacharos (to his chieftainship). At the same time his repute spread abroad) and everyone was frightened. At the beginning of his advent the misguided Jews thought he was the Messiah who is awaited by them, so that some of their leaders joined him and accepted his religion while forsaking that of Moses, who saw God. Those who did so were ten in number, and they remained with him until his murder. But when they saw him eating camel meat, they realized that he was not the one they thought him to be, and were at a loss what to do; being afraid to abjure his religion, those wretched men taught him illicit things directed against us, Christians, and remained with him.
I consider it necessary to give an account of this man’s origin. He was descended from a very widespread tribe, that of Ishmael, son of Abraham; for Nizaros, descendant of Ishmael, is recognized as the father of them all. He begot two sons, Moudaros and Rabias. Moudaros begot Kourasos, Kaisos, Themimes, Asados, and others unknown. All of them dwelt in the Midianite desert and kept cattle, themselves living in tents. There are also those farther away who are not of their tribe, but of that of lektan, the so-called Amanites, that is Homerites. And some of them traded on their camels. Being destitute and an orphan, the aforesaid Mouamed decided to enter the service of a rich woman who was a relative of his, called Chadiga, as a hired worker  with a view to trading by camel in Egypt and Palestine. Little by little he became bolder and ingratiated himself with that woman, who was a widow, took her as a wife, and gained possession of her camels and her substance. Whenever he came to Palestine he consorted with Jews and Christians and sought from them certain scriptural matters. He was also afflicted with epilepsy. When his wife became aware of this, she was greatly distressed, inasmuch as she, a noblewoman, had married a man such as he, who was not only poor, but also an epileptic. He tried deceitfully to placate her by saying, ‘I keep seeing a vision of a certain angel called Gabriel, and being unable to bear his sight, I faint and fall down.’ Now, she had a certain monk  living there, a friend of hers (who had been exiled for his depraved doctrine), and she related everything to him, including the angel’s name. Wishing to satisfy her, he said to her, ‘He has spoken the truth, for this is the angel who is sent to all the prophets.’ When she had heard the words of the false monk, she was the first to believe in Mouamed and proclaimed to other women of her tribe that he was a prophet. Thus, the report spread from women to men, and first to Aboubacharos, whom he left as his successor. This heresy prevailed in the region of Ethribos, in the last resort by war: at first secretly, for ten years, and by war another ten, and openly nine. He taught his subjects that he who kills an enemy or is killed by an enemy goes to Paradise; and he said that this paradise was one of carnal eating and drinking and intercourse with women, and had a river of wine, honey, and milk, and that the women were not like the ones down here, but different ones, and that the intercourse was long-lasting and the pleasure continuous; and other things full of profligacy and stupidity; also that men should feel sympathy for one another and help those who are wronged.
 Muhammad died in 632.
 … Muhammad, of course, was not murdered. Besides, the sequence of thought appears to require something like ‘until they had seen him taking food’. The reading phaghs is not appropriate unless it can mean the act of eating rather than ‘food’, the latter given by Du Cange, Gloss., s.vv. phage, phagh. Dr R. Hoyland has drawn our attention to Chr. 819, 7, which says of Muhammad, primus fecit sacrificium, et comedendum imposuit Arabibus, praeter eorum morem. The eating of camel is forbidden in Deut. 14: 7. The story of the rabbis, of whom only two embraced Islam sincerely, whereas the others pretended to do so, is found in the Sira of Ibn Ishaq (d. 768), trans. A. Guillaume, The Life of Muhammad (London, 1955), 239 ff., 246 ff.
 These names correspond to Nizar, Mudar, Rabi`a, Quraish, Qais, Tamim, and Asad. Discussion by L. I. Conrad, ByzF 15 (1990), 11 ff. Longer genealogy in Chr. 1234, 187-8. …
 … The legend of a Christian monk, variously called Sergius, Bahlra, or Nastur, who was either the teacher of Muhammad or recognized him as a prophet, enjoyed a wide currency. See S. Gero in Syrie colloque, 47-58.
 The durations given here, although presumably derived from an Arab source, do not agree with the Muslim tradition. See L. I. Conrad, ByzF 15 (1990), 18 ff.
July 10th, 2009 by Roger Pearse
I’ve just come across this French site, http://remacle.org/. It contains a simply enormous amount of French translations, often with parallel original text. Partly the site is a portal; but much is actually at the site itself. It seems to be the work of a collective, although lots of stuff is by Marc Szwajcer, and on the site itself. The Armenian history Agathangelos is there. Agapius is there — I wish I’d known, for I had to scan this myself for my own English translation. A work by Severus Sebokht on the Astrolabe is there. Letters of Jerome are there.
Among the gems are the poems of Claudian, and those of Sidonius Apollinaris, including his panegyric for the emperor Majorian, and his panegyric on his ineffectual successor, Anthemius. Firmicus Maternus is there. So is a lot of Photius.
“But what is this to me?” I hear you cry, “I don’t speak French.” But Google translate is really very good for French. So you really can make use of this, even so.
Stephen C. Carlson’s blog Hypotyposeis is not updated as often as it might be, so I only look in infrequently. But I owe this tip to him. Thank you!
July 3rd, 2009 by Roger Pearse
We all know that textbooks are often best in searchable PDF form. But yesterday I came across a case where they were not. I wanted a French grammar, so that I can brush up on stuff for Agapius. I found a bootleg PDF, thereby saving myself $25. But… I found that what I wanted to do was read the thing in bed, just a bit at a time, skip pages, and generally absorb interesting stuff by osmosis. I needed a book, in short.
So, yes, I went out and bought one. It was the only way!
Worth remembering, when we talk about the death of the book. Only some books will die.
June 23rd, 2009 by Roger Pearse
I’ve just discovered http://www.wilbourhall.org/index.html. This site deals with Mathematics, and Mathematical Astronomy in the works of ancient writers. It does so by getting hold of whatever texts exist and fixing the errors in the Google scans and so forth. If you want the complete works of Hero of Alexandria, they’re here. Archimedes, Ptolemy… likewise. Arabic writers? They too. The author, Joe Leichter, writes:
I hope to make available public domain materials that are essential for the study of ancient and early modern mathematics and mathematical astronomy. Google, for example, has done some things to achieve this through its books.google.com project. However, like most other efforts at digitally copying non digital materials, “mistakes were made”. For example, Google currently has several (all incomplete) versions of Teubner’s's edition of Euclid available for download. Most of these unfortunately contain page after page that are illegible, missing, out of order or otherwise unusable.
The man is a hero. Ancient scientific works are a horrendously neglected part of the ancient world, because they require skills and interest in both the humanities and the sciences. Still more neglected are the Byzantine writers on this subject.
All this from a blog that I had not seen before, opuculuk by Nick Nicholas, reporting on a search that he did on the works of Chioniades. (Nick works for the TLG, and was working on their lemmatizer, when he started to come across chunks of untranslated Arabic in the scientific works of Chioniades. Mr. C., a 12th century writer, had been taking lessons from some Persian, so had got a whole load of jargon for his pains!)