From my diary

Today has been dedicated to life’s little chores.  But there is a little news.

Last night I did some more OCR on Ibn Abi Usaibia.  We passed the page 320 mark.

I’m still negotiating to translate Methodius, De Lepra from the mixture of German and Greek that Bonwetsch printed in GCS 27.  The price is quite a bit higher than I wanted, but I’ll find the money for this one and treat it as an experiment.

I’ve also done a bit of work on the GCS page.  A correspondent has pointed out that there is a way to download the volumes from the Polish library as a single .djvu file, rather than a zip file of myriads of pages.  This works (not in IE8 tho — I got it working in Firefox 5), and I’m redownloading some of the files now.

Share

ALDL demanding copies they aren’t entitled to?

A letter arrives today, addressed to my company, Chieftain Publishing, from the “Agency for the Legal Deposit Libraries”. 

In the UK there is a duty on publishers to supply a free copy of each of 6 libraries: The British Library, Oxford, Cambridge, National Library of Scotland, ditto of Wales, and Trinity College Dublin.  It is a much resented provision among publishers, especially publishers of expensive limited-run books.  It’s not that useful a provision, when you consider that none of these libraries will make copies available by inter-library loan to people like you and I. 

The letter demands 5 copies of the Eusebius book paperback, under the Legal Deposit Libraries Act 2003. 

But they’ve already had 5 copies of the hardback, and indeed I have an acknowledgement of receipt.  So I’m rather baffled.  Can they really be entitled to yet more copies of a book which are basically the same?

But the web is a wonderful thing.  The Act itself is here, and laid out — thankfully — in a very readable form.  And what do I see at the top?

Duty to deposit
1.Deposit of publications
2.New and alternative editions
3.Enforcement

“New and alternative editions” sounds relevant, so I open it up.  And I find…

(1)This Act does not apply to a work which is substantially the same as one already published in the same medium in the United Kingdom.

So I have written back and queried their request.  After all, whatever would the libraries whom they represent do with TWO copies of the book?  I shall await a response with interest.

But I wonder how many publishers just sent copies regardless?  And what happens to them?

Share

From my diary

I’m continuing to scan the History of physicians by Ibn Abi Usaibia.  I’ve done another 40 pages lately, which takes us up to 288.  But we’re still only about a third of the way through.

I’ve had a possible bid at PeoplePerHour.com to translate Methodius De lepra; the first bid was too high, but we’re much closer now.  It’s still more than I ever wanted to pay, but I’m willing to give it a go, so long as the quality is there.

The sales figures for November for Amazon for Eusebius: Gospel Problems and Solutions have arrived.  14 copies were sold through Amazon.com and Amazon.co.uk.  My company also sold a few copies directly.  It seems that sales are gradually increasing each month, which is good.  There’s quite a delay between the orders being placed and money reaching me — at least 3 months — but the money for June and July has now arrived.  Of course those months were early days and the sales were fairly small numbers, but at least some money is now coming in.  It’s fairly clear, though, that by 5th April the project will still be in debit, but perhaps not by as much as I had feared. 

So, if you need a Christmas present for the patristic scholar in your life, why not buy him a copy at Amazon.com or Amazon.co.uk?  (sorry to bang the drum a little: I’m not that good at this marketing stuff, but even I know that Christmas is coming).

A correspondent has been pointing out various errata and corrigenda in the GCS page that I’ve set up.  I must look some more at this.

If the sun ever comes out — it’s been dull here for five days now — I shall do a trip up to Cambridge University Library.

Share

Converting DjVu into PDF

The volumes of the GCS at the Kaiser Wilhelm Library in Posen in Poland are in .DjVu format, which is rather inconvenient.  So today I have been looking at whether it is possible to convert them to PDF.  I’ve had some success, I must say.

I obtained a copy of IrfanView from the web.  You need the basic .exe download, but also the plugins, because one of these makes it possible to work with .djvu files.

Once I had installed this, I opened the index.djvu for one of the GCS volumes.  This in fact opened all the files, as it does in the DjVu reader.  I then followed the instructions here:

1) With “IrFanView” go to “File->Print” or ‘Ctrl+p’

2) On the window select “Printer: Adobe PDF”, hit “Printer setup” for the paper size you want, etc…., in the middle of that window says “”Print size” select “Best fit to page(aspect ratio)”

3) On the right side of that window you will see the Preview, under preview is “Multiple images” select “Print all pages”

4) When you’re finished hit “Print” and is going to ask you the name of file you want to save it.

And that’s it!! after severals minutes (i think hours, depending on how many images the DJVU file has) you’re going to have a PDF file with the info you want!

But it doesn’t take hours.  However I did run into a glitch: I got this error:

%%[ ProductName: Distiller ]%%
%%[Page: 1]%%
%%[Page: 2]%%
%%[Page: 3]%%
%%[Page: 4]%%
%%[Page: 5]%%
%%[Page: 6]%%
%%[Page: 7]%%
%%[Page: 8]%%
%%[Page: 9]%%
%%[Page: 10]%%
%%[Page: 11]%%
%%[Page: 12]%%
%%[Page: 13]%%
%%[Page: 14]%%
%%[Page: 15]%%
%%[Page: 16]%%
%%[Page: 17]%%
%%[ Error: invalidfileaccess; OffendingCommand: showpage ]%%
%%[ Flushing: rest of job (to end-of-file) will be ignored ]%%
%%[ Warning: PostScript error. No PDF file produced. ] %%

A bit of hunting around revealed an answer:

… the issue appears to be with my Kaspersky Anti-Virus software. By setting a check mark against most of the exclusions in the Kaspersky application control for Acrobat Distiller everything now seems to be working OK.

I.e. in Settings … Application Control … Applications … (long pause when you hit that button!) … ADOBE SYSTEMS, then right-click on Acrobat Distiller, Application Rules … Exclusions, and check everything except “Do not scan network traffic”.

This worked; and Irfanview ran through 500+ pages and created a perfectly good PDF, some 500Mb in size.

The only downside is that I ended up with a white margin on the right and bottom, where the image was padded out to A4 (or whatever).  Nothing I could do would change that.  Probably I just haven’t got the settings just right.

Share

From my diary

Thankfully my PC decided that it would boot second time around.  Windows is quite an unstable platform these days, I find.

A correspondent writes that there is now OCR software available which can recognise Arabic.  It’s sold by Novodynamics of Michigan and called “Verus”.  Sadly it is ridiculously expensive — $1300 for the “standard edition” and they don’t dare print a price for the “professional edition”. 

An extraordinarily advanced OCR solution, VERUS™ Professional provides the most innovative Middle Eastern language and Asian optical character recognition in the world. VERUS™ Middle East Professional recognizes Arabic, Persian (Farsi, Dari), Pashto, Urdu, including embedded English and French. It also recognizes the Hebrew language, including embedded English. VERUS™ Asia Professional provides support for both Simplified and Traditional Chinese, Korean and Russian languages, including embedded English. Both products automatically detect and clean degraded and skewed documents, automatically identify a page’s primary language, and recognize a page’s fonts without manual intervention. VERUS’™ intuitive user interface allows users to quickly review and edit recognized text.

http://www.novodynamics.com/verus_pro.htm

I would imagine that it should be possible to adapt this software to recognise Syriac, if the manufacturer would agree.

Share

From my diary

Some update on Saturday is now preventing my PC from starting (I’m typing this on my backup PC).  Oh joy.  And even the automatic recovery won’t start…

Share

Eusebius supplementa update

The Latin preface of De Lagarde’s Coptic gospel catena was translated into English for the project, but not included in the book.  The translation has now been uploaded to the Supplementa page in PDF and Word .doc format.

Share

Nuance Omnipage 18

This morning I got hold of Nuance Omnipage 18 standard edition.  The box was very light: mostly air, a CDROM, and a cheeky bit of cheaply printed paper announcing that they included no manuals at all, in order to save the planet.  Humph.

The footprint is quite small, and I copied the CDROM to my hard disk before installation.  Curiously the disk packet had two numbers both labelled as “serial number”.

The installation was unfamiliar.  As I always do, I clicked on the “select options” and found that it wanted to install some voice-related stuff.  I unchecked that.  Then I went ahead and did the install.  At one point it announced that it was going to install something called “CloudConnector”, without giving me the chance to decline.  But I hit cancel, and the rest of the install went fine.  It then popped up a box asking me to register — this opened a web page with a rather shoddy page collecting details.  Every page gave an “invalid certificate” error in IE, which is sloppy.  And then it asked if I wanted to activate, which I did.  So far, so good.

I then opened OP.  It popped up some “friendly” menu, which I removed.  Then I looked at the main screen, and decided to open a PDF and work on it in OP.  It took a little while to work out that I needed “Process … Workflows … PDF or Scanned Image to Omnipage document.  Somehow I think “File … Open” would be rather more normal!  Once you’ve selected this, you click on a button on the tool bar to start processing.  It prompted for a PDF, which I had created myself from some digital photos of Ibn Abi Usaibia, and it promptly objected “non-supported image size” to each page and refused to open it!  Silly programme: I don’t care what the image size is, I want to get some OCR of the pages! 

OK, let’s see if I can workaround.  I select instead “Camera image to Omnipage document” and select a bunch of the same images before I put them in a PDF.  This time it decides to cooperate.  It reads the images, rotates them to portrait mode (correctly).  Then it pops up some kind of dictionary thing, which is annoying.   I hit “close” and the windows cursor starts spinning.  It doesn’t seem to be doing anything, but it’s just sitting there.  Hum.

After a while I get bored, and close the program down.  At least it dies gracefully, prompting me to save my work.  I reopen it, and reopen my project.  Then I click the “Text editor” tab.  It looks as if it recognised page 1 OK, despite being typescript.  No errors, anyway.  My first encounter with OCR quality is  good.

But … I can only see EITHER the image, or the recognised text, not both at the same time.  Hum.  It ought to be possible to do this.  After a bit of hunting, I find “Window … Classic view” which gives me side-by-side.  But I go back to “flexible view”, because I have just discovered that, if I click on the text window, the line of text from the image appears in a hover box above the line.

Now this is really rather convenient.  Mind you, when the lines are slanted — as is often the case — I wonder how it would do?

I hit Alt-Down, and nothing happens.  Of course, this is not Finereader.  A bit of hunting and the Edit menu informs me that Ctrl-PgDn is next page.  F4 is next suspect character.  I never used this in Finereader, but here using it with the hover boxreally works.  My text here has quite a few vowels with overscores.  None of these are recognised by default, but at least I can see them!

So far, not too bad!  Better, indeed, than I had feared.

Now I need to start adding custom characters.  I want to define my own “language” for recognition, based on English but with all the funny characters that I need in this document to represent long vowels.  “Tools … Options” seems to give me choices.  On the process tab I see a box saying “Open PDF as images”.  Its unchecked by default — I’ll check it now, and see if I can open that PDF.  Looks as if you have to save your settings; I save mine to the same directory where I stored the install CDROM.  Then I do “File … New”, and … still can’t open my PDF.  Oh well.

Back to the OPD project from the digital images.  Can I define some extra characters?  Well you can; but it all looks rather weedy compared to Finereader’s options.  Let’s try these: āīōūšŠ.  I get them from charmap, pointing at the Alphabetum Unicode font; but any reasonably full unicode font such as Ms Arial Unicode or Titus Cyberbit Basic would do.  Then “Tools… Options … OCR … Additional characters” and I just paste them into the box.  The “…” button next to that box leads to some weedy, underspecified lookup, which really needs to be more like Charmap.  But do these characters get picked up?

Now I want to re-recognise.  I click on the thumbnail for page 1 and … the menu gives me no option.  Hum.  Wonder what to do. 

In fact I’ve spent some time now trying to work out how to kick off a limited re-read.  No luck yet.  Surely this should be simple and obvious?  Eventually I work out that you select the thumbnails of the pages you want, and hit the toolbar button and that kicks it off.

So how does it do?  Well, it recognises the overscore a.  None of the other characters are picked up.  That’s not so good as Finereader. 

Also the more skewed the page is, the less well OP handles it (understandable), and the less easy it is  to fix.  OP rather presumes that the recognition is near perfect, and has only limited fixing to do.  In such a situation, indeed, OP will be quicker to do a job than Finereader.  And I notice that a ribbon with characters to paste is across the top of the text window — nice touch.  This motivates me to go back and explore again.  I haven’t worked out how to set MY characters in that ribbon.  But when I went into the weedy charmap substitute, there was a similar ribbon at the top, and right-clicking on it allowed you to add more character sets, which increased the number of characters; and by clicking on them, to add them to the ribbon.  How you remove them from the ribbon I don’t know.  It is, in truth, a badly designed feature.  And the OCR still doesn’t recognise what I need.

I’ve had enough for now and closed it down.  Is it any good?  Almost certainly.  It’s less good for weird characters.  But it undoubtedly will see service.

UPDATE: Have just discovered, on starting Word 2010, that Nuance have seen fit to mess with the menus in this (without asking me).  Drat them!

Share

From my diary

I’ve been poking around the web, trying to find out how we identify a particular image of a goddess as “Isis”.  No doubt the answer is some examples of an ancient statue with the goddess’ name on the bottom.  But I’ve had no luck so far in finding an example.

In the process I came across something interesting.  I did a search in the PHI Greek epigraphy database here  (ignore the corpora at filling most of the page — the search is right at the bottom).  The interface is not that friendly, but a search on “isidi” and hitting enter gave back a shoal of inscriptions; some 535 of them.  (Unfortunately there seems to be no way to specify this as a whole word match, so you get substrings of other words).

What was interesting, once I scrolled past the first few matches, was that the vast majority of them included “Sarapi” as well; fewer, but still a good many also add “Anubi” and sometimes “Harpokrati”.  Here’s an example:

Σαράπιδι, Ἴσιδι, Ἀννούβιδι {Ἀνούβιδι}, Ἀντιβοΐδης Δικαίου

or this, from Delos, 94-3 BC (ID 2039, PH 64483 — not sure how I should reference these inscriptions):

Δίκαιος Δικαίου Ἰωνίδ[ης, ἱερεὺς γενόμενος Σαράπιδος, ὑπὲρ τοῦ δήμου τοῦ Ἀθηναίων καὶ τοῦ δή]μου το[ῦ Ῥωμαίων καὶ βασι]λέως Μιθ[ρ]αδάτου Εὐπάτορος Διονύσου καὶ τοῦ ἑαυτοῦ πατρὸς Δ[ικαίου τοῦ — — — — Ἰωνίδου καὶ τῆς μητρὸς — — — Σαράπιδι, Ἴσι]δι, Ἀνούβιδ[ι, Ἁρποκράτει καὶ] μελαν[η]φόροις καὶ θεραπευταῖς, ἐπὶ ἐπιμελητοῦ τῆς νήσου Ἀρόπου [τοῦ patr. dem., ἱερέως δὲ nom. patr. Παι]ανιέως καὶ τῶν [ἐπὶ τὰ ἱερὰ nom., patr. Ἁλ]αιέως [καὶ nom. patr, dem. ζακορεύοντος? — — —]ρος.

Δίκαιος Δικαίου Ἰωνίδ[ης ὑπὲρ τοῦ δή]μου το[ῦ Ἀθηναίων καὶ βασι]λέως Μιθραδάτου Εὐπάτορος Διονύσου καὶ τοῦ ἑαυτοῦ πατρὸς Δ[ικαίου, Σαράπιδι, Ἴσι]δι, Ἀνούβιδ[ι, Ἁρποκράτει καὶ] μελαν[η]φόροις καὶ θεραπευταῖς, ἐπὶ ἐπιμελητοῦ τῆς νήσου Ἀρόπου [dem., ἱερέως nom. Παι]ανιέως καὶ τῶν [ἐπὶ τὰ ἱερὰ nom., dem. καὶ nom. Ἁλ]αιέως [ζακορεύοντος? — — —]ρος.

Anyone care to give us a translation of this?  I note the name of king Mithradates Eupater Dionysus, and mention of the Romans and Athenians.

People sometimes refer to a triad of Isis; but what comes across is that Harpocrates is rather marginal.

Share

Translating Methodius

I thought that I would have a go at getting a piece by Methodius into English.  I’ve placed an advertisement on www.peopleperhour.com (not appeared yet, tho). 

The Old Slavonic text of Methodius has never been published.  Rather reluctantly, therefore, I think we must work from the German translation of it, which is interspersed with Greek from the extant Greek fragments.  So I’ve advertised for a native English speaker with good German and good Greek.

It will be interesting to see if I get any takers, and if so, whether any are at a reasonable price.

The PDF of the piece is here: Methodius_de_lepra_gcs_27.  It’s about 24 pages, ca. 5,000 words.

UPDATE: The advert is here.  I’ve already had two bids; once from a “native Greek speaker” who evidently couldn’t read the advert, which asked for a “native English speaker” and also emailed me asking for the complete Greek text; and one from someone in Bulgaria, with a Bulgarian name, offering translation from German but no indication of being a native English speaker or ability with Greek.  Both have been declined, needless to say.

Share