Using Greek Transcoder

I’ve been converting a load of Greek text into unicode using Greek transcoder with much success.   But I ran across a glitch.  Depending on the option chosen, the accents can all end up to one side!

The option responsible is this one, “Use composing characters”. 

dialog

I checked that, and I should not have done; it caused off-centre accents.  But what on earth does it mean? 

A hunt around the web reveals that you can do all those accents in one of two ways.  Firstly you can use a character that includes them all inside the character.  Alternatively you just type ‘alpha’ followed by the accents, and the browser and editor should render them all correctly as one letter with some accents on the top.    The former is called “using precomposed characters”; the latter “using composing characters”.  The latter does not work very well, as applications don’t support it.  This TLG PDF says more.

toolbarI’ve also noticed that on XP the WINWORD executable tends to hang around in memory after you exit a document.  If you copy the .dot files for Greek Transcoder into the Application Data\Microsoft\Word\Startup directory, they are only picked up when the WINWORD executable starts, so get ignored in this case.  I’ve had to manually terminate it to get the utility to appear.

Once it’s loaded, the buttons appear in Word.

 

Share

Typing in unicode Greek

I’ve just come across this site which allows you to type in ASCII — A)\ etc — and converts what you type on the fly into unicode Greek.  It’s fast, neat and effective.

And better yet — it’s all done in Javascript, which means if you save the .htm page locally, your local copy will work too!

Share

Greek.ttf – the curse of pre-unicode Greek fonts

Once I signed an agreement with the Cerf to use their Greek text of Eusebius Gospel Problems and Solutions, I asked and received a copy of the text in electronic form.  This turned out to be a word file, with an attached font: greek.ttf.

How I cursed that file name!  Because it was clear that this was not a unicode font.  To use the file, I was going to have to convert the text to unicode.  It would help a lot if I knew which font that was!  I hunted around for that file name, and found (as you might expect) several candidates, none of which were the same.

This evening I had a stroke of luck.  I was preparing to write a program that would open the font and display all the characters, so I could see what was what.  But in Vista, when you open a font, you get a Properties option; and under Details there was information!

greek.ttf

This was gold!  The name of the author, Peter J. Gentry and Andre…, a version 1.0, and a date 1993.  A google search turned up a page of old fonts by Eric Pement.  There it was:

Ancient Greek (57 KB). GREEK.TTF, Greek, ver 1.000, © 1993 by Peter J. Gentry and Andrew M. Fountain. Requires this keyboard utility: KeyMan32 (381 KB)

A search on the author names reveals that they were the authors of WinGreek.  I wonder if, perhaps, this font is an early version of that?  With the same keyboard mapping?  If so, I am in great good luck, for WinGreek is widely known.

Installing the font creates “Greek regular” in my fonts directory.  This TLG Wingreek test page reveals that it is exactly the correct mapping.

The next stage is to try to find a converter utility.  And GreekTranscoder seems to fit the bill!  The commercial Antioch program can also import the stuff, and indeed this utility.  I’ll have to see if it works, but I feel very pleased with myself to have got so far!

UPDATE: GreekTranscoder worked brilliantly!  You had to copy the .dot files to the ~\startup, and make sure you had no WINWORD running silently in background, but it then converted everything with just one error.  The Jiffycomp utility did not do as well, and lost all formatting (italics etc).  I have made a donation to Greektranscoder.

Share

Curious QuarkXpress

I have been experimenting with the trial download of desktop publishing package QuarkXpress.  What a curious thing it is!  I have been quite unable, for instance, to import a Word .doc file with footnotes and get footnotes.  This — surely elementary — ambition has cost me an hour or so of my life.

Off to try Adobe InDesign.  Just getting the download is rather horrible — I hope the program is better!

Share

Classical Text Editor – useful?

I was wondering about how to turn the .doc files for the Eusebius and Origen books into something printable, with properly kerned text, etc.  An email suggested that I might like to look at the Classical Text Editor.  So I pulled down the demo and had a play.

Unfortunately all you are presented with on start-up is a blank screen.  This is not very helpful.  I tried importing a word document, and it did import.  But it wasn’t at all clear what the benefit was, once I had done so.  Possibly the output to print is better — but in the demo version this is disfigured every inch with a logo indicating that this is an unregistered copy, so I couldn’t be sure.

In short, I found it baffling.  The help suggested using templates; but none seemed to be supplied as default.  Like most people I edit in Word.  What does this tool give me?

Probably it is a good tool.  But without a guide, it’s useless.  I would imagine that most people using this have been shown how to use it by someone else.  That must limit the take-up.  I couldn’t find anything useful online.

Share

Possible outages ahead

I don’t mind posting my thoughts online. Why not?  But I do not want to post sensitive personal information online. Unfortunately doing the former without doing the latter is getting harder. More and more companies are taking — read “stealing” — freely available personal data and posting it on their own sites in order to produce new services, and thereby make money. That means that data passes out of your control.

A few days ago I had a run-in with someone rather unpleasant online who started threatening me. This led me to consider just how much information a malicious person might be able to locate about me.  Could someone find my home address, telephone number and so on?   I’ve always been cautious; but the answer is “too much”.

I’ll be moving the roger-pearse.com domain to another registrar.  This is because the existing one, Network Solutions, demand money not to broadcast a lot of personal information about me online.  I’ve resorted to turning the WHOIS entry into garbage, but this is not ideal.

During the changeover it is possible that the site may become inaccessible for a while. Don’t be alarmed if this happens; it will get fixed!

Share

The joy of windows

I must be careful about taking a volume of Quasten to bed.  I did so last night, and saw a couple of untranslated works (or rather, remains of them), and decided to blog about them.  So I got out of bed, went to my PC and … it wouldn’t boot.  Nothing I could do would persuade Windows Vista to let me in.  It starts OK, then it gets to the ‘green bar’ going to and fro, and it sits there.  Attempts to get it to repair itself have failed, I can’t get any restore points up… it’s sitting on the side with its little legs pointed stiffly to the sky.

Since I didn’t do anything to it, I would guess that a silent-but-deadly “update” from Microsoft has trashed some critical files.  Fortunately I’ve managed to boot it in safe mode and I am copying 200+Gb of files off it to an external drive.  So I won’t lose work; just days of my life.

So don’t expect much response from me while I’m sorting this out!

(I’m typing this from an old machine which fortunately can still connect to the web).

UPDATE: I managed to get my PC working again.

My first act was to try to start the PC in ‘safe’ mode. This means hitting F8 repeatedly while the machine is first starting; this will give you a boot menu and ‘safe’ is the top one. (I found ‘safe with network’ didn’t work).

Once it had started, I connected my external USB hard disk and copied all my data onto it. This was some 200Gb, so took 5 hours. But I felt a lot safer once I had that! Because I might have to reimage the hard disk, losing all my data; and of course you never know if it will even boot into safe mode next time.

When Vista starts in Safe mode, a help menu appears on the right. One of these leads to the option to restore from a saved point. I hit this, and was surprised that nothing happened. It takes Vista anything up to 5 minutes to show the menu of available saved points, during which time you get no feedback, nothing to show anything is happening (how user friendly!)

I chose a suitable time to roll back to, and hit that. It asked to reboot.

Now here’s the catch. The PC was stuck, just staring at the green bar sweeping to and fro immediately after boot. But… you have to look at the hard disk light. This green bar will do its stuff for anything up to 30 seconds. So long as the hard disk light is active, let it do so. The screen then goes all black, you wait another 10 seconds, and a Windows icon appears, and, very soon, your desktop.

Then, when the recovery has completed, a box appears on the screen telling you so. If you don’t see this, you didn’t achieve the roll-back.

Well, it’s now 3:30pm. That was a day of my life, gone. Thank you Microsoft.

Share

Abbyy Finereader 10 upgrade now out

For many years I have used Abbyy Finereader as my OCR software.  Version 10 is now out, and I have just bought an upgrade.

Mind you, I have retained copies of FR8 and FR9 on my disk, installed and ready to use.  FR9 was quite an improvement in OCR terms on FR8, and has better PDF handling, but the user interface is a lot harder to use.  It fights you.  I’ve never got used to its quirks.  In particular it decided that it wouldn’t allow me to scan images at 400 dpi on my Plustek Opticbook 3600 — which FR8 did — and since I prefer to scan at that resolution, I had to retain FR8.  It’s also better for image cropping. 

So … FR10.  I’ve just installed it, which was painless.  It asks if I want to start some screengrab software every time I start my PC — I uncheck this.  I open it up for the first time, and it wants me to register – that too is painless. 

Then I get a screen with a big red window of “helpful” options — with no way to close it.  I uncheck “display on startup” and it still won’t go.  I’m forced to close the application, and restart.  Not really that good a start.

Next I open an existing FR9 project.  I’d started work on Censorinus, so I use that.  I select the folder; and then it asks me to save it somewhere else.  Yes, OK, we never had to do that in FR5, FR6, FR7, FR8 or FR9.  Why change it?  So I waste some disk space and create folder censorinus_fr10.  I suppose newcomers will find it useful.  And it opens the project OK.  Hmmm. Now what? 

I click on a page, and it doesn’t seem to include any of the OCR’d text.  I select ‘Read’ and it OCR’s it.  But … where is the text I was working on?  A look shows that FR10 has kindly deleted all my recognised text.  It’s kept the blocks on the screen, and that’s it.  B*****ds!!  Now we know why they insisted on keeping the old directory — boy would they be lynched if they hadn’t!  This is bad.  This is really, really bad.  Who wants to restart a whole project?

OK, well I look through a few pages rather hopelessly, and I see one where the image needs editing.  So … what do we have?  Well, we have the FR9 style: “Let’s hide all the tools boys! Hee hee!”  I had to customise mine to get an eraser on it.  How do I do that now?

Well, I can’t say.  If I choose Page|Edit page image, I get a rubbish image editor, with no tools, on which I can crop.  This is the FR9 approach, way inferior to the FR8 one.  It looks as if they still haven’t got rid of that idiot who ruined the interface.  I erase a bit of rubbish on the image … it takes ages.  The pages flashes as I do.  Awful!

OK, I see it.  You choose View|Toolbars|Quick Access bar.  This puts an extra bar at the top, under the file menu.  Then you do View|Toolbars|Customize.    Choose categories “Image”, and you are looking at that toolbar.  Now go down the  icons on the left, and insert them where you want them on that toolbar.  I add erase and a few others, and suddenly I can clean up the image as I want to.  I can zoom the image (although only to 200%, unlike before – another degradation in service), and I can get rid of the image of some long dead student’s pencil on the page.

I’m dispirited, tho.  I’m having to work at this, just to do simple OCR tasks.

OK.  Let’s OCR that page.  Right-click, read and … off it goes.  I get two windows, image and text.  Luckily the “Quick Access Bar” also allows me to minimise the image!  And I click on the text at one point, where it’s duff, and … hang on, where’s the zoom at the bottom?  Ah, it’s still at the bottom; just not displayed by default.  (Why?!)  One click on it, and it appears.

The OCR quality appears about the same, or possibly a little better.  We’ll see.

Overall verdict?  Wish they’d shoot the interface designer. 

UPDATE: another glitch.  While working on Censorinus, I had to do a global replace of “aera” to “era”.  This I did, but they’ve made a subtle change.  After the replace, I used to just hit Esc to get rid of the search/replace dialog box.  Now it doesn’t work.  And why?  Because each time you do a replace, they shift the focus to the document, meaning you have to click the dialog box to get back to where you were. 

This is unbelievably infuriating, and will make for much more work in using the product.  All those extra clicks during a long search/replace…

Share

Housekeeping journal articles; from my diary 2

It’s hot and humid here; so much so, that I can’t think straight.  So I’ve been looking at the piles of photocopied articles and running them through my scanner and throwing away the photocopy.  That’s a mindless activity I can do.

Not sure I’m quite there yet, tho.  The PDF’s are OK, but they aren’t OCR’d.  The scanner software has OCR, but it’s not good enough.  Nor is the built-in OCR in Acrobat.  The best still seems to be Finereader 9; but the PDF’s don’t go through FR9 unchanged.  The images can look strange.

Not sure what to do about that.  But I am gradually freeing up storage space.

Share

Michael the Syrian part 3 – progress report

I’ve now scanned in images of all the pages (around 600) of this monstrously heavy volume — my forearms will never be the same — using Abbyy Finereader 8 to control the scanner.  I scanned in black-and-white at 400 dpi, which is the best for OCR.

I’ve gone through the batch, turning alternate pages the right way up.  I’m now importing it into Finereader 9, which has better OCR and produces smaller PDF’s.

UPDATE (16:30): I’ve created a searchable PDF, which is about 33Mb.  Now starting to upload it to Archive.org.  This can be slow and frustrating, and will probably take all evening.  I’ve also exported the text as .htm and .doc, which I’ll probably place there also.  I haven’t proofed any of the OCR output, but FR9 gives rather better results than FR8, which is what the automatic processes at Archive.org use.

UPDATE (16:36): Good grief.  It uploaded first time.  It’s here: http://www.archive.org/details/MichelLeSyrien3  I’d better add the other formats, then (if it will let me).  It’s not in the searches yet, tho.

UPDATE (16:39): Hmm.  The interface for uploads of extra files has changed.  Somewhat better than it was.  Still very slow, it seems, and not that intuitive.  You can tell it was tested by someone local to the server, and not someone far away from it.

Share