This morning I got hold of Nuance Omnipage 18 standard edition. The box was very light: mostly air, a CDROM, and a cheeky bit of cheaply printed paper announcing that they included no manuals at all, in order to save the planet. Humph.
The footprint is quite small, and I copied the CDROM to my hard disk before installation. Curiously the disk packet had two numbers both labelled as “serial number”.
The installation was unfamiliar. As I always do, I clicked on the “select options” and found that it wanted to install some voice-related stuff. I unchecked that. Then I went ahead and did the install. At one point it announced that it was going to install something called “CloudConnector”, without giving me the chance to decline. But I hit cancel, and the rest of the install went fine. It then popped up a box asking me to register — this opened a web page with a rather shoddy page collecting details. Every page gave an “invalid certificate” error in IE, which is sloppy. And then it asked if I wanted to activate, which I did. So far, so good.
I then opened OP. It popped up some “friendly” menu, which I removed. Then I looked at the main screen, and decided to open a PDF and work on it in OP. It took a little while to work out that I needed “Process … Workflows … PDF or Scanned Image to Omnipage document. Somehow I think “File … Open” would be rather more normal! Once you’ve selected this, you click on a button on the tool bar to start processing. It prompted for a PDF, which I had created myself from some digital photos of Ibn Abi Usaibia, and it promptly objected “non-supported image size” to each page and refused to open it! Silly programme: I don’t care what the image size is, I want to get some OCR of the pages!
OK, let’s see if I can workaround. I select instead “Camera image to Omnipage document” and select a bunch of the same images before I put them in a PDF. This time it decides to cooperate. It reads the images, rotates them to portrait mode (correctly). Then it pops up some kind of dictionary thing, which is annoying. I hit “close” and the windows cursor starts spinning. It doesn’t seem to be doing anything, but it’s just sitting there. Hum.
After a while I get bored, and close the program down. At least it dies gracefully, prompting me to save my work. I reopen it, and reopen my project. Then I click the “Text editor” tab. It looks as if it recognised page 1 OK, despite being typescript. No errors, anyway. My first encounter with OCR quality is good.
But … I can only see EITHER the image, or the recognised text, not both at the same time. Hum. It ought to be possible to do this. After a bit of hunting, I find “Window … Classic view” which gives me side-by-side. But I go back to “flexible view”, because I have just discovered that, if I click on the text window, the line of text from the image appears in a hover box above the line.
Now this is really rather convenient. Mind you, when the lines are slanted — as is often the case — I wonder how it would do?
I hit Alt-Down, and nothing happens. Of course, this is not Finereader. A bit of hunting and the Edit menu informs me that Ctrl-PgDn is next page. F4 is next suspect character. I never used this in Finereader, but here using it with the hover boxreally works. My text here has quite a few vowels with overscores. None of these are recognised by default, but at least I can see them!
So far, not too bad! Better, indeed, than I had feared.
Now I need to start adding custom characters. I want to define my own “language” for recognition, based on English but with all the funny characters that I need in this document to represent long vowels. “Tools … Options” seems to give me choices. On the process tab I see a box saying “Open PDF as images”. Its unchecked by default — I’ll check it now, and see if I can open that PDF. Looks as if you have to save your settings; I save mine to the same directory where I stored the install CDROM. Then I do “File … New”, and … still can’t open my PDF. Oh well.
Back to the OPD project from the digital images. Can I define some extra characters? Well you can; but it all looks rather weedy compared to Finereader’s options. Let’s try these: āīōūšŠ. I get them from charmap, pointing at the Alphabetum Unicode font; but any reasonably full unicode font such as Ms Arial Unicode or Titus Cyberbit Basic would do. Then “Tools… Options … OCR … Additional characters” and I just paste them into the box. The “…” button next to that box leads to some weedy, underspecified lookup, which really needs to be more like Charmap. But do these characters get picked up?
Now I want to re-recognise. I click on the thumbnail for page 1 and … the menu gives me no option. Hum. Wonder what to do.
In fact I’ve spent some time now trying to work out how to kick off a limited re-read. No luck yet. Surely this should be simple and obvious? Eventually I work out that you select the thumbnails of the pages you want, and hit the toolbar button and that kicks it off.
So how does it do? Well, it recognises the overscore a. None of the other characters are picked up. That’s not so good as Finereader.
Also the more skewed the page is, the less well OP handles it (understandable), and the less easy it is to fix. OP rather presumes that the recognition is near perfect, and has only limited fixing to do. In such a situation, indeed, OP will be quicker to do a job than Finereader. And I notice that a ribbon with characters to paste is across the top of the text window — nice touch. This motivates me to go back and explore again. I haven’t worked out how to set MY characters in that ribbon. But when I went into the weedy charmap substitute, there was a similar ribbon at the top, and right-clicking on it allowed you to add more character sets, which increased the number of characters; and by clicking on them, to add them to the ribbon. How you remove them from the ribbon I don’t know. It is, in truth, a badly designed feature. And the OCR still doesn’t recognise what I need.
I’ve had enough for now and closed it down. Is it any good? Almost certainly. It’s less good for weird characters. But it undoubtedly will see service.
UPDATE: Have just discovered, on starting Word 2010, that Nuance have seen fit to mess with the menus in this (without asking me). Drat them!