Page 3 of 3

Re: Maintenance Guides

Posted: Thu Sep 18, 2014 11:44 am
by BillTwoTriumphs
johnconradlee wrote:Someone with better OCR software might do better.
No - not really. While it's not my day job, this is something I am very involved in with my "other" job (typesetting, OCR, image scanning - all sorts of (freelance) prepress work for both traditional book and eBook/ePub publishers).

OCR is only really any good for short documents. It real life it never gets any better than 90-95% letter accuracy which while it sounds OK actually isn't when you realise that on average one in every 10 to 20 letters will have been mis-recognised.

You can (and we do) use it in some instances where publishers need editable copy of long out of print books but as Mike says you need to do a lot of post-OCR cleaning up, which can be done to a large extend by script-based search and replace routines. It's actually quicker, less time consuming and much more accurate to get a proper, trained typist to rekey the text from scratch.

John, earlier you said:
The problem is (at least according to Andy Roberts) that although they were typed in MS Word originally (in the early 90s) we don't have a complete "as printed" version correctly formatted which we can pdf.
Do you mean that no-one has these electronic files any more (and hence we only have paper copies now)? If the files exist in any sort of electronic format - no matter how old - I would be able to get them updated and converted into a suitable format for PDFing very quickly.

Cheers,