LADY ON THE WEB

the virtual journal of Celia Gray

Friday, December 20, 2013

 

Working with OCR: A Mark of Character

Lately, Miss Underwood and I have been reviewing and organizing past transcripts, and performing various administrative duties pertaining to the show. At our time of life, it is apropos to do such cataloguing work, as one digests what has gone before and prepares for, shall we say, Act 3. (The intermission was quite long, lasting, as it did, roughly five years!)

In specific, we used OCR (Optical Character Recognition) to convert scanned text into digital text files. This was very useful, as neither Miss Underwood nor myself wished to retype dozens of pages of old transcripts. (They hadn't been saved as digital files in the first place, but printed out "on the fly," during those moments of compressed haste familiar to all those who labor backstage.)

We have not spent a great deal of time reviewing competing OCR programs, but there is usually some measure of "amusement factor" in the process, as one views the results of the computer's manful struggle to interpret text scrawled with handwritten addenda. Fortunately, there was not enough of this to cause any real problems, and we finished this task within a few days.

In other news, we have logged 209 guests to date. We have been checking our transcript files to see what (if anything) has gone untargeted or unprinted. And we "tweaked" the template for this blog to make it a bit easier to read.