Convert a typewritten document to text
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Plusnet Community
- :
- Forum
- :
- Other forums
- :
- Tech Help - Software/Hardware etc
- :
- Re: Convert a typewritten document to text
Convert a typewritten document to text
23-07-2018 4:45 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
I am looking for some software that can convert an old document written on a typewriter to a text file you can edit. Typed documents have some unique features:
- There is only one font
- If a character is slightly wrong or misaligned then that same character is always the same slightly wrong or misaligned.
So you might think this would be a easy task for OCR (Optical Character Recognition) software and you would get even better results if the software is prepared to let you teach it how to recognise individual letters. But no. You end up with something that is ridiculously over-complicated with copious errors in a whole range of different fonts and with peppered with other features that could not be achieved using a typewriter. And I have not found any OCR software that is prepared to learn.
Any suggestions?
Re: Convert a typewritten document to text
23-07-2018 5:04 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
I think all you can do is try different OCR software.
Not used it for years, someone may have suggestions on a up to date good application to use.
Depends on the size of the document, but it is never 100% in my experience so you will have to go through it correcting errors manually.
I suppose using good software means there are less of them to deal with.
Re: Convert a typewritten document to text
23-07-2018 6:27 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
You should be able to correct font variances by simply opening the document in a word processor, highlighting the complete document and selecting your preferred font.
With respect to OCR accuracy, have you tried expanding the typed text by scanning a portrait page and printing it again in landscape? it will need two pages to get the complete portrait page but it might give the OCR a bit more detail to go at.
Failing that you may just have to suffer the pain using a word processor or seek professional help, I guess it depends on the size of the document when deciding if that's practicable.
Moderator and Customer
If this helped - select the Thumb
If it fixed it, help others - select 'This Fixed My Problem'
Re: Convert a typewritten document to text
23-07-2018 6:28 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
How big ( number of pages ? ) is the document? Is it "plain typing" as in a manuscript/novel or is it tabulated etc?
You may find it cheaper to employ some self employed office typewriting person, than to squander oondles of money on various OCR softwares that you may never need to use again..
Re: Convert a typewritten document to text
23-07-2018 8:30 PM - edited 23-07-2018 8:35 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
And I have not found any OCR software that is prepared to learn.Any suggestions?
Have you tried Adobe Acrobat? Not sure about learning but Omnipage can be taught. Both have time limited trial versions.
Re: Convert a typewritten document to text
24-07-2018 8:24 AM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
So far I have tried Iris OCR and, since my original post, ABBYY FineReader. The ABBY software produced a much better result. You can tell it the input is typewritten and it then produces output that is all in one font and with a basic and simple layout. But it has no learning capability that I have found so far. I'll look into the other suggestions.
Re: Convert a typewritten document to text
25-07-2018 5:28 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
OmniPage Pro is definitely the one to use. See http://supportcontent.nuance.com/omnipage/18/doc/OP18Guide.pdf
Re: Convert a typewritten document to text
26-07-2018 11:44 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
I had a go with OmniPage but was disappointed in the result. Some of the resulting text was laid out in document format but random sections were placed in text boxes. It's another example of software trying to be too clever and thereby giving a stupid result. Typed and printed documents typically have a very simple layout and the last thing you want is some OCR software that renders this more complicated then the original. If there is a "Don't put text in boxes" setting or a "Don't create random lines" option then I failed to find them.
Re: Convert a typewritten document to text
27-07-2018 10:27 AM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
If you go to Options then you can set the layout to be the simplest possible
Re: Convert a typewritten document to text
28-07-2018 9:03 AM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
I was trying to deal with a document that had a double-line margin around, and quite close to, the text. I wanted to ignore this but the Nuance software got fixated with it, seeing it either as characters or rendering it as bits of vertical line - which appeared even in a .rtf format output. I would have got a better result if I had printed out the pages, cut-off these margins with a pair of scissors and re-scanned them. So for me, Omnipage is in the "stupid result by trying to be too clever" category.
Re: Convert a typewritten document to text
28-07-2018 9:40 AM - edited 28-07-2018 9:41 AM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
Acrobat has a cropping feature and a redaction feature so you could scan your document using Acrobat, crop out the lines or redact them then save the result as a PDF. Omnipage can open a PDF and do the OCR.
Re: Convert a typewritten document to text
28-07-2018 10:57 AM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
Yes I see what @daveplus means.
A basic cropping feature will do what you want, many programs have them.
More hassle than if you didn't need to, but less if you tried your manual way.
Re: Convert a typewritten document to text
29-07-2018 2:51 AM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
Why do you need this document as text, anyway?
A print of an image is just as good as a typewritten document.
"In The Beginning Was The Word, And The Word Was Aardvark."
Re: Convert a typewritten document to text
29-07-2018 7:55 AM - edited 29-07-2018 7:56 AM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
@VileReynard It seems you have jumped in with both feet.... the first line of the original post gives a clue...
I am looking for some software that can convert an old document written on a typewriter to a text file you can edit.
Re: Convert a typewritten document to text
29-07-2018 12:14 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
Fair enough...
How about pre-processing the image scans?
e.g. Increase the contrast so you have just black & white - no greys; then remove speckles (dirt etc) before asking OCR to do its thing?
"In The Beginning Was The Word, And The Word Was Aardvark."
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Plusnet Community
- :
- Forum
- :
- Other forums
- :
- Tech Help - Software/Hardware etc
- :
- Re: Convert a typewritten document to text