LNCtips.com: 2 Steps to Solve an OCR Problem
When I work with electronic medical records (EMR), I love it when they've been OCRd. OCR stands for Optical Character Recognition. OCR is a process which allows one to convert EMR that are in PDF format into text that can be edited and searched. I use OCRd EMR to copy and paste big chunks of text from PDF documents to Microsoft Word, which saves me typing time. However, the OCR process isn't perfect. In fact, it creates a problem every time I copy and paste that drives me crazy. However, I can solve that OCR problem in just two steps.
The problem is this: When I copy large amounts of PDF text and then paste them into a Microsoft Word document, the formatting is wrong. The font is always too large or too small and the line breaks (carriage returns) are always in the wrong place. For example, here's the text I want to place into my document:
And here's how it looks after pasting it into my document:
You can see that the font in the History paragraph is different and the line breaks are wrong.
Step 1 is to get rid of the line breaks. To do so, I highlight the text for the History paragraph and press Ctrl and H at the same time on my keyboard, which brings up the Find and Replace dialogue box. In the Find box, I type the caret key (^) (the symbol above the 6 on the keyboard) and the letter "p" (without the quotes). On the Replace box, I enter one space and nothing else. Here's what it looks like:
Then I select Replace All. Here's what it then looks like:
Now that the line breaks are correct, Step 2 is to make sure that the fonts are consistent. To do that, I use the keyboard shortcut Ctrl and A, which selects all the text in the document. Then I select the font and font size. In the document below, I've selected size 12 Times New Roman. Here's what the finished document looks like:
When I need to inform the attorney of a treater's exact documentation, copying and pasting text from PDF to Word is a big time saver. (I'll point out any spelling errors I see too, such as the word "and" in the fifth line of the History paragraph, which I believe should be "any"). Using this 2 step technique allows me to transfer large amounts of documentation into my reports quickly and easily.