Preparing Legal Documents for Optical Character Recognition

24 March 2017
 Categories: Technology, Blog


Scanners are frequently used today through the process of Optical Character Recognition. OCR technology is designed to be able to "read" paper documents, turning them into searchable, plain text computer files. Scanning is most prevalent in the legal industry, where OCR technology is used to make discovery documents (documents pertinent to a case) easier to categorize and read. But this also requires a high level of fidelity. Better results are generally acquired through advanced preparation.

Separating OCR Documents

Not all documents can be used with OCR. Handwritten documents or documents that are in unusual fonts (such as script fonts) are usually not readable. The first step in preparing documents to be scanned should always be to separate the documents that can be scanned. Ideal documents are ones that have been printed directly from a computer in a common font.

Recovering Damaged Documents

If documents are damaged, then they won't be able to be fed automatically into a scanner. Tears in a document will disrupt the OCR process. Documents that have been damaged should be placed on a backing paper and slid into a plastic sleeve. They can then be carefully scanned on the scanner bed. However, the time cost associated with this may not be worthwhile compared to manually coding the documents. 

Using the Right Scanner Settings

Scanners need to be set to "text" mode in order to produce clear text. Contrast should be turned up to high and the scanning should be done in black and white. All of this will properly prepare a digital file for reading via OCR. Using the OCR within a scanner itself is possible but usually discouraged, as the on-board scanning technology of most scanners is usually outdated. 

Uploading the Correct Files

In general, scanned files should be in either a PDF format or a TIFF format. These are the types of files that are most likely to be read by an OCR program. JPEGs and PNGs may not have the high quality and high resolution that is necessary for optical character recognition.

Even when done properly, there may be some glitches in the OCR scan. Manual reading of the scans is usually done to ensure that the documents have been coded correctly and there are no issues. However, when properly prepared, the scanning and OCR analysis of documents can make the process of legal discovery far faster and easier. Once a process has been established, it will be easier to achieve consistent results.