Electronics Information Home
Electronics Information
Optical character recognition (OCR)
| Optical character recognition (OCR) |
| Tuesday, 21 November 2006 | |
|
Document processing systems employing optical character recognition devices for scanning and storing the contents of documents are widely applied. Document image processing is a crucial process in the field of office automation. Office automation involves automating the tasks of processing, filing and retrieving documents to increase productivity of the work environment. Document image processing begins with the optical character recognition (OCR) phase where a computer and optical imaging system are used to optically scan a paper document to acquire optical image data, convert the optical image data into electrical image data, and process the electrical image data to determine the content of the document. Optical character recognition is widely used to extract text from printed or handwritten documents and forms. Typically, a document is scanned on an optical scanner to produce bit mapped image data. An OCR software application processes the image data and extracts therefrom all of the text. There are two basic methods used for OCR: matrix matching and feature extraction. Matrix matching compares what the OCR device sees as a character against a library of character matrices or templates. When an image matches one of these prescribed templates within a given level of accuracy, the OCR application assigns that image the corresponding American Standard Code for Information Interchange (ASCII) symbol. Feature extraction, also known as intelligent character recognition (ICR), is OCR without strict matching to prescribed templates. The amount of computing intelligence that is applied by a device varies the results for ICR applications. The application looks for general features such as open areas, closed shapes, diagonal lines, line intersections, etc. The character recognition device recognizes characters by a very high recognition rate if the characters are written clearly in directed regions of a sheet etc., of a document, on which the region for writing letters is specified. OCR systems are capable of segmenting the electrical image data into blocks, lines and words upon recognizing these features. An OCR program can differentiate between text objects and non-text objects (such as the background) in an image based on intensity differences between the text objects and the background. Optical character recognition (OCR) systems employ various strategies for isolating small portions of the image as connected components, segmenting a connected component into one or several character images and recognizing each such image as representing a specific character. An optical character recognition device is generally only capable of recognizing characters printed in a font that the OCR device has been trained to recognize. Devices such as an optical scanner or an optical reader may be utilized to input data into a data processing system for analysis or processing. Pattern recognition classifiers are used in sorting scanned characters into a number of output classes. During operation, the system receives an input image associated with one of a plurality of classes. The relationship of the image to each class is analyzed via a classification technique based upon the training parameters. In general, the operation of optical character recognition systems involves placing text on a scanner and obtaining a pixel bitmap of the page to be read, converting that image to text using an OCR program in the personal computer to which the scanner is attached, and generating speech output of the interpreted text using a text-to-speech software program. Automated scanning and processing with OCR systems improves document processing efficiency by automating the previously slow, labor-intensive and costly procedure of manually processing the information contained on these forms. Document processors employing OCR devices have been widely utilized to facilitate processing of pre-formatted business forms and documents. Optical character recognition (OCR) techniques are used in automated mail processing scanning or reading, sorting, handling, and distributing systems to accommodate and process an ever-increasing range of individually diverse mail products, pieces, articles, or units. Optical character recognition is commonly used in the currency processing field for lifting the serial number or code from processed notes. OCR technology is used, for example, for identifying specific notes processed by a high speed currency processing machine by lifting a note's serial code using a camera device and then recording the serial code to the note processed. |

