How Do They Work? OCR Systems

Updated: Nov 30, 2023

OCR (optical character recognition)

OCR (optical character recognition) is the recognition of printed or written text characters by a computer.

This involves photo scanning of the text character-by-character, analysis of the scanned-in image, and then translation of the character image into character codes, such as ASCII, commonly used in data processing.

In OCR processing, the scanned-in image or bitmap is analyzed for light and dark areas in order to identify each alphabetic letter or numeric digit.

When a character is recognized, it is converted into an ASCII code. Special circuit boards and computer chips designed expressly for OCR are used to speed up the recognition process.

OCR is being used by libraries to digitize and preserve their holdings. OCR is also used to process checks and credit card slips and sort the mail. Billions of magazines and letters are sorted every day by OCR machines, considerably speeding up mail delivery.

Some Common FAQ’S

Where do we use OCR?

OCR technology is software that scans documents containing texts and converts them into documents that can be edited. However, for the scanning to take place, the text should be clear, and at times, handwritten text may not be recognized by the software.

How does OCR software work?

Optical character recognition, or OCR, is a method of converting a scanned image into text. When a page is scanned, it is typically stored as a bit-mapped file in TIF format. When the image is displayed on the screen, we can read it. But to the computer, it is just a series of black and white dots.

What is an OCR scanner used for?

Optical Character Recognition, or OCR, is a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera into editable and searchable data.

What is a Magnetic Ink Character Recognition?

Magnetic Ink Character Recognition is a character recognition system that uses special ink and characters. When a document that contains this ink needs to be read, it passes through a machine, which magnetizes the ink and then translates the magnetic information into characters. MICR technology is used by banks.

What is a Tesseract OCR?

Optical Character Recognition, or OCR, is the process of electronically extracting text from images and reusing it in a variety of ways such as document editing, free-text searches, or compression. In this tutorial, you’ll learn how to use Tesseract, an open source OCR engine maintained by Google.


