Abstract
OCR or optical character recognition is the technology used to distinguish printed or handwritten text within digital images of physical documents. OCR is most commonly used to turn hard copy, legal or historic documents into PDF. An OCR system analyzes an image and identifies dark areas as characters that need to be recognized, enabling digital archiving, editing and documents searching with common programs like Microsoft Word or Google Docs. Characters are recognized with one of two algorithms. Pattern Recognition: when OCR programs are fed examples of texts and various fonts and formats to compare and recognize characters in a document, or Feature Detection: when OCR programs apply rules about letter and number of features, like lines, curves angle in order to recognize characters. For instance, the letter A might be recognized as two diagonal Lines connected by horizontal line across the middle. OCR is also used for archiving newspapers and phone books, mobile App check deposits, automated tollbooth collection, indexing print materials for search engines, managing legal documents, sorting mail and digitizing documents to be read aloud for the visually impaired. Prior to OCR technology, paper documents could only be converted manually with someone typing documents one by one. OCR saves a huge amount of time, reduces translation errors and minimizes effort.