Optical character recognition

A text recognition approach applied to printed material: taking a scan or photo of typeset or typewritten text and converting it into editable, searchable characters. It is the older and most familiar form of the task, and for clean printed documents in well-supported languages it is now mature and reliable.

The caveats are practical. Accuracy drops on poor scans, unusual fonts, or diacritics and characters the system was not trained on, which makes OCR for low-resource languages and non-Latin or community-specific orthographies much harder than for, say, printed English.

Created · Updated
Supported By the National Science Foundation Award 2542375.