frontmatter

title: Optical character recognition
aliases: [OCR, optical character recognition]
tags: [term]
category: AI & NLP
summary: The specific name for text recognition on printed material — converting a scan of typeset or typed text into editable, searchable characters.

Optical character recognition

A text recognition approach applied to printed material: taking a scan or photo of typeset or typewritten text and converting it into editable, searchable characters. It is the older and most familiar form of the task, and for clean printed documents in well-supported languages it is now mature and reliable.

The caveats are practical. Accuracy drops on poor scans, unusual fonts, or diacritics and characters the system was not trained on, which makes OCR for low-resource languages and non-Latin or community-specific orthographies much harder than for, say, printed English.

Created Jun 19, 2026 · Updated Jun 19, 2026