Vision OCR LLM

https://catalogartifact.azureedge.net/publicartifacts/johnsnowlabsinc1646051154808.visual_language_ocr_llm-44ccec6a-2a7d-4b43-974a-b8b3505205c3/image1_logo216x216.png

Overview Plans Ratings + reviews Details + support

Compact, OCR-specialized vision-language model engineered for state-of-the-art grounded OCR in production document workflows.

The Vision OCR LLM is an enterprise-grade OCR-specialized vision-language model engineered for state-of-the-art grounded OCR in production document workflows. It is the right model when text recognition AND text location both matter: medical de-identification, form-field extraction, compliance redaction, document anonymization, and any pipeline that needs to act on a specific word at a specific position on a specific page The model emits text along with precise word-level bounding-box coordinates in a single inference pass, with no two-stage detection-then-recognition pipeline to maintain, achieving state-of-the-art results across every major OCR benchmarks. Unlike traditional OCR solutions that only return text, the model is optimized for reading text and returning precise word-level bounding boxes in a single inference pass. Key capabilities and Ideal Use Cases - OCR and document understanding for PDFs, images, forms, and scanned documents - Medical de-identification (PHI redaction with precise coordinates) - Form-field extraction (mapping values to specific page regions) - Compliance auditing (which text was flagged, where on the page) - Document anonymization (region-level masking and blurring) - Multilingual document processing, table and formula recognition, handwritten text In independent benchmark evaluations covering leading OCR and vision-language models, John Snow Labs Vision OCR LLM achieved the highest ranking among self-hosted models and outperformed multiple well-known open-source and commercial alternatives on structured document extraction tasks. The model is specifically designed for organizations that require accurate document intelligence while maintaining security, compliance, and operational control Performance 860 on OCRBench (state-of-the-art for models under 3B parameters) 94.10 overall on OmniDocBench with 0.042 text edit distance, 94.73 formula, 91.81 table 85.21 on Wild-OmniDocBench (degraded scans with folds and lighting changes) 91.03 on DocML multilingual document parsing across 14 non-English non-Chinese languages 92.29 cards, 92.53 receipts, 92.87 video subtitles on information extraction 0.9574 Table TEDS, 0.9706 Formula CDM (English) 0.039 BBox CER on FUNSD - #1 of 15 models in the JSL Vision Benchmark Series 4.7x lower CER than Tesseract 5.5, 6.1x lower than EasyOCR on the same FUNSD benchmark 100% parse rate - valid bounding-box output produced for every page Built for organizations that require security, control, and high-quality structured outputs, the Vision OCR LLM enables enterprises to unlock value from document repositories while reducing operational costs and accelerating automation initiatives.

Vision OCR LLM

by John Snow Labs Inc

Compact, OCR-specialized vision-language model engineered for state-of-the-art grounded OCR in production document workflows.