https://store-images.s-microsoft.com/image/apps.55093.a5d92db5-d3b2-423d-ac75-5c52916dc8b7.4fae5566-ab3a-4ce6-be9a-7e80b1668b24.977ca143-f52d-4058-b3a1-d735c8bbc4ca

Tesseract

durch ATH Infosystems

Version 5.5.2 + Free Support on Ubuntu 24.04

Tesseract is an open-source Optical Character Recognition (OCR) engine originally developed by Hewlett-Packard and now maintained by Google. It is widely used to extract text from images, scanned documents, and PDFs with high accuracy, supporting over 100 languages through trained data models.

Features of Tesseract:

  • Tesseract supports OCR for a wide range of languages, including the ability to add custom traineddata for specialized text recognition.
  • It uses LSTM-based deep learning models, improving accuracy for complex, distorted, or handwritten text.
  • Tesseract can process multiple image formats such as PNG, JPG, TIFF, and PDF for text extraction.
  • It can be used directly via command line or integrated with programming languages like Python (pytesseract) and Java.

Tesseract Usage:

$ sudo su
$ sudo apt update
$ /usr/local/bin/tesseract -v
$ Test working: cat /tmp/output.txt
    
Disclaimer: Tesseract OCR is distributed under the Apache License 2.0. It is provided free of charge and without any warranty, express or implied. Users are responsible for ensuring compliance with licensing terms and for any outcomes resulting from the usage of this software. The developers and contributors hold no liability for damages, data loss, or issues arising from its use.