|
Program Name: FineReader XIX Version: 1.0
Omnifont OCR for Fraktur and old European language recognition
Program Description:
FineReader XIX
You can order this product FineReader XIX directly at the vendor now (list of items) 
FineReader XIX is a special version of the award-winning FineReader optical character recognition (OCR) software for
recognising "fraktur" or "black letter" texts from the period between 1800 and 1938. It is designed to
convert scans of old documents, books, and papers into text for the purpose of digital archiving and publishing.
FineReader XIX is the omnifont OCR for Fraktur, giving users a solution for scanning and converting old documents with
minimal training and dictionary work. This was achieved by combining extremely intelligent technology with dedicated
linguistic study:
OCR systems work by analysing a text image and making a hypothesis about which letter or word an image represents. The
hypotheses are analysed in context and verified by use of sophisticated OCR dictionaries made up of Language Models (LMs).
Language Models are computer databases that describe the vocabulary of a language. The problem is that modern OCR systems do
not have LMs for older text fonts and older text spellings. The solution for Fraktur text recognition was achieved through
the development of OCR dictionaries specifically for this time period. Special language models were created for five
European languages.
The Fraktur language models were created with the help of ATAPY Software. Through the development process, 10 different
dictionaries and more than 105 books published between 1 808 and 1 930 were analysed. Linguists reviewed word stock,
identified words that have phased out through the evolution of the languages, and identified the correct paradigm
assignments for synchronising the language models with the appropriate grammar usage for the time period. More than 500.000
word entries were manually compared with existing FineReader dictionaries. Grammatical paradigms and word evolutions were
reviewed to add 159 historic grammar paradigms that were missing from the contemporary language models. Language models were
then compiled and tested on a control group of testing documents featuring old text.
To recognise the Fraktur style fonts, ABBYY created special classifiers, or alphabets, capable of recognising the Fraktur
symbols. As part of this effort, the ABBYY development teams collected a symbol image base with an average of 2500 symbol
samples for each symbol, a new alphabet pattern, and collected and input a sample test base representing 31000 pages of text
from different sources. Using the sample text, the recognition engine was "fine-tuned" to work with the subtle
features of the Fraktur alphabet (such as the ligatures, or connected letters). The new alphabet was then added to the
FineReader system and interface and tested extensively.
FineReader XIX was developed with the needs of universities and research center in mind. The product was developed in
cooperation with the worldwide METAe Project. METAe is a consortium of libraries and digitisation companies from across
Europe who are working together to create the METAe Engine, a software package specifically designed for organising the work
flow of the archiving and conversion of historical materials such as books, journals, magazines and newspapers. FineReader
XIX will provide a key component for archiving some of Europe's most priceless historical documents. Partners in the METAe
project include: the Univeristy of Innsbruck (Austria), University of Florence (Italy) Bibliotéque Nationale de
France, the National Library of Norway, the Friedrich-Ebert-Foundation (Germany), CCS Compact Computer Systeme (Germany), and Cornell University Library (USA).
FineReader XIX support for Fraktur includes
- Languages: German, English, French, Italian, and Spanish
- Fonts: Fraktur, Schwabacher, and a majority of Textura (Gothic) fonts
System Requirements für FineReader XIX:
- IBM compatible PC with Pentium processor (200 MHz)
- Microsoft Windows 98/Me/NT 4.0 with Service Pack 6/2000/XP/Server 2003
- 32 MB RAM (Windows 98/Me); 64 MB (Windows NT 4.0/2000/XP/Server 2003); additional 16 MB RAM required for each additional
processor in a multi-processor system
- 230 MB available hard disk space for typical installation; 70 MB for program operation
- CD-ROM drive
- SVGA graphics
- Keyboard, mouse or other input device
- Free USB port for hardware key
- 100% TWAIN-compatible scanner, digital camera, or fax modem
- Microsoft Internet Explorer 4.0 or higher (Microsoft Internet Explorer 5.01 is included in the delivery package)
Supported input formats
- BMP: black and white, grey, color
- PCX, DCX: black and white, grey, color
- JPEG: grey, color
- JPEG 2000, part1: grey, color
- PNG: black and white, grey, color
- TIFF: black and white, grey, color, multi-image. Methods of compression: Unpacked, CCITT Group 3, CCITT Group 3 FAX(2D),
CCITT Group4, PackBits, JPEG, ZIP
- PDF
Supported output formats
- MicrosoftWord XP, 2000, 97, 95
- RTF
- TXT
- Unicode Text
- Microsoft Excel XP, 2000, 97, 95
- HTML 3.2/4.0
- Unicode HTML 3.2/4.0
- DBF
- CSV
- PDF 3.0/4.0
Prices:
The following articles can be bought now directly from the supplier.
back
|