OCR accuracy benchmarks by document type, image quality requirements, DPI thresholds, and how Zera OCR handles degraded scans across bank statements, invoices, checks, and financial statements.
Accuracy is measured as field-level extraction match against ground-truth human review on 10,000 documents per type. "Accuracy" means the extracted value exactly matches the correct value — no partial credit.
| Document Type | Digital PDF Accuracy | Scanned PDF Accuracy | Image (JPG/PNG) |
|---|---|---|---|
| Bank Statements | 99.8% | 96.2% | 95.4% |
| Invoices | 99.6% | 95.8% | 94.1% |
| Checks (MICR fields) | 99.1% | 97.3% | 93.7% |
| Financial Statements | 99.4% | 94.6% | 93.2% |
| Fax-quality scans (<150 DPI) | N/A | 87.3% | 82.1% |
* All scanned document benchmarks at 200 DPI. Fax-quality benchmark at 120 DPI. Data from Q1 2025 internal accuracy study.
Zera OCR performs best with clean, high-resolution inputs. These are the minimum and recommended specifications for each input type.
| Parameter | Minimum | Recommended | Notes |
|---|---|---|---|
| Resolution (DPI) | 150 DPI | 200–300 DPI | Below 150 DPI: accuracy drops sharply, especially for MICR and small print |
| Color mode | Grayscale | Grayscale or color | Black-and-white bitonal acceptable for standard print; color helps logo/header detection |
| File size (per page) | No minimum | 50KB–3MB/page | Files >10MB/page are auto-compressed before OCR |
| Page rotation | Any angle | 0° (upright) | Auto-deskew corrects up to ±15° rotation; extreme angles reduce accuracy |
| Contrast ratio | Low acceptable | High contrast | Low contrast (fax quality) processed but with lower field accuracy |
| JPG compression | Quality 60+ | Quality 85+ | Heavy compression artifacts degrade character edges; use PNG for critical docs |
Zera OCR processes scanned PDFs, images, and fax-quality documents in under 30 seconds.
Try for one week →Zera OCR accepts all common document formats used in accounting workflows. No pre-processing or format conversion is required by the user.
PDFs with embedded text are processed via direct text extraction (no OCR needed). Accuracy 99.6%+. Fastest processing path — typically under 5 seconds per page.
PDFs containing scanned images trigger Zera OCR. Each page is rasterized, deskewed, and processed field-by-field. Accuracy 94–97% at 200 DPI.
Direct image upload supported. Zera OCR reads EXIF orientation metadata and auto-rotates before processing. Recommended quality 85+ to prevent artifact-related errors.
Lossless PNG format ideal for screenshots and digital-native documents. Supports transparency (flattened to white before OCR). No quality degradation from compression.
Password provided at upload time is used for session-only decryption. Password is never stored, logged, or transmitted beyond the decryption step.
Multi-page PDFs processed page by page. Pages are stitched into a single transaction dataset after extraction. No page count limit on upload.
Zera OCR does not simply run raw character recognition. It applies a financial document-specific pipeline designed to maximize accuracy on the fields that matter most to accountants.
| Step | Process | Purpose |
|---|---|---|
| 1. Ingestion | Format detection, password decryption, rasterization | Normalize input to image form for OCR |
| 2. Pre-processing | Deskew, denoising, contrast normalization | Maximize character edge clarity |
| 3. Layout Analysis | Region detection: header, table body, footer, MICR zone | Apply appropriate OCR model per region |
| 4. Field OCR | Character recognition per field with confidence scoring | Extract and score each individual field |
| 5. Arithmetic Validation | Running totals, balance checks, line item sums verified | Catch transposition errors and OCR misreads |
| 6. Output Assembly | Fields merged with extraction schema, low-confidence flags applied | Produce clean, structured output for export |
Not all documents arrive in ideal condition. Zera OCR applies fallback strategies for degraded inputs rather than failing silently.
Document is processed with a warning flag. Accuracy degrades to 82–87%. Fields with <70 confidence score are highlighted for manual review in the export.
Auto-deskew corrects up to ±15°. Pages beyond this range are flagged. Users can manually rotate pages before reprocessing via the document preview tool.
Contrast normalization applied. For extreme cases (thermally faded, water-damaged), a "manual review required" flag is applied to affected pages.
If extracted totals don't match sum of line items, the discrepancy is reported as an extraction warning. The raw OCR values are still exported so users can identify the error.
Zera OCR requires a minimum of 150 DPI for acceptable accuracy. 200–300 DPI is recommended for optimal results. Below 150 DPI, character recognition degrades significantly, particularly for fine print like routing numbers on checks.
Zera OCR is optimized for printed financial documents. Handwritten amounts on checks are supported with 88% accuracy. Fully handwritten invoices or statements are not supported — the document must be printed or typed.
Yes. Zera Books can process password-protected PDFs. Users provide the password at upload time and it is used only for that session — it is never stored or logged.
Low-confidence OCR fields are flagged in the output with a yellow highlight. The extracted value is included but marked as uncertain. Users can review and correct these fields before exporting to QuickBooks or Xero.
Explore related Zera Books features and documentation.
Zera OCR extracts data from scanned PDFs, images, and fax-quality documents with 95%+ accuracy — no templates, no configuration.
Try for one week →