Training Corpus
The Zera AI engine was built on the premise that a model trained on enough real financial documents would outperform any rule-based system — including one specifically tuned for a particular bank's format. The training corpus reflects this approach:
Zera AI Training Data
The 2.8 million bank statements in the training set cover hundreds of institutions — North American banks (Chase, Bank of America, Wells Fargo, TD, RBC, Scotiabank, CIBC, BMO), UK banks, Australian banks, and many regional credit unions and community banks. Format diversity in the training set is the primary reason the model handles previously unseen layouts.
The 847 million transactions in the training set cover the full range of transaction description styles — from terse ATM references to multi-line ACH entries with originator information. This volume is what enables reliable description standardization across institutions.
Accuracy Methodology
Accuracy figures without methodology are marketing noise. Here's what the numbers actually measure:
| Metric | Value | What It Measures |
|---|---|---|
| Field-level accuracy (digital PDF) | 99.6% | Each individual extracted value compared against the source document. Measured across a held-out test set of 50,000 digital PDFs. |
| OCR accuracy (scanned PDF) | 95%+ | Character-level accuracy on scanned document images via Zera OCR. Higher on clean 300 DPI scans; lower bound at 150 DPI. |
| Date field accuracy | 99.8% | Date values extracted and normalized correctly. Higher than average due to arithmetic validation against statement period. |
| Amount field accuracy | 99.9% | Numeric amounts. Highest accuracy field due to arithmetic cross-checking (opening balance + transactions = closing balance). |
| Description accuracy | 98.7% | Text descriptions captured correctly. Lower than numeric fields due to truncation and bank-specific formatting variations. |
What 99.6% means in practice: A 100-transaction statement with 400 field values has an average of 1.6 errors at 99.6% field accuracy. In most cases, those errors are in description fields (minor truncations), not amount or date fields where errors would affect reconciliation.
See 99.6% accuracy on your statements
Test Zera AI on your real client PDFs. Upload any bank format — no setup, no configuration. One-week trial.
Try for one weekDynamic Format Adaptation — How It Works
Template tools map specific PDF regions to specific fields. When a bank shifts columns, adds a new field, or changes the page header structure, the template breaks. Zera AI takes a different approach:
Header Detection
The model identifies column headers by semantic understanding — recognizing "Date", "Description", "Withdrawals", "Deposits", "Balance" and their equivalents across languages and abbreviations.
Transaction Row Identification
Rows with date-amount-description patterns are identified as transactions even when interspersed with running totals, subtotals, or bank-added notes that aren't transactions.
Format Change Resilience
When a bank updates its PDF format — changing column order, adding a new fee column, or modifying the header structure — the model identifies the new structure without configuration.
Arithmetic Validation
Extracted data is validated against arithmetic relationships: opening balance + debits - credits = closing balance. Discrepancies indicate potential extraction errors and are flagged.
AI Categorization — Transaction Classification
Beyond extraction, Zera AI classifies every transaction against your QuickBooks or Xero chart of accounts. This is a separate model layer trained specifically on accounting categorization patterns.
The categorization model uses transaction description, amount, and counterpart (payee) as inputs. It outputs a suggested category and a confidence score. High-confidence suggestions are auto-accepted; low-confidence items are flagged for review. The model learns from corrections over time. Full technical details on the AI categorization reference page.
Frequently Asked Questions
What data was Zera AI trained on?
3.2 million financial documents: 2.8 million bank statements from hundreds of institutions, 420,000 invoices, and 847 million transactions. Training data covers US, Canadian, UK, and Australian financial institutions.
Why doesn't Zera AI require template setup?
It identifies document structure dynamically using semantic header detection and transaction row recognition — not fixed field maps. It adapts to format changes automatically without configuration.
What does 99.6% accuracy mean?
Field-level accuracy on digital PDFs — each extracted value measured against the source. Amount fields achieve 99.9% due to arithmetic validation. See the accuracy methodology table above for breakdown by field type.
How does Zera AI compare to tools like Docsumo or Klippa?
Template-based tools like Docsumo and Klippa require template training for each new format. Zera AI requires zero template setup. See the full comparison at zerabooks.com/alternatives.