Client works with 6 vendors. Each sends invoices in completely different format.
First thought: "I'll need 6 separate workflows."
Then realized: Why?
Built one universal workflow. Handles all 6 formats. Plus works for new vendors without code changes.
THE SECRET:
Schema-based extraction doesn't care about format. Extracts semantically, not positionally.
Vendor A: Invoice total bottom right
Vendor B: Invoice total top left
Vendor C: Invoice total in table middle
Vendor D: Labeled "Amount Due"
Vendor E: Labeled "Balance"
Vendor F: Different language
Same schema finds "total amount" regardless of position, label, or language.
THE SCHEMA:
{
"vendor_name": "string",
"invoice_number": "string",
"invoice_date": "date",
"total_amount": "number",
"line_items": [{
"description": "string",
"quantity": "number",
"amount": "number"
}]
}
This extracts correctly from all 6 vendor formats. Modern processing understands semantic meaning.
THE WORKFLOW:
Gmail → Parse Document → Extract with schema → Validate → Switch on confidence → QuickBooks → Slack
One workflow. Six vendors. Zero maintenance when they change templates.
THE RESULTS:
Month 1: 6 vendors, ~180 invoices
Month 3: Added 2 new vendors, workflow handled automatically, ~280 invoices
Month 6: Now 11 vendors, workflow unchanged, ~450 invoices
Zero code changes for 5 new vendors.
TRADITIONAL APPROACH:
Would need 11 separate parsers to maintain. One template change from any vendor = emergency fix needed.
SCHEMA APPROACH:
One universal extraction. Vendor changes template? Still extracts correctly. Add new vendor? Already works.
CLIENT REACTION:
"Wait, when we add new vendors, we don't need to update anything?"
Exactly. That's semantic extraction.
THE LESSON:
Stop building format-specific parsers. Build semantic extractors. One workflow serves all formats. Scales infinitely without code changes.
Modern document processing understands WHAT you want extracted, not WHERE it appears.
How many vendor-specific parsers are you maintaining right now?