We built Docs4U AI to eliminate the hours Indian exporters, importers, manufacturers, and CA firms spend manually typing document data — one Shipping Bill, one BOE, one challan at a time.
Every Indian exporter knows the routine: ICEGATE generates a Shipping Bill PDF, and someone in the team spends 45 minutes manually entering IEC codes, port codes, HS codes, and FOB values into Excel — row by row, document by document.
Textile manufacturers do the same with Grey Despatch Challans. CA firms do it with tax invoices. CHA firms do it with Bills of Entry. The same tedious, error-prone process, happening thousands of times a day across India's trade and compliance ecosystem.
Docs4U AI was built to change that. Purpose-trained for India's trade document formats — ICEGATE, GST, DGFT — it extracts every field accurately in under 15 seconds.
Every document is purpose-built for the specific format and field requirements of India's trade and tax ecosystem.
The principles behind every feature we build.
A wrong IEC or GSTIN in an export register causes real compliance problems. We prioritise extraction accuracy above everything — including processing time.
We don't build generic document tools. Every extraction model is trained on the specific format, field names, and compliance requirements of Indian trade and GST documents.
Your Shipping Bills and invoices contain confidential business information. All files are stored in your private account, encrypted in transit, and never shared or used to train models.
Industry-leading AI models combined with India-specific domain knowledge and validation logic.
Our primary extraction engine reads document structure contextually — understanding that "FOB VALUE" on one Shipping Bill and "F.O.B. AMOUNT" on another are the same field. Handles multi-page, multi-item documents natively without rigid templates.
Streaming ExtractionFor scanned PDFs — mobile camera photos of printed documents, low-resolution fax scans, old ICEGATE printouts — our vision pipeline reads the image directly without traditional OCR pre-processing, handling complex table layouts accurately.
PDF + Image SupportGSTIN (15-char), IEC (10-char), CIN (21-char), PAN — extracted values are validated against format rules and corrected for common OCR errors (O→0, I→1). State code is cross-checked against the company address to catch transposition errors.
GSTIN · IEC · CIN · PANOutput isn't just dumped into cells. IEC, GSTIN, and Challan numbers are stored as text (no leading-zero loss), dates normalised to DD-MMM-YYYY, columns auto-fitted, header rows frozen — ready for direct import into your ERP or Tally.
openpyxl · Auto-FormatUpload 100 PDFs at once. Processing runs in a Celery task queue, so you're not waiting for each document. Come back to a single consolidated Excel download when the batch completes.
Celery · RedisDocument-level extraction results are cached in Redis (1-hour TTL). Column re-selection reuses cached data — no re-calling the AI. JSON repair logic recovers from any truncated responses, ensuring no failed extractions go unrecovered.
Redis · json-repairProcess your first 3 documents free — no credit card, no setup. See the accuracy yourself.