OpenDataLoader PDF: Dual-Mode Processing Pipeline

How local deterministic extraction and AI hybrid mode work together

Triage PDF Input Merge PDF File (any type) PDF Parser (Java, deterministic) Layout Analysis (XY-Cut++) Safety Filters Page Triage (simple vs complex) OCR (80+ languages) Table AI (0.93 TEDS) Enrichment (formulas, charts) Merge Results Output JSON (bbox) Markdown HTML PDF (annotated)
Interactive Diagram: Hover over components to see details. Use the buttons above to trace each processing mode.