OpenDataLoader PDF: Dual-Mode Processing Pipeline
How local deterministic extraction and AI hybrid mode work together
Local Mode (0.05s/page)
Hybrid Mode (0.90 accuracy)
Reset
Local Java Engine
Hybrid AI Backend
Triage
PDF Input
Merge
PDF File
(any type)
PDF Parser
(Java, deterministic)
Layout Analysis
(XY-Cut++)
Safety Filters
Page Triage
(simple vs complex)
OCR
(80+ languages)
Table AI
(0.93 TEDS)
Enrichment
(formulas, charts)
Merge
Results
Output
JSON (bbox)
Markdown
HTML
PDF (annotated)
Interactive Diagram:
Hover over components to see details. Use the buttons above to trace each processing mode.