Olmocr
Visit Toololmocr is an open-source toolkit that linearizes PDFs and other image-based documents into clean, readable plain text or Markdown. It's designed for creating LLM datasets and training, handling complex formatting, equations, and tables.
At a glance
Trending
Also listed in