Pragmatic_segmenter
Visit ToolPragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages. It does not use machine learning and requires no training data.
At a glance
Trending
Pragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages. It does not use machine learning and requires no training data.
Trending
About
Pragmatic Segmenter is a rule-based sentence boundary detection gem designed to segment text into sentences across multiple languages without requiring machine learning or training data. It aims to provide a "real-world" segmenter that performs well even when the format and domain of the input text are unknown. The tool focuses on robust language support, going beyond English-centric solutions, and includes text cleaning and preprocessing capabilities. It is opinionated, specifically developed for segmenting texts to create translation memories, and handles ambiguous sentence boundaries conservatively to maintain coherence. The project also features "Golden Rules," a set of tests for evaluating segmenter accuracy on edge cases, which are available for download.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending