RedPajama-Data
Visit ToolRedPajama-Data is an open-source Data & Analytics tool that provides code for preparing large datasets for training large language models. It includes tools for data processing, quality signal computation, and deduplication.
At a glance
Trending
Also listed in