Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

togethercomputer/RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Stars: 4,924Language: Python
togethercomputer/RedPajama-Data - GitHub Repository | PyPI Leaderboard