togethercomputer/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Stars: 4,924Language: Python
Give AlbumentationsX a star on GitHub — it powers this leaderboard
Star on GitHubThe RedPajama-Data repository contains code for preparing large datasets for training large language models.