huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Stars: 2,912Language: Python
Give AlbumentationsX a star on GitHub — it powers this leaderboard
Star on GitHubFreeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.