Give AlbumentationsX a star on GitHub — it powers this leaderboard
Ready-made tokenizer library for working with GPT and tiktoken