Give AlbumentationsX a star on GitHub — it powers this leaderboard
Apache Nutch is an extensible and scalable web crawler