Give AlbumentationsX a star on GitHub — it powers this leaderboard
extract meaningful text content from html of web page