Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

apache/tika

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

Stars: 3,590Language: Java
apache/tika - GitHub Repository | PyPI Leaderboard