apache/tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Stars: 3,590Language: Java
Give AlbumentationsX a star on GitHub — it powers this leaderboard
Star on GitHubThe Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).