Apache TikaApache Tika
Apache TikaCertified

This sub-group of plugins contains tasks for using Apache Tika. The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).