Hello, I'm trying to index email files with Solr (4.7.2)
The files have the extension .eml (message/rfc822) The mail body is correctly indexed but attachments are not indexed if they are not .txt files. If attachments are .txt files it works, but if attachment are .pdf of .docx files they are not indexed. I checked the extracted text by calling: curl " http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true&extractOnly=true&extractFormat=text " -F "myfile=@Test1.eml" The returned extracted text does not contain the content of the attachments if they are not .txt files. It is not a problem with the Apache Tika library not being able to process attachments, because running the standalone Apache Tika app by calling: java -jar tika-app-1.4.jar -t Test1.eml on my eml files correctly displays the attachments' text. Maybe is it a problem with how Tika is called by Solr ? Is there something to modify in the default configuration ? Thanx for any help ;) Olivier