Hi Yavar, I would stick with Erik's post : http://lucidworks.com/blog/indexing-with-solrj/
Ahmet On Wednesday, March 4, 2015 12:05 PM, Yavar Husain <yavarhus...@gmail.com> wrote: What is the best pattern to index the following kind of data: HarryPotter.PDF HarryPotter.txt Avengers.Docx Avengers.txt For each of the above file the meta data lies in the text file having same name as the rich document (as can be seen above). (1) Now the brute force method that I can think of is extract text from rich document and extract meta data from the associated txt file, club them to form an xml and send it to Solr for indexing. (2) Another thing that I can think of is to use SolrJ and just programatically read the PDF and the txt file and send that to Solr. If this is the case then is it possible to send PDF directly to Solr without having to extract text first in my SolrJ program. Is there something better that I can do quickly? I know if I just had rich documents I would have used the Tika-Solr integration/requestHandlers to do the job. Any help would be appreciated. Thanks, Yavar