Re: Pattern for extracting text from a rich document and an associated metadata file

2015-03-04 Thread Ahmet Arslan
Hi Yavar, I would stick with Erik's post : http://lucidworks.com/blog/indexing-with-solrj/ Ahmet On Wednesday, March 4, 2015 12:05 PM, Yavar Husain wrote: What is the best pattern to index the following kind of data: HarryPotter.PDF HarryPotter.txt Avengers.Docx Avengers.txt For each of

Pattern for extracting text from a rich document and an associated metadata file

2015-03-04 Thread Yavar Husain
What is the best pattern to index the following kind of data: HarryPotter.PDF HarryPotter.txt Avengers.Docx Avengers.txt For each of the above file the meta data lies in the text file having same name as the rich document (as can be seen above). (1) Now the brute force method that I can think o