Re: tika integration exception and other related queries

2011-06-09 Thread Naveen Gupta
Hi Gary, Similar thing we are doing, but we are not creating an XML doc, rather we are leaving TIKA to extract the content and depends on dynamic fields. We are not storing the text as well. But not sure if in future that would be the case. What about microsoft 7 and later related attachments. Is

Re: tika integration exception and other related queries

2011-06-09 Thread Gary Taylor
Naveen, Not sure our requirement matches yours, but one of the things we index is a "comment" item that can have one or more files attached to it. To index the whole thing as a single Solr document we create a zipfile containing a file with the comment details in it and any additional attach

Re: tika integration exception and other related queries

2011-06-08 Thread Naveen Gupta
Hi Gary It started working .. though i did not test for Zip files, but for rar files, it is working fine .. only thing what i wanted to do is to index the metadata (text mapped to content) not store the data Also in search result, i want to filter the stuffs ... and it started working fine .

Re: tika integration exception and other related queries

2011-06-08 Thread Gary Taylor
Naveen, For indexing Zip files with Tika, take a look at the following thread : http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html I got it to work with the 3.1 source and a couple of patches. Hope this helps. Regards, Gary. On 08/

tika integration exception and other related queries

2011-06-07 Thread Naveen Gupta
Hi Can somebody answer this ... 3. can somebody tell me an idea how to do indexing for a zip file ? 1. while sending docx, we are getting following error. java.lang. > > NumberFormatException: For input string: "2011-01-27T07:18:00Z" > at > java.lang.NumberFormatException.forInputString(