Re: tika integration exception and other related queries

Gary Taylor Thu, 09 Jun 2011 06:14:05 -0700

Naveen,

Not sure our requirement matches yours, but one of the things we indexis a "comment" item that can have one or more files attached to it. Toindex the whole thing as a single Solr document we create a zipfilecontaining a file with the comment details in it and any additionalattached files. This is submitted to Solr as a TEXT field in an XMLdoc, along with other meta-data fields from the comment. In our schemathe TEXT field is indexed but not stored, so when we search and get amatch back it doesn't contain all of the contents from the attachedfiles etc., only the stored fields in our schema. Admittedly, the usercan therefore get back a "comment" match with no indication as to WHEREthe match occurred (ie. was it in the meta-data or the contents of theattached files), but at the moment we're only interested in gettingappropriate matches, not explaining where the match is.


Hope that helps.

Kind regards,
Gary.



On 09/06/2011 03:00, Naveen Gupta wrote:

Hi Gary

It started working .. though i did not test for Zip files, but for rar
files, it is working fine ..

only thing what i wanted to do is to index the metadata (text mapped to
content) not store the data .... Also in search result, i want to filter the
stuffs ... and it started working fine .. i don't want to show the content
stuffs to the end user, since the way it extracts the information is not
very helpful to the user .. although we can apply few of the analyzers and
filters to remove the unnecessary tags ..still the information would not be
of much help .. looking for your opinion ... what you did in order to filter
out the content or are you showing the content extracted to the end user?

Even in case, we are showing the text part to the end user, how can i limit
the number of characters while querying the search results ... is there any
feature where we can achieve this ... the concept of snippet kind of thing
...

Thanks
Naveen

On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor<g...@inovem.com>  wrote:

Naveen,

For indexing Zip files with Tika, take a look at the following thread :


http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html

I got it to work with the 3.1 source and a couple of patches.

Hope this helps.

Regards,
Gary.



On 08/06/2011 04:12, Naveen Gupta wrote:

Hi Can somebody answer this ...

3. can somebody tell me an idea how to do indexing for a zip file ?

1. while sending docx, we are getting following error.

Re: tika integration exception and other related queries

Reply via email to