RE: Indexing a (File attached to a document)

Allison, Timothy B. Thu, 12 May 2016 10:19:49 -0700

If I understand the question correctly...

I'm assuming you are indexing rich documents (PDF/DOC/MSG, etc) with DIH's Tika 
handler.  Some of those documents have attachments.

If that's the case, all of the content of embedded docs _should_[0] be 
extracted, but then all of that content across the main document and the 
embedded documents is concatenated into one big string.

If you want to handle attachments with greater precision, the best bet is using 
SolrJ [1] in combination with Tika's RecursiveParserWrapper [2].  That wrapper 
returns a list of Metadata objects for each input file.  The list contains one 
Metadata object for each "document" (one for the container and one for each 
attachment).

So, if I'm right, and you'd like this as part of Solr's DIH, see [3].

[0] https://issues.apache.org/jira/browse/SOLR-7189
[1] https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

[2] 
http://stackoverflow.com/questions/36950382/how-to-extract-content-from-pst-file-using-apache-tika

[3] https://issues.apache.org/jira/browse/SOLR-7229 
-----Original Message-----
From: Reth RM [mailto:reth.ik...@gmail.com] 
Sent: Thursday, May 12, 2016 12:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing a (File attached to a document)

Could you please let us know which crawler are you using to fetch data from 
document and its attachment?

On Thu, May 12, 2016 at 3:26 PM, Solr User <sowmya741...@gmail.com> wrote:

> Hi
>
> If I index a document with a file attachment attached to it in solr, 
> can I visualise data of that attached file attachment also while 
> querying that particular document? Please help me on this
>
>
> Thanks & Regards
> Vidya Nadella
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-docum
> ent-tp4276334.html Sent from the Solr - User mailing list archive at 
> Nabble.com.
>

RE: Indexing a (File attached to a document)

Reply via email to