If I understand the question correctly... I'm assuming you are indexing rich documents (PDF/DOC/MSG, etc) with DIH's Tika handler. Some of those documents have attachments.
If that's the case, all of the content of embedded docs _should_[0] be extracted, but then all of that content across the main document and the embedded documents is concatenated into one big string. If you want to handle attachments with greater precision, the best bet is using SolrJ [1] in combination with Tika's RecursiveParserWrapper [2]. That wrapper returns a list of Metadata objects for each input file. The list contains one Metadata object for each "document" (one for the container and one for each attachment). So, if I'm right, and you'd like this as part of Solr's DIH, see [3]. [0] https://issues.apache.org/jira/browse/SOLR-7189 [1] https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ [2] http://stackoverflow.com/questions/36950382/how-to-extract-content-from-pst-file-using-apache-tika [3] https://issues.apache.org/jira/browse/SOLR-7229 -----Original Message----- From: Reth RM [mailto:reth.ik...@gmail.com] Sent: Thursday, May 12, 2016 12:41 PM To: solr-user@lucene.apache.org Subject: Re: Indexing a (File attached to a document) Could you please let us know which crawler are you using to fetch data from document and its attachment? On Thu, May 12, 2016 at 3:26 PM, Solr User <sowmya741...@gmail.com> wrote: > Hi > > If I index a document with a file attachment attached to it in solr, > can I visualise data of that attached file attachment also while > querying that particular document? Please help me on this > > > Thanks & Regards > Vidya Nadella > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-docum > ent-tp4276334.html Sent from the Solr - User mailing list archive at > Nabble.com. >