Thanks for the reply, will find out more about it. Currently I am able to retrieve the normal Metadata of the email, but not the Metadata of the attachments which are part of the contents in the EML file, which looks something like this.
--000000000000d8b77b057d59ca19-- --000000000000d8b77e057d59ca1b Content-Type: application/pdf; name="file1.pdf" Content-Disposition: attachment; filename="file1.pdf" Content-Transfer-Encoding: base64 Content-ID: <f_jpurtpnk0> X-Attachment-Id: f_jpurtpnk0 Regards, Edwin On Sat, 3 Aug 2019 at 05:38, Tim Allison <talli...@apache.org> wrote: > I'd strongly recommend rolling your own ingest code. See Erick's > superb: https://lucidworks.com/post/indexing-with-solrj/ > > You can easily get attachments via the RecursiveParserWrapper, e.g. > > https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java#L351 > > This will return a list of Metadata objects; the first one will be the > main/container, each other entry will be an attachment. Let us know > if you have any questions/surprises. There are a couple of todos for > .eml... > > On Fri, Aug 2, 2019 at 3:43 AM Jan Høydahl <jan....@cominvent.com> wrote: > > > > Try the Apache Tika mailing list. > > > > -- > > Jan Høydahl, search solution architect > > Cominvent AS - www.cominvent.com > > > > > 2. aug. 2019 kl. 05:01 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com > >: > > > > > > Hi, > > > > > > Does anyone knows if this can be done on the Solr side? > > > Or it has to be done on the Tika side? > > > > > > Regards, > > > Edwin > > > > > > On Thu, 1 Aug 2019 at 09:38, Zheng Lin Edwin Yeo <edwinye...@gmail.com > > > > > wrote: > > > > > >> Hi, > > >> > > >> Would like to check, Is there anyway which we can detect the number of > > >> attachments and their names during indexing of EML files in Solr, and > index > > >> those information into Solr? > > >> > > >> Currently, Solr is able to use Tika and Tesseract OCR to extract the > > >> contents of the attachments. However, I could not find the information > > >> about the number of attachments in the EML file and what are their > filename. > > >> > > >> I am using Solr 7.6.0 in production, and also trying out on the new > Solr > > >> 8.2.0. > > >> > > >> Regards, > > >> Edwin > > >> > > >