Hi Tim,

Regarding the returning of the list of Metadata objects, is the code
suppose to include the information on the number of attachments in the
particular email and/or the name of the attachment?
For example, if there are 3 attachments in the email, we should be able to
see immediately from the Metadata that there are attachments, and there are
3 of them.

Thank you.

Regards,
Edwin

On Sat, 3 Aug 2019 at 07:19, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Thanks for the reply, will find out more about it.
>
> Currently I am able to retrieve the normal Metadata of the email, but not
> the Metadata of the attachments which are part of the contents in the EML
> file, which looks something like this.
>
> --000000000000d8b77b057d59ca19--
>
> --000000000000d8b77e057d59ca1b
> Content-Type: application/pdf; name="file1.pdf"
> Content-Disposition: attachment; filename="file1.pdf"
> Content-Transfer-Encoding: base64
> Content-ID: <f_jpurtpnk0>
> X-Attachment-Id: f_jpurtpnk0
>
> Regards,
> Edwin
>
> On Sat, 3 Aug 2019 at 05:38, Tim Allison <talli...@apache.org> wrote:
>
>> I'd strongly recommend rolling your own ingest code.  See Erick's
>> superb: https://lucidworks.com/post/indexing-with-solrj/
>>
>> You can easily get attachments via the RecursiveParserWrapper, e.g.
>>
>> https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java#L351
>>
>> This will return a list of Metadata objects; the first one will be the
>> main/container, each other entry will be an attachment.  Let us know
>> if you have any questions/surprises.  There are a couple of todos for
>> .eml...
>>
>> On Fri, Aug 2, 2019 at 3:43 AM Jan Høydahl <jan....@cominvent.com> wrote:
>> >
>> > Try the Apache Tika mailing list.
>> >
>> > --
>> > Jan Høydahl, search solution architect
>> > Cominvent AS - www.cominvent.com
>> >
>> > > 2. aug. 2019 kl. 05:01 skrev Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>:
>> > >
>> > > Hi,
>> > >
>> > > Does anyone knows if this can be done on the Solr side?
>> > > Or it has to be done on the Tika side?
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> > > On Thu, 1 Aug 2019 at 09:38, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> > > wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> Would like to check, Is there anyway which we can detect the number
>> of
>> > >> attachments and their names during indexing of EML files in Solr,
>> and index
>> > >> those information into Solr?
>> > >>
>> > >> Currently, Solr is able to use Tika and Tesseract OCR to extract the
>> > >> contents of the attachments. However, I could not find the
>> information
>> > >> about the number of attachments in the EML file and what are their
>> filename.
>> > >>
>> > >> I am using Solr 7.6.0 in production, and also trying out on the new
>> Solr
>> > >> 8.2.0.
>> > >>
>> > >> Regards,
>> > >> Edwin
>> > >>
>> >
>>
>

Reply via email to