Sorry, attachments are not supported here :(

Anyway, I believe the misunderstanding resides in what you think you should mean "image indexing": actually, AFAIK, Tika indexes only a) the textual content of a given resource b) its metadata.
So

- for a JPG file (or in genetal, an image) you will get only its metadata
- for a compressed archive, Commons Compress API will decompress the archive and once did that, each file within the archive will be associated to a proper parser. So here actually it depends on the files (types) you have in your archive.

Best,
Andrea



Is that close to what you were thinking?

On 04/15/2015 05:16 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote:
Thanks Andrea. I can see that Tika1.5 supports both compressed (ZIP) and image (JPG) formats. If thats the case, why SolrCell could not index the documents of .zip and .jpg? Am I missing something here? No error is thrown in the overall process and the java program completes successfully. But when I query the Solr UI, only 8 files are indexed.

Attached is a simple screenshot of the files types I am trying to index.

Thanks & Regards
Vijay

On 15 April 2015 at 15:27, Andrea Gazzarini <a.gazzar...@gmail.com <mailto:a.gazzar...@gmail.com>> wrote:

    Hi Vijay,
    here you can find all supported formats by Tika, which is
    internally used by SolrCell:

     * https://tika.apache.org/*1.4*/formats.html
     * https://tika.apache.org/*1.5*/formats.html
     * https://tika.apache.org/*1.6*/formats.html
     * https://tika.apache.org/*1.7*/formats.html

    Best,
    Andrea




    On 04/15/2015 04:20 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote:

        Hi,

        I am trying to index various binary file types into Solr.
        However, some
        file types seems to be ignored and not getting indexed, though
        the metadata
        is being extracted successfuly for all the types.

        Specifically, zip files and jpg files are not getting indexed,
        where as
        pdf, MS office documents are getting indexed. Hence wondering
        whether there
        is a defined list of indexable file types.

        Moreover, I am just wondering why Solr could not index the jpg
        and zip
        documents when it was able to extract the metadata from those
        files?

        The code snippet is as below:

        contentStreamUpdateReq.addFile(file, fileType);
        contentStreamUpdateReq.setParam("literal.id
        <http://literal.id>", literalId);
        contentStreamUpdateReq.setParam("uprefix", "attr_");
        contentStreamUpdateReq.setParam("fmap.content", "content");
        contentStreamUpdateReq.setAction(AbstractUpdateRequest.ACTION.COMMIT,
        true,
        true);
        solrServer.request(contentStreamUpdateReq);

        Thanks & Regards
        Vijay




The contents of this e-mail are confidential and for the exclusive use of the intended recipient. If you receive this e-mail in error please delete it from your system immediately and notify us either by e-mail or telephone. You should not copy, forward or otherwise disclose the content of the e-mail. The views expressed in this communication may not necessarily be the view held by WHISHWORKS.

Reply via email to