Thanks Andrea. For image files and zip files, even metadata is not
available. Just to explain further, I have indexed a total of 10 files, out
of which a .jpg file and .zip file are present.

After the indexing process is complete, no information about either of
these files is present in the solr query UI when I give *.* as the query
parameters. Not even metadata is displayed. Infact in the response,
*numFound* is showing only 8 documents, which are the ones apart from zip
and jpg files.

Thanks & Regards
Vijay


On 15 April 2015 at 16:29, Andrea Gazzarini <a.gazzar...@gmail.com> wrote:

> Sorry, attachments are not supported here :(
>
> Anyway, I believe the misunderstanding resides in what you think you
> should mean "image indexing": actually, AFAIK, Tika indexes only a) the
> textual content of a given resource b) its metadata.
> So
>
> - for a JPG file (or in genetal, an image) you will get only its metadata
> - for a compressed archive, Commons Compress API will decompress the
> archive and once did that, each file within the archive will be associated
> to a proper parser. So here actually it depends on the files (types) you
> have in your archive.
>
> Best,
> Andrea
>
>
>
> Is that close to what you were thinking?
>
> On 04/15/2015 05:16 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote:
>
>> Thanks Andrea. I can see that Tika1.5 supports both compressed (ZIP) and
>> image (JPG) formats. If thats the case, why SolrCell could not index the
>> documents of .zip and .jpg? Am I missing something here?  No error is
>> thrown in the overall process and the java program completes successfully.
>> But when I query the Solr UI, only 8 files are indexed.
>>
>> Attached is a simple screenshot of the files types I am trying to index.
>>
>> Thanks & Regards
>> Vijay
>>
>> On 15 April 2015 at 15:27, Andrea Gazzarini <a.gazzar...@gmail.com
>> <mailto:a.gazzar...@gmail.com>> wrote:
>>
>>     Hi Vijay,
>>     here you can find all supported formats by Tika, which is
>>     internally used by SolrCell:
>>
>>      * https://tika.apache.org/*1.4*/formats.html
>>      * https://tika.apache.org/*1.5*/formats.html
>>      * https://tika.apache.org/*1.6*/formats.html
>>      * https://tika.apache.org/*1.7*/formats.html
>>
>>     Best,
>>     Andrea
>>
>>
>>
>>
>>     On 04/15/2015 04:20 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote:
>>
>>         Hi,
>>
>>         I am trying to index various binary file types into Solr.
>>         However, some
>>         file types seems to be ignored and not getting indexed, though
>>         the metadata
>>         is being extracted successfuly for all the types.
>>
>>         Specifically, zip files and jpg files are not getting indexed,
>>         where as
>>         pdf, MS office documents are getting indexed. Hence wondering
>>         whether there
>>         is a defined list of indexable file types.
>>
>>         Moreover, I am just wondering why Solr could not index the jpg
>>         and zip
>>         documents when it was able to extract the metadata from those
>>         files?
>>
>>         The code snippet is as below:
>>
>>         contentStreamUpdateReq.addFile(file, fileType);
>>         contentStreamUpdateReq.setParam("literal.id
>>         <http://literal.id>", literalId);
>>         contentStreamUpdateReq.setParam("uprefix", "attr_");
>>         contentStreamUpdateReq.setParam("fmap.content", "content");
>>         contentStreamUpdateReq.setAction(AbstractUpdateRequest.ACTION.
>> COMMIT,
>>         true,
>>         true);
>>         solrServer.request(contentStreamUpdateReq);
>>
>>         Thanks & Regards
>>         Vijay
>>
>>
>>
>>
>> The contents of this e-mail are confidential and for the exclusive use of
>> the intended recipient. If you receive this e-mail in error please delete
>> it from your system immediately and notify us either by e-mail or
>> telephone. You should not copy, forward or otherwise disclose the content
>> of the e-mail. The views expressed in this communication may not
>> necessarily be the view held by WHISHWORKS.
>>
>
>

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.

Reply via email to