Sorry, attachments are not supported here :(
Anyway, I believe the misunderstanding resides in what you think you
should mean "image indexing": actually, AFAIK, Tika indexes only a) the
textual content of a given resource b) its metadata.
So
- for a JPG file (or in genetal, an image) you will get only its metadata
- for a compressed archive, Commons Compress API will decompress the
archive and once did that, each file within the archive will be
associated to a proper parser. So here actually it depends on the files
(types) you have in your archive.
Best,
Andrea
Is that close to what you were thinking?
On 04/15/2015 05:16 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote:
Thanks Andrea. I can see that Tika1.5 supports both compressed (ZIP)
and image (JPG) formats. If thats the case, why SolrCell could not
index the documents of .zip and .jpg? Am I missing something here? No
error is thrown in the overall process and the java program completes
successfully. But when I query the Solr UI, only 8 files are indexed.
Attached is a simple screenshot of the files types I am trying to index.
Thanks & Regards
Vijay
On 15 April 2015 at 15:27, Andrea Gazzarini <a.gazzar...@gmail.com
<mailto:a.gazzar...@gmail.com>> wrote:
Hi Vijay,
here you can find all supported formats by Tika, which is
internally used by SolrCell:
* https://tika.apache.org/*1.4*/formats.html
* https://tika.apache.org/*1.5*/formats.html
* https://tika.apache.org/*1.6*/formats.html
* https://tika.apache.org/*1.7*/formats.html
Best,
Andrea
On 04/15/2015 04:20 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote:
Hi,
I am trying to index various binary file types into Solr.
However, some
file types seems to be ignored and not getting indexed, though
the metadata
is being extracted successfuly for all the types.
Specifically, zip files and jpg files are not getting indexed,
where as
pdf, MS office documents are getting indexed. Hence wondering
whether there
is a defined list of indexable file types.
Moreover, I am just wondering why Solr could not index the jpg
and zip
documents when it was able to extract the metadata from those
files?
The code snippet is as below:
contentStreamUpdateReq.addFile(file, fileType);
contentStreamUpdateReq.setParam("literal.id
<http://literal.id>", literalId);
contentStreamUpdateReq.setParam("uprefix", "attr_");
contentStreamUpdateReq.setParam("fmap.content", "content");
contentStreamUpdateReq.setAction(AbstractUpdateRequest.ACTION.COMMIT,
true,
true);
solrServer.request(contentStreamUpdateReq);
Thanks & Regards
Vijay
The contents of this e-mail are confidential and for the exclusive use
of the intended recipient. If you receive this e-mail in error please
delete it from your system immediately and notify us either by e-mail
or telephone. You should not copy, forward or otherwise disclose the
content of the e-mail. The views expressed in this communication may
not necessarily be the view held by WHISHWORKS.