Thanks Andrea. For image files and zip files, even metadata is not available. Just to explain further, I have indexed a total of 10 files, out of which a .jpg file and .zip file are present.
After the indexing process is complete, no information about either of these files is present in the solr query UI when I give *.* as the query parameters. Not even metadata is displayed. Infact in the response, *numFound* is showing only 8 documents, which are the ones apart from zip and jpg files. Thanks & Regards Vijay On 15 April 2015 at 16:29, Andrea Gazzarini <a.gazzar...@gmail.com> wrote: > Sorry, attachments are not supported here :( > > Anyway, I believe the misunderstanding resides in what you think you > should mean "image indexing": actually, AFAIK, Tika indexes only a) the > textual content of a given resource b) its metadata. > So > > - for a JPG file (or in genetal, an image) you will get only its metadata > - for a compressed archive, Commons Compress API will decompress the > archive and once did that, each file within the archive will be associated > to a proper parser. So here actually it depends on the files (types) you > have in your archive. > > Best, > Andrea > > > > Is that close to what you were thinking? > > On 04/15/2015 05:16 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote: > >> Thanks Andrea. I can see that Tika1.5 supports both compressed (ZIP) and >> image (JPG) formats. If thats the case, why SolrCell could not index the >> documents of .zip and .jpg? Am I missing something here? No error is >> thrown in the overall process and the java program completes successfully. >> But when I query the Solr UI, only 8 files are indexed. >> >> Attached is a simple screenshot of the files types I am trying to index. >> >> Thanks & Regards >> Vijay >> >> On 15 April 2015 at 15:27, Andrea Gazzarini <a.gazzar...@gmail.com >> <mailto:a.gazzar...@gmail.com>> wrote: >> >> Hi Vijay, >> here you can find all supported formats by Tika, which is >> internally used by SolrCell: >> >> * https://tika.apache.org/*1.4*/formats.html >> * https://tika.apache.org/*1.5*/formats.html >> * https://tika.apache.org/*1.6*/formats.html >> * https://tika.apache.org/*1.7*/formats.html >> >> Best, >> Andrea >> >> >> >> >> On 04/15/2015 04:20 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote: >> >> Hi, >> >> I am trying to index various binary file types into Solr. >> However, some >> file types seems to be ignored and not getting indexed, though >> the metadata >> is being extracted successfuly for all the types. >> >> Specifically, zip files and jpg files are not getting indexed, >> where as >> pdf, MS office documents are getting indexed. Hence wondering >> whether there >> is a defined list of indexable file types. >> >> Moreover, I am just wondering why Solr could not index the jpg >> and zip >> documents when it was able to extract the metadata from those >> files? >> >> The code snippet is as below: >> >> contentStreamUpdateReq.addFile(file, fileType); >> contentStreamUpdateReq.setParam("literal.id >> <http://literal.id>", literalId); >> contentStreamUpdateReq.setParam("uprefix", "attr_"); >> contentStreamUpdateReq.setParam("fmap.content", "content"); >> contentStreamUpdateReq.setAction(AbstractUpdateRequest.ACTION. >> COMMIT, >> true, >> true); >> solrServer.request(contentStreamUpdateReq); >> >> Thanks & Regards >> Vijay >> >> >> >> >> The contents of this e-mail are confidential and for the exclusive use of >> the intended recipient. If you receive this e-mail in error please delete >> it from your system immediately and notify us either by e-mail or >> telephone. You should not copy, forward or otherwise disclose the content >> of the e-mail. The views expressed in this communication may not >> necessarily be the view held by WHISHWORKS. >> > > -- The contents of this e-mail are confidential and for the exclusive use of the intended recipient. If you receive this e-mail in error please delete it from your system immediately and notify us either by e-mail or telephone. You should not copy, forward or otherwise disclose the content of the e-mail. The views expressed in this communication may not necessarily be the view held by WHISHWORKS.