Hi,
solrconfig.xml (especially if you didn't touch it) should be good. What
about the schema? Are you using the one that comes with the download
bundle, too?
I don't see the stacktrace..did you forget to paste it?
Best,
Andrea
On 04/14/2015 06:06 PM, Vijaya Narayana Reddy Bhoomi Reddy wrote:
Hi,
Here are the solr-config xml and the error log from Solr logs for your
reference. As mentioned earlier, I didnt make any changes to the
solr-config.xml as I am using the xml file out of the box one that
came with the default installation.
Please let me know your thoughts on why these issues are occuring.
Thanks & Regards
Vijay
*Vijay Bhoomireddy*, Big Data Architect
1000 Great West Road, Brentford, London, TW8 9DW
*T:+44 20 3475 7980*
*M:**+44 7481 298 360*
*W: *ww <http://www.whishworks.com/>w.whishworks.com
<http://www.whishworks.com/>
<https://www.linkedin.com/company/whishworks><http://www.whishworks.com/blog/><https://twitter.com/WHISHWORKS><https://www.facebook.com/whishworksit>
On 14 April 2015 at 15:57, Vijaya Narayana Reddy Bhoomi Reddy
<vijaya.bhoomire...@whishworks.com
<mailto:vijaya.bhoomire...@whishworks.com>> wrote:
Hi,
I am trying to index PDF and Microsoft Office files (.doc, .docx,
.ppt, .pptx, .xlx, and .xlx) files into Solr. I am facing the
following issues. Request to please let me know what is going
wrong with the indexing process.
I am using solr 4.10.2 and using the default example server
configuration that comes with Solr distribution.
PDF Files - Indexing as such works fine, but when I query using
*.* in the Solr Query console, metadata information is displayed
properly. However, the PDF content field is empty. This is
happening for all PDF files I have tried. I have tried with some
proprietary files, PDF eBooks etc. Whatever be the PDF file,
content is not being displayed.
MS Office files - For some office files, everything works perfect
and the extracted content is visible in the query console.
However, for others, I see the below error message during the
indexing process.
*Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser*
*
*
I am using SolrJ to index the documents and below is the code
snippet related to indexing. Please let me know where the issue is
occurring.
static String solrServerURL =
"http://localhost:8983/solr";
static SolrServer solrServer = new HttpSolrServer(solrServerURL);
static ContentStreamUpdateRequest
indexingReq = new ContentStreamUpdateRequest("/update/extract");
indexingReq.addFile(file, fileType);
indexingReq.setParam("literal.id <http://literal.id>", literalId);
indexingReq.setParam("uprefix", "attr_");
indexingReq.setParam("fmap.content", "content");
indexingReq.setParam("literal.fileurl", fileURL);
indexingReq.setAction(AbstractUpdateRequest.ACTION.COMMIT, true,
true);
solrServer.request(indexingReq);
Thanks & Regards
Vijay
The contents of this e-mail are confidential and for the exclusive use
of the intended recipient. If you receive this e-mail in error please
delete it from your system immediately and notify us either by e-mail
or telephone. You should not copy, forward or otherwise disclose the
content of the e-mail. The views expressed in this communication may
not necessarily be the view held by WHISHWORKS.