On 3/22/2015 5:04 PM, Majisha Parambath wrote: > As part of an assignment, we initially crawled and collected NSF and > NASA Polar Datasets using Nutch. We used the nutch dump command to dump > out the segments that were created as part of the crawl. > Now we have to index this data into Solr. I am using java -jar post.jar > filename to post to solr however after the execution I do not see my > file indexed and checking the log I found exceptions which I am > attaching with this mail.
Here's the first part of your exception: org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0x0 (at char #10, byte #-1) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) Solr is expecting UTF-8 characters, but the info you are sending it is in another character set, and includes characters outside the normal ASCII set. The error message indicates that it is XML data. If you know what character set the data actually uses for encoding, you can use XML methods to indicate the character set, and the XML libraries that Solr is utilizing can probably convert to UTF-8 automatically. http://www.w3schools.com/xml/xml_encoding.asp Thanks, Shawn