Hello Majisha,
Nutch' Solr indexing plugin has support for stripping non-utf8 character
codepoints from the input, but it does so only on the content field if i
remember correctly.
However, that stripping method was not built with the invalid middle byte
exception in mind, and i have not seen
On 3/22/2015 5:04 PM, Majisha Parambath wrote:
> As part of an assignment, we initially crawled and collected NSF and
> NASA Polar Datasets using Nutch. We used the nutch dump command to dump
> out the segments that were created as part of the crawl.
> Now we have to index this data into Solr. I a
Hello,
As part of an assignment, we initially crawled and collected NSF and NASA
Polar Datasets using Nutch. We used the nutch dump command to dump out the
segments that were created as part of the crawl.
Now we have to index this data into Solr. I am using java -jar post.jar
filename to post to