On 2/23/2013 12:27 AM, Raja Kulasekaran wrote:
Hi,
I got the exception *"Invalid Date String" *as I run the crawl against
webpages .
*
*
Each one use their own date format and as a developer we don't have a
control on it. Instead of throwing exception, It should suppose to convert
into a Solr based format .
Can you suggest me how do I overcome that ?
Solr doesn't do any crawling, so you must be using another piece of
software that talks to Solr, most likely Nutch. Date conversion will
have to be handled in the program that feeds the data to Solr.
Questions for Nutch will find the best support on the Nutch user mailing
list.
http://nutch.apache.org/mailing_lists.html
I did some searching and found something that says you'd have to write
some code:
http://stackoverflow.com/questions/10445095/nutch-solr-formatting-date-from-web-page-metadata-into-correct-solr-format
Note that if you can get a date into a java Date object, I'm fairly sure
that you can get the properly formatted string for Solr with this java
code, where dateObject is the Date in question:
SimpleDateFormat formatUTC = new SimpleDateFormat
("yyyy-MMM-dd'T'HH:mm:ss'Z'");
formatUTC.setTimeZone(TimeZone.getTimeZone("UTC"));
return formatUTC.format(dateObject);
Further searching has turned up a possible plugin for handling dates,
but it does say that it has no support for full timestamps. This plugin
has not been added to any released Nutch version, but the source code is
attached to the issue:
https://issues.apache.org/jira/browse/NUTCH-1406
Thanks,
Shawn