On 2/23/2013 12:27 AM, Raja Kulasekaran wrote:
Hi,

I got the exception *"Invalid Date String" *as I run the crawl against
webpages .
*
*
Each one use their own date format and as a developer we don't have a
control on it. Instead of throwing exception, It should suppose to convert
into a Solr based format .

Can you suggest me how do I overcome that ?

Solr doesn't do any crawling, so you must be using another piece of software that talks to Solr, most likely Nutch. Date conversion will have to be handled in the program that feeds the data to Solr. Questions for Nutch will find the best support on the Nutch user mailing list.

http://nutch.apache.org/mailing_lists.html

I did some searching and found something that says you'd have to write some code:

http://stackoverflow.com/questions/10445095/nutch-solr-formatting-date-from-web-page-metadata-into-correct-solr-format

Note that if you can get a date into a java Date object, I'm fairly sure that you can get the properly formatted string for Solr with this java code, where dateObject is the Date in question:

SimpleDateFormat formatUTC = new SimpleDateFormat
("yyyy-MMM-dd'T'HH:mm:ss'Z'");
formatUTC.setTimeZone(TimeZone.getTimeZone("UTC"));
return formatUTC.format(dateObject);

Further searching has turned up a possible plugin for handling dates, but it does say that it has no support for full timestamps. This plugin has not been added to any released Nutch version, but the source code is attached to the issue:

https://issues.apache.org/jira/browse/NUTCH-1406

Thanks,
Shawn

Reply via email to