Here's the problem, at the end of the DIH file:
       <field column="pubdate"      xpath="/rss/channel/item/pubDate"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
     </entity>

This says "parse this timestamp into a Java Date object using this
date-time spec". This string uses the UTC timestamp format that Solr
reads. You need to change this date-format string to the format of
your incoming timestamps. The JDK Date class and innumerable tutorials
for it are online.

Cheers,

Lance Norskog

On Sat, Dec 11, 2010 at 4:10 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> Dates in Solr have a very specific format, see:
> http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html
>
> Best
> Erick
>
> On Sat, Dec 11, 2010 at 6:32 PM, Adam Estrada <estrada.adam.gro...@gmail.com
>> wrote:
>
>> All,
>>
>> I am ingesting a lot of RSS feeds as part of my application and I keep
>> getting the same error.
>>
>> WARNING: Could not parse a Date field
>> java.text.ParseException: Unparseable date: "Mon, 06 Dec 2010 23:31:38
>> +0000"
>>        at java.text.DateFormat.parse(Unknown Source)
>>        at
>> org.apache.solr.handler.dataimport.DateFormatTransformer.process(Date
>> FormatTransformer.java:89)
>>        at
>> org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow
>> (DateFormatTransformer.java:69)
>>        at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransf
>> ormer(EntityProcessorWrapper.java:195)
>>        at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>> ityProcessorWrapper.java:241)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:357)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:383)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
>> ava:242)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>> :180)
>>        at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
>> rter.java:331)
>>        at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
>> ava:389)
>>        at
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
>> va:370)
>> Dec 11, 2010 6:25:47 PM org.apache.solr.handler.dataimport.DocBuilder
>> finish
>> INFO: Import completed successfully
>> Dec 11, 2010 6:25:47 PM org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: start
>> commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDelete
>> s=false)
>>
>> Are there any tips or tricks to getting standard RSS <update> fields to
>> import correctly?
>>
>> An example for a DIH config XML file is as follows:
>>
>>      <entity name="CBS"
>>        pk="link"
>>        datasource="filedatasource"
>>        url="http://feeds.cbsnews.com/CBSNewsMain?format=xml";
>>        processor="XPathEntityProcessor"
>>        forEach="/rss/channel | /rss/channel/item"
>>        transformer="DateFormatTransformer,HTMLStripTransformer">
>>         <field column="source"       xpath="/rss/channel/title"
>> commonField="true" />
>>        <field column="source-link"  xpath="/rss/channel/link"
>>  commonField="true" />
>>        <field column="subject"      xpath="/rss/channel/description"
>> commonField="true" />
>>        <field column="title"        xpath="/rss/channel/item/title" />
>>        <field column="link"         xpath="/rss/channel/item/link" />
>>        <field column="description"  xpath="/rss/channel/item/description"
>> stripHTML="true" />
>>        <field column="creator"      xpath="/rss/channel/item/creator" />
>>        <field column="item-subject" xpath="/rss/channel/item/subject" />
>>        <field column="author"       xpath="/rss/channel/item/author" />
>>        <field column="comments"     xpath="/rss/channel/item/comments" />
>>        <field column="pubdate"      xpath="/rss/channel/item/pubDate"
>> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
>>      </entity>
>>
>> Any tips on this would be really appreciated as I need to query based on
>> the
>> date the article was published.
>>
>> Thanks,
>> Adam
>>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to