Here's the problem, at the end of the DIH file: <field column="pubdate" xpath="/rss/channel/item/pubDate" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" /> </entity>
This says "parse this timestamp into a Java Date object using this date-time spec". This string uses the UTC timestamp format that Solr reads. You need to change this date-format string to the format of your incoming timestamps. The JDK Date class and innumerable tutorials for it are online. Cheers, Lance Norskog On Sat, Dec 11, 2010 at 4:10 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Dates in Solr have a very specific format, see: > http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html > > Best > Erick > > On Sat, Dec 11, 2010 at 6:32 PM, Adam Estrada <estrada.adam.gro...@gmail.com >> wrote: > >> All, >> >> I am ingesting a lot of RSS feeds as part of my application and I keep >> getting the same error. >> >> WARNING: Could not parse a Date field >> java.text.ParseException: Unparseable date: "Mon, 06 Dec 2010 23:31:38 >> +0000" >> at java.text.DateFormat.parse(Unknown Source) >> at >> org.apache.solr.handler.dataimport.DateFormatTransformer.process(Date >> FormatTransformer.java:89) >> at >> org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow >> (DateFormatTransformer.java:69) >> at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransf >> ormer(EntityProcessorWrapper.java:195) >> at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent >> ityProcessorWrapper.java:241) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde >> r.java:357) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde >> r.java:383) >> at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j >> ava:242) >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java >> :180) >> at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo >> rter.java:331) >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j >> ava:389) >> at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja >> va:370) >> Dec 11, 2010 6:25:47 PM org.apache.solr.handler.dataimport.DocBuilder >> finish >> INFO: Import completed successfully >> Dec 11, 2010 6:25:47 PM org.apache.solr.update.DirectUpdateHandler2 commit >> INFO: start >> commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDelete >> s=false) >> >> Are there any tips or tricks to getting standard RSS <update> fields to >> import correctly? >> >> An example for a DIH config XML file is as follows: >> >> <entity name="CBS" >> pk="link" >> datasource="filedatasource" >> url="http://feeds.cbsnews.com/CBSNewsMain?format=xml" >> processor="XPathEntityProcessor" >> forEach="/rss/channel | /rss/channel/item" >> transformer="DateFormatTransformer,HTMLStripTransformer"> >> <field column="source" xpath="/rss/channel/title" >> commonField="true" /> >> <field column="source-link" xpath="/rss/channel/link" >> commonField="true" /> >> <field column="subject" xpath="/rss/channel/description" >> commonField="true" /> >> <field column="title" xpath="/rss/channel/item/title" /> >> <field column="link" xpath="/rss/channel/item/link" /> >> <field column="description" xpath="/rss/channel/item/description" >> stripHTML="true" /> >> <field column="creator" xpath="/rss/channel/item/creator" /> >> <field column="item-subject" xpath="/rss/channel/item/subject" /> >> <field column="author" xpath="/rss/channel/item/author" /> >> <field column="comments" xpath="/rss/channel/item/comments" /> >> <field column="pubdate" xpath="/rss/channel/item/pubDate" >> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" /> >> </entity> >> >> Any tips on this would be really appreciated as I need to query based on >> the >> date the article was published. >> >> Thanks, >> Adam >> > -- Lance Norskog goks...@gmail.com