On Fri, Sep 25, 2009 at 6:48 PM, Brahim Abdesslam <
brahim.abdess...@maecia.com> wrote:

> Hello everybody,
>
> we are using Solr to index some RSS feeds for a news agregator application.
>
> We've got some difficulties with the publication date of each item because
> each site use an homemade date format.
> The fact is that we want to have the exact amount of time between the date
> of publication and the time it is now.
>
>
The fact is that the RSS example is just that, an example. It was never
meant for production use and it does not handle the variety of date formats
found in the wild. If you want to index RSS feeds, it is best to use an RSS
parser to extract out the values. You can use the PlainTextEntityProcessor
to get the raw RSS feed and write a custom transformer which uses a rss
parsing library like rome to extract the various fields.


> So we decided to uses a timestamp that stores the index time for each item.
>
> The problem is :
>
>   * when i do a full-import&clean=false the index is always cleaned.

  * when i do a simple import, nothing seems to be done.
>

== snip ==


>
> - Tests :
>
> => command=full-import&clean=false
>
> 25-Sep-2009 14:58:21 org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> 25-Sep-2009 14:58:21 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
> status=0 QTime=6
>

See the above parameters. It has only one param: command=full-import. There
is no clean=false in there so I'm guessing the clean parameter never made it
to Solr. Can you check again?

-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to