We shread the RSS into individual items then create Solr XML documents to
insert.  Solr is an easy choice for us over straight Lucene since it adds
the server infrastructure that we would mostly be writing ourself - caching,
data types, master/slave replication.

We looked at nutch too - but that was before my time.

Jim



John Martyniak-3 wrote:
> 
> Thank you that is good information, as that is kind of way that I am  
> leaning.
> 
> So when you fetch the content from RSS, does that get rendered to an  
> XML document that Solr indexes?
> 
> Also what where a couple of decision points for using Solr as opposed  
> to using Nutch, or even straight Lucene?
> 
> -John
> 
> 
> 
> On Oct 22, 2008, at 11:22 AM, Jim Murphy wrote:
> 
>>
>> We index RSS content using our own home grown distributed spiders -  
>> not using
>> Nutch.  We use ruby processes do do the feed fetching and XML  
>> shreading, and
>> Amazon SQS to queue up work packets to insert into our Solr cluster.
>>
>> Sorry can't be of more help.
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20113143.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20114697.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to