We shread the RSS into individual items then create Solr XML documents to insert. Solr is an easy choice for us over straight Lucene since it adds the server infrastructure that we would mostly be writing ourself - caching, data types, master/slave replication.
We looked at nutch too - but that was before my time. Jim John Martyniak-3 wrote: > > Thank you that is good information, as that is kind of way that I am > leaning. > > So when you fetch the content from RSS, does that get rendered to an > XML document that Solr indexes? > > Also what where a couple of decision points for using Solr as opposed > to using Nutch, or even straight Lucene? > > -John > > > > On Oct 22, 2008, at 11:22 AM, Jim Murphy wrote: > >> >> We index RSS content using our own home grown distributed spiders - >> not using >> Nutch. We use ruby processes do do the feed fetching and XML >> shreading, and >> Amazon SQS to queue up work packets to insert into our Solr cluster. >> >> Sorry can't be of more help. >> >> -- >> View this message in context: >> http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20113143.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > -- View this message in context: http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20114697.html Sent from the Solr - User mailing list archive at Nabble.com.