Thanks for the fast reply. Wow this seems a very active community. I have a few more questions in that case:
1) If Solr is going to be file-based, os it then preferable to run multiple Solrs with Shards? How can I determine what capacity 1 Solr can cope? 2) I am presuming there is already tokenizers for hypertext and xml in Solr so that it can use extract the right information out? 3) I need to also get the 'author' information out for things like blogs. I guess theres no universal way of doing it and I have to have someone manually go through the documents and feed the solr index with the author information? When you mention 'write a loader script...', do you mean I should incorporate the date checking in the loader script? Solr has no internal way of checking the timestamp in a document and updating? Thanks, Nayeem 2009/4/24 Eric Pugh <ep...@opensourceconnections.com> > It seems like you have three components to your system: > > 1) Data indexing from multiple sources > > 2) Search for specific words in documents > > 3) Preserve rating and search term. > > I think that Solr comes into play on #1 and #2. You can index content in > any number of approaches, either via the new DataImportHandler architecture, > or the more traditional write a loader script that puts the documents in > Solr. You can store in Solr when a document was indexed, and use that to > check against the original documents to see if they changed. Check a last > published tag on an RSS feed, or the last updated time on a physical file. > This is a very common use case for Solr. > > For #2, you could have users issue queries, and make them "favorites", > storing them in the DB. Assuming they like the results they mark the > documents with the ratings, which you could store in Solr, but I would put > in a DB.. Easier to manage User A says 1, User B says 0. > > Then for the UI, just issue the search baseed on queries stored in the db, > and match the id's up with the ranking in the DB. Simple! > > As far as the last part, Solr works best in filesystem, that is part of why > it is so fast, no clunky SQL. There are scripts for backing up and > restoring indexes that you can use, check the wiki > http://wiki.apache.org/solr/SolrOperationsTools. > > Eric > > > > > On Apr 24, 2009, at 6:18 AM, Developer In London wrote: > > Hi All, >> >> I am new to the whole Solr/Lucene community. But I think this might be the >> solution ot what I am looking to do. I would appreciate any feedback on >> how >> I can go about doing this with Solr: >> >> I am looking to make a system where - >> a) mainly lots of different blog sites, web journals, articles are indexed >> on a regular basis. Data that has already been indexed needs to be >> revisited >> to see if there are any changes. >> b) The end users has very fixed search terms, eg 'Lloyds TSB' and >> 'Corporate >> Banking'. All the documents that are found matching this are presented to >> a >> human to analyse. >> c) Once the human analyses the document he gives it a rating of 1, 0 or >> -1. >> This rating needs to be saved somewhere and be linked with the specific >> document and also with the search term (eg 'Lloyds TSB' & 'Corporate >> Banking' in this case). >> d) End users can then see these documents with the ratings next to them. >> >> What would be the best approach to this? >> >> Should I set up a different database to save the rating and relevant >> mappings, or is there any way to put it in to Solr? >> >> My 2nd question is, can Solr Index be saved in a database in any way? >> Whats >> the backup and recovery method on Solr? >> >> Thanks in advance. >> >> Nayeem >> > > ----------------------------------------------------- > Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com > Free/Busy: http://tinyurl.com/eric-cal > > > > > -- cashflowclublondon.co.uk ("`-''-/").___..--''"`-._ `6_ 6 ) `-. ( ).`-.__.`) (_Y_.)' ._ ) `._ `. ``-..-' _..`--'_..-_/ /--'_.' ,' (il),-'' (li),' ((!.-' .