Thanks for the fast reply. Wow this seems a very active community.

I have a few more questions in that case:

1) If Solr is going to be file-based, os it then preferable to run multiple
Solrs with Shards? How can I determine what capacity 1 Solr can cope?

2) I am presuming there is already tokenizers for hypertext and xml in Solr
so that it can use extract the right information out?

3) I need to also get the 'author' information out for things like blogs. I
guess theres no universal way of doing it and I have to have someone
manually go through the documents and feed the solr index with the author
information?

When you mention 'write a loader script...', do you mean I should
incorporate the date checking in the loader script? Solr has no internal way
of checking the timestamp in a document and updating?



Thanks,

Nayeem

2009/4/24 Eric Pugh <ep...@opensourceconnections.com>

> It seems like you have three components to your system:
>
> 1) Data indexing from multiple sources
>
> 2) Search for specific words in documents
>
> 3) Preserve rating and search term.
>
> I think that Solr comes into play on #1 and #2.  You can index content in
> any number of approaches, either via the new DataImportHandler architecture,
> or the more traditional write a loader script that puts the documents in
> Solr.  You can store in Solr when a document was indexed, and use that to
> check against the original documents to see if they changed.  Check a last
> published tag on an RSS feed, or the last updated time on a physical file.
>  This is a very common use case for Solr.
>
> For #2, you could have users issue queries, and make them "favorites",
> storing them in the DB.  Assuming they like the results they mark the
> documents with the ratings, which you could store in Solr, but I would put
> in a DB..  Easier to manage User A says 1, User B says 0.
>
> Then for the UI, just issue the search baseed on queries stored in the db,
> and match the id's up with the ranking in the DB.  Simple!
>
> As far as the last part, Solr works best in filesystem, that is part of why
> it is so fast, no clunky SQL.  There are scripts for backing up and
> restoring indexes that you can use, check the wiki
> http://wiki.apache.org/solr/SolrOperationsTools.
>
> Eric
>
>
>
>
> On Apr 24, 2009, at 6:18 AM, Developer In London wrote:
>
>  Hi All,
>>
>> I am new to the whole Solr/Lucene community. But I think this might be the
>> solution ot what I am looking to do. I would appreciate any feedback on
>> how
>> I can go about doing this with Solr:
>>
>> I am looking to make a system where -
>> a) mainly lots of different blog sites, web journals, articles are indexed
>> on a regular basis. Data that has already been indexed needs to be
>> revisited
>> to see if there are any changes.
>> b) The end users has very fixed search terms, eg 'Lloyds TSB' and
>> 'Corporate
>> Banking'. All the documents that are found matching this are presented to
>> a
>> human to analyse.
>> c) Once the human analyses the document he gives it a rating of 1, 0 or
>> -1.
>> This rating needs to be saved somewhere and be linked with the specific
>> document and also with the search term (eg 'Lloyds TSB' & 'Corporate
>> Banking' in this case).
>> d) End users can then see these documents with the ratings next to them.
>>
>> What would be the best approach to this?
>>
>> Should I set up a different database to save the rating and relevant
>> mappings, or is there any way to put it in to Solr?
>>
>> My 2nd question is, can Solr Index be saved in a database in any way?
>> Whats
>> the backup and recovery method on Solr?
>>
>> Thanks in advance.
>>
>> Nayeem
>>
>
> -----------------------------------------------------
> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com
> Free/Busy: http://tinyurl.com/eric-cal
>
>
>
>
>


-- 
cashflowclublondon.co.uk

                      ("`-''-/").___..--''"`-._
                       `6_ 6  )   `-.  (     ).`-.__.`)
                       (_Y_.)'  ._   )  `._ `. ``-..-'
                     _..`--'_..-_/  /--'_.' ,'
                    (il),-''  (li),'  ((!.-'
.

Reply via email to