Re: Near real-time search of user data

Otis Gospodnetic Thu, 19 Feb 2009 19:24:04 -0800

I've used a similar strategy for Simpy.com, but with raw Lucene and not Solr.  
The crucial piece is to close (inactive) user indices periodically and thus 
free the memory.  Are you doing the same with your per-user Solr cores and 
still running into memory issues?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Mark Ferguson <mark.a.fergu...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, February 20, 2009 1:14:15 AM
> Subject: Near real-time search of user data
> 
> Hi,
> 
> I am trying to come up with a strategy for a solr setup in which a user's
> indexed data can be nearly immediately available to them for search. My
> current strategy (which is starting to cause problems) is as follows:
> 
>   - each user has their own personal index (core), which gets committed
> after each update
>   - there is a main index which is basically an aggregate of all user
> indexes. This index gets committed every 5 minutes or so.
> 
> In this way, I can search a user's personal index to get real-time results,
> and concatenate the world results from the main index, which aren't as
> important to be immediate.
> 
> This multicore strategy worked well in test scenarios but as the user
> indexes get larger it is starting to fall apart as I run into memory issues
> in maintaining too many cores. It's not realistic to dedicate a new machine
> to every 5K-10K users and I think this is what I will have to do to maintain
> the multicore strategy.
> 
> So I am hoping that someone will be able to provide some tips on how to
> accomplish what I am looking for. One option is to simply send a commit to
> the main index every couple seconds, but I was hoping someone with
> experience could shed some light on whether this is a viable option before I
> attempt that route (i.e. can commits be sent that frequently on a large
> index?). The indexes are distributed but they could still be in the 2-100GB
> range.
> 
> Thanks very much for any suggestions!
> 
> Mark

Re: Near real-time search of user data

Reply via email to