Near real-time search of user data

Mark Ferguson Thu, 19 Feb 2009 09:14:45 -0800

Hi,

I am trying to come up with a strategy for a solr setup in which a user's
indexed data can be nearly immediately available to them for search. My
current strategy (which is starting to cause problems) is as follows:


  - each user has their own personal index (core), which gets committed
after each update
  - there is a main index which is basically an aggregate of all user
indexes. This index gets committed every 5 minutes or so.

In this way, I can search a user's personal index to get real-time results,
and concatenate the world results from the main index, which aren't as
important to be immediate.

This multicore strategy worked well in test scenarios but as the user
indexes get larger it is starting to fall apart as I run into memory issues
in maintaining too many cores. It's not realistic to dedicate a new machine
to every 5K-10K users and I think this is what I will have to do to maintain
the multicore strategy.

So I am hoping that someone will be able to provide some tips on how to
accomplish what I am looking for. One option is to simply send a commit to
the main index every couple seconds, but I was hoping someone with
experience could shed some light on whether this is a viable option before I
attempt that route (i.e. can commits be sent that frequently on a large
index?). The indexes are distributed but they could still be in the 2-100GB
range.

Thanks very much for any suggestions!

Mark

Near real-time search of user data

Reply via email to