Re: Near real-time search of user data

Mark Ferguson Thu, 19 Feb 2009 21:10:50 -0800

Thanks Noble and Otis for your suggestions.

After reading more messages on the mailing list relating to this problem, I
decided to implement one suggestion which was to keep an archive index and a
smaller delta index containing only recent updates, then do a distributed
search across them. The delta index is small so can handle rapid commits
(every 1-2 seconds). This setup works well for my architecture because it is
easy to keep track of recent changes in the database and then send those to
the archive index every hour or so, then clear out the delta.


I really like your ideas about closing inactive indexes when using a
multicore setup; having too many indexes open was definitely the issue
plaguing me. Thanks for your great ideas and the time you take on this
project!

Mark



On Thu, Feb 19, 2009 at 9:31 PM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.p...@gmail.com> wrote:

> we have a similar usecase and I have raised an issue for the same
> (SOLR-880)
> currently we are using an internal patch and we hopw to submit one soon.
>
> we also use an LRU based automatic loading unloading feature. if a
> request comes up for a core that is 'STOPPED' . the core is 'STARTED'
> and the request is served.
>
> We  keep an upper limit of the no:of cores to be kept loaded and if
> the limit is crossed, a least recently used core is 'STOPPED' .
>
> --Noble
>
>
> On Fri, Feb 20, 2009 at 8:53 AM, Otis Gospodnetic
> <otis_gospodne...@yahoo.com> wrote:
> >
> > I've used a similar strategy for Simpy.com, but with raw Lucene and not
> Solr.  The crucial piece is to close (inactive) user indices periodically
> and thus free the memory.  Are you doing the same with your per-user Solr
> cores and still running into memory issues?
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: Mark Ferguson <mark.a.fergu...@gmail.com>
> >> To: solr-user@lucene.apache.org
> >> Sent: Friday, February 20, 2009 1:14:15 AM
> >> Subject: Near real-time search of user data
> >>
> >> Hi,
> >>
> >> I am trying to come up with a strategy for a solr setup in which a
> user's
> >> indexed data can be nearly immediately available to them for search. My
> >> current strategy (which is starting to cause problems) is as follows:
> >>
> >>   - each user has their own personal index (core), which gets committed
> >> after each update
> >>   - there is a main index which is basically an aggregate of all user
> >> indexes. This index gets committed every 5 minutes or so.
> >>
> >> In this way, I can search a user's personal index to get real-time
> results,
> >> and concatenate the world results from the main index, which aren't as
> >> important to be immediate.
> >>
> >> This multicore strategy worked well in test scenarios but as the user
> >> indexes get larger it is starting to fall apart as I run into memory
> issues
> >> in maintaining too many cores. It's not realistic to dedicate a new
> machine
> >> to every 5K-10K users and I think this is what I will have to do to
> maintain
> >> the multicore strategy.
> >>
> >> So I am hoping that someone will be able to provide some tips on how to
> >> accomplish what I am looking for. One option is to simply send a commit
> to
> >> the main index every couple seconds, but I was hoping someone with
> >> experience could shed some light on whether this is a viable option
> before I
> >> attempt that route (i.e. can commits be sent that frequently on a large
> >> index?). The indexes are distributed but they could still be in the
> 2-100GB
> >> range.
> >>
> >> Thanks very much for any suggestions!
> >>
> >> Mark
> >
> >
>
>
>
> --
> --Noble Paul
>

Re: Near real-time search of user data

Reply via email to