Re: High add/delete rate and index fragmentation

Lance Norskog Thu, 03 Dec 2009 16:07:45 -0800

#1: Yes, compared to relational DBs, Solr/Lucene in general are biased
towards slow indexing and fast queries. It automatically merges
segments and keeps fragmentation down. The rate of merging can be
controlled.


#2: The standard architecture is with a master that only does indexing
and one or more slaves that only handle queries. The slaves poll the
master for index updates regularly. Java 1.4 has a built-in system for
this.

#3: The standard architecture puts the query servers behind a load
balancer. It's the load balancer's job to watch for query servers
coming on and off line.

An alternate architecture has multiple servers which do both indexing
and queries in the same index. This provides the shortest "pipeline"
time from recieving the data to making it available for search.

On Wed, Dec 2, 2009 at 2:43 PM, Jason Rutherglen
<jason.rutherg...@gmail.com> wrote:
> Rodrigo,
>
> It sounds like you're asking about near realtime search support,
> I'm not sure.  So here's few ideas.
>
> #1 How often do you need to be able to search on the latest
> updates (as opposed to updates from lets say, 10 minutes ago)?
>
> To topic #2, Solr provides master slave replication. The
> optimize would happen on the master and the new index files
> replicated to the slave(s).
>
> #3 is a mixed bag at this point, and there is no official
> solution, yet. Shell scripts, and a load balancer could kind of
> work. Check out SOLR-1277 or SOLR-1395 for progress along these
> lines.
>
> Jason
> On Wed, Dec 2, 2009 at 11:53 AM, Rodrigo De Castro <rodr...@sacaluta.com> 
> wrote:
>> We are considering Solr to store events which will be added and deleted from
>> the index in a very fast rate. Solr will be used, in this case, to find the
>> right event we need to process (since they may have several attributes and
>> we may search the best match based on the query attributes). Our
>> understanding is that the common use cases are those wherein the read rate
>> is much higher than writes, and deletes are not as frequent, so we are not
>> sure Solr handles our use case very well or if it is the right fit. Given
>> that, I have a few questions:
>>
>> 1 - How does Solr/Lucene degrade with the fragmentation? That would probably
>> determine the rate at which we would need to optimize the index. I presume
>> that it depends on the rate of insertions and deletions, but would you have
>> any benchmark on this degradation? Or, in general, how has been your
>> experience with this use case?
>>
>> 2 - Optimizing seems to be a very expensive process. While optimizing the
>> index, how much does search performance degrade? In this case, having a huge
>> degradation would not allow us to optimize unless we switch to another copy
>> of the index while optimize is running.
>>
>> 3 - In terms of high availability, what has been your experience detecting
>> failure of master and having a slave taking over?
>>
>> Thanks,
>> Rodrigo
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: High add/delete rate and index fragmentation

Reply via email to