Re: Possibilities of (near) real time search with solr

Peter Sturge Tue, 16 Nov 2010 01:57:22 -0800

Hi Peter,

First off, many thanks for putting together the NRT Wiki page!

This may have changed recently, but the NRT stuff - e.g. per-segment
commits etc. is for the latest Solr 4 trunk only.
If your setup uses the 3x Solr code branch, then there's a bit of work
to do to move to the new version.
Some of this is due to the new 3.x Lucene, which has a lot of cool new
stuff in it, but also deprecates a lot of old stuff,
so existing SolrJ clients and custom server-side code/configuration
will need to take this into account.
We've not had the time to do this, so that's about as far as I can go
on that one for now.

We have had some very good success with distributed/shard searching -
i.e. 'new' data arrives in a relatively small index, and so can remain
fast, whilst distributed shards hold 'older' data and so can keep
their caches warm (i.e. very few/no commits). This works particularly
well for summary data  (facets, filter queries etc. that sit in
caches) .
Be careful about merging, as all involved cores will pause for the
merging period. Really needs to be done out-of-hours, or better still,
offline (i.e. replicate the cores, then merge, then bring them live).
The trickiest bit about the above is defining when data is deemed to
be 'old' and then moving that data in an efficient manner to a
read-only shard. Using SolrJ can help in this regard as it can offload
some of the administration from the server(s).

Thanks,
Peter

On Mon, Nov 15, 2010 at 8:06 PM, Peter Karich <peat...@yahoo.de> wrote:
> Hi,
>
> I wanted to provide my indexed docs (tweets) relative fast: so 1 to 10 sec
> or even 30 sec would be ok.
>
> At the moment I am using the read only core scenario described here (point
> 5)*
> with a commit frequency of 180 seconds which was fine until some days. (I am
> using solr1.4.1)
> Now the time a commit takes is too high (40-80s) and too CPU-heavy because
> the index is too large >7GB.
>
> I thought about some possible solutions:
> 1. using solr NRT patches**
> 2. using shards (+ multicore) where I feed into a relative small core and
> merges them later (every hour or so) to reduce the number of cores
> 3. It would be also nice if someone could explain what and if there are
> benefits when using solr4.0 ...
>
> The problem for 1. is that I haven't found a guide how to apply all the
> patches. Or is NRT not possible at the moment with solr? Does anybody has a
> link for me?
>
> Then I looked into solution 2. It seems to me that the CPU- and
> administration-overhead of sharding can be quite high. Any hints (I am using
> SolrJ)? E.g. I need to include the date facet patch
>
> Or how would you solve this?
>
> Regards,
> Peter.
>
> *
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201009.mbox/%3caanlktincgekjlbxe_bsaahlct_hlr_kwuxm5zxovt...@mail.gmail.com%3e
>
> **
> https://issues.apache.org/jira/browse/SOLR-1606
>
>
> --
> http://jetwick.com twitter search prototype
>

Re: Possibilities of (near) real time search with solr

Reply via email to