Re: exceeded limit of maxWarmingSearchers ERROR

Nagendra Nagarajayya Tue, 16 Aug 2011 18:28:54 -0700

Naveen:

See below:

*NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
document to become searchable*. Any document that you add through update
becomes  immediately searchable. So no need to commit from within your
update client code.  Since there is no commit, the cache does not have to be
cleared or the old searchers closed or  new searchers opened, and warmed
(error that you are facing).



Looking at the link which you mentioned is clearly what we wanted. But the
real thing is that you have "RA does need a commit for  a document to become
searchable" (please take a look at bold sentence) .

Yes, as said earlier you do not need a commit. A document becomessearchable as soon as you add it. Below is an example of adding adocument with curl (this from the wiki athttp://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x):


curl 
"http://localhost:8983/solr/update/csv?stream.file=/tmp/x1.csv&encapsulator=%1f";

There is no commit included. The contents of the document becomeimmediately searchable.

In future, for more loads, can it cater to Master Slave (Replication) and
etc to scale and perform better? If yes, we would like to go for NRT and
looking at the performance described in the article is acceptable. We were
expecting the same real time performance for a single user.

There are no changes to Master/Slave (replication) process. So anychanges you have currently will work as before or if you enablereplication later, it should still work as without NRT.

What about multiple users, should we wait for 1-2 secs before calling the
curl request to make SOLR perform better. Or internally it will handle with
multiple request (multithreaded and etc).

Again for updating documents, you do not have to change your currentprocess or code. Everything remains the same, except that if you wereincluding commit, you do not include commit in your update statements.There is no change to the existing update process so internally it willnot queue or multi-thread updates. It is as in existing Solrfunctionality, there no changes to the existing setup.

Regarding perform better, in the Wiki paper every update through curladds (streams) 500 documents. So you could take this approach. (this wassomething that I chose randomly to test the performance but seems to begood)

What would be doc size (10,000 docs) to allow JVM perform better? Have you
done any kind of benchmarking in terms of multi threaded and multi user for
NRT and also JVM tuning in terms of SOLR sever performance. Any kind of
performance analysis would help us to decide quickly to switch over to NRT.

The performance discussed in the wiki paper uses the MBArtists index.The MBArtists index is the index used as one of the examples in thebook, Solr 1.4 Enterprise Search Server. You can download and build thisindex if you have the book or can also download the contents frommusicbrainz.org. Each doc maybe about 100 bytes and has about 7 fields.Performance with wikipedia's xml dump, commenting out skipdoc field(include redirects) in the dataconfig.xml [ dataimport handler ], theupdate performance is about 15000 docs / sec (100 million docs), withthe skipdoc enabled (does not skip redirects), the performance is about1350 docs / sec [ time spent mostly converting validating/xml thanactual update ] (about 11 million docs ). Documents in wikipedia can bequite big, at least avg size of about 2500-5000 bytes or more.

I would suggest that you download and give NRT with Apache Solr 3.3 andRankingAlgorithm a try and get a feel of it as this would be the bestway to see how your config works with it.

Questions in terms for switching over to NRT,


1.Should we upgrade to SOLR 4.x ?

2. Any benchmarking (10,000 docs/secs).  The question here is more specific

the detail of individual doc (fields, number of fields, fields size,
parameters affecting performance with faceting or w/o faceting)


Please see the MBArtists index as discussed above.

3. What about multiple users ?

A user in real time might be having an large doc size of .1 million. How to
break and analyze which one is better (though it is our task to do). But
still any kind of break up will help us. Imagine a user inbox.

You maybe able to stream the documents in a set as in the example in thewiki. The example streams 500 documents at a time. The wiki paper has anexample of a document that was used. You could copy/paste that to try itout.

4. JVM tuning and performance result based on Multithreaded environment.

5. Machine Details (RAM, CPU, and settings from SOLR perspective).

Default Solr settings with the shipped jetty container. The startupscript used is available when you download Solr 3.3 withRankingAlgorithm. It has mx set to 2Gb and uses the default collectorwith parallel collection enabled for the young generation. The systemis a x86_64 Linux (2.6 kernel), 2 core (2.5Ghz) and uses internal disksfor indexing.

My suggestion would be to download a version of Solr 3.3 withRankingAlgorithm and give it a try to see if any changes are needed fromyour existing setup.


Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

Hoping that you are getting my point. We want to benchmark the performance.
If you can involve me in your group, that would be great.

Thanks
Naveen



2011/8/15 Nagendra Nagarajayya<nnagaraja...@transaxtions.com>

Bill:

I did look at Marks performance tests. Looks very interesting.

Here is the Apacle Solr 3.3 with RankingAlgorithm NRT performance:
http://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.x<http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x>


Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.**org<http://rankingalgorithm.tgels.org>



On 8/14/2011 7:47 PM, Bill Bell wrote:

I understand.

Have you looked at Mark's patch? From his performance tests, it looks
pretty good.

When would RA work better?

Bill


On 8/14/11 8:40 PM, "Nagendra Nagarajayya"<nnagarajayya@**
transaxtions.com<nnagaraja...@transaxtions.com>>
wrote:

  Bill:

The technical details of the NRT implementation in Apache Solr with
RankingAlgorithm (SOLR-RA) is available here:

http://solr-ra.tgels.com/**papers/NRT_Solr_**RankingAlgorithm.pdf<http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf>

(Some changes for Solr 3.x, but for most it is as above)

Regarding support for 4.0 trunk, should happen sometime soon.

Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.**org<http://rankingalgorithm.tgels.org>





On 8/14/2011 7:11 PM, Bill Bell wrote:

OK,

I'll ask the elephant in the roomŠ.

What is the difference between the new UpdateHandler from Mark and the
SOLR-RA?

The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk?

Pros/Cons?


On 8/14/11 8:10 PM, "Nagendra
Nagarajayya"<nnagarajayya@**transaxtions.com<nnagaraja...@transaxtions.com>
wrote:

  Naveen:

NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
document to become searchable. Any document that you add through update
becomes  immediately searchable. So no need to commit from within your
update client code.  Since there is no commit, the cache does not have
to be cleared or the old searchers closed or  new searchers opened, and
warmed (error that you are facing).

Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.**org<http://rankingalgorithm.tgels.org>



On 8/14/2011 10:37 AM, Naveen Gupta wrote:

Hi Mark/Erick/Nagendra,

I was not very confident about NRT at that point of time, when we
started
project almost 1 year ago, definitely i would try NRT and see the
performance.

The current requirement was working fine till we were using
commitWithin 10
millisecs in the XMLDocument which we were posting to SOLR.

But due to which, we were getting very poor performance (almost 3 mins
for
15,000 docs) per user. There are many paraller user committing to our
SOLR.

So we removed the commitWithin, and hence performance was much much
better.

But then we are getting this maxWarmingSearcher Error, because we are
committing separately as a curl request after once entire doc is
submitted
for indexing.

The question here is what is difference between commitWithin and
commit
(apart from the fact that commit takes memory and processes and
additional
hardware usage)

Why we want it to be visible as soon as possible, since we are
applying
many
business rules on top of the results (older indexes as well as new
one)
and
apply different filters.

upto 5 mins is fine for us. but more than that we need to think then
other
optimizations.

We will definitely try NRT. But please tell me other options which we
can
apply in order to optimize.?

Thanks
Naveen


On Sun, Aug 14, 2011 at 9:42 PM, Erick
Erickson<erickerickson@gmail.**com<erickerick...@gmail.com>>wrote:

  Ah, thanks, Mark... I must have been looking at the wrong JIRAs.

Erick

On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller<markrmil...@gmail.com>
wrote:

On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote:

  You either have to go to near real time (NRT), which is under

development, but not committed to trunk yet

NRT support is committed to trunk.

- Mark Miller
lucidimagination.com

Re: exceeded limit of maxWarmingSearchers ERROR

Reply via email to