Re: Threads in Solr

Chris Hostetter Sat, 01 Mar 2008 20:39:10 -0800

: I'm running my tests on server with 4 double-kernel CPU. I was expecting 
: good improvements from multithreaded solution but I have speed 10th 
: times worse. Here is how I run those threads, I think I'm doing 
: something wrong, please advise:


As i said, i'm not much of a threads expert, but the one piece of advice i 
do remember from someone else is that Thread instantiation is expensive, 
and it's better to use Executor pools.

independent of the why the multithreaded version is slower then the single 
threaded version however, a couple of things jump out at me...

1) there's no reason i can think of to use SolrQueryParser, the facet 
values are already the indexed form so you can just make TermQueries 
directly

2) you are calling getDocListAndSet even though you only need the DocSet 
part to give to SimpleFacets ... just using getDocSet should be faster.

3) the code as written re-executes the main query using a "List<Query> 
filters" that is unique for each thread because it adds one new filter ... 
ultimately all you care about is the docset, so instead of reexecuting 
that main query over and over, you can just find the DocSet for that 
single new filter, and compute the intersection with mainSet (which has 
already factoried in the other filters) and give that to SimpleFacets. 
something like this (single threaded) should work...

  ... // you already have: mainSet, facetFields, shortFld, shortFldName
  for (int i = 0; i < shortFld.size(); i++) { 
    Query q= new TermQuery(shortFldName, shortFld.getName(i));
    DocSet d = s.getDocSet(q, mainSet);
    NamedList tmp = new SimpleFacets(s,d,p).getFacetCounts();
    facetFields.add(shrtName, tmp.get("facet_fields");
  }

...as long as you don't ask Solr to compute the DocList for all of those 
permutations (since you don't need them anyway) everything should either 
already be in the filterCache, or be a set intersection and should be 
crazy freaking fast.

In fact: i'm wondering if the slowdown in performance you were seeing was 
because the parallel execution was causing cache evictions ... without 
changing anycode, if you startup your solr port, hit your custom request 
handler one time, and then look at the stats page, do you see a non-zero 
value for "evictions" in any of the caches? ... is the number higher or 
lower when you do the same test with your multi-threaded version?  

having caches that are too small might be the full explanation of why the 
threaded version is slower, but like i said: you should be able to get a 
lot of speed ups just be ditching the DocList method.



-Hoss

Re: Threads in Solr

Reply via email to