AW: Threads in Solr

Hausherr, Jens Tue, 26 Feb 2008 08:39:26 -0800

It has been some time since I last worked with the Lucene index directly, but 
AFAIK the lucene index by default is not thread-safe which means it is propably 
wrapped in som synchronization layer.


Concerning the bad performance I can only guess on some items to examine:

1) Every thread performs a complete query. 
2) Assuming that the query takes time "t" to perform concludes that "n" threads 
will run (max) "n*t"
3) If your threads hit some synchronized method they are likely to queue at the 
synchronization barrier which might lead to "n*t" execution time.
4) The join statement at the end of your code snippet ensures that your request 
handler continues iff all threads have completed.
5) Vectors are synchronized - it might not be necessary to use a Vector for 
storing your threads (as far the code snippet is concerned at least - I see no 
concurrent access to the threads here) 

Personally I think that to profit from parallelization it would be necessary to 
segment the index to perform disjunct queries - I do not know whether solr odr 
lucene already support this feature...

/Jens

-----Ursprüngliche Nachricht-----
Von: Evgeniy Strokin [mailto:[EMAIL PROTECTED] 
Gesendet: Dienstag, 26. Februar 2008 16:57
An: solr-user@lucene.apache.org
Betreff: Re: Threads in Solr

I'm running my tests on server with 4 double-kernel CPU. I was expecting good 
improvements from multithreaded solution but I have speed 10th times worse. 
Here is how I run those threads, I think I'm doing something wrong, please 
advise:
 
------------------------------------------
............. code truncated .............
 
public class MultiFacetRequestHandler extends StandardRequestHandler {

    protected NamedList getFacetInfo(SolrQueryRequest req,
                                     SolrQueryResponse rsp,
                                     DocSet mainSet) {
        SimpleFacets f = new SimpleFacets(req.getSearcher(),
                mainSet,
                req.getParams());
        NamedList facetInfo = f.getFacetCounts(); ////////////////// This is 
custom code for multi facets
        SolrParams p = req.getParams();
        String fl = p.get(SolrParams.FL);
        int flags = 0;
        if (fl != null)
            flags |= SolrPluginUtils.setReturnFields(fl, rsp);
        Query query = QueryParsing.parseQuery(p.required().get(SolrParams.Q),
                p.get(SolrParams.DF), p, req.getSchema());
        try {
                NamedList facetFields = (NamedList) 
facetInfo.get("facet_fields");
                if (facetFields.size() == 2) {
                    String shortFldName = facetFields.getName(0);
                    NamedList shortFld = (NamedList) facetFields.getVal(0);
                    NamedList longFld = (NamedList) facetFields.getVal(1);
                    if (shortFld.size() > longFld.size()) {
                        shortFld = longFld;
                        shortFldName = facetFields.getName(1);
                    }
                    List<Query> filters = 
SolrPluginUtils.parseFilterQueries(req);
                    if (filters == null) filters = new LinkedList<Query>();
                    SolrIndexSearcher s = req.getSearcher();
                    Vector<Thread> threads = new Vector<Thread>();
                    Thread thread;
                    for (int i = 0; i < shortFld.size(); i++) {
                        SolrQueryParser qp = new SolrQueryParser(s.getSchema(), 
null);
                        Query q = qp.parse(shortFldName + ":\"" + 
shortFld.getName(i)+"\"");
                        List<Query> fltrs=new LinkedList<Query>();
                        fltrs.addAll(filters);
                        fltrs.add(q);
                        thread = new 
Thread(makeRunnable(s,query,fltrs,flags,p,shortFld.getName(i),facetFields));
                        threads.add(thread);
                        thread.start();
                    }
                    for (Thread thread1 : threads) {
                        thread1.join();
                    }
                }
        } catch (Exception e) {
            SolrException.logOnce(SolrCore.log, "Exception in multi faceting", 
e);
        }
///////////////////////////////////////////////
        return facetInfo;
    }
 
    public Runnable makeRunnable(final SolrIndexSearcher s, final Query query, 
final List<Query> filters, final int flags, final SolrParams p, final String 
shrtName, final NamedList facetFields) {
        return new Runnable() {
            public void run() {
                try{
                    DocListAndSet matrixRes = s.getDocListAndSet(query, 
filters, null, 0, 0, flags);
                    NamedList matr = new 
SimpleFacets(s,matrixRes.docSet,p).getFacetCounts();
                    facetFields.add(shrtName, matr.get("facet_fields"));
                }catch (Exception e){
                     SolrException.logOnce(SolrCore.log, "Exception in multi 
faceting", e);
                }
            }
        };
    }
............. code truncated .............
}
 

 



----- Original Message ----
From: Chris Hostetter <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, February 26, 2008 2:55:36 AM
Subject: Re: Threads in Solr

: Yes I do computing the same DocSet. Should it be the problem? Is any way to 
solve it?
: In general in each thread I ran the same query and add different Filter 
Query. 

it's not neccessarily a problem, it's just that you may not get much benefit 
from prallelization if all of the worker threads are doing the same work 
simulteneously.

but like i said:  without knowing exactly what your threading code looks like, 
it's hard to guess what might be wrong (and even if i was looking right at your 
multithreaded code, it wouldn't neccessarily be obvious to me, my 
multi-threading knowledge is mediocre) and it's still not clear if you are 
testing on hardware that can actually take advantage of parallelization.


-Hoss

This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.

AW: Threads in Solr

Reply via email to