Re: Solr 4.1 over Websphere errors
Thank you This sure is a lot to chew on -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-1-over-Websphere-errors-tp4068715p4068740.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can't find solr.xml
Nabeel, I just want to say, that though this post is very old, in the entire internet of this error, your suggestion of moving out of /home//solr into /opt/solr was the one that worked for me too Thank you! Anria -- View this message in context: http://lucene.472066.n3.nabble.com/Can-t-find-solr-xml-tp3992267p4068768.html Sent from the Solr - User mailing list archive at Nabble.com.
&fq degrades qtime in a 20million doc collection
hi all, I have a Really fun question to ask. I'm sitting here looking at what is by far the beefiest box I've ever seen in my life. 256GB of RAM, extreme TerraBytes of disc space, the works. Linux server properly partitioned Yet, what we are seeing goes against all intuition I've built up in the Solr world 1. Collection has 20-30 million docs. 2. q=*&fq=someField:SomeVal ---> takes 2.5 seconds 3.q=someField:SomeVal --> 300ms 4. as numFound -> infinity, qtime -> infinity. have any of you encountered such a thing? that FQ degrades query time by so much? it's pure Solr 5.3.1. ZK + Tomcat 8 + 1shard in solr. JDK_8u60 All running on this same box. We have already tested different autoCommit strategies, and different values for heap size, starting at 16GB, 32GB, 64GB, 128GB ...The only place we saw a 100ms improvement was between 32 - -Xmx=64GB. Thanks Anria -- View this message in context: http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: &fq degrades qtime in a 20million doc collection
hi Shawn Thanks for the quick answer. As for the q=*, we also saw similar results in our testing when doing things like q=somefield:qval &fq=otherfield:fqval Which makes a pure Lucene query. I simplified things somewhat since our results were always that as numFound got large, the query time degraded as soon as we added any &fq in the mix. We also saw similar results for queries like q=query stuff &defType=edismax &df=afield &qf=afield bfield cfield So the query structure was not what created the 3-7 second query time, it was always a correlation between is &fq in the query, and what is the numFound. We've run numerous load tests for bringing in good query with fq values in the "newSearcher", caches on, caches off ... this same phenomenon persisted. As for Tomcat, it's an easy enough test to run it in Jetty. We will sure try that! GC we've had default and G1 setups. Thanks for giving us something to think about Anria -- View this message in context: http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250600.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: &fq degrades qtime in a 20million doc collection
hi all, We did try the q=queryA AND queryB, vs q=queryA&fq=queryB. For all tests, we commented out caching, and reload core between queries to be ultra sure that we are getting good comps on time. we have so many unique Fq and such frequent commits that caches are always invalidated, so our tests for the most part were with caches all commented out. Further, we have seen some gains in using an autoCommit strategy of 10 or 15 seconds, but still the first queries are horrible. Also, we tried to set some of these &fq in the "newSearcher" so that it warms at least an OS caches once before registering the Searcher as available. The index size is around 121GB. So it's just outside of Modest size, but not yet unacceptable range. THe docs are all modest in content. Small pieces of content, mostly strings, think Metadata of PDF files for the most part. not even the OCR content of them, just great well defined metadata. [quote] How much memory are you giving the JVM? Are you autowarming? Are you indexing while this is going on, and if what are your commit parameters? If you add &debug=true to your query, one of the returned sections [/quote] We tried with several sizes of heap, the gains were minimal. Above that no gain. If we use autowarming in either filterCache or NewSearcher query, the query takes too long, then several newsearcher classes get created and we start seeing maximum newSearcher exceeded errors It's by using &debug=true and &debug=timing that we isolated this. the Query time took the longest. Sometimes Prepare takes a little time too. Forget it if we add a facet that adds another 500+ ms at the low end ... Very perplexing and fun challenge. Thank Toke for that info on the heap size pointers, we will dial it down on the Heap size Anria -- View this message in context: http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250798.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can we create multiple cluster in single Zookeeper instance
hi Mugeesh It's best to use Zookeeper as it was intended. Install, or run 3 of them independent of any Solr, then point Solr to the zookeeper cluster. You can have 1, but then, if anything happens to that 1 single node of Zookeeper, all of your Solr will be dead, until you can properly revive it from a back-up. If it's terribly corrupte Hope this helps Anria -- View this message in context: http://lucene.472066.n3.nabble.com/Can-we-create-multiple-cluster-in-single-Zookeeper-instance-tp4250791p4250810.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: &fq degrades qtime in a 20million doc collection
Here are some Actual examples, if it helps wt=json&q=*:*&indent=on&fq=SolrDocumentType:"invalidValue"&fl=timestamp&rows=0&start=0&debug=timing { "responseHeader": { "status": 0, "QTime": 590, "params": { "q": "*:*", "debug": "timing", "indent": "on", "fl": "timestamp", "start": "0", "fq": "SolrDocumentType:\"invalidValue\"", "rows": "0", "wt": "json" } }, "response": { "numFound": 22916435, "start": 0, "docs": [] }, "debug": { "timing": { "time": 590, "prepare": { "time": 0, "query": { "time": 0 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { "time": 0 }, "stats": { "time": 0 }, "debug": { "time": 0 } }, "process": { "time": 590, "query": { "time": 590 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { "time": 0 }, "stats": { "time": 0 }, "debug": { "time": 0 } } } } } Now we wipe out all caches, and put the filter in q. wt=json&q=SolrDocumentType:"invalidValue"&indent=on&fl=timestamp&rows=0&start=0&debug=timing { "responseHeader": { "status": 0, "QTime": 266, "params": { "q": "SolrDocumentType:\"invalidValue\"", "debug": "timing", "indent": "on", "fl": "timestamp", "start": "0", "rows": "0", "wt": "json" } }, "response": { "numFound": 22916435, "start": 0, "docs": [] }, "debug": { "timing": { "time": 266, "prepare": { "time": 0, "query": { "time": 0 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { "time": 0 }, "stats": { "time": 0 }, "debug": { "time": 0 } }, "process": { "time": 266, "query": { "time": 266 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { "time": 0 }, "stats": { "time": 0 }, "debug": { "time": 0 } } } } } -- View this message in context: http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250823.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: &fq degrades qtime in a 20million doc collection
Here is a stacktrace of when we put a &fq in the autowarming, or in the "newSearcher" to warm up the collection after a commit. 2016-01-12 19:00:13,216 [http-nio-19082-exec-25 vaultThreadId:http-STAGE-30518-14 vaultSessionId:1E53A095AD22704 vaultNodeId:nodeId:node-2 vaultInstanceId:2228 vaultUserId:9802] INFO org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 2016-01-12 19:00:13,217 [http-nio-19082-exec-25 vaultThreadId:http-STAGE-30518-14 vaultSessionId:1E53A095AD22704 vaultNodeId:nodeId:node-2 vaultInstanceId:2228 vaultUserId:9802] WARN org.apache.solr.core.SolrCore - [instance_2228] Error opening new searcher. exceeded limit of maxWarmingSearchers=10, try again later. 2016-01-12 19:00:13,217 [http-nio-19082-exec-25 vaultThreadId:http-STAGE-30518-14 vaultSessionId:1E53A095AD22704 vaultNodeId:nodeId:node-2 vaultInstanceId:2228 vaultUserId:9802] INFO org.apache.solr.update.processor.LogUpdateProcessor - [instance_2228] webapp=/solr path=/update params={waitSearcher=true&commit=true&softCommit=true&wt=javabin&version=2} {} 0 0 2016-01-12 19:00:13,217 [http-nio-19082-exec-25 vaultThreadId:http-STAGE-30518-14 vaultSessionId:1E53A095AD22704 vaultNodeId:nodeId:node-2 vaultInstanceId:2228 vaultUserId:9802] ERROR org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=10, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1759) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:609) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) at veeva.ecm.common.interfaces.web.SolrDispatchOverride.doFilter(SolrDispatchOverride.java:44) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1096) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1500) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1456) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250836.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: &fq degrades qtime in a 20million doc collection
hi Shawn Thanks for your comprehensive answers. I really appreciate it. Just for clarity, the numbers I posted here were from tests that we isolated only one single fq and a q. These do have good times, even though its almost 600ms. Once we are in application mode, and other fq's and facets etc are added, query times go as bad as 7 seconds (which I personally observed). But you did give us a lot to work with, especially I think in the arena of commit strategies and cache usage. We'll do some more tests with different strategies in this area. Thanks Anria -- View this message in context: http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250855.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: &fq degrades qtime in a 20million doc collection
Thanks Toke for this. It gave us a ton to think about, and it really helps supporting the notion of several smaller indexes over one very large one, where we can rather distribute a few JVM processes with less size each, than have one massive one that is according to this, less efficient. Toke Eskildsen wrote > I would guess the 100 ms improvement was due to a factor not related to > heap size. With the exception of a situation where the heap is nearly > full, increasing Xmx will not improve Solr performance significantly. > > Quick note: Never set Xmx in the range 32GB-40GB (40GB is approximate): > At the 32GB point, the JVM switches to larger pointers, which means that > effective heap space is _smaller_ for Xmx=33GB than it is for Xmx=31GB: > https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/ > > - Toke Eskildsen, State and University Library, Denmark -- View this message in context: http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4251176.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: &fq degrades qtime in a 20million doc collection
hi Yonik We definitely didn't overlook that q=* being a wildcard scan, we just had so many systemic problems to focus on I neglected to thank Shawn for that particular piece of useful information. I must admit, I seriously never knew this. Ever since q=* was allowed I was so happy that it never occurred to me to investigate its details. Now I know :) Combining all the information from everybody here really brought home where our shortcomings were 1. yes, the q=* was quickly replaced by q=*:* everywhere - quick win 2. caching strategies are being reformed 3. We're looking into making smaller shards / cores since we do require super frequent commits, so on smaller bitsets the commit times should be way less, and we can use the smaller heap sizes to stay optimized in that realm One last question though please : Schema investigations : the &fq are frequently on Multivalued string fields, and we believe that it may also be slowing down the &Fq even more, but we were wondering why. When we run &fq on single valued fields its faster than the multi-valued fields, even when the multi-valued fields frequently have only a single value in it. Thanks again for everybody's help and pointers and hints, you kept us busy with changing our mindset on a lot of things here. Regards Anria -- View this message in context: http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4251212.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: custom field tutorial
You seem to know what you want the words to map to, so index the map. Have one field for the word, one field for the mapped value, and at query time, search the words and return the mapped field. If it is comma separated, so be it and split it up in your code post search. Otherwise, same as Wunder, in my many years in search this is an odd request Anria Sent from my Samsung smartphone on AT&T Original message Subject: Re: custom field tutorial From: Walter Underwood To: solr-user@lucene.apache.org CC: What are you trying to do? This seems really odd. I've been working in search for fifteen years and I've never heard this request. You could always return all the fields to the client and ignore the ones you don't want. wunder On Jun 7, 2013, at 8:24 PM, geeky2 wrote: > can someone point me to a "custom field" tutorial. > > i checked the wiki and this list - but still a little hazy on how i would do > this. > > essentially - when the user issues a query, i want my class to interrogate a > string field (containing several codes - example boo, baz, bar) > > and return a single integer field that maps to the string field (containing > the code). > > example: > > boo=1 > baz=2 > bar=3 > > thx > mark >