Edismax should, should not, exact match operators
On google a user can query using operators like "+" or "-" and quote the desired term in order to get the desired match. Does something like this come by default with edismax parser ? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Edismax-should-should-not-exact-match-operators-tp4140967.html Sent from the Solr - User mailing list archive at Nabble.com.
Schema editing in SolrCloud
I'm having a SolrCloud setup using Solr 4.6 with several configuration sets and multiple collections, some sharing the same config set. I would like now to update the schema inside a config set, adding a new field. 1. Can i do this directly downloading the schema file and re-uploading after editing or do i have to download the whole config set and re-upload after editing only the schema file? 2. At https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities I've saw the it is possible to put data inside a Zookeeper file, but how can one specify which config set should that file be uploaded into? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-editing-in-SolrCloud-tp4141423.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Inconsistent Behavior of Solr Cloud
Are all the replicas up ? Did you check if there is enough space on the disk? How are you running the queries? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Inconsistent-Behavior-of-Solr-Cloud-tp4141593p4141605.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Concurent indexing
Here's some of the Solr's last words (log content before it stoped accepting updates), maybe someone can help me interpret that. http://pastebin.com/mv7fH62H -- View this message in context: http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095642.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Debugging update request
Thanks Erick! The version is 4.4.0. I'm posting 100k docs batches every 30-40 sec from each indexing client and sometimes two or more clients post in a very small timeframe. That's when i think the deadlock happens. I'll try to replicate the problem and check the thread dump. -- View this message in context: http://lucene.472066.n3.nabble.com/Debugging-update-request-tp4095619p4095821.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Debugging update request
I got the trace from jstack. I found references to "semaphore" but not sure if this is what you meant. Here's the trace: http://pastebin.com/15QKAz7U -- View this message in context: http://lucene.472066.n3.nabble.com/Debugging-update-request-tp4095619p4095847.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud Query Balancing
I have setup a SolrCloud system with: 3 shards, replicationFactor=3 on 3 machines along with 3 Zookeeper instances. My web application makes queries to Solr specifying the hostname of one of the machines. So that machine will always get the request and the other ones will just serve as an aid. So I would like to setup a load balancer that would fix that, balancing the queries to all machines. Maybe doing the same while indexing. Would this be a good practice ? Any recommended tools for doing that? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Query-Balancing-tp4095854.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Query Balancing
Thanks! I've read a lil' bit about that, but my app is php-based so I'm afraid I can't use that. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Query-Balancing-tp4095854p4095857.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Error when i want to create a CORE
Assuming that you are using the Admin UI: The instanceDir must be already existing (in your case index1). Inside it there should be conf/ directory holding the cofiguration files. In the config field only insert the file name (like "solrconfig.xml") which shoulf be found in the conf/ directory -- View this message in context: http://lucene.472066.n3.nabble.com/Error-when-i-want-to-create-a-CORE-tp4095894p4095900.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Query Balancing
Thanks! Could you provide some examples or details of the configuration you use ? I think this solution would suit me also. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Query-Balancing-tp4095854p4095910.html Sent from the Solr - User mailing list archive at Nabble.com.
Change config set for a collection
The question also asked some 10 months ago in http://lucene.472066.n3.nabble.com/SolrCloud-4-1-change-config-set-for-a-collection-td4037456.html, and then the answer was negative, but here it goes again, maybe now it's different. Is it possible to change the config set of a collection using the Collection API to another one (stored in zookeeper)? If not, is it possible to do it using zkCli ? Also how can somebody check which config set a collection is using ? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Change-config-set-for-a-collection-tp4096032.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Change config set for a collection
Thank you, Shawn! "linkconfig" - that's exactly what i was looking for! -- View this message in context: http://lucene.472066.n3.nabble.com/Change-config-set-for-a-collection-tp4096032p4096134.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Change config set for a collection
Thanks Garth! Yes, indeed, I know that issue. I had set up my SolrCloud using 4.5.0 and then encountered this problem, so I rolled back to 4.4.0 - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Change-config-set-for-a-collection-tp4096032p4096136.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr timeout after reboot
I have a SolrCloud environment with 4 shards, each having a replica and a leader. The index size is about 70M docs and 60Gb, running with Jetty + Zookeeper, on 2 EC2 instances, each with 4CPUs and 15G RAM. I'm using SolrMeter for stress testing. If I restart Jetty and then try to use SolrMeter to bomb an instance with queries, using a query per minute rate of 3000 then that solr instance somehow timesout and I need to restart it again. If instead of using 3000 qpm i startup slowly with 200 for a minute or two, then 1800 and then 3000 everything is good. I assume this happens because Solr is not warmed up. What settings could I tweak so that Solr doesn't time out anymore when getting many requests? Is there a way to limit how many req it can serve? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr timeout after reboot
Thank you, Otis! I've integrated the SPM on my Solr instances and now I have access to monitoring data. Could you give me some hints on which metrics should I watch? Below I've added my query configs. Is there anything I could tweak here? 1024 true 20 100 active:true false 10 - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096780.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr timeout after reboot
Hmm, no, I haven't... What would be the effect of this ? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096809.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr timeout after reboot
I'm using the m3.xlarge server with 15G RAM, but my index size is over 100G, so I guess putting running the above command would bite all available memory. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096827.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.6.0 latest build
Hi! I'm using Solr 4.4.0 currently and I'm having quite some trouble with * SOLR-5216: Document updates to SolrCloud can cause a distributed deadlock. (Mark Miller) which should be fixed for 4.6.0. Where could I get Solr 4.6.0 from? I want to make some tests regarding this fix. Thank you! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-6-0-latest-build-tp4096960.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.6.0 latest build
Thanks Chris & Rafal! So the problem actually persists in 4.6. I'll then watch this issue and cheer for Mark's fix. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-6-0-latest-build-tp4096960p4096992.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: fq with { or } in Solr 4.3.1
For filtering categories i'm using something like this : fq=category:(cat1 OR cat2 OR cat3) - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/fq-with-or-in-Solr-4-3-1-tp4097170p4097183.html Sent from the Solr - User mailing list archive at Nabble.com.
Changing indexed property on a field from false to true
Being given indexed="false"* stored="true" multiValued="false" /> Changed to indexed="true"* stored="true" multiValued="false" /> Once the above is done and the collection reloaded, is there a way I can build that index on that field, without reindexing the everything? Thank you! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Changing-indexed-property-on-a-field-from-false-to-true-tp4097213.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Changing indexed property on a field from false to true
I've made a test, based on your suggestion. Using the example in 4.5.0 i set the title field as indexed=false, indexed a couple of docs: 1 BigApple 2 SmallApple and made fq=title:BigApple. No docs were returned, of course. Then I modified the schema, setting indexed=true for the title field and restarted solr. Following that I posted a document update : 1 BigApple Afterwards i ranned the same query fq=title:BigApple and the document was returned. So at a first look an atomic update can do the trick. Unless I was doing something wrong. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Changing-indexed-property-on-a-field-from-false-to-true-tp4097213p4097233.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud: optimizing a core triggers optimizations of all cores in that collection?
Hi! I have a SolrCloud setup, on two servers 3 shards, replicationFactor=2. Today I trigered the optimization on core *shard2_replica2* which only contained 3M docs, and 2.7G. The size of the other shards were shard3=2.7G and shard1=48G (the routing is implicit but after some update deadlocks and restarts the shard range in Zookeeper got null and everything since then apparently got indexed to shard1) So, half an hour after i triggered the optimization, via the Admin UI, i noticed that used space was increasing alot on *both servers* for cores *shard1_replica1 and shard1_replica2*. It was now 67G and increasing. In the end after about 40 minutes from the start operation shard1 was done optimizing on both servers leaving shard1_replica1 and shard1_replica2 at about 33G. Any idea what is happening and why the core on which i wanted the optimization to happen, got no optimization and instead another shard got optimized, on both servers? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimizations-of-all-cores-in-that-collection-tp4097499.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Normalized data during indexing ?
Maybe this can help you: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Normalized-data-during-indexing-tp4097750p4097752.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud: optimizing a core triggers optimizations of all cores in that collection?
Thanks Erick! I will try specifying the distrib parameter. As for why I am optimizing, well i do lots of delete by id and by query and after a while about 30% of maxDocs are deletedDocs. On a 50G index that means about 15G of space which I am trying to free by doing the optimization. "it's usually better NOT to optimize " Could you provide some more details on this? Thank you! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-tp4097499p4097828.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud: optimizing a core triggers optimizations of all cores in that collection?
Thanks @Mark & @Erick Should I create a JIRA issue for this ? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-tp4097499p4098020.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Optimal interval for soft commit
How do you add the documents to the index - one by one, batches of n ? When do you do your commits ? Because 8k docs per day is not a lot. Depending on the above, commiting with softCommit=true might also be a solution. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Optimal-interval-for-soft-commit-tp4098016p4098022.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: One of all shard stopping, all shards stop
When one of your shards dies, your index becomes incomplete. By default the querying is distributed (on all shards - &distrib=true) and if one of them (shard X) is down, then you get an error stating that there are "no servers hosting shard X". If the other shards are still up you can query them directly using "&distrib=false" but in the resultset you will only have documents from that shard. So you would have to query every active shard individually and then merge the results yourself. If I'm wrong please correct me. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/One-of-all-shard-stopping-all-shards-stop-tp4098015p4098024.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr For
You're describing two different entities: Job and Employee. Since they are clearly different in any way you will need two different cores with two different schemas. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-For-tp4097928p4098025.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie to Solr
Put "*:*" in the q field Then check the facet check box (look lower close to the Execute button) and in the facet.field insert "Name". This should do the trick. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876p4098031.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie to Solr
I don't see the mentioned attachement. Try using http://snag.gy/ to provide it. As for where do you find it, the default is http://localhost:8983/solr/collection1/query - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876p4098041.html Sent from the Solr - User mailing list archive at Nabble.com.
Phrase query combined with term query for maximum accuracy
For maximum search accuracy on my SolrCloud system i was thinking of combining phrase search with term search in the following way: search term: john doe search fields: title, description - a match in the title is more relevant than one in the description What i want to achieve - the following document ranking: 1. a hard match for "john doe" in the title 2. a hard match for "john doe" in the description 3. a match of "john" OR "doe" in the title 4. a match of "john" OR "doe" in the description What I've got until now: Using edismax parser: q.op=or&q=title:"john doe"^100 OR description:"john doe"^50 OR title:john doe^30 OR description:john doe^10 Would the above query provide me what i want, or is there a better way to do it? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Phrase-query-combined-with-term-query-for-maximum-accuracy-tp4098215.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud batch updates
I'm currently using a SolrCoud setup and I index my data using a couple of in-house indexing clients. The clients process some files and post json messages containing added documents in batches. Initially my batch size was 100k docs and the post request took about 20-30 secs. I switched to 10k batches and now the updates are much faster but also more in number. My commit settings are : - autocommit - 45s / 100k docs, openSearcher=false - softAutoCommit - every 3 minutes I'm trying to figure out which one is preferable - bigger batches, rare or smaller batches, often? And why? Which are the background operations that take place after posting docs? At which point does the replication kick in - after commit or after update? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-batch-updates-tp4098463.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase query combined with term query for maximum accuracy
Thanks Jack! I tried it and i get a really funny behaviour: I have two collections, having the same solrconfig.xml and the same schema definition, except for the type of some fields, which in collection_DE are customized for German languange and in collection_US for English Fields "title" and "text" have the corresponding type (text_de in collection_DE and text_en in collection_US) Now, when i run this query: /solr/collection_US/select/?q=title:"blue hat"^100 OR text:"blue hat"^50 OR title:(blue hat)^30 OR text:(blue hat)^10&fq=active:true&start=0&rows=40&sort=score+desc&fl=*,score&country=US i get error: No live SolrServers available to handle this request:[http://xxx:8983/solr/collection_US_shard2_replica1, http://xxx:8983/solr/collection_US_shard2_replica2]","trace":"org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[http://xx:8983/solr/collection_US_shard2_replica1, http://xx:8983/solr/collection_US_shard2_replica2]\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1489)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:517)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:540)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1097)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:446)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1031)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:200)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:317)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:445)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:269)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)\n\tat java.lang.Thread.run(Thread.java:724)\nCaused by: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[http://xxx:8983/solr/collection_US_shard2_replica1, http://xxx:8983/solr/collection_US_shard2_replica2]\n\tat org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:333)\n\tat org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(HttpShardHandlerFactory.java:214)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:158)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\t... 1 more\nCaused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:8983/solr/collection_US_shard2_replica2 returned non ok status:500, message:Server Error\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)\n\tat org
Re: Phrase query combined with term query for maximum accuracy
One more thing i just noticed: if for collection_US i try to search for title:"blue hat"^100 OR text:"blue hat"^50-> i get the same error but if i search for : title:"blue hat"^100 OR text:"bluehat"^50 -> it works fine - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Phrase-query-combined-with-term-query-for-maximum-accuracy-tp4098215p4098599.html Sent from the Solr - User mailing list archive at Nabble.com.
Query OR operator triggers weird exception
I run a set of queries using the AdminUI and some of them trigger a weird error: "error": { "msg": "org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:... "code": 500 } Here's the pattern, using the edismax parser: title:"blue hat" OR text:"blue hat" -->error above title:"blue hat" OR text:"bluehat"--> OK title:"blue hat" OR text:(blue hat) --> OK title:(blue hat) OR text:(blue hat)--> OK Any idea what is wrong here? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Query-OR-operator-triggers-weird-exception-tp4098605.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query OR operator triggers weird exception
Thanks Jack! Some more info: I looked a little bit and tried the problem query, undistributed, on each shard: shard2_replica1 and shard2_replica2 throw this error: "responseHeader":{ "status":500, "QTime":2, "params":{ "lowercaseOperators":"true", "indent":"true", "q":"title:\"red shoes\" OR text:\"red shoe\"", "distrib":"false", "stopwords":"true", "wt":"json", "defType":"edismax"}}, "error":{ "trace":"java.lang.ArrayIndexOutOfBoundsException\n", "code":500}} On other shards the query works fine. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Query-OR-operator-triggers-weird-exception-tp4098605p4098607.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query OR operator triggers weird exception
I also narrowed my problem to the text field. simple query : title:"red shoes" works but text:"red shoes" does not. Could you extend a little bit how could my schema omitted position information? I'm not really sure what you mean by that. Thank you! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Query-OR-operator-triggers-weird-exception-tp4098605p4098609.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query OR operator triggers weird exception
After restarting my servers this was the first error i got when trying to make the same query: { "responseHeader":{ "status":500, "QTime":336, "params":{ "lowercaseOperators":"true", "indent":"true", "q":"text:\"blue cat\"", "distrib":"false", "stopwords":"true", "wt":"json", "defType":"edismax"}}, "error":{ "msg":"-103", "trace":"java.lang.ArrayIndexOutOfBoundsException: -103\n\tat org.apache.lucene.codecs.lucene41.ForUtil.skipBlock(ForUtil.java:219)\n\tat org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.skipPositions(Lucene41PostingsReader.java:958)\n\tat org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextPosition(Lucene41PostingsReader.java:988)\n\tat org.apache.lucene.search.ExactPhraseScorer.phraseFreq(ExactPhraseScorer.java:213)\n\tat org.apache.lucene.search.ExactPhraseScorer.nextDoc(ExactPhraseScorer.java:134)\n\tat org.apache.lucene.search.Scorer.score(Scorer.java:64)\n\tat org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:624)\n\tat org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)\n\tat org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1494)\n\tat org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1363)\n\tat org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:474)\n\tat org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:434)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1489)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:517)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:540)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1097)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:446)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1031)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:200)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:317)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:445)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:269)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)\n\tat java.lang.Thread.run(Thread.java:724)\n", "code":500}} - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Query-OR-operator-triggers-weird-exception-tp4098605p4098614.html Sent from the Solr - User mailing list archive at Nabble.com.
Replication after re adding nodes to cluster (sleeping replicas)
I have a SolrCloud cluster holding 4 collections, each with with 3 shards and replication factor = 2. They all live on 2 machines, and I am currently using this setup for testing. However, i would like to connect this test setup to our live application, just for benchmarking and evaluating if it can handle the big qpm number. I am planning also to setup a new machine, and add new nodes manually, one more replica for each shard on the new machines, in case the first two have problems handling the big qpm. But what i would like to do is after I set up the new nodes, to shut down the new machine and only put it back in the cluster if it's needed. Thus, getting to the title of this mail: After re adding the 3rd machine to the cluster, will the replicas be automatically synced with the leader, or do i need to manually trigger this somehow ? Is there a better idea for having this sleeping replicas? I bet lots of people faced this problem, so a best practice must be out there. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Replication-after-re-adding-nodes-to-cluster-sleeping-replicas-tp4098764.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud different machine sizes
I've setup my SolrCloud using AWS and i'm currently using 2 average machines. I'm planning to ad one more bigger machine (by bigger i mean double the RAM). If they all work in a cluster and the search being distributed, will the smaller machines limit the performance the bigger machine could offer? (they have less memory, so less cache, thus more disk reads on that machines ==> bigger query times) ? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-different-machine-sizes-tp4099138.html Sent from the Solr - User mailing list archive at Nabble.com.
Performance of "rows" and "start" parameters
I saw that some time ago there was a JIRA ticket dicussing this, but still i found no relevant information on how to deal with it. When working with big nr of docs (e.g. 70M) in my case, I'm using start=0&rows=30 in my requests. For the first req the query time is ok, the next one is visibily slower, the third even more slow and so on until i get some huge query times of up 140secs, after a few hundreds requests. My test were done with SolrMeter at a rate of 1000qpm. Same thing happens at 100qpm, tough. Is there a best practice on how to do in this situation, or maybe an explanation why is the query time increasing, from request to request ? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud different machine sizes
Thank you, Erick! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-different-machine-sizes-tp4099138p4099195.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: The first search is slow
First time you run a query it's always slower, because it reads data from disk. After the first query, caches are built and stored in RAM memory, so the second run of that query will hit caches and be sensibly faster. To change how slow the first query is, play around with you firstSearcher and newSearcher queries (see the example solrconfig.xml for more details) Following link should provide you some more info on caching and what you can do to improve the performance of your first query. http://wiki.apache.org/solr/SolrCaching If your hardware and setup allows it, you can try a trick, putting the contents of your index in RAM directly: cat ...//data/index/* >/dev/null - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/The-first-search-is-slow-tp4099316p4099347.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance of "rows" and "start" parameters
Thank you! I suspect that maybe my box was too small. I'm upgrading my machines to more CPU & RAM and let's see how it goes from there. Would limiting the number of returned fields to a smaller value would make any improvement? The behaviour I noticed was that: at start=o&rows=10 avg qtime after 200queris was about 15ms at start=o&rows=20 avg qtime after 200queris was about 20ms at start=o&rows=30 avg qtime after 200queris was about 250ms and slowly increasing. at start=o&rows=50 avg qtime after 200queris was about 1400ms and increasing really fast. Tests were made using SolrMeter, using a set of keywords, each request having specified the start=0&rows=N (N being one of the values above). So, no deep paging, always requesting first N results, sorted by score. I will try again this scenario on the bigger boxes, and come back. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194p4099370.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud statistics
Solr's queryhandler statistics are pretty neat. Avg time per req, avg requests in the last 5/15 min and so on. But, when using SolrCloud's distributed search each core gets multiple requests, making it hard to check which is the actual query time (the time from when a leader gets the query request until the resultset is actually served back to the client). Is there a way to monitor this (or maybe a tool) ? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-statistics-tp4099378.html Sent from the Solr - User mailing list archive at Nabble.com.
Creating a replica by copying index
Is it possible to create a replica of a shard (collection1_shard1_replica1), in SolrCloud, by copying the collection1_shard1_replica1/ directory to the new server, updating core.properties and restarting solr on that machine? Would this be faster than using the CoreAPI to create a new core and specifying the collection and shard? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Creating-a-replica-by-copying-index-tp4099600.html Sent from the Solr - User mailing list archive at Nabble.com.
Merging shards and replicating changes in SolrCloud
Here's the background of this topic: I have setup a collection with 4 shards, replicationFactor=2, on two machines. I started to index documents, but after hitting some update deadlocks and restarting servers my shards ranges in ZK state got nulled (i'm using implicit routing). Indexing continued without me noticing and all new documents were indexed in shard1 creating huge disproportions with shards2,3,4. Of course, I want to fix this and get my index into 4 shards, evenly distributed. What I'm thinking to do is: 1. on machine 1, merge shards2,3,4 into shard1 using http://wiki.apache.org/solr/MergingSolrIndexes (at this point what happens to the replica of shard1 on machine2 ? will SolrCloud try to replicate shard1 from machine1?) 2. on machine 2, unload the shard1,2,3,4 cores 3. on machine 1, split shard1 in shard1_0 and shard1_1. Again split shard1_0 and 1_1 getting 4 equal shards 1_0_0, 1_0_1, 1_1_0, 1_1_1 (will now the shard range for the newborns be correct if in the beginning shard1's range was "null"?) 4. on machine 1 unload shard1 5. rename shards 1_0_0, 1_0_1, 1_1_0, 1_1_1 to 1,2,3,4. 6. replicate shard 1,2,3,4 to machine 2 Do you see any problems with this scenario? Anything that could be don in a more efficient way ? Thank you - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-shards-and-replicating-changes-in-SolrCloud-tp407.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr timeout after reboot
Thank you, Peter! Last weekend I was up until 4am trying to understand why is Solr starting so so sooo slow, when i had gave enough memory to fit the entire index. And then I remembered your trick used on the m3.xlarge machines, tried it and it worked like a charm! Thank you again! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4100254.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Merging shards and replicating changes in SolrCloud
Thanks for the comments Shalin,I ended up doing just that, reindexing from ground up. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-shards-and-replicating-changes-in-SolrCloud-tp407p4100255.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding a server to an existing SOLR cloud cluster
>From my understanding, if your already existing cluster satisfies your collection (already live nodes >= nr shards * replication factor) there wouldn't be any need for creating additional replicas on the new server, unless you directly ask for them, after startup. I usually just add the machine to the cluster and the manually create the replicas I need. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-server-to-an-existing-SOLR-cloud-cluster-tp4100275p4100313.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replicate Solr Cloud
You'll have to provide some more details on your problem. What do you mean by "location A and B" : 2 different machines? By default SolrCloud shards can have replicas which can be hosted on different machines. It can offer you redundancy, if one of you machines dies, your search system will still be up if the other machine(s) are up and running. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Replicate-Solr-Cloud-tp4100410p4100434.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrcloud - forward update to a shard failed
Do you do your commit from the two indexing clients or have the autocommit set to maxDocs = 1000 ? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100633.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrcloud - forward update to a shard failed
I did something like that also, and i was getting some nasty problems when one of my clients would try to commit before a commit issued by another one hadn't yet finish. Might be the same problem for you too. Try not doing explicit commits fomr the indexing client and instead set the autocommit to 1000 docs or whichever value fits you best. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100670.html Sent from the Solr - User mailing list archive at Nabble.com.
Optimizing cores in SolrCloud
A few weeks ago optimization in SolrCloud was discussed in this thred: http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-td4097499.html#a4098020 The thread was covering the distributed optimization inside a collection. My use case requires manually running optimizations every week or so, because I do delete by query often, and deletedDocs number gets to huge amounts, and the only way to regain that space is by optimizing. Since I have a pretty steady high load, I can't do it over night and i was thinking to do it one core at a time -> meaning optimizing shard1_replica1 and then shard1_replica2 and so on, using curl 'http://localhost:8983/solr/collection1_shard1_replica1/update?optimize=true&distrib=false' My question is how would this reflect on the performance of the system? All queries that would be routed to that shard replica would be very slow I assume. Would there be any problems if a replica is optimized and another is not? Anybody tried something like this? Any tips or stories ? Thank you! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Optimizing cores in SolrCloud
Thanks Erick! That's a really interesting idea, i'll try it! Another question would be, when does the merging actually happens? Is it triggered or conditioned by something? Currently I have a core with ~13M maxDocs and ~3M deleted docs, and although I see a lot of merges in SPM, deleted documents aren't really going anywhere. For merging I have the example settings, haven't changed it. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871p4100936.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrCloud question
Hi, The CollectionAPI provides some more options that will prove to be very usefull to you: /admin/collections?action=CREATE&name=name&numShards=number&replicationFactor=number&maxShardsPerNode=number&createNodeSet=nodelist&collection.configName=configname Have a look at: https://cwiki.apache.org/confluence/display/solr/Collections+API Regarding your observations: 1. Completely normal, that's standard naming 2. When you created the collection you did not specify a configuration so the new collection will use the conf already stored in ZK. If you have more than one not sure which one will be picked as default. 3. You should be able to create replicas, by adding new cores on the other machines, and specifying the collection name and shard id. The data will then be replicated automatically to the new node. If you already tried that and get errors/problems while doing it provide some more details. As far as i know you should be able to move/replace the index data, as long as the source collection has the same config as the target collection. Afterwards you'll have to reload your core / restart the Solr instance - not sure which one will do it - most likely the latter. But it will be easier if you use the method described at point 3 above. Please someone correct me, if i'm wrong. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-question-tp4101266p4101675.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple data/index.YYYYMMDD.... dirs == bug?
I encountered this problem often when i restarted a solr instance before replication was finished more than once. I would then have multiple timestamped directories and the index directory. However, the index.properties points to the active index directory. The moment when the replication succeeded the temp dir is renamed "index" and the index.properties is gone. On the situation when the index is missing, not sure about that. Maybe this happens when the replica is too old and an old-school replication is done. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-data-index-MMDD-dirs-bug-tp4102163p4102168.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to work with remote solr savely?
Use HTTP basic authentication, setup in your servlet container (jetty/tomcat). That should work fine if you are *not* using SolrCloud. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102613.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to work with remote solr savely?
http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication Maybe you could achieve write/read access limitation by setting path based authentication: The update handler "/solr/core/update" should be protected by authentication, with credentials only known to you. But then of course, your indexing client will need to authenticate in order to add docs to solr. Your select handler "/solr/core/select" could then be open or protected by http auth with credentials open to developers. That's the first idea that comes to mind - haven't tested it. If you do, feedback and let us know how it went. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102618.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Autowarmed queries on jvm crash
As Shawn stated above, when you start up Solr there will be no such thing as caches or old searchers. If you want to warm up, you can only rely on firstSearcher and newSearcher queries. /"What would happen to the autowarmed queries , cache , old searcher now ?"/ They're all gone. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Autowarmed-queries-on-jvm-crash-tp4103451p4103466.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Persist solr cache
Caches are only valid as long as the Index Searcher is valid. So, if you make a commit with opening a new searcher then caches will be invalidated. However, in this scenario you can configure your caches so that the new searcher will keep a certain number of cache entries from the previous one (autowarmCount). That's the only cache "persistence" Solr can offer. On restarting/crash you can't reuse caches. Why do you need to persist caches in case of a crash? What's your usage scenario? Do you have problems with performance after startup? You can read more at http://wiki.apache.org/solr/SolrCaching#Overview - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Persist-solr-cache-tp4103463p4103469.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Persist solr cache
You could just add the queries you have set up in your batch script to the firstSearcher queries. Like this, you wouldn't need to run the script everytime you restart Solr. As for crash protection and immediate action, that's outside the scope of the Solr mailing list. You could setup a watchdog that will restart Solr if it crashes, or something like that. Or you could use SolrCloud with replicas on multiple machine. This would remove the SPOF from your system. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Persist-solr-cache-tp4103463p4103487.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr as a service for multiple projects in the same environment
Hi, There's nothing unusual in what you are trying to do, this scenario is very common. To answer your questions: > 1. as I understand I can separate the configs of each collection in > zookeeper. is it correct? Yes, that's correct. You'll have to upload your configs to ZK and use the CollectionAPI to create your collections. >2.are there any solr operations that can be performed on collection A and somehow affect collection B? No, I can't think of any cross-collection operation. Here you can find a list of collection related operations: https://cwiki.apache.org/confluence/display/solr/Collections+API >3. is the solr cache separated for each collection? Yes, separate and configurable in solrconfig.xml for each collection. >4. I assume that I'll encounter a problem with the os cache, when the different indices will compete on the same memory, right? how severe is this issue? Hardware can be a bottleneck. If all your collection will face the same load you should try to give solr a RAM amount equal to the index size (all indexes) >5. any other advice on building such an architecture? does the maintenance overhead of maintaining multiple clusters in production really overwhelm the problems and risks of using the same cluster for multiple systems? I was in the same situation as you, and putting everything in multiple collections in just one cluster made sense for me : it's easier to manage and has no obvious downside. As for "risks of using the same cluster for multiple systems" they are pretty much the same in both scenarios. Only that with multiple clusters you'll have much more machines to manage. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/solr-as-a-service-for-multiple-projects-in-the-same-environment-tp4103523p4103537.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to programatically unload a shard from a single server to horizontally scale on SolrCloud
Use Core API, which provides the "UNLOAD" operation. Simply just unload the cores you don't need and they'll be automatically removed from SolrCloud. You can also specify options like "deleteDataDir" or "deleteIndex" to cleanup the disk space or you can do it in your script. http://wiki.apache.org/solr/CoreAdmin#UNLOAD - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-programatically-unload-a-shard-from-a-single-server-to-horizontally-scale-on-SolrCloud-tp4105343p4105344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JVM crashed when start solr
Which are you solr startup parameters (java options) ? You can assign more memory to the JVM by specifying -Xmx=10G or whichever value works for you. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/JVM-crashed-when-start-solr-tp4105702p4105705.html Sent from the Solr - User mailing list archive at Nabble.com.
Updating shard range in Zookeeper
Hi, Somehow my Zookeeper clusterstate has gotten messed up and after a restart of both Zookeeper instances and my Solr instances in one of my collections , for one shard, the "range" is now null. Everything else it's fine, but I can't index documents now because i get an error : No active slice servicing hash code 2c7ade4d in DocCollection. The router of my collection is compositIed. If i look to the other collections ranges i can guess that the missing range should be "0-3fff". Any idea how can i update it ? (tools, procedures) Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Updating-shard-range-in-Zookeeper-tp4106138.html Sent from the Solr - User mailing list archive at Nabble.com.
Cloud graph gone after manually editing clusterstate.json
HI, Today I changed my ZK config, removing one instance in the quorum and then restarted both all ZKs and all Solr instances. After this operation i noticed that one of the shards in one collection was missing the range ("range":null). Router for that collection was compositeId. So, I proceeded adding the missing range manually by editing clusterstate.json $ zkCli.sh -server zk1:9983 get /clusterstate.json > clusterstate.json i did my edits, and then: $ zkCli.sh -server zk1:9983 set /clusterstate.json "`cate clusterstate.json`" Everything fine, I check in the Admin - the clusterstate.json was updated, but now when i try to see the graph view or radial graph i can't see anything. Just white space. Any idea why? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Cloud-graph-gone-after-manually-editing-clusterstate-json-tp4106142.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Cloud graph gone after manually editing clusterstate.json
Thanks for the reply Tim, Yes, that was just a typo, i used "cat" not "cate". As for the checks everything looks fine, my edits were: 1. updating the shard range 2. removed the header which looked log information, as below: * removed header start here* Connecting to solr3:9983 2013-12-11 16:15:05,372 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 2013-12-11 16:15:05,376 [myid:] - INFO [main:Environment@100] - Client environment:host.name=solr3.internal 2013-12-11 16:15:05,377 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_25 2013-12-11 16:15:05,377 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2013-12-11 16:15:05,378 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/java-7-openjdk-amd64/jre 2013-12-11 16:15:05,378 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/opt/zookeeper/bin/../build/classes:/opt/zookeeper/bin/../build/lib/*.jar:/opt/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/z$ 2013-12-11 16:15:05,378 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/jni:/lib:/usr/lib 2013-12-11 16:15:05,379 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2013-12-11 16:15:05,379 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler= 2013-12-11 16:15:05,380 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2013-12-11 16:15:05,380 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2013-12-11 16:15:05,381 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.2.0-4-amd64 2013-12-11 16:15:05,381 [myid:] - INFO [main:Environment@100] - Client environment:user.name=solr 2013-12-11 16:15:05,382 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/solr 2013-12-11 16:15:05,382 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/opt/zookeeper 2013-12-11 16:15:05,384 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=solr3:9983 sessionTimeout=3 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@58a5f543 2013-12-11 16:15:05,412 [myid:] - INFO [main-SendThread(solr3.productdb.internal:9983):ClientCnxn$SendThread@966] - Opening socket connection to server solr3.internal/10.33.182.78:9983. Will not attempt to authenticate $ 2013-12-11 16:15:05,419 [myid:] - INFO [main-SendThread(solr3.productdb.internal:9983):ClientCnxn$SendThread@849] - Socket connection established to solr3.internal/10.33.182.78:9983, initiating session 2013-12-11 16:15:05,427 [myid:] - INFO [main-SendThread(solr3.productdb.internal:9983):ClientCnxn$SendThread@1207] - Session establishment complete on server solr3.internal/10.33.182.78:9983, sessionid = 0x142e187355000$ WATCHER:: WatchedEvent state:SyncConnected type:None path:null *<< i removed the above until here* { "offers_collection_GB":{ "shards":{ "shard1":{ "range":"8000-bfff", "state":"active", "replicas":{ .. and so on Could this be the problem? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Cloud-graph-gone-after-manually-editing-clusterstate-json-tp4106142p4106161.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Cloud graph gone after manually editing clusterstate.json
I had a look, but all looks fine there too: [Wed Dec 11 2013 17:04:41 GMT+0100 (CET)] runRoute get #/~cloud GET tpl/cloud.html?_=1386777881244 200 OK 57ms GET /solr/zookeeper?wt=json&_=1386777881308 200 OK 509ms GET /solr/zookeeper?wt=json&path=%2Flive_nodes&_=1386777881822 200 OK 62ms GET /solr/zookeeper?wt=json&detail=true&path=%2Fclusterstate.json&_=1386777881886 200 OK 84ms - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Cloud-graph-gone-after-manually-editing-clusterstate-json-tp4106142p4106172.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cloud graph gone after manually editing clusterstate.json
Hi guys, thanks for the replies! The json was valid, i validated it and the only diff between the fiels was my edit. But actually, it got fixed by itself - when i got to work today, everything was working as it should. Maybe it was something on my machine or browser, can't put a finger on the problem. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Cloud-graph-gone-after-manually-editing-clusterstate-json-tp4106142p4106350.html Sent from the Solr - User mailing list archive at Nabble.com.
Metrics in monitoring SolrCloud
Hi, I'm trying to add SolrCloud to out internal monitoring tools and I wonder if anybody else proceeded in this direction and could maybe provide some tips. I would want to be able to get from SolrCloud: 1. The status for each collection - meaning can it serve queries or not. 2. Average query time per collection 3. Nr of requests per second/min for each collection Would i need to implement some solr plugins for this, or is the information already existing? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Metrics-in-monitoring-SolrCloud-tp4106384.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: combining cores into a collection
Hi David, "They are loaded with a lot of data so avoiding a reload is of the utmost importance." Well, reloading a core won't cause any data loss. Is it 100% availability during the process is what you need? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/combining-cores-into-a-collection-tp4109090p4109101.html Sent from the Solr - User mailing list archive at Nabble.com.
Boosting documents at index time, based on payloads
Hi, I'm not really sure how/if payloads work (I tried out Rafal Kuc's payload example in Apache Solr 4 Cookbook and did not do what i was expecting - see below what i was expecting to do and please correct me if i was looking for the the wrong droid) What I am trying to achieve is similar to the payload principle, give certain term a boosting value at index time. At query time , if searched by that term, that boost value should influence the scoring, docs with bigger boost values being preferred to the ones with smaller boost values. Can this be achieved using payloads? I expect so, but then how should this behaviour be implemented - the basic recipe failed to work, so I'm a little confused. Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Boosting-documents-at-index-time-based-on-payloads-tp4110661.html Sent from the Solr - User mailing list archive at Nabble.com.
Simple payloads example not working
Hi, I'm trying to test payloads in Solr Using solr 4.6.0 and the example configuration, i posted 3 docs to solr: 1 Doc one testone|100 testtwo|30 testthree|5 I testone, you testtwo, they testthree 2 Doc two testone|30 testtwo|200 testthree|5 I testone, you testtwo, they testthree 3 Doc three testone|5 testtwo|100 testthree|300 I testone, you testtwo, they testthree Then in the Admin UI i queryed: http://localhost:8983/solr/collection1/select?q=text%3Atestone&wt=json&indent=true The result was: { "responseHeader":{ "status":0, "QTime":0, "params":{ "indent":"true", "q":"text:testone", "wt":"json"}}, "response":{"numFound":3,"start":0,"docs":[ { "id":"2", "title":["Doc two"], "payloads":"testone|30 testtwo|200 testthree|5", "_version_":1457102453306556416}, { "id":"3", "title":["Doc three"], "payloads":"testone|5 testtwo|100 testthree|300", "_version_":1457102453657829376}, { "id":"1", "title":["Doc one"], "payloads":"testone|100 testtwo|30 testthree|5", "_version_":1457102486106013696}] }} So, doc two has the biggest score although i gave doc one the biggest payload for term "testone". I am missing something here or is there a bug? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Simple payloads example not working
Actually, i just checked the debugQuery output: they all have the same score: "explain": { "1": "\n0.24276763 = (MATCH) weight(text:testone in 0) [DefaultSimilarity], result of:\n 0.24276763 = fieldWeight in 0, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 0.7768564 = idf(docFreq=4, maxDocs=4)\n0.3125 = fieldNorm(doc=0)\n", "2": "\n0.24276763 = (MATCH) weight(text:testone in 1) [DefaultSimilarity], result of:\n 0.24276763 = fieldWeight in 1, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 0.7768564 = idf(docFreq=4, maxDocs=4)\n0.3125 = fieldNorm(doc=1)\n", "3": "\n0.24276763 = (MATCH) weight(text:testone in 2) [DefaultSimilarity], result of:\n 0.24276763 = fieldWeight in 2, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 0.7768564 = idf(docFreq=4, maxDocs=4)\n0.3125 = fieldNorm(doc=2)\n" }, No payload seems to be considered in score calculation - do i need to use a special query handler? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4110999.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Simple payloads example not working
Thanks iorixxx, Actually I've just tried it and I hit a small wall, the tutorial looks not to be up to date with the codebase. When implementing my custom similarity class i should be using PayloadHelper, but following happens: in PayloadHelper: public static final float decodeFloat(byte [] bytes, int offset) in DefaultSimilarity: public float scorePayload(int doc, int start, int end, BytesRef payload) So it's BytesRef vs byte[]. How should i proceed in this scenario? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111040.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Simple payloads example not working
Thanks, that indeed fixed the problem. Now i've created a custom Similarity class and used it in schema.xml. Problem is now that for all docs the calculated payload score is the same: public class CustomSolrSimilarity extends DefaultSimilarity { @Override public float scorePayload(int doc, int start, int end, BytesRef payload) { if (payload != null) { Float pscore = PayloadHelper.decodeFloat(payload.bytes); System.out.println("payload is: " + payload.toString() + " with score: " + Float.toString(pscore)); return pscore; } return 1.0f; } } Output log: payload is: [41 26 66 66] with score: 10.4 payload is: [41 f0 0 0] with score: 10.4 payload is: [42 4a cc cd] with score: 10.4 payload is: [42 c6 0 0] with score: 10.4 payload is: [41 26 66 66] with score: 10.4 payload is: [41 f0 0 0] with score: 10.4 payload is: [42 4a cc cd] with score: 10.4 payload is: [42 c6 0 0] with score: 10.4 Any idea why is it always the same ? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111045.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Simple payloads example not working
Thanks Eric, I did create a custom query parser, which seems to work just fine. My only problem now is the one above, with all docs having the same score for some reason. See below the query parserL import org.apache.commons.lang.StringUtils; import org.apache.lucene.index.Term; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.BooleanClause.Occur; import org.apache.lucene.search.payloads.AveragePayloadFunction; import org.apache.lucene.search.payloads.PayloadTermQuery; import org.apache.solr.common.params.SolrParams; import org.apache.solr.common.util.NamedList; import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.search.QParser; import org.apache.solr.search.QParserPlugin; import org.apache.solr.search.SyntaxError; /** * Parser plugin to parse payload queries. */ public class PayloadQParserPlugin extends QParserPlugin { @Override public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { return new PayloadQParser(qstr, localParams, params, req); } @Override public void init(NamedList args) { } } class PayloadQParser extends QParser { public PayloadQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { super(qstr, localParams, params, req); } @Override public Query parse() throws SyntaxError { BooleanQuery q = new BooleanQuery(); String[] nvps = StringUtils.split(qstr, " "); for (int i = 0; i < nvps.length; i++) { String[] nv = StringUtils.split(nvps[i], ":"); if (nv[0].startsWith("+")) { q.add(new PayloadTermQuery(new Term(nv[0].substring(1), nv[1]), new AveragePayloadFunction(), false), Occur.MUST); } else { q.add(new PayloadTermQuery(new Term(nv[0], nv[1]), new AveragePayloadFunction(), false), Occur.SHOULD); } } return q; } } - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111050.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Simple payloads example not working
Correction: I observed a pattern, the returned score is the same for all docs and equals with the payload of the term in the first doc: http://localhost:8983/solr/collection1/pds-search?q=payloads:testone&wt=json&indent=true&debugQuery=true ---> "explain":{ "1":"\n15.4 = (MATCH) btq(includeSpanScore=false), result of:\n 15.4 = AveragePayloadFunction.docScore()\n", "2":"\n15.4 = (MATCH) btq(includeSpanScore=false), result of:\n 15.4 = AveragePayloadFunction.docScore()\n", "3":"\n15.4 = (MATCH) btq(includeSpanScore=false), result of:\n 15.4 = AveragePayloadFunction.docScore()\n", "4":"\n15.4 = (MATCH) btq(includeSpanScore=false), result of:\n 15.4 = AveragePayloadFunction.docScore()\n"}, http://localhost:8983/solr/collection1/pds-search?q=payloads:testthree&wt=json&indent=true&debugQuery=true "explain":{ "1":"\n5.0 = (MATCH) btq(includeSpanScore=false), result of:\n 5.0 = AveragePayloadFunction.docScore()\n", "2":"\n5.0 = (MATCH) btq(includeSpanScore=false), result of:\n 5.0 = AveragePayloadFunction.docScore()\n", "3":"\n5.0 = (MATCH) btq(includeSpanScore=false), result of:\n 5.0 = AveragePayloadFunction.docScore()\n", "4":"\n5.0 = (MATCH) btq(includeSpanScore=false), result of:\n 5.0 = AveragePayloadFunction.docScore()\n"}, Any clue why is this happening? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111060.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Simple payloads example not working
Investigating, it looks that the payload.bytes property is where the problem is. payload.toString() outputs corrects values, but .bytes property seems to behave a little weird: public class CustomSimilarity extends DefaultSimilarity { @Override public float scorePayload(int doc, int start, int end, BytesRef payload) { if (payload != null) { Float pscore = PayloadHelper.decodeFloat(payload.bytes); System.out.println("payload : " + payload.toString() + ", payload bytes: " + payload.bytes.toString() + ", decoded value is " + pscore); return pscore; } return 1.0f; } } outputs on query: http://localhost:8983/solr/collection1/pds-search?q=payloads:testone&wt=json&indent=true&debugQuery=true payload : [41 26 66 66], payload bytes: [B@149c678, decoded value is 10.4 payload : [41 f0 0 0], payload bytes: [B@149c678, decoded value is 10.4 payload : [42 4a cc cd], payload bytes: [B@149c678, decoded value is 10.4 payload : [42 c6 0 0], payload bytes: [B@149c678, decoded value is 10.4 payload : [41 26 66 66], payload bytes: [B@850fb7, decoded value is 10.4 payload : [41 f0 0 0], payload bytes: [B@1cad357, decoded value is 10.4 payload : [42 4a cc cd], payload bytes: [B@f922cf, decoded value is 10.4 payload : [42 c6 0 0], payload bytes: [B@5c4dc4, decoded value is 10.4 Something doesn't seem right here. Any idea why this behaviour? Is anyone using payloads using Solr 4.6.0 ? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111214.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Simple payloads example not working
Yes, it's float: The scenario is simple to replicate - default solr-4.6.0 example, with a custom Similarity class (the one above) and a custom queryparser (again, listed above). I posted the docs in XML format (docs also listed above) using exampledocs/post.sh utility. Indeed it looks weird, and can't explain it. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111219.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Simple payloads example not working
Hi Markus, Do you have any example/tutorials of your payloads in custom filter implementation ? I really want to get payloads working, in any way. Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111244.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Simple payloads example not working
Hi Ahmet, Yes, I did, also tried various scenarios with the same outcome. I used the stock example, with minimum customization ( custom similarity and query parser ). - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-payloads-example-not-working-tp4110998p4111324.html Sent from the Solr - User mailing list archive at Nabble.com.
Sorting docs by Hamming distance
Hi, Did anybody try to embed into Solr sorting based on the Hamming distance on a certain field. http://en.wikipedia.org/wiki/Hamming_distance E.g. having a document doc1 with a field doc_hash:"12345678" and doc2 with doc_hash:"12345699". When searching for doc_hash:"123456780" the sort order should be -> doc1,doc2. What would be the best way to achieve this kind of behaviour? Writing a plugin or maybe a custom function query ? Thanks! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-docs-by-Hamming-distance-tp4157600.html Sent from the Solr - User mailing list archive at Nabble.com.