Re: Rerank queries and grouping
Hi Joel, Thanks for your reply, Yes, I considered the Collapse and Expand [1] , the problem is that I'll deploy it on a multishard instance and I want to retrieve the top N groups. I thing that collapse and expand could have two downsides: i) it won't guarantee the retrieval of N groups, I could mitigate retrieving a larger number of documents, but I would prefer to avoid. ii) It won't guarantee to have the best document per group: a shard A could have high scoring documents in a group G1, and then have a top scoring document D for the group G2, but since each shard returns only its top documents, potentially I could lose D as head of the group G2, if another shard returns documents in G2 with a lower score. Cheers, Diego [1] https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results On Thu, Jul 16, 2015 at 2:01 AM, Joel Bernstein wrote: > As you've seen RankQueries won't currently have any effect on Grouping > queries. > > A RankQuery can be combined with Collapse and Expand though. You may want > to review Collapse and Expand and see if it meets your use case. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, Jul 15, 2015 at 2:36 PM, Diego Ceccarelli < > diego.ceccare...@gmail.com> wrote: > > > Hi Everyone, > > > > I need to use a RankQuery within a grouping [1]. > > I did some experiments with RerankQuery [2] and solr 4.10.2 and it seems > > that > > if you group on a field, the reranking query is completely ignored > > (on the cloud, and on a single instance). > > I would expect to see the results in each group reranked using the > > RerankQuery. > > > > I had a look at the grouping code and documentation and, > > if I correctly understood, the grouping works in two steps: > > > > 1) first the top groups are retrieved > > 2) top documents for each group in the top groups are retrieved. > > > > I thought that the collector generated by a RankQuery could be injected > > in 2), i.e., for each group set a rerank collector... but I'm not sure if > > this solution > > is feasable since the collectors are set in Lucene > > (AbstractSecondPassGroupingCollector) > > and a RankQuery is defined in Solr... > > > > Any suggestion? > > > > Thanks, > > Diego > > > > [1] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > > [2] https://cwiki.apache.org/confluence/display/solr/Query+Re-Ranking > > >
Issue with using createNodeSet in Solr Cloud
Hello There, I am trying to use the createNodeSet parameter when creating a new collection but I'm getting an error when doing so. More specifically, I have four Solr instances running locally in separate JVMs (127.0.0.1:8983, 127.0.0.1:8984, 127.0.0.1:8985, 127.0.0.1:8986) and a standalone Zookeeper instance which all Solr instances point to. The four Solr instances have no collections added to them and are all up and running (I can access the admin page in all of them). Now, I want to create a collections in only two of these four instances ( 127.0.0.1:8983, 127.0.0.1:8984) but when I hit one instance with the following URL: http://localhost:8983/solr/admin/collections?action=CREATE&name=collection_A&numShards=1&replicationFactor=2&maxShardsPerNode=1&createNodeSet=127.0.0.1:8983_solr,127.0.0.1:8984_solr&collection.configName=collection_A I am getting the following response: 400 3503 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Cannot create collection collection_A. No live Solr-instances among Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984 _solr Cannot create collection collection_A. No live Solr-instances among Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984 _solr 400 Cannot create collection collection_A. No live Solr-instances among Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984 _solr 400 The instances are definitely up and running (at least the admin console can be accessed as mentioned) and if I remove the createNodeSet parameter the collection is created as expected. Am I missing something obvious or is this a bug? The exact Solr version I'm using is 4.9.1. Any pointers would be much appreciated. Thanks, Savvas
Using Facet API to get histograms for two keywords?
Hi lovely Solr masters! I am using Facet API to get histograms for two keywords. For each keyword, the histogram calculates the number of documents with the keyword every hour. An example of list of documents : { q_s : "keyword1", when_dt : "2015-05-27T15:13:00.000Z" } { q_s : "keyword2", when_dt : "2015-05-27T16:17:00.000Z" } { q_s : "keyword2", when_dt : "2015-05-27T16:18:00.000Z" } { q_s : "keyword1", when_dt : "2015-05-27T16:20:00.000Z" } An example of output historgram using facet: "keyword1" : { "2015-05-27T15:00:00.000Z" : "1", "2015-05-27T16:00:00.000Z" : "1" } "keyword2" : { "2015-05-27T15:00:00.000Z" : "0", "2015-05-27T16:00:00.000Z" : "2" } My question is what is best practice to run the query once for N keywords using Facet JSON API ? The following command using the query using Json request API successfully responds with the above expected results, but the facet.date part of the Json request is hard to read for programmers. curl -d @- http://localhost:8983/solr/queries/select { "params": { "wt": "json", "indent": true, "_": 1436772757584, "q": "*:*", "rows": 0, "fq": [ "{!tag=fq0}q_s:keyword1", "{!tag=fq1}q_s:keyword2", "when_dt:[2015-05-27T15:00:00.000Z TO 2015-05-28T10:19:04.000Z]" ], "facet": true, "facet.date": [ "{!ex=fq1 key=keyword1 facet.date.start=2015-05-27T15:00:00.000Z facet.date.end=2015-05-28T10:19:04.000Z facet.date.gap=+1HOURS facet.date.sort=when_dt}when_dt", "{!ex=fq0 key=keyword2 facet.date.start=2015-05-27T15:00:00.000Z facet.date.end=2015-05-28T10:19:04.000Z facet.date.gap=+1HOURS facet.date.sort=when_dt}when_dt" ] } } To make the complicated part simpler, I decided to use Facet API as shown below, but it does not return any facet results. curl -d @- http://localhost:8983/solr/queries/select { "params": { "wt": "json", "indent": true, "_": 1436772757584, "q": "*:*", "rows": 0, "fq": [ "{!tag=fq0}q_s:keyword1", "{!tag=fq1}q_s:keyword2", "when_dt:[2015-05-27T15:00:00.000Z TO 2015-05-28T10:19:04.000Z]" ] }, "facet": { "keyword1": { "range": { "excludeTags": ["fq1"], "field": "when_dt", "start": "2015-05-27T15:00:00.000Z", "end": "2015-05-28T10:19:04.000Z", "gap": "+1HOURS", "sort": "when_dt" } }, "keyword2": { "range": { "excludeTags": ["fq0"], "field": "when_dt", "start": "2015-05-27T15:00:00.000Z", "end": "2015-05-28T10:19:04.000Z", "gap": "+1HOURS", "sort": "when_dt" } } } } Response : > { > "responseHeader":{ >"status":0, >"QTime":2, >"params":{ > "json":"{ \"params\": {\"wt\": \"json\",\"indent\": true, > \"_\": 1436772757584,\"q\": \"*:*\",\"rows\": 0,\"fq\": [ > \"{!tag=fq0}q_s:keyworkd1\", \"{!tag=fq1}q_s:keyword2\", > \"when_dt:[2015-05-27T15:00:00.000Z TO 2015-05-28T10:19:04.000Z]\" ] > }, \"facet\": {\"keyword1\": { \"range\": { > \"excludeTags\": [\"fq1\"],\"field\": \"when_dt\", > \"start\": \"2015-05-27T15:00:00.000Z\", \"end\": > \"2015-05-28T10:19:04.000Z\", \"gap\": \"+1HOURS\", > \"sort\": \"when_dt\" } },\"keyword2\": { \"range\": { > \"excludeTags\": [\"fq0\"],\"field\": \"when_dt\", > \"start\": \"2015-05-27T15:00:00.000Z\", \"end\": > \"2015-05-28T10:19:04.000Z\", \"gap\": \"+1HOURS\", > \"sort\": \"when_dt\" } } }}"}}, > "response":{"numFound":0,"start":0,"docs":[] > }, > "facets":{ >"count":0}} Any idea about this? Thanks in advance. Solr rocks! - Kangmo
Programmatically find out if node is overseer
Hello - i need to run a thread on a single instance of a cloud so need to find out if current node is the overseer. I know we can already programmatically find out if this replica is the leader of a shard via isLeader(). I have looked everywhere but i cannot find an isOverseer. I did find the election stuff but i am unsure if that is what i need to use. Any thoughts? Thanks! Markus
Setup cloud collection
Hi, I'm new to solr! So downloaded version 5.2 and modified the solr file so it allows me to create a 5 node cluster: > 5 shards and replication factor 3 < Now I see that one node is marked as leader for 3 shards. So my question is, how can 1 node serve requests for 3 shards, wouldn't that be uneven distribution of load? Regards
Multiple boost queries on a specific field
Hello, I'm trying to use the boost queries for the 1st time and I need some help. Let's assume my documents have a /provider /field, which is populated by a string, i.e. A, B, C, D, E. I'd like to assign weight to providers. A is /^2.0/, B is /^1.5/ and the others are 1.0. So, if I run the following query : /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0/ My first results have provider A. Let's try another one : ?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:B^1.5 My first results have provider B. Good! But that's not exactly what I'm looking for (otherwise I'd just made a filterQuery). What I want is a /multiple /boost query. So I tried : /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:(A^2.0 B^1.5)/ Then my first results have provider B. It's not logical. I tried another syntax : /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0&bq=provider:B^1.5/ But nothing changes. Can you help me ? My 2nd problem is that I would like to give some weight to the newer documents, but in a range of a month. That means, if a document named B1 with provider B (weighs 1.5) is newer than a document named A1 with provider A (weighs 2.0), I want that B1 gets on top of A1 /only /if the difference between their dates are /at least 1 month/. Do you know how to do this ? Since my boost logic depends on the user navigation, I have to realize this only at query-time. Thanks for your help, Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-boost-queries-on-a-specific-field-tp4217678.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setup cloud collection
If you’ve set numShards to 5, then your indexes are split evenly across all 5 shards and they should all be considered leaders in charge of updating the replicas with new information. Could it be the case that 1 of your shards has 3 replicas and is the leader for that specific shard? What specifically is indicating that one node is marked as leader for 3 shards? Thanks, Esther Quansah > On Jul 16, 2015, at 7:51 AM, SolrUser2015 wrote: > > Hi, I'm new to solr! > > So downloaded version 5.2 and modified the solr file so it allows me to > create a 5 node cluster: > >> 5 shards and replication factor 3 < > > Now I see that one node is marked as leader for 3 shards. > > So my question is, how can 1 node serve requests for 3 shards, wouldn't that > be uneven distribution of load? > > Regards
Re: Setup cloud collection
I'm looking at the cloud graph in the admin UI. The black dots with green indicates same node as leader for three shards out of five. Regards > On 16 jul 2015, at 14:31, Esther-Melaine Quansah > wrote: > > If you’ve set numShards to 5, then your indexes are split evenly across all 5 > shards and they should all be considered leaders in charge of updating the > replicas with new information. Could it be the case that 1 of your shards has > 3 replicas and is the leader for that specific shard? What specifically is > indicating that one node is marked as leader for 3 shards? > > Thanks, > > Esther Quansah > >> On Jul 16, 2015, at 7:51 AM, SolrUser2015 wrote: >> >> Hi, I'm new to solr! >> >> So downloaded version 5.2 and modified the solr file so it allows me to >> create a 5 node cluster: >> >>> 5 shards and replication factor 3 < >> >> Now I see that one node is marked as leader for 3 shards. >> >> So my question is, how can 1 node serve requests for 3 shards, wouldn't that >> be uneven distribution of load? >> >> Regards >
Re: Setup cloud collection
On 7/16/2015 5:51 AM, SolrUser2015 wrote: > Hi, I'm new to solr! > > So downloaded version 5.2 and modified the solr file so it allows me to > create a 5 node cluster: > >> 5 shards and replication factor 3 < > > Now I see that one node is marked as leader for 3 shards. > > So my question is, how can 1 node serve requests for 3 shards, wouldn't that > be uneven distribution of load? SolrCloud will distribute individual queries to different replicas, so over time the entire cloud will be used. The leader role shouldn't affect queries, that role is mostly there for indexing and fault handling. If you are really concerned about this, you can assign preferred leaders and then ask Solr to reshuffle them. I have never used this functionality. Here's the documentation on it: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders Thanks, Shawn
SolrCloud 5.2.1 - collection creation error
I'm installing SolrCloud 5.2.1 on 4 Ubuntu 14.04 machines with 3 external zookeepers. I've installed the solr machines using Ansible following the "Taking Solr to Production" steps. 1. Download 5.2.1 2. Extract installation script 3. Run installation script Then I stop solr and make my configuration changes to the solr.in.sh file (adding zookeepers) and log4j.properties (recommended changes). Restart solr and everything looks good. The problem I have is that I can't create a collection. I create the collection folder in /var/solr/data and tried both the bin script and API but get the error below. I've tried 5.2.0 also and both Java 7 and 8 with the same result. 50047java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse; local class incompatible: stream classdesc serialVersionUID = 3123208377723774018, local class serialVersionUID = 3945300637328478755org.apache.solr.common.SolrException: java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse; local class incompatible: stream classdesc serialVersionUID = 3123208377723774018, local class serialVersionUID = 3945300637328478755 at org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:62) at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:228) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:168) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:646) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:417) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse; local class incompatible: stream classdesc serialVersionUID = 3123208377723774018, local class serialVersionUID = 3945300637328478755 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:60) ... 27 more 500
Re: Setup cloud collection
Thanks Shawn, but don't want to build something in front of Solr cloud to help Solr assign leader role to distribute load of indexing. Instead of doing this manual step (rebalance leaders) maybe one host should not take the leader role of multiple shards for same collection if the number of live nodes are equal to number of shards. But assuming that when you say it will happen "over time", Maybe I'll continue indexing and see that leaders will be rebalanced soon. Regards > On 16 Jul 2015, at 14:57, Shawn Heisey wrote: > >> On 7/16/2015 5:51 AM, SolrUser2015 wrote: >> Hi, I'm new to solr! >> >> So downloaded version 5.2 and modified the solr file so it allows me to >> create a 5 node cluster: >> >>> 5 shards and replication factor 3 < >> >> Now I see that one node is marked as leader for 3 shards. >> >> So my question is, how can 1 node serve requests for 3 shards, wouldn't that >> be uneven distribution of load? > > SolrCloud will distribute individual queries to different replicas, so > over time the entire cloud will be used. The leader role shouldn't > affect queries, that role is mostly there for indexing and fault handling. > > If you are really concerned about this, you can assign preferred leaders > and then ask Solr to reshuffle them. I have never used this > functionality. Here's the documentation on it: > > https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders > > Thanks, > Shawn >
Re: Setup cloud collection
On 7/16/2015 7:47 AM, solr.user.1...@gmail.com wrote: > Thanks Shawn, but don't want to build something in front of Solr cloud to > help Solr assign leader role to distribute load of indexing. > > Instead of doing this manual step (rebalance leaders) maybe one host should > not take the leader role of multiple shards for same collection if the number > of live nodes are equal to number of shards. > > But assuming that when you say it will happen "over time", Maybe I'll > continue indexing and see that leaders will be rebalanced soon. Unless you have a fairly major event (like Solr restarting or an operation taking longer than zkClientTimeout) your leaders will never change. It's a semi-permanent role. When a qualifying event happens, SolrCloud does an election process to determine the leader, but elections do not happen unless you force them with a REBALANCELEADERS action or one of several errors occurs. You don't have to build anything in front of Solr. You simply have to assign a preferred leader for each shard, an action that can be done with an HTTP call in a browser. I don't think we have anything in the admin UI to assign preferred leaders ... I will look into it and open an issue if necessary. The thing that I'm saying will happen over time is that all replicas will be used for queries. If you send a thousand queries, you'll find that they will be divided fairly evenly among all replicas. The fact that you have one node as leader for three of your shards is not very much of a big deal, but if you really want to change it, you can do so with the preferred leader feature. Thanks, Shawn
Re: Setup cloud collection
Thank you, very good explanation. Regards > On 16 Jul 2015, at 17:12, Shawn Heisey wrote: > >> On 7/16/2015 7:47 AM, solr.user.1...@gmail.com wrote: >> Thanks Shawn, but don't want to build something in front of Solr cloud to >> help Solr assign leader role to distribute load of indexing. >> >> Instead of doing this manual step (rebalance leaders) maybe one host should >> not take the leader role of multiple shards for same collection if the >> number of live nodes are equal to number of shards. >> >> But assuming that when you say it will happen "over time", Maybe I'll >> continue indexing and see that leaders will be rebalanced soon. > > Unless you have a fairly major event (like Solr restarting or an > operation taking longer than zkClientTimeout) your leaders will never > change. It's a semi-permanent role. When a qualifying event happens, > SolrCloud does an election process to determine the leader, but > elections do not happen unless you force them with a REBALANCELEADERS > action or one of several errors occurs. > > You don't have to build anything in front of Solr. You simply have to > assign a preferred leader for each shard, an action that can be done > with an HTTP call in a browser. I don't think we have anything in the > admin UI to assign preferred leaders ... I will look into it and open an > issue if necessary. > > The thing that I'm saying will happen over time is that all replicas > will be used for queries. If you send a thousand queries, you'll find > that they will be divided fairly evenly among all replicas. The fact > that you have one node as leader for three of your shards is not very > much of a big deal, but if you really want to change it, you can do so > with the preferred leader feature. > > Thanks, > Shawn >
What does replicationFactor really do?
Hi, In 5.1, we are creating a collection using the Collections API with an initial replicationFactor of X. This value is then stored in the state.json file for that collection. If I try to issue ADDREPLICA on this cluster, it throws an error saying that there are no live nodes for additional replicas. If I connect a new solr node to zookeeper and issue an ADDREPLICA call, the replica is created and no errors are thrown, but replicationFactor remains at X in the state.json file. Why? What does replicationFactor really mean? It seems like it's being honored in some cases and ignored in others. Thanks for any help you can provide. Cheers, Jim
Re: What does replicationFactor really do?
On 7/16/2015 10:46 AM, Jim.Musil wrote: > In 5.1, we are creating a collection using the Collections API with an > initial replicationFactor of X. This value is then stored in the state.json > file for that collection. > > If I try to issue ADDREPLICA on this cluster, it throws an error saying that > there are no live nodes for additional replicas. > > If I connect a new solr node to zookeeper and issue an ADDREPLICA call, the > replica is created and no errors are thrown, but replicationFactor remains at > X in the state.json file. > > Why? What does replicationFactor really mean? It seems like it's being > honored in some cases and ignored in others. I believe what I'm saying below is correct. Hopefully someone with more knowledge will speak up if I'm wrong. If you're not using a shared filesystem (which I think right now is only HDFS), then the only time replicationFactor is used is at collection creation time. It won't affect anything that happens later. If you ARE using HDFS, then there is a feature called autoAddReplicas which will detect when your replica count is below replicationFactor and automatically add more replicas until you're back in compliance. I know almost nothing about this feature. Here is the issue where it was added and the page in the reference guide where the feature is mentioned: https://issues.apache.org/jira/browse/SOLR-5656 https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS Thanks, Shawn
serious JSON Facet bug
To anyone using the JSON Facet API in released Solr versions: I discovered a serious memory leak while doing performance benchmarks (see http://yonik.com/facet_performance/ for some of the early results). Assuming you're in the evaluation / development phase of your project, I'd recommend using a recent developer snapshot for now: https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/ The fix (and performance improvements) will also be in the next Solr release (5.3) of course. -Yonik
Re: Setup cloud collection
Piling on to Shawn's comments. Leadership is a very misunderstood role when people start using SolrCloud, and it often gets conflated with the old "master" role in master/slave. There is, indeed, a small additional bit of processing that goes on on the leader node that's not done on replicas. But the REBALANCELEADER code was put in place to handle situations where 100+ leaders happened to be on the _same_ node. It took many tens of leaders being on a node for the additional work imposed by being a leader to be noticed in a very demanding environment. Indexing is done _both_ on the leader and the replicas, so the workload for indexing isn't substantially different. And, as Shawn says querying is done on all replicas by a software load balancer, although you can reasonably put a HW load balancer in front of the whole thing too. So by and large you can completely ignore it whe leaders that aren't evenly distributed. The additional load isn't worth the headache of trying to control it. And it will change as you bounce Solr servers, leadership is assigned to the node that contains the first replica of a shard to come up. Best, Erick On Thu, Jul 16, 2015 at 8:23 AM, wrote: > Thank you, very good explanation. > > Regards > >> On 16 Jul 2015, at 17:12, Shawn Heisey wrote: >> >>> On 7/16/2015 7:47 AM, solr.user.1...@gmail.com wrote: >>> Thanks Shawn, but don't want to build something in front of Solr cloud to >>> help Solr assign leader role to distribute load of indexing. >>> >>> Instead of doing this manual step (rebalance leaders) maybe one host should >>> not take the leader role of multiple shards for same collection if the >>> number of live nodes are equal to number of shards. >>> >>> But assuming that when you say it will happen "over time", Maybe I'll >>> continue indexing and see that leaders will be rebalanced soon. >> >> Unless you have a fairly major event (like Solr restarting or an >> operation taking longer than zkClientTimeout) your leaders will never >> change. It's a semi-permanent role. When a qualifying event happens, >> SolrCloud does an election process to determine the leader, but >> elections do not happen unless you force them with a REBALANCELEADERS >> action or one of several errors occurs. >> >> You don't have to build anything in front of Solr. You simply have to >> assign a preferred leader for each shard, an action that can be done >> with an HTTP call in a browser. I don't think we have anything in the >> admin UI to assign preferred leaders ... I will look into it and open an >> issue if necessary. >> >> The thing that I'm saying will happen over time is that all replicas >> will be used for queries. If you send a thousand queries, you'll find >> that they will be divided fairly evenly among all replicas. The fact >> that you have one node as leader for three of your shards is not very >> much of a big deal, but if you really want to change it, you can do so >> with the preferred leader feature. >> >> Thanks, >> Shawn >>
Re: SolrCloud 5.2.1 - collection creation error
It looks at a glance like you're in "Jar hell" and have one or more jar files from "somewhere else" in your classpath, possibly a jar file from an older Solr or one of the libraries. Best, Erick On Thu, Jul 16, 2015 at 6:17 AM, Aaron Gibbons wrote: > I'm installing SolrCloud 5.2.1 on 4 Ubuntu 14.04 machines with 3 external > zookeepers. I've installed the solr machines using Ansible following the > "Taking Solr to Production" steps. > >1. Download 5.2.1 >2. Extract installation script >3. Run installation script > > Then I stop solr and make my configuration changes to the solr.in.sh file > (adding zookeepers) and log4j.properties (recommended changes). Restart > solr and everything looks good. > > The problem I have is that I can't create a collection. I create the > collection folder in /var/solr/data and tried both the bin script and API > but get the error below. I've tried 5.2.0 also and both Java 7 and 8 with > the same result. > > 50047java.io.InvalidClassException: > org.apache.solr.client.solrj.SolrResponse; local class incompatible: stream > classdesc serialVersionUID = 3123208377723774018, local class > serialVersionUID = 3945300637328478755org.apache.solr.common.SolrException: > java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse; > local class incompatible: stream classdesc serialVersionUID = > 3123208377723774018, local class serialVersionUID = 3945300637328478755 at > org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:62) > at > org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:228) > at > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:168) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:646) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:417) at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) at > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) Caused by: > java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse; > local class incompatible: stream classdesc serialVersionUID = > 3123208377723774018, local class serialVersionUID = 3945300637328478755 at > java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at > java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at > org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:60) > ... 27 more 500
Re: Programmatically find out if node is overseer
look at the overseer election ephemeral node in ZK, the first one in line is the current overseer. Best, Erick On Thu, Jul 16, 2015 at 3:42 AM, Markus Jelsma wrote: > Hello - i need to run a thread on a single instance of a cloud so need to > find out if current node is the overseer. I know we can already > programmatically find out if this replica is the leader of a shard via > isLeader(). I have looked everywhere but i cannot find an isOverseer. I did > find the election stuff but i am unsure if that is what i need to use. > > Any thoughts? > > Thanks! > Markus
Re: Multiple boost queries on a specific field
Why are you using q.alt? That uses much different query parsing logic that I believe bypasses the dismax stuff. Just use q=*:*. *:* also short-circuits most of the scoring since there's nothing to score there, try with q= real terms. As to your second query, see https://wiki.apache.org/solr/FunctionQuery#Date_Boosting for a way to make more recent documents bubble to the top. It doesn't quite do what you're asking, but it might be "close enough". Best, Erick On Thu, Jul 16, 2015 at 4:55 AM, bengates wrote: > Hello, > > I'm trying to use the boost queries for the 1st time and I need some help. > Let's assume my documents have a /provider /field, which is populated by a > string, i.e. A, B, C, D, E. > > I'd like to assign weight to providers. A is /^2.0/, B is /^1.5/ and the > others are 1.0. > > So, if I run the following query : > /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0/ > My first results have provider A. > > Let's try another one : > > ?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:B^1.5 > My first results have provider B. Good! > > But that's not exactly what I'm looking for (otherwise I'd just made a > filterQuery). > What I want is a /multiple /boost query. So I tried : > /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:(A^2.0 B^1.5)/ > Then my first results have provider B. It's not logical. > > I tried another syntax : > /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0&bq=provider:B^1.5/ > But nothing changes. > > Can you help me ? > > My 2nd problem is that I would like to give some weight to the newer > documents, but in a range of a month. > That means, if a document named B1 with provider B (weighs 1.5) is newer > than a document named A1 with provider A (weighs 2.0), I want that B1 gets > on top of A1 /only /if the difference between their dates are /at least 1 > month/. Do you know how to do this ? > > Since my boost logic depends on the user navigation, I have to realize this > only at query-time. > > Thanks for your help, > Ben > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Multiple-boost-queries-on-a-specific-field-tp4217678.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue with using createNodeSet in Solr Cloud
There were a couple of cases where the "no live servers" was being returned when the error was something completely different. Does the Solr log show something more useful? And are you sure you have a configset named collection_A? 'cause this works (admittedly on 5.x) fine for me, and I'm quite sure there are bunches of automated tests that would be failing so I suspect it's just a misleading error being returned. Best, Erick On Thu, Jul 16, 2015 at 2:22 AM, Savvas Andreas Moysidis wrote: > Hello There, > > I am trying to use the createNodeSet parameter when creating a new > collection but I'm getting an error when doing so. > > More specifically, I have four Solr instances running locally in separate > JVMs (127.0.0.1:8983, 127.0.0.1:8984, 127.0.0.1:8985, 127.0.0.1:8986) and a > standalone Zookeeper instance which all Solr instances point to. The four > Solr instances have no collections added to them and are all up and running > (I can access the admin page in all of them). > > Now, I want to create a collections in only two of these four instances ( > 127.0.0.1:8983, 127.0.0.1:8984) but when I hit one instance with the > following URL: > > http://localhost:8983/solr/admin/collections?action=CREATE&name=collection_A&numShards=1&replicationFactor=2&maxShardsPerNode=1&createNodeSet=127.0.0.1:8983_solr,127.0.0.1:8984_solr&collection.configName=collection_A > > I am getting the following response: > > > > 400 > 3503 > > > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > Cannot create collection collection_A. No live Solr-instances among > Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984 > _solr > > > > Cannot create collection collection_A. No live Solr-instances among > Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984 > _solr > > 400 > > > > Cannot create collection collection_A. No live Solr-instances among > Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984 > _solr > > 400 > > > > > The instances are definitely up and running (at least the admin console can > be accessed as mentioned) and if I remove the createNodeSet parameter the > collection is created as expected. > > Am I missing something obvious or is this a bug? > > The exact Solr version I'm using is 4.9.1. > > Any pointers would be much appreciated. > > Thanks, > Savvas
Re: Programmatically find out if node is overseer
An easier way (IMO) and more 'official' is to use the CLUSTERSTATUS ( https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18) or OVERSEERSTATUS ( https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api17) API. The OVERSEERSTATUS returns a 'leader' item which says who is the overseer, at least as far as I understand. Not sure what is returned in case there are multiple nodes with the overseer role. The CLUSTERSTATUS returns an 'overseer' item with all nodes that have the overseer role assigned. I'm usually using that API to query for the status of my Solr cluster. Shai On Fri, Jul 17, 2015 at 3:55 AM, Erick Erickson wrote: > look at the overseer election ephemeral node in ZK, the first one in > line is the current overseer. > > Best, > Erick > > On Thu, Jul 16, 2015 at 3:42 AM, Markus Jelsma > wrote: > > Hello - i need to run a thread on a single instance of a cloud so need > to find out if current node is the overseer. I know we can already > programmatically find out if this replica is the leader of a shard via > isLeader(). I have looked everywhere but i cannot find an isOverseer. I did > find the election stuff but i am unsure if that is what i need to use. > > > > Any thoughts? > > > > Thanks! > > Markus >