Setting routerField/shardKey on specific collection?
Hi, I'm using Solr 4.6 and trying to specify a router.field (shard key) on a specific collection so that all documents with the same value in the specified field end up in the same collection. However, I can't find an example of how to do this via the solr.xml? I see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there is a mention of a routeField property. Should the solr.xml contain the following? routerField="consolidationGroupId" /> Any help would be greatly appreciate? I've been yak shaving all afternoon reading various Jira tickets and wikis trying to get this to work :-) Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk>
Re: Setting routerField/shardKey on specific collection?
Many thanks Timothy, I tried this today but ran into issues getting the new collection to persist (so that I could search for the parameter). It's good to have this confirmed as a viable approach though, and I'll persevere with this tomorrow. If I figure it out I'll reply with the details. Thanks again, Daniel On 04/12/2013 17:41, Tim Potter wrote: Hi Daniel, I'm not sure how this would apply to an existing collection (in your case collection1). Try using the collections API to create a new collection and pass the router.field parameter. Grep'ing over the code, the parameter is named: router.field (not routerField or routeField). Cheers, Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com ________ From: Daniel Bryant Sent: Wednesday, December 04, 2013 9:40 AM To: solr-user@lucene.apache.org Subject: Setting routerField/shardKey on specific collection? Hi, I'm using Solr 4.6 and trying to specify a router.field (shard key) on a specific collection so that all documents with the same value in the specified field end up in the same collection. However, I can't find an example of how to do this via the solr.xml? I see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there is a mention of a routeField property. Should the solr.xml contain the following? Any help would be greatly appreciate? I've been yak shaving all afternoon reading various Jira tickets and wikis trying to get this to work :-) Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk> -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk>
Searching for document by id in a sharded environment
Hi, I'm in the process of migrating an application that queries Solr to use a new sharded SolrCloud, and as part of this I'm adding the shard key to the document id when we index documents (as we're using grouping and we need to ensure that grouped documents end up on the same shard) e.g. 156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475 I'm having a problem with my application when searching by id with SolrJ CloudSolrServer - the exclamation point is misinterpreted as a boolean negation, and the matching document is not returned in the search results. I just wanted to check if the only way to make this work would be to escape the exclamation point (i.e. prefix with a slash, or enclose the id within quotes). We're keen to avoid this, as this will require lots of modifications throughout the code on a series of applications that interact with Solr. If anyone has any better suggestions on how to achieve this it would be very much appreciated! Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk>
Re: Searching for document by id in a sharded environment
Thanks for your replies Ahmet and Joel! We have now determined that the exclamation point wasn't the issues, and our query actually had too many boolean expressions in (more than the default 1024). Apologies for any confusion this may have caused - the issue went around my team like the childhood game of telephone, and the initial problem of the "too many boolean expressions" was thought to have appeared due to the "!", when in fact some other code had been committed which didn't batch large delete by id queries. This caused us to start looking for a solution to a problem that didn't exist :-) Thanks again for the quick response! Best wishes, Daniel On 09/12/2013 12:21, Ahmet Arslan wrote: Hi Daniel, TermQueryParser comes handy when you don't want to escape. q = {!term f=id}156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475 On Monday, December 9, 2013 2:14 PM, Daniel Bryant wrote: Hi, I'm in the process of migrating an application that queries Solr to use a new sharded SolrCloud, and as part of this I'm adding the shard key to the document id when we index documents (as we're using grouping and we need to ensure that grouped documents end up on the same shard) e.g. 156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475 I'm having a problem with my application when searching by id with SolrJ CloudSolrServer - the exclamation point is misinterpreted as a boolean negation, and the matching document is not returned in the search results. I just wanted to check if the only way to make this work would be to escape the exclamation point (i.e. prefix with a slash, or enclose the id within quotes). We're keen to avoid this, as this will require lots of modifications throughout the code on a series of applications that interact with Solr. If anyone has any better suggestions on how to achieve this it would be very much appreciated! Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk>
Query results in "no servers hosting shard: " with single sharded SolrCloud (with embedded ZK)
Hi, I'm getting the error 'msg: "no servers hosting shard: " ' when trying to search on a freshly created SolrCloud instance with an embedded ZooKeeper and a single shard? My solr.xml is as follows: shard="1"/> And all the directories referenced in the solr.xml are present under the solr directory I'm starting my SolrCloud with the following command: java -Dbootstrap_confdir=./solr/offerings/conf -Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar Everything initialises fine, and I can see all of the schemas correctly via the admin console, but as soon as I execute a query I get the above error? I'm assuming I'm not telling Solr correctly that it is the only shard, but I am passing that as a JVM argument at startup? Any thoughts would be most appreciated! Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk>
Re: Query results in "no servers hosting shard: " with single sharded SolrCloud (with embedded ZK)
Ah! That's solved it - there were multiple missing (inactive) shards shown in the Cloud panel. This is bizarre (as I'm specifying numShards=1 on the JVM options), but deleting my zoo_data folder under the solr directory, and then restarting SolrCloud resulted in queries returning correct values. Many thanks for the very helpful pointer Furkan! This has no doubt saved me many hours of continued pondering and frustration. Best wishes, Daniel On 10/12/2013 21:06, Furkan KAMACI wrote: Hi Daniel; Could you open the Solr admin page and check it? If there is no error message click on the Cloud link at left panel check the status of your node? Thanks; Furkan KAMACI 10 Aralık 2013 Salı tarihinde Daniel Bryant adlı kullanıcı şöyle yazdı: Hi, I'm getting the error 'msg: "no servers hosting shard: " ' when trying to search on a freshly created SolrCloud instance with an embedded ZooKeeper and a single shard? My solr.xml is as follows: shard="1"/> And all the directories referenced in the solr.xml are present under the solr directory I'm starting my SolrCloud with the following command: java -Dbootstrap_confdir=./solr/offerings/conf -Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar Everything initialises fine, and I can see all of the schemas correctly via the admin console, but as soon as I execute a query I get the above error? I'm assuming I'm not telling Solr correctly that it is the only shard, but I am passing that as a JVM argument at startup? Any thoughts would be most appreciated! Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk < http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk> -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk>
Best way to copy data from SolrCloud to standalone Solr?
Hi all, I have a production SolrCloud server which has multiple sharded indexes, and I need to copy all of the indexes to a (non-cloud) Solr server within our QA environment. Can I ask for advice on the best way to do this please? I've searched the web and found solr2solr (https://github.com/dbashford/solr2solr), but the author states that this is best for small indexes, and ours are rather large at ~20Gb each. I've also looked at replication, but can't find a definite reference on how this should be done between SolrCloud and Solr? Any guidance is very much appreciated. Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk>
Re: Best way to copy data from SolrCloud to standalone Solr?
Hi Shawn, Michael, Many thanks for your responses - we're going to try the replication/backup command, as we're thinking this is a 'two bird with one stone' approach which will not only allow us to copy the indexes, but also help with backups in SolrCloud as well. Thanks again to you both! Best wishes, Daniel On 17/02/2014 20:25, Michael Della Bitta wrote: I do know for certain that the backup command on a cloud core still works. We have a script like this running on a cron to snapshot indexes: curl -s ' http://localhost:8080/solr/#{core}/replication?command=backup&numberToKeep=4&location=/tmp ' (not really using /tmp for this, parameters changed to protect the guilty) The admin handler for replication doesn't seem to be there, but the actual API seems to work normally. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. "The Science of Influence Marketing" 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey wrote: On 2/17/2014 8:32 AM, Daniel Bryant wrote: I have a production SolrCloud server which has multiple sharded indexes, and I need to copy all of the indexes to a (non-cloud) Solr server within our QA environment. Can I ask for advice on the best way to do this please? I've searched the web and found solr2solr (https://github.com/dbashford/solr2solr), but the author states that this is best for small indexes, and ours are rather large at ~20Gb each. I've also looked at replication, but can't find a definite reference on how this should be done between SolrCloud and Solr? Any guidance is very much appreciated. If the master index isn't changing at the time of the copy, and you're on a non-Windows platform, you should be able to copy the index directory directly. On a Windows platform, whether you can copy the index while Solr is using it would depend on how Solr/Lucene opens the files. A typical Windows file open will prevent anything else from opening them, and I do not know whether Lucene is smarter than that. SolrCloud requires the replication handler to be enabled on all configs, but during normal operation, it does not actually use replication. This is a confusing thing for some users. I *think* you can configure the replication handler on slave cores with a non-cloud config that point at the master cores, and it should replicate the main Lucene index, but not the config files. I have no idea whether things will work right if you configure other master options like replicateAfter and config files, and I also don't know if those options might cause problems for SolrCloud itself. Those options shouldn't be necessary for just getting the data into a dev environment, though. Thanks, Shawn -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk>
Advice for performance issues with group.facet=true
Hi everyone, I'm seeing very bad performance when grouping (field collapsing) using group.facet=true with a large result set. - I have an index with 2 million documents, and I query with five facet fields (each with 30+ groups) - If I set group.facet=false the query can take 2000ms on first run, but no more than 250ms on subsequent execution - If I set group.facet=true it takes on average 18000ms on the first run, and the same time on all subsequent runs (suggesting to me that a cache is not being used) I've checked the Solr Jira and several others are experiencing the same issue: https://issues.apache.org/jira/browse/SOLR-4763 Could anyone offer any advice or suggestions please? This is becoming a blocking issue for us, and I'm very curious if this will be fixed in the near future? Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk>
Re: Advice for performance issues with group.facet=true
Many thanks for your response Otis - I had feared as much, but it's good to have it confirmed. Best wishes, Daniel On 03/07/2013 17:05, Otis Gospodnetic wrote: Hi, I think nobody in the community is focused on field collapsing/grouping, so I suspect there won't be a fix until somebody gets a strong-enough itch or business requires it so much that it decides it pays to invests in the contribution. Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Wed, Jul 3, 2013 at 5:54 AM, Daniel Bryant wrote: Hi everyone, I'm seeing very bad performance when grouping (field collapsing) using group.facet=true with a large result set. - I have an index with 2 million documents, and I query with five facet fields (each with 30+ groups) - If I set group.facet=false the query can take 2000ms on first run, but no more than 250ms on subsequent execution - If I set group.facet=true it takes on average 18000ms on the first run, and the same time on all subsequent runs (suggesting to me that a cache is not being used) I've checked the Solr Jira and several others are experiencing the same issue: https://issues.apache.org/jira/browse/SOLR-4763 Could anyone offer any advice or suggestions please? This is becoming a blocking issue for us, and I'm very curious if this will be fixed in the near future? Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk> -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk <http://www.tai-dev.co.uk/>* daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk> | +44 (0) 7799406399 | Twitter: @taidevcouk <https://twitter.com/taidevcouk>