Setting routerField/shardKey on specific collection?

2013-12-04 Thread Daniel Bryant

Hi,

I'm using Solr 4.6 and trying to specify a router.field (shard key) on a 
specific collection so that all documents with the same value in the 
specified field end up in the same collection.


However, I can't find an example of how to do this via the solr.xml? I 
see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there 
is a mention of a routeField property.


Should the solr.xml contain the following?


routerField="consolidationGroupId" />



Any help would be greatly appreciate? I've been yak shaving all 
afternoon reading various Jira tickets and wikis trying to get this to 
work :-)


Best wishes,

Daniel


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


Re: Setting routerField/shardKey on specific collection?

2013-12-04 Thread Daniel Bryant
Many thanks Timothy, I tried this today but ran into issues getting the 
new collection to persist (so that I could search for the parameter). 
It's good to have this confirmed as a viable approach though, and I'll 
persevere with this tomorrow.


If I figure it out I'll reply with the details.

Thanks again,

Daniel


On 04/12/2013 17:41, Tim Potter wrote:

Hi Daniel,

I'm not sure how this would apply to an existing collection (in your case 
collection1). Try using the collections API to create a new collection and pass 
the router.field parameter. Grep'ing over the code, the parameter is named: 
router.field (not routerField or routeField).

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com

________
From: Daniel Bryant 
Sent: Wednesday, December 04, 2013 9:40 AM
To: solr-user@lucene.apache.org
Subject: Setting routerField/shardKey on specific collection?

Hi,

I'm using Solr 4.6 and trying to specify a router.field (shard key) on a
specific collection so that all documents with the same value in the
specified field end up in the same collection.

However, I can't find an example of how to do this via the solr.xml? I
see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there
is a mention of a routeField property.

Should the solr.xml contain the following?


  


Any help would be greatly appreciate? I've been yak shaving all
afternoon reading various Jira tickets and wikis trying to get this to
work :-)

Best wishes,

Daniel


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


Searching for document by id in a sharded environment

2013-12-09 Thread Daniel Bryant

Hi,

I'm in the process of migrating an application that queries Solr to use 
a new sharded SolrCloud, and as part of this I'm adding the shard key to 
the document id when we index documents (as we're using grouping and we 
need to ensure that grouped documents end up on the same shard) e.g.


156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475

I'm having a problem with my application when searching by id with SolrJ 
CloudSolrServer - the exclamation point is misinterpreted as a boolean 
negation, and the matching document is not returned in the search results.


I just wanted to check if the only way to make this work would be to 
escape the exclamation point (i.e. prefix with a slash, or enclose the 
id within quotes). We're keen to avoid this, as this will require lots 
of modifications throughout the code on a series of applications that 
interact with Solr.


If anyone has any better suggestions on how to achieve this it would be 
very much appreciated!


Best wishes,

Daniel


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


Re: Searching for document by id in a sharded environment

2013-12-10 Thread Daniel Bryant

Thanks for your replies Ahmet and Joel!

We have now determined that the exclamation point wasn't the issues, and 
our query actually had too many boolean expressions in (more than the 
default 1024).


Apologies for any confusion this may have caused - the issue went around 
my team like the childhood game of telephone, and the initial problem of 
the "too many boolean expressions" was thought to have appeared due to 
the "!", when in fact some other code had been committed which didn't 
batch large delete by id queries. This caused us to start looking for a 
solution to a problem that didn't exist :-)


Thanks again for the quick response!

Best wishes,

Daniel



On 09/12/2013 12:21, Ahmet Arslan wrote:

Hi Daniel,

TermQueryParser comes handy when you don't want to escape.

q = {!term 
f=id}156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475




On Monday, December 9, 2013 2:14 PM, Daniel Bryant 
 wrote:
Hi,

I'm in the process of migrating an application that queries Solr to use
a new sharded SolrCloud, and as part of this I'm adding the shard key to
the document id when we index documents (as we're using grouping and we
need to ensure that grouped documents end up on the same shard) e.g.

156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475

I'm having a problem with my application when searching by id with SolrJ
CloudSolrServer - the exclamation point is misinterpreted as a boolean
negation, and the matching document is not returned in the search results.

I just wanted to check if the only way to make this work would be to
escape the exclamation point (i.e. prefix with a slash, or enclose the
id within quotes). We're keen to avoid this, as this will require lots
of modifications throughout the code on a series of applications that
interact with Solr.

If anyone has any better suggestions on how to achieve this it would be
very much appreciated!

Best wishes,

Daniel




--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


Query results in "no servers hosting shard: " with single sharded SolrCloud (with embedded ZK)

2013-12-10 Thread Daniel Bryant

Hi,

I'm getting the error 'msg: "no servers hosting shard: " ' when trying 
to search on a freshly created SolrCloud instance with an embedded 
ZooKeeper and a single shard?


My solr.xml is as follows:


  


shard="1"/>


  


And all the directories referenced in the solr.xml are present under the 
solr directory


I'm starting my SolrCloud with the following command:

java -Dbootstrap_confdir=./solr/offerings/conf 
-Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar


Everything initialises fine, and I can see all of the schemas correctly 
via the admin console, but as soon as I execute a query I get the above 
error? I'm assuming I'm not telling Solr correctly that it is the only 
shard, but I am passing that as a JVM argument at startup?


Any thoughts would be most appreciated!

Best wishes,

Daniel



--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


Re: Query results in "no servers hosting shard: " with single sharded SolrCloud (with embedded ZK)

2013-12-10 Thread Daniel Bryant
Ah! That's solved it - there were multiple missing (inactive) shards 
shown in the Cloud panel. This is bizarre (as I'm specifying numShards=1 
on the JVM options), but deleting my zoo_data folder under the solr 
directory, and then restarting SolrCloud resulted in queries returning 
correct values.


Many thanks for the very helpful pointer Furkan! This has no doubt saved 
me many hours of continued pondering and frustration.


Best wishes,

Daniel



On 10/12/2013 21:06, Furkan KAMACI wrote:

Hi Daniel;

Could you open the Solr admin page and check it? If there is no error
message click on the Cloud link at left panel check the status of your node?

Thanks;
Furkan KAMACI

10 Aralık 2013 Salı tarihinde Daniel Bryant 
adlı kullanıcı şöyle yazdı:

Hi,

I'm getting the error 'msg: "no servers hosting shard: " ' when trying to

search on a freshly created SolrCloud instance with an embedded ZooKeeper
and a single shard?

My solr.xml is as follows:


   
 
 
 
shard="1"/>

 
   


And all the directories referenced in the solr.xml are present under the

solr directory

I'm starting my SolrCloud with the following command:

java -Dbootstrap_confdir=./solr/offerings/conf

-Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar

Everything initialises fine, and I can see all of the schemas correctly

via the admin console, but as soon as I execute a query I get the above
error? I'm assuming I'm not telling Solr correctly that it is the only
shard, but I am passing that as a JVM argument at startup?

Any thoughts would be most appreciated!

Best wishes,

Daniel



--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk <

http://www.tai-dev.co.uk/>*

daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44

(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


Best way to copy data from SolrCloud to standalone Solr?

2014-02-17 Thread Daniel Bryant

Hi all,

I have a production SolrCloud server which has multiple sharded indexes, 
and I need to copy all of the indexes to a (non-cloud) Solr server 
within our QA environment.


Can I ask for advice on the best way to do this please?

I've searched the web and found solr2solr 
(https://github.com/dbashford/solr2solr), but the author states that 
this is best for small indexes, and ours are rather large at ~20Gb each. 
I've also looked at replication, but can't find a definite reference on 
how this should be done between SolrCloud and Solr?


Any guidance is very much appreciated.

Best wishes,

Daniel



--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


Re: Best way to copy data from SolrCloud to standalone Solr?

2014-02-18 Thread Daniel Bryant

Hi Shawn, Michael,

Many thanks for your responses - we're going to try the 
replication/backup command, as we're thinking this is a 'two bird with 
one stone' approach which will not only allow us to copy the indexes, 
but also help with backups in SolrCloud as well.


Thanks again to you both!

Best wishes,

Daniel



On 17/02/2014 20:25, Michael Della Bitta wrote:

I do know for certain that the backup command on a cloud core still works.
We have a script like this running on a cron to snapshot indexes:

curl -s '
http://localhost:8080/solr/#{core}/replication?command=backup&numberToKeep=4&location=/tmp
'

(not really using /tmp for this, parameters changed to protect the guilty)

The admin handler for replication doesn't seem to be there, but the actual
API seems to work normally.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey  wrote:


On 2/17/2014 8:32 AM, Daniel Bryant wrote:

I have a production SolrCloud server which has multiple sharded indexes,
and I need to copy all of the indexes to a (non-cloud) Solr server
within our QA environment.

Can I ask for advice on the best way to do this please?

I've searched the web and found solr2solr
(https://github.com/dbashford/solr2solr), but the author states that
this is best for small indexes, and ours are rather large at ~20Gb each.
I've also looked at replication, but can't find a definite reference on
how this should be done between SolrCloud and Solr?

Any guidance is very much appreciated.

If the master index isn't changing at the time of the copy, and you're
on a non-Windows platform, you should be able to copy the index
directory directly.  On a Windows platform, whether you can copy the
index while Solr is using it would depend on how Solr/Lucene opens the
files.  A typical Windows file open will prevent anything else from
opening them, and I do not know whether Lucene is smarter than that.

SolrCloud requires the replication handler to be enabled on all configs,
but during normal operation, it does not actually use replication.  This
is a confusing thing for some users.

I *think* you can configure the replication handler on slave cores with
a non-cloud config that point at the master cores, and it should
replicate the main Lucene index, but not the config files.  I have no
idea whether things will work right if you configure other master
options like replicateAfter and config files, and I also don't know if
those options might cause problems for SolrCloud itself.  Those options
shouldn't be necessary for just getting the data into a dev environment,
though.

Thanks,
Shawn




--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


Advice for performance issues with group.facet=true

2013-07-03 Thread Daniel Bryant

Hi everyone,

I'm seeing very bad performance when grouping (field collapsing) using 
group.facet=true with a large result set.


- I have an index with 2 million documents, and I query with five facet 
fields (each with 30+ groups)
- If I set group.facet=false the query can take 2000ms on first run, but 
no more than 250ms on subsequent execution
- If I set group.facet=true it takes on average 18000ms on the first 
run, and the same time on all subsequent runs (suggesting to me that a 
cache is not being used)


I've checked the Solr Jira and several others are experiencing the same 
issue:


https://issues.apache.org/jira/browse/SOLR-4763

Could anyone offer any advice or suggestions please? This is becoming a 
blocking issue for us, and I'm very curious if this will be fixed in the 
near future?


Best wishes,

Daniel

--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


Re: Advice for performance issues with group.facet=true

2013-07-04 Thread Daniel Bryant
Many thanks for your response Otis - I had feared as much, but it's good 
to have it confirmed.


Best wishes,

Daniel


On 03/07/2013 17:05, Otis Gospodnetic wrote:

Hi,

I think nobody in the community is focused on field
collapsing/grouping, so I suspect there won't be a fix until somebody
gets a strong-enough itch or business requires it so much that it
decides it pays to invests in the contribution.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Wed, Jul 3, 2013 at 5:54 AM, Daniel Bryant
 wrote:

Hi everyone,

I'm seeing very bad performance when grouping (field collapsing) using
group.facet=true with a large result set.

- I have an index with 2 million documents, and I query with five facet
fields (each with 30+ groups)
- If I set group.facet=false the query can take 2000ms on first run, but no
more than 250ms on subsequent execution
- If I set group.facet=true it takes on average 18000ms on the first run,
and the same time on all subsequent runs (suggesting to me that a cache is
not being used)

I've checked the Solr Jira and several others are experiencing the same
issue:

https://issues.apache.org/jira/browse/SOLR-4763

Could anyone offer any advice or suggestions please? This is becoming a
blocking issue for us, and I'm very curious if this will be fixed in the
near future?

Best wishes,

Daniel

--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44 (0)
7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bry...@tai-dev.co.uk <mailto:daniel.bry...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>