Re: solr 4.4 splitshard query

2018-12-05 Thread Kelly, Frank
Whenever I hit a problem with SPLITSHARDS it's usually because I run out of disk as effectively your doubling the disk space used by the shard. However for large indexes (and 40GB is pretty large) take a look at https://issues.apache.org/jira/browse/SOLR-5324 If that's the problem one possible w

Re: Thoughts on scaling strategy for Solr deployed on AWS EC2 instances - Scale up / out and which instance type?

2018-05-21 Thread Kelly, Frank
ing SolrCloud, the usual scaling approacy if you're >index-heavy is to add more shards, and since you're CPU bound they'd >have to be on new AWS instances. Or, if you're running multiple >replicas on each instance, move some of the replicas to new instances. >A

Thoughts on scaling strategy for Solr deployed on AWS EC2 instances - Scale up / out and which instance type?

2018-05-21 Thread Kelly, Frank
Using Solr 5.3.1 - index We have an indexing heavy workload (we do more indexing than searching) and for those searches we do perform we have very few cache hits (25% of our index is in memory and the hit rate is < 0.1%) We are currently using r3.xlarge (memory optimized instances as we origina

Re: Largest number of indexed documents used by Solr

2018-04-05 Thread Kelly, Frank
For us we have ~ 350M documents stored using r3.xlarge nodes with 8GB Heap and about 31GB of RAM We are using Solr 5.3.1 in a SolrCloud setup (3 collections, each with 3 shards and 3 replicas). For us lots of RAM memory is not as important as CPU (as the EBS disk we run on top of is quite fast a

Re: SolrCloud: How best to do backups?

2018-02-13 Thread Kelly, Frank
Hmmm... > >Can you (fairly quickly) reproduce this AWS environment (including the >indexes)? Or does it require that several week process to provision new >Solr boxes...? > >What happens now if one of those ec2 instances gets into trouble? Do you >have autoscaling groups set up?

SolrCloud: How best to do backups?

2018-02-08 Thread Kelly, Frank
We have a large SolrCloud deployment on AWS (350m documents spread across 3 collections, each with 3 shards and 3 replicas) Running on 3 x r3.xlarge’s with the data stored on EBS drives with Provisioned IOPS Currently it’s handling 38m requests per day My question is how best should we back-up

Re: SolrCloud 5.3.1 "IndexWriter is closed"

2017-09-12 Thread Kelly, Frank
The schema change doesn¹t seem to be making any difference - just the act of a reload whilst handling live traffic. The reload takes about 30 seconds and soon after (within a few seconds) we start to see IndexWriter closed exceptions -Frank Frank Kelly Principal Software Engineer Identity Prof

Re: SolrCloud 5.3.1 "IndexWriter is closed"

2017-09-12 Thread Kelly, Frank
No - these are new terms for new documents we will be adding later so no need to reindex old documents. Frank Frank Kelly Principal Software Engineer Identity Profile Team (SCBE, Traces, CDA) HERE 5 Wayside Rd, Burlington, MA 01803, USA 42° 29' 7" N 71° 11' 32" W

SolrCloud 5.3.1 "IndexWriter is closed"

2017-09-11 Thread Kelly, Frank
Just wondering if anyone has seen this before and might understand why this is happening When we deploy a new schema.xml adding some new search terms we get the dreaded “IndexWriter is closed” exception and the only solution we have found to date is a Solr restart :-( Environment: * Solr

Re: Does {!child} query support nested Queries ("v=")

2017-03-06 Thread Kelly, Frank
ps://issues.apache.org/jira/browse/SOLR-5772 We always create, update and delete nested objects together On 3/2/17, 3:42 PM, "Mikhail Khludnev" wrote: >Hello, Frank! > >The closest equivalent would be q=+type:userAccount +givenName:test* >And make sure please that it's pa

Solr 5.3.1: child query must only match non-parent docs

2017-03-02 Thread Kelly, Frank
Our customers are running this query where they have a filter on the parent objects (givenName, familyName etc) and then request the child objects ({!parent which etc) q=+(givenName:(+UserSearchControllerUTFN +1180460672*) familyName:(+UserSearchControllerUTFN +1180460672*)) +{!parent which="t

Does {!child} query support nested Queries ("v=")

2017-03-02 Thread Kelly, Frank
This is Solr Cloud 5.3.1 I have a query like the following q={!child of="type:userAccount" v="givenName:test*”} Intent: Show me all children of the type:userAccount where userAccount.givenName:test* If I run the query multiple times I get a very different numFound difference 186,560 to 187,412

Re: Distributed Search: Wrong count?

2017-03-01 Thread Kelly, Frank
Quick extra clarification – the documents in question we are searching for are child documents we are searching direct (no parent/child in the query) -Frank From: Frank J Kelly mailto:frank.ke...@here.com>> Reply-To: "solr-user@lucene.apache.org" mailto:solr-

Distributed Search: Wrong count?

2017-03-01 Thread Kelly, Frank
Environment: SolrCloud 5.3 Collection has 12.3m docs split across 3 shards and 3 replicas In the query below I get one document ID returned but a numFound of 365 { "responseHeader":{ "status":0, "QTime":47, "params":{ "q":"haUserId: AND haAccountType:google AND type:use

Re: Copying SolrCloud collections (Replication? Backup/Restore?)

2017-02-10 Thread Kelly, Frank
bring up Solr on region 2 and verify >it's as you expect. >5> Use the Collections API to ADDREPLICA in region 2 to build out your >collection. the ADDREPLICA will automatically copy the index from the >leader. > >Best, >Erick > >On Fri, Feb 10, 2017 at 10:12 AM, K

Copying SolrCloud collections (Replication? Backup/Restore?)

2017-02-10 Thread Kelly, Frank
Hello, We have a 100M+ documents across 2 collections and need to reindex the entirety of the Collections as we need to turn on “docValues”:true on a number of fields (see previous emails from this week :-] ). Unfortunately we have 4 AWS regions each with their own SolrCloud cluster each with

Re: Solr Heap Dump: Any suggestions on what to look for?

2017-02-10 Thread Kelly, Frank
To clarify "we put ³docValues²=³true² on the schema” should have said "we put ³docValues²=³true² on the id field only” -Frank On 2/10/17, 10:27 AM, "Kelly, Frank" wrote: >Thanks Shawn, > >Yeah think we have identified root cause thanks to some of the sugges

Re: Solr Heap Dump: Any suggestions on what to look for?

2017-02-10 Thread Kelly, Frank
.com/company/heremaps> <https://www.instagram.com/here/> On 2/9/17, 11:00 AM, "Shawn Heisey" wrote: >On 2/9/2017 6:19 AM, Kelly, Frank wrote: >> Got a heap dump on an Out of Memory error. >> Analyzing the dump now in Visual VM >> >> Seein

Re: Solr Heap Dump: Any suggestions on what to look for?

2017-02-09 Thread Kelly, Frank
Thanks for the fast reply. I think we¹re going to focus on using doc values. You also said "facet on fewer fields² - how does one do that? Thanks! -Frank Frank Kelly Principal Software Engineer HERE 5 Wayside Rd, Burlington, MA 01803, USA 42° 29' 7" N 71° 11' 32" W

Solr Heap Dump: Any suggestions on what to look for?

2017-02-09 Thread Kelly, Frank
Got a heap dump on an Out of Memory error. Analyzing the dump now in Visual VM Seeing a lot of byte[] arrays (77% of our 8GB Heap) in * TreeMap$Entry * FieldCacheImpl$SortedDocValues We’re considering switch over to DocValues but would rather be definitive about the root cause before we

Solr 5.3.1: Collection reload results in IndexWriter is closed exception

2017-02-07 Thread Kelly, Frank
Just wondering if anyone has seen this before and might understand why this is happening Environment: Solr 5.3.1 in Solr Cloud (3 shards each with 3 replicas across 3 EC2 Vms) 100m documents (20+ GB index) Previously we’ve done reloads of a collection after changing solrconfig.xml without any i

Re: Lucene Merge Thread: skip too large

2017-01-18 Thread Kelly, Frank
> <https://www.linkedin.com/company/heremaps> <https://www.instagram.com/here/> On 1/18/17, 10:23 AM, "Shawn Heisey" wrote: >On 1/18/2017 6:51 AM, Kelly, Frank wrote: >> We¹re investigating a strange spike in Heap memory usage in our >> Production Solr. >>

Lucene Merge Thread: skip too large

2017-01-18 Thread Kelly, Frank
Hello, We're investigating a strange spike in Heap memory usage in our Production Solr. Heap is stable for days ~ 1.6GB and then suddenly spikes to 3.9 GB and we get an OOM. Our app server behavior using Solr appears to unchanged (no new schema updates, no additional indexing or searching we co

Re: SolrCloud: ClusterState says we are the leader but locally we don't think so

2017-01-17 Thread Kelly, Frank
ot; wrote: >Try bouncing the overseer for your cluster. > >On Jan 17, 2017 12:01 PM, "Kelly, Frank" wrote: > >> Solr Version: 5.3.1 >> >> Configuration: 3 shards, 3 replicas each >> >> After running out of heap memory recently (cause unknown) we¹ve b

SolrCloud: ClusterState says we are the leader but locally we don't think so

2017-01-17 Thread Kelly, Frank
Solr Version: 5.3.1 Configuration: 3 shards, 3 replicas each After running out of heap memory recently (cause unknown) we've been successfully restarting nodes to recover. Finally we did one restart and one of the nodes now says the following 2017-01-17 16:57:16.835 ERROR (qtp1395089624-17) [c:

Re: SolrCloud: Failure to recover on restart following OutOfMemoryError

2016-07-29 Thread Kelly, Frank
Hi Folks, I didn't hear back from anyone on this and just wanted to ping again to see if anyone had any thoughts. FWIW we are using SolrCloud (Solr 5.3.1) with a separate dedicated ZooKeeper ensemble. -Frank From: Frank J Kelly mailto:frank.ke...@here.com>> Reply-To: "solr-user@lucene.ap

SolrCloud: Failure to recover on restart following OutOfMemoryError

2016-07-27 Thread Kelly, Frank
Hi All, We have a SolrCloud cluster with 3 Virtual Machines, assigning 4GB to the Java Heap. Recently we added a number of collections to the machine going from around 80 collections (each with 3 shards x 3 replicas) to 150 collections We've hit Heap errors. That wasn't the surprise, the surpr

SolrCloud: Adding a very large collection to a pre-existing cluster

2016-06-21 Thread Kelly, Frank
We have about 200 million documents (~70 GB) we need to keep indexed across 3 collections. Currently 2 of the 3 collections are already indexed (roughly 90m docs). We'd like to create the remaining collection (about 100 m documents) but minimizing the performance impact on the existing collecti

Re: Solr Max Query length

2016-04-23 Thread Kelly, Frank
se response = solr.query(q, METHOD.POST); > >Let's wait for others response. > > > > >On Fri, Apr 22, 2016 at 8:51 PM, Kelly, Frank >wrote: > >> I am using the SolrJ library - does it have a way to specify one variant >> (POST) over the other (GET)

Re: Solr Max Query length

2016-04-22 Thread Kelly, Frank
I am using the SolrJ library - does it have a way to specify one variant (POST) over the other (GET)? -Frank On 4/22/16, 11:13 AM, "Reth RM" wrote: >Are you using get instead of post? > >https://dzone.com/articles/solr-select-query-get-vs-post > > > >On Fri,

Solr Max Query length

2016-04-22 Thread Kelly, Frank
I used SolrJ and wrote a test to confirm that the max query length supported by Solr (by default) was 8192 in Solr 5.3.1 Based on the default Jetty settings jetty.xml: The test would not work however until I had used a max size of 4096 (so the query passes at 4095 and returns a RemoteSolr

Re: solcloud on production

2016-04-08 Thread Kelly, Frank
I have found that ZooKeeper is the weak link for SolrCloud especially if deployed on a Cloud environment (AWS). https://issues.apache.org/jira/browse/SOLR-3274 https://issues.apache.org/jira/browse/SOLR-8868 And others see https://issues.apache.org/jira/browse/ZOOKEEPER-2112?jql=project%20%3D%

RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Kelly, Frank
Just wondering if my observation of SolrCloud behavior after ZooKeeper loses a quorum is normal or to-be-expected Version of Solr: 5.3.1 Version of ZooKeeper: 3.4.7 Using SolrCloud with external ZooKeeper Deployed on AWS Our Solr cluster has 3 nodes Our Zookeeper ensemble consists of three no

Re: RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Kelly, Frank
ering whether this might be the bug of SOLR-8326, which is fixed >in Solr 5.4 > >That's my guess as a user who ran into the bug myself. > >-Original Message- >From: Kelly, Frank [mailto:frank.ke...@here.com] >Sent: Wednesday, March 16, 2016 3:09 PM >To: solr

Re: RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Kelly, Frank
Analytics Team (SCBE/HAC/CDA) HERE 5 Wayside Rd, Burlington, MA 01803, USA 42° 29' 7" N 71° 11' 32² W <http://360.here.com/> <https://twitter.com/here> <https://www.facebook.com/here><https://linkedin.com/company/heremaps> <https://www.instagram.co

SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-15 Thread Kelly, Frank
Just wondering if my observation of SolrCloud behavior after ZooKeeper loses a quorum is normal or to-be-expected Version of Solr 5.3.1 Version of ZooKeeper: 3.4.7 Using SolrCloud with external ZooKeeper Deployed on AWS Our Zookeeper ensemble consists of three nodes with the same config e.g. $

SolrCloud behavior when a ZooKeeper node goes down

2016-02-08 Thread Kelly, Frank
We are running a small SolrCloud instance on AWS Solr : Version 5.3.1 ZooKeeper: Version 3.4.6 3 x ZooKeeper nodes (with higher limits and timeouts due to being on AWS) 3 x Solr Nodes (8 GB of memory each - 2 collections with 3 shards for each collection) Let's call the ZooKeeper nodes A, B and

Solr 5.3.1 ArrayIndexOutOfBoundsException while running a query

2016-01-11 Thread Kelly, Frank
Using Solr 5.3.1 in Solr Cloud mode deployed on AWS (each Solr instance has -Xmx 1024m and the server has 8GB of RAM) Am getting a 500 error running a query via the UI Looking in the logs I just see this with no stack trace 2016-01-12 02:04:22.181 ERROR (qtp59559151-7313) [c:qa_us-east-1_here_a

Solr Heap memory vs. OS memory

2015-12-09 Thread Kelly, Frank
Hi Folks, I was wondering if this link I found recommended by Erick is still accurate (for Solr 5.3.1) "For configuring your Java VM, you should rethink your memory requirements: Give only the really needed amount of heap space and leave as much as possible to the O/S. As a rule of thumb: Don

Wildcard searches - field:aaaa* works but field:a*a does not

2015-12-03 Thread Kelly, Frank
Hello Lucene Folks, Newbie here - I've found how Solr does Wildcard searches of the form field:a* using the EdgeNGramFilterFactory https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory but I can't seem to dig up how to support wildcards in the middle

Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-03 Thread Kelly, Frank
Just wondering if folks have any suggestions on using Schema.xml vs. Managed Schema going forward. Our deployment will be > 3 Zk, 3 Shards, 3 replicas > Copies of each collection in 5 AWS regions (EBS-backed EC2 instances) > Planning at least 1 Billion objects indexed (currently < 100 million) I

Re: Create Collection Admin Request - unable to specify collection configName

2015-12-02 Thread Kelly, Frank
Thank you everyone - this was EXACTLY my problem. I was using a chroot for startup but not on the upload of configurations. Now everything works as expected. Thanks everyone! -Frank On 12/2/15, 12:10 AM, "Upayavira" wrote: >Adding /solr to the zk string 'namespaces' the data within a sor >di

Re: Create Collection Admin Request - unable to specify collection configName

2015-12-01 Thread Kelly, Frank
scbe_**public7_config/conf* -confname >>> scbe_public7 -z zk.zk.zk.zk:2181 >>> >>> This is how I do >>> zkcli.sh -zkhost $ZK_ENSEMBLE -cmd upconfig -confdir /tmp/access/conf >>> -confname access >>> >>> You can verify if you have properly

Re: Create Collection Admin Request - unable to specify collection configName

2015-12-01 Thread Kelly, Frank
>> >> This is how I do >> zkcli.sh -zkhost $ZK_ENSEMBLE -cmd upconfig -confdir /tmp/access/conf >> -confname access >> >> You can verify if you have properly uploaded the config to either by >> Upayvira's suggestion or using ./zkcli.sh Eg : >> h

Re: Create Collection Admin Request - unable to specify collection configName

2015-12-01 Thread Kelly, Frank
y >Upayvira's suggestion or using ./zkcli.sh Eg : >https://gist.github.com/manisnesan/52ffc84dd761365e0c22 > > > >On Tue, Dec 1, 2015 at 5:22 PM, Kelly, Frank wrote: > >> So I have an ensemble of three Zk nodes running >> >> I have tried upconfig to all th

Re: Create Collection Admin Request - unable to specify collection configName

2015-12-01 Thread Kelly, Frank
z) > >And the first form of your CREATE command is what's required, i.e. >"collection.configName=scbe_public7" >not >"collection.configName=/configs/scbe_public7" > >Don't worry about the linkconfig, create or anything else until you >can see your configs

Re: Create Collection Admin Request - unable to specify collection configName

2015-12-01 Thread Kelly, Frank
i.e. one layer too deep. > >This is gonna be something trivial, I bet you! > >Upayavira > >On Tue, Dec 1, 2015, at 08:04 PM, Kelly, Frank wrote: >> Context: Solr 5.3.1 with ZooKeeper 3.4.6 (SolrCloud) >> >> Via the REST APU I am trying to create a colle

Create Collection Admin Request - unable to specify collection configName

2015-12-01 Thread Kelly, Frank
Context: Solr 5.3.1 with ZooKeeper 3.4.6 (SolrCloud) Via the REST APU I am trying to create a collection and tie it to a configuration I have loaded into ZooKeeper Here are the configs loaded into ZooKeeper [zk: localhost:2181(CONNECTED) 5] ls /configs [scbe_public7, mycollection, scbe_public_

Re: Is there a way to set zkClientTimeout from command line?

2015-12-01 Thread Kelly, Frank
t the top >of >bin/solr) > > > >: Date: Tue, 1 Dec 2015 15:43:59 + >: From: "Kelly, Frank" >: Reply-To: solr-user@lucene.apache.org >: To: "solr-user@lucene.apache.org" >: Subject: Is there a way to set zkClientTimeout from command line? >

Is there a way to set zkClientTimeout from command line?

2015-12-01 Thread Kelly, Frank
I am executing "/bin/solr" for SolrCloud and noticed that there is a Bash parameter that is being inspected but is never being set except by default if [ -z "$ZK_CLIENT_TIMEOUT" ]; then ZK_CLIENT_TIMEOUT="15000" #I would like this to be settable from command line fi CLOUD_MODE_OPTS=(

Re: ZooKeeper nodes die taking down Solr Cluster?

2015-12-01 Thread Kelly, Frank
Thanks Emir - responses inline below >Can you please confirm that Solr nodes are aware of entire ZK ensemble? Can you explain how I could find that out - I looked into the logs and the Admin UI and didn¹t see A way to examine if the Solr nodes saw the entire ensemble >Can you give more info how i

ZooKeeper nodes die taking down Solr Cluster?

2015-11-30 Thread Kelly, Frank
I am somewhat new to SolrCloud and ZooKeeper. We are deploying ZK and SolrCloud on AWS. We are noticing an issue where the one of the three nodes in the ZooKeeper ensemble "drops out" of the ensemble (although the Java process continues to run fine and nothing obviously bad in the ZooKeeper log