Re: How to check when a search exceeds the threshold of timeAllowed parameter

2015-12-23 Thread Jeff Wartes
Looks like it’ll set partialResults=true on your results if you hit the timeout. https://issues.apache.org/jira/browse/SOLR-502 https://issues.apache.org/jira/browse/SOLR-5986 On 12/22/15, 5:43 PM, "Vincenzo D'Amore" wrote: >Well... I can write everything, but really all this just to un

Re: SolrCloud: Setting/finding node names for deleting replicas

2016-01-08 Thread Jeff Wartes
I’m pretty sure you could change the name when you ADDREPLICA using a core.name property. I don’t know if you can when you initially create the collection though. The CLUSTERSTATUS command will tell you the core names: https://cwiki.apache.org/confluence/display/solr/Collections+API#Collectio

Re: SolrCloud: Setting/finding node names for deleting replicas

2016-01-08 Thread Jeff Wartes
odeName=xxx > >btw, for your app, isn't "slice" old notation? > > > > >On 08/01/16 22:05, Jeff Wartes wrote: >> >> I’m pretty sure you could change the name when you ADDREPLICA using a >> core.name property. I don’t know if you can when you

Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes
My understanding is that the "version" represents the timestamp the searcher was opened, so it doesn’t really offer any assurances about your data. Although you could probably bounce a node and get your document counts back in sync (by provoking a check), it’s interesting that you’re in this si

Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes
ot;; > >>> >>> You might watch the achieved replication factor of your updates and see if >>> it ever changes >>> > >This is a good tip. I’m not sure I like the implication that any failure to >write all 3 of our replicas must be retried at the app layer

Re: SolrCloud replicas out of sync

2016-01-27 Thread Jeff Wartes
On 1/27/16, 8:28 AM, "Shawn Heisey" wrote: > >I don't think any documentation states this, but it seems like a good >idea to me use an alias from day one, so that you always have the option >of swapping the "real" collection that you are using without needing to >change anything else. I'll

Re: SolrCloud replicas out of sync

2016-01-27 Thread Jeff Wartes
If you can identify the problem documents, you can just re-index those after forcing a sync. Might save a full rebuild and downtime. You might describe your cluster setup, including ZK. it sounds like you’ve done your research, but improper ZK node distribution could certainly invalidate some

Re: collection aliasing

2016-01-28 Thread Jeff Wartes
I enjoy using collection aliases in all client references, because that allows me to change the collection all clients use without updating the clients. I just move the alias. This is particularly useful if I’m doing a full index rebuild and want an atomic, zero-downtime switchover. On 1/2

Re: Restoring backups of solrcores

2016-02-01 Thread Jeff Wartes
Aliases work when indexing too. Create collection: collection1 Create alias: this_week -> collection1 Index to: this_week Next week... Create collection: collection2 Create (Move) alias: this_week -> collection2 Index to: this_week On 2/1/16, 2:14 AM, "vidya" wrote: >Hi > >How can that b

Re: Shard allocation across nodes

2016-02-01 Thread Jeff Wartes
You could write your own snitch: https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement Or, it would be more annoying, but you can always add/remove replicas manually and juggle things yourself after you create the initial collection. On 2/1/16, 8:42 AM, "Tom Evans"

Re: Adding nodes

2016-02-17 Thread Jeff Wartes
Solrcloud does not come with any autoscaling functionality. If you want such a thing, you’ll need to write it yourself. https://github.com/whitepages/solrcloud_manager might be a useful head start though, particularly the “fill” and “cleancollection” commands. I don’t do *auto* scaling, but I d

Re: very slow frequent updates

2016-02-23 Thread Jeff Wartes
My suggestion would be to split your problem domain. Use Solr exclusively for search - index the id and only those fields you need to search on. Then use some other data store for retrieval. Get the id’s from the solr results, and look them up in the data store to get the rest of your fields. T

Re: very slow frequent updates

2016-02-24 Thread Jeff Wartes
stomer wants the list in descending >> > order of the price. >> > >> > So I have to get all the 1000 docids from solr and find the metadata of >> > them in a sql database or in cache in best case. This is the way you >> > suggested? Is it not too slow

Re: Shard State vs Replica State

2016-02-26 Thread Jeff Wartes
I believe the shard state is a reflection of whether that shard is still in use by the collection, and has nothing to do with the state of the replicas. I think doing a split-shard operation would create two new shards, and mark the old one as inactive, for example. On 2/26/16, 8:50 AM, "De

Re: SolrCloud - Strategy for recovering cluster states

2016-03-01 Thread Jeff Wartes
I’ve been running SolrCloud clusters in various versions for a few years here, and I can only think of two or three cases that the ZK-stored cluster state was broken in a way that I had to manually intervene by hand-editing the contents of ZK. I think I’ve seen Solr fixes go by for those cases,

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread Jeff Wartes
? > > > >Your tool is very interesting, I just thought about writing such a tool >myself. >From the sources I understand that you represent each node as a path in the >git repository. >So, I guess that for restore purposes I will have to do >the opposite direction and create a

Re: XX:ParGCCardsPerStrideChunk

2016-03-03 Thread Jeff Wartes
I've experimented with that a bit, and Shawn added my comments in IRC to his Solr/GC page here: https://wiki.apache.org/solr/ShawnHeisey The relevant bit: "With values of 4096 and 32768, the IRC user was able to achieve 15% and 19% reductions in average pause time, respectively, with the maximu

Re: Separating cores from Solr home

2016-03-03 Thread Jeff Wartes
It’s a bit backwards feeling, but I’ve had luck setting the install dir and solr home, instead of the data dir. Something like: -Dsolr.solr.home=/data/solr -Dsolr.install.dir=/opt/solr So all of the Solr files are in in /opt/solr and all of the index/core related files end up in /data/solr.

Re: SolrCloud no leader for collection

2016-04-05 Thread Jeff Wartes
I recall I had some luck fixing a leader-less shard (after a ZK quorum failure) by forcably removing the records for the down-state replicas from the leader election list, and then forcing an election. The ZK path looks like collections//leader_elect/shardX/election. Usually you’ll find the dow

Re: SolrCloud backup/restore

2016-04-05 Thread Jeff Wartes
There is some automation around this process in the backup commands here: https://github.com/whitepages/solrcloud_manager It’s been tested with 5.4, and will restore arbitrary replication factors. Ever assuming the shared filesystem for backups, of course. On 4/5/16, 3:18 AM, "Reth RM" wrot

Re: HTTP Client Only

2016-04-14 Thread Jeff Wartes
If you’re already using java, just use the CloudSolrClient. If you’re using the default router, (CompositeId) it’ll figure out the leaders and send documents to the right place for you. If you’re not using java, then I’d still look there for hints on how to duplicate the functionality. On

Re: Adding replica on solr - 5.50

2016-04-14 Thread Jeff Wartes
I’m all for finding another way to make something work, but I feel like this is the wrong advice. There are two options: 1) You are doing something wrong. In which case, you should probably invest in figuring out what. 2) Solr is doing something wrong. In which case, you should probably invest

Re: Indexing 700 docs per second

2016-04-19 Thread Jeff Wartes
I have no numbers to back this up, but I’d expect Atomic Updates to be slightly slower than a full update, since the atomic approach has to retrieve the fields you didn't specify before it can write the new (updated) document. On 4/19/16, 11:54 AM, "Tim Robertson" wrote: >Hi Mark, > >We we

Re: Replicas for same shard not in sync

2016-04-26 Thread Jeff Wartes
At the risk of thread hijacking, this is an area where I don’t know I fully understand, so I want to make sure. I understand the case where a node is marked “down” in the clusterstate, but what if it’s down for less than the ZK heartbeat? That’s not unreasonable, I’ve seen some recommendations

Re: Replicas for same shard not in sync

2016-04-27 Thread Jeff Wartes
ome retry logic in the code that distributes the updates from >the leader as well. > >Best, >Erick > >On Tue, Apr 26, 2016 at 12:51 PM, Jeff Wartes wrote: >> >> At the risk of thread hijacking, this is an area where I don’t know I fully >> understand, so I want to ma

Re: Solr 5.2.1 on Java 8 GC

2016-04-28 Thread Jeff Wartes
Shawn Heisey’s page is the usual reference guide for GC settings: https://wiki.apache.org/solr/ShawnHeisey Most of the learnings from that are in the Solr 5.x startup scripts already, but your heap is bigger, so your mileage may vary. Some tools I’ve used while doing GC tuning: * VisualVM - Co

Re: Passing Ids in query takes more time

2016-05-05 Thread Jeff Wartes
An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for 80k ids though is basically 80k searches as far as Solr is concerned, so it’s not altogether surprising that it takes a while. Your complaint seems to be that the query planner doesn’t know in advance that should be

Cached fq decreases performance

2015-09-03 Thread Jeff Wartes
I have a query like: q=&fq=enabled:true For purposes of this conversation, "fq=enabled:true" is set for every query, I never open a new searcher, and this is the only fq I ever use, so the filter cache size is 1, and the hit ratio is 1. The fq=enabled:true clause matches about 15% of my docume

Re: Cached fq decreases performance

2015-09-03 Thread Jeff Wartes
and even a newsletter: >http://www.solr-start.com/ > > >On 3 September 2015 at 16:45, Jeff Wartes wrote: >> >> I have a query like: >> >> q=&fq=enabled:true >> >> For purposes of this conversation, "fq=enabled:true" is set for every >

Re: Cached fq decreases performance

2015-09-04 Thread Jeff Wartes
On 9/4/15, 7:06 AM, "Yonik Seeley" wrote: > >Lucene seems to always be changing it's execution model, so it can be >difficult to keep up. What version of Solr are you using? >Lucene also changed how filters work, so now, a filter is >incorporated with the query like so: > >query = new BooleanQ

Autowarm and filtercache invalidation

2015-09-24 Thread Jeff Wartes
If I configure my filterCache like this: and I have <= 10 distinct filter queries I ever use, does that mean I’ve effectively disabled cache invalidation? So my cached filter query results will never change? (short of JVM restart) I’m unclear on whether autowarm simply copies the value into the

Re: Autowarm and filtercache invalidation

2015-09-24 Thread Jeff Wartes
whether it was populated via autowarm. On 9/24/15, 11:28 AM, "Jeff Wartes" wrote: > >If I configure my filterCache like this: >autowarmCount="10"/> > >and I have <= 10 distinct filter queries I ever use, does that mean I’ve >effectively disabled cache inv

Re: How to know index file in OS Cache

2015-09-25 Thread Jeff Wartes
I’ve been relying on this: https://code.google.com/archive/p/linux-ftools/ fincore will tell you what percentage of a given file is in cache, and fadvise can suggest to the OS that a file be cached. All of the solr start scripts at my company first call fadvise (FADV_WILLNEED) on all the files

Re: Cost of having multiple search handlers?

2015-09-28 Thread Jeff Wartes
One would hope that https://issues.apache.org/jira/browse/SOLR-4735 will be done by then. On 9/28/15, 11:39 AM, "Walter Underwood" wrote: >We did the same thing, but reporting performance metrics to Graphite. > >But we won’t be able to add servlet filters in 6.x, because it won’t be a >webapp

Re: Cost of having multiple search handlers?

2015-09-29 Thread Jeff Wartes
production for a year, >but the config is pretty manual. > >wunder >Walter Underwood >wun...@wunderwood.org >http://observer.wunderwood.org/ (my blog) > > >> On Sep 28, 2015, at 4:41 PM, Jeff Wartes wrote: >> >> >> One would hope that h

Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes
I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud index on fields like this:

Re: Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes
f you set f.city.facet.limit=-1 ? > >On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes >wrote: > >> >> I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud >> index on fields like this: >> >> > docValues="true”/> >

Re: Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes
here >https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Over-Re >questParameters >eg does it happen if you run with distrib=false? > >On Fri, Oct 2, 2015 at 12:27 AM, Jeff Wartes >wrote: > >> >> No change, still shows an insert per-reques

Re: Facet queries blow out the filterCache

2015-10-02 Thread Jeff Wartes
gh, because issuing new distinct queries causes a reported insert, but not a lookup, so the cache hit ratio is always exactly 1. On 10/2/15, 4:18 AM, "Toke Eskildsen" wrote: >On Thu, 2015-10-01 at 22:31 +, Jeff Wartes wrote: >> It still inserts if I address the core dire

Re: Facet queries blow out the filterCache

2015-10-06 Thread Jeff Wartes
I dug far enough yesterday to find the GET_DOCSET, but not far enough to find why. Thanks, a little context is really helpful sometimes. So, starting with an empty filterCache... http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=true &facet.field=popularity New values:

Re: are there any SolrCloud supervisors?

2015-10-14 Thread Jeff Wartes
I’m aware of two public administration tools: This was announced to the list just recently: https://github.com/bloomreach/solrcloud-haft And I’ve been working in this: https://github.com/whitepages/solrcloud_manager Both of these hook the Solrcloud client’s ZK access to inspect the cluster state

Re: DevOps question : auto deployment/setup of Solr & Zookeeper on medium-large clusters

2015-10-20 Thread Jeff Wartes
If you’re using AWS, there’s this: https://github.com/LucidWorks/solr-scale-tk If you’re using chef, there’s this: https://github.com/vkhatri/chef-solrcloud (There are several other chef cookbooks for Solr out there, but this is the only one I’m aware of that supports Solr 5.3.) For ZK, I’m less

Re: copy data between collection

2015-10-26 Thread Jeff Wartes
The “copy” command in this tool automatically does what Upayavira describes, including bringing the replicas up to date. (if any) https://github.com/whitepages/solrcloud_manager I’ve been using it as a mechanism for copying a collection into a new cluster (different ZK), but it should work withi

Re: replica recovery

2015-10-27 Thread Jeff Wartes
On the face of it, your scenario seems plausible. I can offer two pieces of info that may or may not help you: 1. A write request to Solr will not be acknowledged until an attempt has been made to write to all relevant replicas. So, B won’t ever be missing updates that were applied to A, unless c

Re: Facet queries blow out the filterCache

2015-10-28 Thread Jeff Wartes
FWIW, since it seemed like there was at least one bug here (and possibly more), I filed https://issues.apache.org/jira/browse/SOLR-8171 On 10/6/15, 3:58 PM, "Jeff Wartes" wrote: > >I dug far enough yesterday to find the GET_DOCSET, but not far enough to >find why. Thanks,

Re: Data Import Handler / Backup indexes

2015-11-17 Thread Jeff Wartes
https://github.com/whitepages/solrcloud_manager supports 5.x, and I added some backup/restore functionality similar to SOLR-5750 in the last release. Like SOLR-5750, this backup strategy requires a shared filesystem, but note that unlike SOLR-5750, I haven’t yet added any backup functionality for

Re: replica recovery

2015-11-19 Thread Jeff Wartes
er but it isn't clear to me how high it should be or if >raising the limit will cause new problems. > >Any advice you could provide in this situation would be awesome! > >Cheers, >Brian > > > >> On Oct 27, 2015, at 20:50, Jeff Wartes wrote: >> >>

Re: Data Import Handler / Backup indexes

2015-11-23 Thread Jeff Wartes
be run >because the database is unavailable. > >Our collection is simple: 2 nodes - 1 collection - 2 shards with 2 >replicas >each > >So a simple copy (cp command) for both the nodes/shards might work for us? >How do I restore the data back? > > > >On

Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-03 Thread Jeff Wartes
I’ve never used the managed schema, so I’m probably biased, but I’ve never seen much of a point to the Schema API. I need to make changes sometimes to solrconfig.xml, in addition to schema.xml and other config files, and there’s no API for those, so my process has been like: 1. Put the entire con

Re: How to list all collections in solr-4.7.2

2015-12-03 Thread Jeff Wartes
Looks like LIST was added in 4.8, so I guess you’re stuck looking at ZK, or finding some tool that looks in ZK for you. The zkCli.sh that ships with zookeeper would probably suffice for a one-off manual inspection: https://zookeeper.apache.org/doc/trunk/zookeeperStarted.html#sc_ConnectingT oZooKee

Re: Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-04 Thread Jeff Wartes
If you want two different collections to have two different schemas, those collections need to reference two different configsets. So you need another copy of your config available using a different name, and to reference that other name when you create the second collection. On 12/4/15, 6:26 AM

Re: Fully automated replica creation in AWS

2015-12-09 Thread Jeff Wartes
It’s a pretty common misperception that since solr scales, you can just spin up new nodes and be done. Amazon ElasticSearch and older solrcloud getting-started docs encourage this misperception, as does the HDFS-only autoAddReplicas flag. I agree that auto-scaling should be approached carefully,

Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Jeff Wartes
Don’t set solr.data.dir. Instead, set the install dir. Something like: -Dsolr.solr.home=/data/solr -Dsolr.install.dir=/opt/solr I have many solrcloud collections, and separate data/install dirs, and I’ve never had to do anything with manual per-collection or per-replica data dirs. That said, it’

state.json being downloaded every 10 seconds

2016-05-16 Thread Jeff Wartes
I have a solr 5.4 cluster with three collections, A, B, C. Nodes either host replicas for collection A, or B and C. Collections B and C are not currently used - no inserts or queries. Collection A is getting significant query traffic, but no insert traffic, and queries are only directed to node

Re: state.json being downloaded every 10 seconds

2016-05-16 Thread Jeff Wartes
>What the "something" is that sends requests I'm not quite sure, but >that's a place >to start. > >Best, >Erick > >On Mon, May 16, 2016 at 11:08 AM, Jeff Wartes wrote: >> >> I have a solr 5.4 cluster with three collections, A, B, C. >&g

Re: SolrCloud replicas consistently out of sync

2016-05-19 Thread Jeff Wartes
That case related to consistency after a ZK outage or network connectivity issue. Your case is standard operation, so I’m not sure that’s really the same thing. I’m aware of a few issues that cam happen if ZK connectivity goes wonky, that I hope are fixed in SOLR-8697. This one might be a close

Re: How to stop searches to solr while full data import is going in SOLR

2016-05-23 Thread Jeff Wartes
The PingRequestHandler contains support for a file check, which allows you to control whether the ping request succeeds based on the presence/absence of a file on disk on the node. http://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/handler/PingRequestHandler.html I suppose you could

Re: SolrCloud increase replication factor

2016-05-23 Thread Jeff Wartes
https://github.com/whitepages/solrcloud_manager was designed to provide some easier operations for common kinds of cluster operation. It hasn’t been tested with 6.0 though, so if you try it, please let me know your experience. On 5/23/16, 6:28 AM, "Tom Evans" wrote: >On Mon, May 23, 2016 at

Re: Solr cloud with Grouping query gives inconsistent results

2016-05-23 Thread Jeff Wartes
My first thought is that you haven’t indexed such that all values of the field you’re grouping on are found in the same cores. See the end of the article here: (Distributed Result Grouping Caveats) https://cwiki.apache.org/confluence/display/solr/Result+Grouping And the “Document Routing” sectio

Re: What if adding 3rd node exceeds replication Factor? [scottchu]

2016-05-25 Thread Jeff Wartes
SolrCloud never creates replicas automatically, unless perhaps you’re using the HDFS-only autoAddReplicas option. Start the new node using the same ZK, and then use the Collections API (https://cwiki.apache.org/confluence/display/solr/Collections+API) to ADDREPLICA. The replicationFactor you s

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread Jeff Wartes
Oh, interesting. I’ve certainty encountered issues with multi-word synonyms, but I hadn’t come across this. If you end up using it with a recent solr verison, I’d be glad to hear your experience. I haven’t used it, but I am aware of one other project in this vein that you might be interested in

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread Jeff Wartes
gt; > >> > >> > > > >> >> > > > >> > > >> > >> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ >> > > > >> > >> >

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread Jeff Wartes
d line I get: > >/opt/solr-5.4.0/server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar > >But the log file is still carrying class not found exceptions when I >restart... > >Are you in "Cloud" mode? What version of Solr are you using? > >On Tue, May 3

Re: Solr off-heap FieldCache & HelioSearch

2016-06-03 Thread Jeff Wartes
For what it’s worth, I’d suggest you go into a conversation with Azul with a more explicit “I’m looking to buy” approach. I reached out to them with a more “I’m exploring my options” attitude, and never even got a trial. I get the impression their business model involves a fairly expensive (to

Re: Multiple calls across the distributed nodes for a query

2016-06-15 Thread Jeff Wartes
Any distributed query falls into the two-phase process. Actually, I think some components may require a third phase. (faceting?) However, there are also cases where only a single pass is required. A fl=id,score will only be a single pass, for example, since it doesn’t need to get the field valu

Re: Long STW GCs with Solr Cloud

2016-06-16 Thread Jeff Wartes
Check your gc log for CMS “concurrent mode failure” messages. If a concurrent CMS collection fails, it does a stop-the-world pause while it cleans up using a *single thread*. This means the stop-the-world CMS collection in the failure case is typically several times slower than a concurrent CMS

Re: Long STW GCs with Solr Cloud

2016-06-17 Thread Jeff Wartes
res. I suspect there's a lot of garbage building up. >We're going to run tests with field collapsing disabled and see if that >makes a difference. > >Cas > > >On Thu, Jun 16, 2016 at 1:08 PM, Jeff Wartes wrote: > >> Check your gc log for CMS “concurrent mode fa

Re: SolrCloud: Adding a very large collection to a pre-existing cluster

2016-06-21 Thread Jeff Wartes
There’s no official way of doing #1, but there are some less official ways: 1. The Backup/Restore API provides some hooks into loading pre-existing data dirs into an existing collection. Lots of caveats. 2. If you don’t have many shards, there’s always rsync/reload. 3. There are some third-party

Re: Help with recovering shard range after zookeeper disaster

2016-06-28 Thread Jeff Wartes
This might come a little late to be helpful, but I had a similar situation with Solr 5.4 once. We ended up finding a ZK snapshot we could restore, but we did also get the cluster back up for most of the interim by taking the now-empty ZK cluster, re-uploading the configs that the collections us

Re: Full re-index without downtime

2016-07-06 Thread Jeff Wartes
A variation on #1 here - Use the same cluster, create a new collection, but use the createNodeSet option to logically partition your cluster so no node has both the old and new collection. If your clients all reference a collection alias, instead of a collection name, then all you need to do w

Re: solrcloud consumes more time than solr when write index

2016-07-12 Thread Jeff Wartes
Well, two thoughts: 1. If you’re not using solrcloud, presumably you don’t have any replicas. If you are, presumably you do. This makes for a biased comparison, because SolrCloud won’t acknowledge a write until it’s been safely written to all replicas. In short, solrcloud write time is max(per

Re: solrcloud consumes more time than solr when write index

2016-07-13 Thread Jeff Wartes
Kent > >2016-07-12 23:02 GMT+08:00 Jeff Wartes : > >> Well, two thoughts: >> >> >> 1. If you’re not using solrcloud, presumably you don’t have any replicas. >> If you are, presumably you do. This makes for a biased comparison, because >> SolrCloud won’t

Re: Node not recovering, leader elections not occuring

2016-07-19 Thread Jeff Wartes
It sounds like the node-local version of the ZK clusterstate has diverged from the ZK cluster state. You should check the contents of zookeeper and verify the state there looks sane. I’ve had issues (v5.4) on a few occasions where leader election got screwed up to the point where I had to delete

Effects of insert order on query performance

2016-08-11 Thread Jeff Wartes
This isn’t really a question, although some validation would be nice. It’s more of a warning. Tldr is that the insert order of documents in my collection appears to have had a huge effect on my query speed. I have a very large (sharded) SolrCloud 5.4 index. One aspect of this index is a mult

Re: Effects of insert order on query performance

2016-08-12 Thread Jeff Wartes
.com/blog/2015/09/29/solrcloud-large-tenants-and-routing/ Regards, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 11.08.2016 19:39, Jeff Wartes wrote: > This i

Re: Result Grouping vs. Collapsing Query Parser -- Can one be deprecated?

2016-10-20 Thread Jeff Wartes
I’ll also mention the choice to improve processing speed by allocating more memory, which increases the importance of GC tuning. This bit me when I tried using it on a larger index. https://issues.apache.org/jira/browse/SOLR-9125 I don’t know if the result grouping feature shares the same issue

Re: Facets based on sampling

2016-11-04 Thread Jeff Wartes
https://issues.apache.org/jira/browse/SOLR-5894 had some pretty interesting looking work on heuristic counts for facets, among other things. Unfortunately, it didn’t get picked up, but if you don’t mind using Solr 4.10, there’s a jar. On 11/4/16, 12:02 PM, "John Davis" wrote: Hi, I a

Re: CodaHale metrics for Solr 6?

2016-11-04 Thread Jeff Wartes
Expanding on my comment on the ticket, I’m really quite happy with using codahale/dropwizard metrics with Solr. I don’t know if I’m comfortable just sharing a screenshot of the resulting grafana dashboard, but I’ve got, per-host: - Percentile latencies and rates for GET vs POST (which in solrclo

Re: Queries regarding solr cache

2016-12-01 Thread Jeff Wartes
I found this, which intends to explore the usage of RoaringDocIdSet for solr: https://issues.apache.org/jira/browse/SOLR-9008 This suggests Lucene’s filter cache already uses it, or did at one point: https://issues.apache.org/jira/browse/LUCENE-6077 I was playing with id set implementations earl

Re: Memory leak in Solr

2016-12-04 Thread Jeff Wartes
Here’s an earlier post where I mentioned some GC investigation tools: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3c8f8fa32d-ec0e-4352-86f7-4b2d8a906...@whitepages.com%3E In my experience, there are many aspects of the Solr/Lucene memory allocation model that scale wi

Re: CREATEALIAS to non-existing collections

2016-12-09 Thread Jeff Wartes
I’d prefer it if the alias was required to be removed, or pointed elsewhere, before the collection could be deleted. As a best practice, I encourage all SolrCloud users to configure an alias to each collection, and use only the alias in their clients. This allows atomic switching between colle

Re: Latest advice on G1 collector?

2017-01-25 Thread Jeff Wartes
Hah, interesting. The fact that the CMS collector fails back to a *single-threaded* collection on concurrent-mode-failure had me seriously considering trying the Parallel collector a year or two ago. I figured out (and stopped) the queries that were doing the sudden massive allocations that wer

Re: Latest advice on G1 collector?

2017-01-26 Thread Jeff Wartes
Adding my anecdotes: I’m using heavily tuned ParNew/CMS. This is a SolrCloud collection, but per-node I’ve got a 28G heap and a 200G index. The large heap turned out to be necessary because certain operations in Lucene allocate memory based on things other than result size, (index size typica

Re: Collection will not replicate

2017-02-01 Thread Jeff Wartes
Sounds similar to a thread last year: http://lucene.472066.n3.nabble.com/Node-not-recovering-leader-elections-not-occuring-tp4287819p4287866.html On 2/1/17, 7:49 AM, "tedsolr" wrote: I have version 5.2.1. Short of an upgrade, are there any remedies? Erick Erickson wrote >

Solr performance on EC2 linux

2017-04-28 Thread Jeff Wartes
tldr: Recently, I tried moving an existing solrcloud configuration from a local datacenter to EC2. Performance was roughly 1/10th what I’d expected, until I applied a bunch of linux tweaks. This should’ve been a straight port: one datacenter server -> one EC2 node. Solr 5.4, Solrcloud, Ubuntu

Re: Solr performance on EC2 linux

2017-04-30 Thread Jeff Wartes
performance between local and EC2 But thanks for telling us about this! It's totally baffling Erick On Fri, Apr 28, 2017 at 9:09 AM, Jeff Wartes wrote: > > tldr: Recently, I tried moving an existing solrcloud configuration from a local datacent

Re: Solr performance on EC2 linux

2017-05-01 Thread Jeff Wartes
The R series is labeled "High-Memory" Which instance type did you end up using? On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey wrote: > On 4/28/2017 10:09 AM, Jeff Wartes wrote: > > tldr: Recently, I tried moving an existing solrcloud configuration fr

Re: Solr performance on EC2 linux

2017-05-01 Thread Jeff Wartes
I started with the same three-node 15-shard configuration I’d been used to, in an RF1 cluster. (the index is almost 700G so this takes three r4.8xlarge’s if I want to be entirely memory-resident) I eventually dropped down to a 1/3rd size index on a single node (so 5 shards, 100M docs each) so I

Re: Solr performance on EC2 linux

2017-05-01 Thread Jeff Wartes
Yes, that’s the Xenial I tried. Ubuntu 16.04.2 LTS. On 5/1/17, 7:22 PM, "Will Martin" wrote: Ubuntu 16.04 LTS - Xenial (HVM) Is this your Xenial version? On 5/1/2017 6:37 PM, Jeff Wartes wrote: > I tried a few variations of various things be

Re: Solr performance on EC2 linux

2017-05-03 Thread Jeff Wartes
It’s presumably not a small degradation - this guy very recently suggested it’s 77% slower: https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-slower-on-ec2/ The other reason that blog post is interesting to me is that his benchmark utility showed the work of entering the kernel

Re: Result merging takes too long

2014-03-17 Thread Jeff Wartes
This is highly anecdotal, but I tried SOLR-1880 with 4.7 for some tests I was running, and saw almost a 30% improvement in latency. If you¹re only doing document selection, it¹s definitely worth having. I¹m reasonably certain that the patch would work in 4.6 too, but the test file relies on some

Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Jeff Wartes
Please note that although the article talks about the ADDREPLICA command, that feature is coming in Solr 4.8, so don¹t be confused if you can¹t find it yet. See https://issues.apache.org/jira/browse/SOLR-5130 On 3/20/14, 7:45 AM, "Erick Erickson" wrote: >You might find this useful: >http://he

Re: Logging which client connected to Solr

2014-03-27 Thread Jeff Wartes
You could always just pass the username as part of the GET params for the query. Solr will faithfully ignore and log any parameters it doesn¹t recognize, so it¹d show up in your {lot of params}. That means your log parser would need more intelligence, and your client would have to pass in the dat

Re: svn vs GIT

2014-04-14 Thread Jeff Wartes
I vastly prefer git, but last I checked, (admittedly, some time ago) you couldn't build the project from the git clone. Some of the build scripts assumed some svn commands will work. On 4/12/14, 3:56 PM, "Furkan KAMACI" wrote: >Hi Amon; > >There has been a conversation about it at dev list: >h

Re: svn vs GIT

2014-04-15 Thread Jeff Wartes
iyengar wrote: >> ant compile / ant -f solr dist / ant test certainly work, I use them >>with a >> git working copy. You trying something else? >> On 14 Apr 2014 19:36, "Jeff Wartes" wrote: >> >>> I vastly prefer git, but last I checked, (admittedly,

Re: timeAllowed in not honoring

2014-04-30 Thread Jeff Wartes
It¹s not just FacetComponent, here¹s the original feature ticket for timeAllowed: https://issues.apache.org/jira/browse/SOLR-502 As I read it, timeAllowed only limits the time spent actually getting documents, not the time spent figuring out what data to get or how. I think that means the primar

Re: When not to use NRTCachingDirectory and what to use instead.

2014-04-30 Thread Jeff Wartes
On 4/19/14, 6:51 AM, "Ken Krugler" wrote: > >The code I see seems to be using an FSDirectory, or is there another >layer of wrapping going on here? > >return new NRTCachingDirectory(FSDirectory.open(new File(path)), >maxMergeSizeMB, maxCachedMB); I was also curious about this subject. Not

Re: Strategy for removing an active shard from zookeeper

2014-07-03 Thread Jeff Wartes
To expand on that, the Collections API DELETEREPLICA command is availible in Solr >= 4.6, but will not have the ability wipe the disk until Solr 4.10. Note that whether or not it deletes anything from disk, DELETEREPLICA will remove that replica from your cluster state in ZK, so even in 4.10, reb

Re: Listening on SolrCloud events

2014-07-03 Thread Jeff Wartes
If you¹re using SolrJ, CloudSolrServer exposes the information you need directly, although you¹d have to poll it for changes. Specifically, this code path will get you a snapshot of the clusterstate: http://lucene.apache.org/solr/4_5_0/solr-solrj/org/apache/solr/client/solrj /impl/CloudSolrServer.h

SolrCloud extended warmup support

2014-07-21 Thread Jeff Wartes
I’d like to ensure an extended warmup is done on each SolrCloud node prior to that node serving traffic. I can do certain things prior to starting Solr, such as pump the index dir through /dev/null to pre-warm the filesystem cache, and post-start I can use the ping handler with a health check f

  1   2   >