MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
We are running our search on solr4.7 and I am evaluating whether to upgrade to solr5.3.1. I found MatchAllDocsQuery is much slower in solr5.3.1. Anyone know why? We have a lot of queries without any query keyword, but we apply filters on the query. Load testing shows those queries are much slower in solr5.3.1 compare to 4.7. If we load test with queries with search keywords, we can see the queries are much faster in solr5.3.1 compare solr4.7. here is sample debug info: (in solr 4.7) 0 86 id 0 *:* true +categoryIdsPath:1001 2 36652255 36651884 *:* *:* MatchAllDocsQuery(*:*) *:* 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm LuceneQParser +categoryIdsPath:1001 +categoryIdsPath:1001 86.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 86.0 85.0 0.0 0.0 0.0 0.0 1.0 (in solr 5.3.1) 0 313 id 0 *:* true +categoryIdsPath:1001 2 36652255 36651884 *:* *:* MatchAllDocsQuery(*:*) *:* 1.0 = *:*, product of: 1.0 = boost 1.0 = queryNorm 1.0 = *:*, product of: 1.0 = boost 1.0 = queryNorm LuceneQParser +categoryIdsPath:1001 +categoryIdsPath:1001 313.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 311.0 311.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Thanks, Wei
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if the slowness of MatchAllDocsQuery is also caused by the removal of fieldcache. Can someone please explain a little bit? Thanks, Wei On Fri, Nov 6, 2015 at 7:15 AM, Shawn Heisey wrote: > On 11/5/2015 10:25 PM, Jack Krupansky wrote: > > I vaguely recall some discussion concerning removal of the field cache in > > Lucene. > > The FieldCache wasn't exactly *removed* ... it's more like it was > renamed, improved, and sort of hidden in a miscellaneous package. Some > things still require this functionality, so they use the hidden class > instead, which was changed to use the DocValues API. > > https://issues.apache.org/jira/browse/LUCENE-5666 > > I am not qualified to discuss LUCENE-5666 beyond what I wrote in the > paragraph above, and it's possible that some of what I said is wrong > because I do not really understand the APIs involved. > > The change has caused problems for Solr. End result from Solr's > perspective: Certain things which used to work perfectly fine (mostly > facets and grouping) in Solr 4.x have one of two problems in 5.x: > Either they don't work at all, or performance has gone way down. Some > of these problems are documented in Jira. These are the issues I know > about: > > https://issues.apache.org/jira/browse/SOLR-8088 > https://issues.apache.org/jira/browse/SOLR-7495 > https://issues.apache.org/jira/browse/SOLR-8096 > > For fields where adding docValues is a viable option (most field types > other than solr.TextField), adding docValues and reindexing is very > likely to solve those problems. > > Sometimes adding docValues won't work, either because the field type > doesn't allow it, or because it's the indexed terms that are needed, not > the original field value. For those situations, there is currently no > solution. > > Thanks, > Shawn > >
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
and see if that is a lot faster, both with old and new Solr. > > -- Jack Krupansky > > On Fri, Nov 6, 2015 at 3:01 PM, wei wrote: > > > Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if > > the slowness of MatchAllDocsQuery is also caused by the removal of > > fieldcache. Can someone please explain a little bit? > > > > Thanks, > > Wei > > > > On Fri, Nov 6, 2015 at 7:15 AM, Shawn Heisey > wrote: > > > > > On 11/5/2015 10:25 PM, Jack Krupansky wrote: > > > > I vaguely recall some discussion concerning removal of the field > cache > > in > > > > Lucene. > > > > > > The FieldCache wasn't exactly *removed* ... it's more like it was > > > renamed, improved, and sort of hidden in a miscellaneous package. Some > > > things still require this functionality, so they use the hidden class > > > instead, which was changed to use the DocValues API. > > > > > > https://issues.apache.org/jira/browse/LUCENE-5666 > > > > > > I am not qualified to discuss LUCENE-5666 beyond what I wrote in the > > > paragraph above, and it's possible that some of what I said is wrong > > > because I do not really understand the APIs involved. > > > > > > The change has caused problems for Solr. End result from Solr's > > > perspective: Certain things which used to work perfectly fine (mostly > > > facets and grouping) in Solr 4.x have one of two problems in 5.x: > > > Either they don't work at all, or performance has gone way down. Some > > > of these problems are documented in Jira. These are the issues I know > > > about: > > > > > > https://issues.apache.org/jira/browse/SOLR-8088 > > > https://issues.apache.org/jira/browse/SOLR-7495 > > > https://issues.apache.org/jira/browse/SOLR-8096 > > > > > > For fields where adding docValues is a viable option (most field types > > > other than solr.TextField), adding docValues and reindexing is very > > > likely to solve those problems. > > > > > > Sometimes adding docValues won't work, either because the field type > > > doesn't allow it, or because it's the indexed terms that are needed, > not > > > the original field value. For those situations, there is currently no > > > solution. > > > > > > Thanks, > > > Shawn > > > > > > > > >
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
the explain part are different in solr4.7 and solr 5.3.1. In solr 4.7, there is only one line 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm in solr 5.3.1, there is actually a boost, and the score is product of boost & queryNorm. Can that cause the problem? if solr5 need to calculate the product of all the hits. I am not sure where the boost come from, and why it is different in solr4.7 1.0 = *:*, product of: 1.0 = boost 1.0 = queryNorm 1.0 = *:*, product of: 1.0 = boost 1.0 = queryNorm
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
Hi Shawn, I took care of the warm up problem during the test. I setup jmeter project, get query log from our production(>10 queries), and run the same query log through jmeter to hit the solr instances with the same qps(about 40). I removed warmup queries in both the solr setup, and also set the autowarmup of cache to 0 in the solrconfig. I run the test for 1 hour. these two instances are not serving other query traffic but they both get update traffic. I disabled softcommit in solr5 and set the hardcommit to 2 minutes. The solr4 instance is a slave node replicating from solr4 master instance, and the master also has 2 minutes commit cycle, and the testing solr4 instance replicate the index every 2 minutes. The solr5 is slower than solr4. After some investigation I realized that it seems the queries containing q=*:* are causing the problem. I splitted the query log into two log files, one with q=*:* and another without(almost all our queries have filter queries). when I run the test, solr5 is faster when running query with query keyword, but is much slower when run "q=*:*" query log. There is no other query traffic to both the two instance.(there is index traffic). When I get the query debug log in my first email, I make sure there is no filter cache (verified through the solr console. after hard commit, the filtercache is cleaned) Hope my email address your concern about how I do the test. What obvious to me is that solr5 is faster in one test(with query keyword) and is slower in the other test(without query keyword). Thanks, Wei On Fri, Nov 6, 2015 at 1:41 PM, Shawn Heisey wrote: > On 11/6/2015 1:01 PM, wei wrote: > > Thanks Jack and Shawn. I checked these Jira tickets, but I am not sure if > > the slowness of MatchAllDocsQuery is also caused by the removal of > > fieldcache. Can someone please explain a little bit? > > I only glanced at your full output in the message at the start of this > thread. I thought I saw facet output in it, but it turns out that the > only mention of facets was the timing information from the debug, so > that very likely rules out the FieldCache change as a culprit. > > I am suspecting that the 4.7 index is warmed better, and may have the > specific filter query (categoryIdsPath:1001)already sitting in the > filterCache. > > Try running that query a few of times on both versions, then restart > Solr on both versions so they both start clean, and run the query *once* > on each system, and see whether there's still a large discrepancy. > > If one of the systems is receiving queries from active clients and the > other is not, then the comparison will be unfair, and biased towards the > one that is getting additional queries. Query activity, even if it > seems unrelated to the query you are testing, has a tendency to reduce > overall qtime values. > > Thanks, > Shawn > >
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
Good point! I tried that, on solr5 the query time is around 100-110ms, and on solr4 it is around 60-63ms(very consistent). Solr5 is slower. Thanks, Wei On Fri, Nov 6, 2015 at 6:46 PM, Yonik Seeley wrote: > On Fri, Nov 6, 2015 at 9:30 PM, wei wrote: > > in solr 5.3.1, there is actually a boost, and the score is product of > boost > > & queryNorm. > > Hmmm, well, it's worth putting on the list of stuff to investigate. > Boosting was also changed in lucene. > > What happens if you try this multiple times in a row? > > &rows=2&fl=id&q={!cache=false}*:*&fq=categoryIdsPath:1001 > > (basically just add {!cache=false} as a prefix to the main query.) > > This would allow hotspot time to compile methods, and ensure that the > filter query was cached, and do a better job of isolating the > "filtered match-all-docs" part of the execution. > > -Yonik >
Re: MatchAllDocsQuery is much slower in solr5.3.1 compare to solr4.7
Thanks Yonik. A JIRA bug is opened: https://issues.apache.org/jira/browse/SOLR-8251 Wei On Fri, Nov 6, 2015 at 7:10 PM, Yonik Seeley wrote: > On Fri, Nov 6, 2015 at 9:56 PM, wei wrote: > > Good point! I tried that, on solr5 the query time is around 100-110ms, > and > > on solr4 it is around 60-63ms(very consistent). Solr5 is slower. > > When it's something easy, there comes a point when it makes sense to > stop asking more questions and just try it yourself... > I just did this, and can confirm what you're seeing. For me, 5.3.1 > is about 5x slower than 4.10 for this particular query. > Thanks for your persistence / patience in reporting this. Could you > open a JIRA issue for it? > > -Yonik >
solr query latency spike when replicating index
I noticed the solr query latency spike on slave node when replicating index from master. Especially when master just finished optimization, the slave node will copy the whole index, and the latency is really bad. Is there some way to fix it? Thanks, Wei
Re: solr query latency spike when replicating index
seems "sar" is not installed. This is product machine, so I can't install it. We use ssd, and the gc throughput is about 95.8. We already throttle the replication to below 20M. We also have enough memory to hold both the jvm and index in memory. I am not sure when replicating the index, if both indexes(old and new) need to be in the memory. The memory is not big enough to hold both(old index+new index+jvm). Thanks, Wei On Fri, Apr 3, 2015 at 3:35 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > In Solr 5.0 you can throttle the replication and limit the bandwidth it > uses. The Sematext guys wrote a nice blog post about it. See > http://blog.sematext.com/2015/01/26/solr-5-replication-throttling/ > > On Thu, Apr 2, 2015 at 1:53 PM, wei wrote: > > > I noticed the solr query latency spike on slave node when replicating > index > > from master. Especially when master just finished optimization, the slave > > node will copy the whole index, and the latency is really bad. > > > > Is there some way to fix it? > > > > Thanks, > > Wei > > > > > > -- > Regards, > Shalin Shekhar Mangar. >
Correct approach to copy index between solr clouds?
Hi, In our set up there are two solr clouds: Cloud A: production cloud serves both writes and reads Cloud B: back up cloud serves only writes Cloud A and B have the same shard configuration. Write requests are sent to both cloud A and B. In certain circumstances when Cloud A's update lags behind, we want to bulk copy the binary index from B to A. We have tried two approaches: Approach 1. For cloud A: a. delete collection to wipe out everything b. create new collection (data is empty now) c. shut down solr server d. copy binary index from cloud B to corresponding shard replicas in cloud A e. start solr server Approach 2. For cloud A: a. shut down solr server b. remove the whole 'data' folder under index/ in each replica c. copy binary index from cloud B to corresponding shard replicas in cloud A d. start solr server Is approach 2 sufficient? I am wondering if delete/recreate collection each time is necessary to get cloud into a "clean" state for copy binary index between solr clouds. Thanks for your advice!
Re: Correct approach to copy index between solr clouds?
Thanks Erick. Can you explain a bit more on the write.lock file? So far I have been copying it over from B to A and haven't seen issue starting the replica. On Sat, Aug 26, 2017 at 9:25 AM, Erick Erickson wrote: > Approach 2 is sufficient. You do have to insure that you don't copy > over the write.lock file however as you may not be able to start > replicas if that's there. > > There's a relatively little-known third option. You an (ab)use the > replication API "fetchindex" command, see: > https://cwiki.apache.org/confluence/display/solr/Index+Replication to > pull the index from Cloud B to replicas on Cloud A. That has the > advantage of working even if you are actively indexing to Cloud B. > NOTE: currently you cannot _query_ CloudA (the target) while the > fetchindex is going on, but I doubt you really care since you were > talking about having Cloud A offline anyway. So for each replica you > fetch to you'll send the fetchindex command directly to the replica on > Cloud A and the "masterURL" will be the corresponding replica on Cloud > B. > > Finally, what I'd really do is _only_ have one replica for each shard > on Cloud A active and fetch to _that_ replica. I'd also delete the > data dir on all the other replicas for the shard on Cloud A. Then as > you bring the additional replicas up they'll do a full synch from the > leader. > > FWIW, > Erick > > On Fri, Aug 25, 2017 at 6:53 PM, Wei wrote: > > Hi, > > > > In our set up there are two solr clouds: > > > > Cloud A: production cloud serves both writes and reads > > > > Cloud B: back up cloud serves only writes > > > > Cloud A and B have the same shard configuration. > > > > Write requests are sent to both cloud A and B. In certain circumstances > > when Cloud A's update lags behind, we want to bulk copy the binary index > > from B to A. > > > > We have tried two approaches: > > > > Approach 1. > > For cloud A: > > a. delete collection to wipe out everything > > b. create new collection (data is empty now) > > c. shut down solr server > > d. copy binary index from cloud B to corresponding shard replicas > in > > cloud A > > e. start solr server > > > > Approach 2. > > For cloud A: > > a. shut down solr server > > b. remove the whole 'data' folder under index/ in each replica > > c. copy binary index from cloud B to corresponding shard replicas > in > > cloud A > > d. start solr server > > > > Is approach 2 sufficient? I am wondering if delete/recreate collection > > each time is necessary to get cloud into a "clean" state for copy binary > > index between solr clouds. > > > > Thanks for your advice! >
commit time in solr cloud
Hi, In solr cloud we want to track the last commit time on each node. The information source is from the luke handler: admin/luke?numTerms=0&wt=json, e.g. - userData: { - commitTimeMSec: "1504895505447" }, - lastModified: "2017-09-08T18:31:45.447Z" I'm assuming the lastModified time is when latest hard commit happens. Is that correct? On all nodes we have autoCommit set to 15 mins interval. One observation I don't understand is quite often the last commit time on shard leaders lags behind the last commit time on replicas, some times the lag is over 10 minutes. My understanding is that as update requests goes to leader first, the timer on the leaders would start earlier than the replicas. Am I missing something here? Thanks, Wei
solr cloud without hard commit?
Hello All, What are the impacts if solr cloud is configured to have only soft commits but no hard commits? In this way if a non-leader node crashes, will it still be able to recover from the leader? Basically we are wondering in a read heavy & write heavy scenario, whether taking hard commit out could help to improve query performance and what are the consequences. Thanks, Wei
Re: solr cloud without hard commit?
Thanks Emir and Erick! Helps me a lot to understand the commit process. A few more questions: 1. https://lucidworks.com/2013/08/23/understanding- transaction-logs-softcommit-and-commit-in-sorlcloud/ mentions that for soft commit, "new segments are created that will be merged". Does that mean without hard commit, soft commits will create many small segments in memory and that could also slow down query? As I understand merge policy only kicks in with hard commit. 2. Without hard commit configure, will the segments still be fsync to disk when accumulated updates exceeds rambuffersizeMB? Is there any concern to increase rambuffersizeMB to a large value? 3. Can transaction logs be disabled in solr cloud? Will functionalities(replication, peer sync) break without transaction logs? Thanks, Wei On Fri, Sep 29, 2017 at 8:33 AM, Erick Erickson wrote: > More than you want to know about hard and soft commits here: > https://lucidworks.com/2013/08/23/understanding- > transaction-logs-softcommit-and-commit-in-sorlcloud/ > > You don't need to read it though, Emir did an admirable job of telling > you why turning off hard commits is a terrible idea. > > Best, > Erick > > On Fri, Sep 29, 2017 at 1:07 AM, Emir Arnautović > wrote: > > Hi Wei, > > Hard commits are about data durability. It will roll over transaction > logs and create index new index segment. If configured with > openSearcher=false, they do not affect query performance much (other then > taking some resources) since they do not invalidate caches. If you have > transaction logs enabled, without hard commits it would grow infinitely and > can result in full disk. In case of heavy indexing, even rare hard commits > can result in large transaction logs causing Solr restart after crash to > take a while because transaction logs are replayed. > > Soft commits are the one that are affecting query performance and should > be as rare as your requirements allow. They invalidate caches causing cold > searches or if you have warming set up, take resources to do the warming. > > > > I would recommend to keep hard commits, set to every 20-60 seconds > (depending on indexing volume) and make sure openSearcher is set to false. > > > > HTH, > > Emir > > > >> On 29 Sep 2017, at 06:55, Wei wrote: > >> > >> Hello All, > >> > >> What are the impacts if solr cloud is configured to have only soft > commits > >> but no hard commits? In this way if a non-leader node crashes, will it > >> still be able to recover from the leader? Basically we are wondering > in a > >> read heavy & write heavy scenario, whether taking hard commit out could > >> help to improve query performance and what are the consequences. > >> > >> Thanks, > >> Wei > > >
Leader initiated recovery authentication failure
Hi All, After enabling basic authentication for solr cloud, I noticed that the internal leader initiated recovery failed with 401 response. The recovery request from leader: GET //replica1.mycloud.com:9090/solr/admin/cores?action=*REQUESTRECOVERY*&core=replica1&wt=javabin&version=2 HTTP/1.1" 401 310 "-" "Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0" 5 My authorization config is: authorization: { - class: "solr.RuleBasedAuthorizationPlugin", - permissions: [ - { - name: "security-edit", - role: "admin", - index: 1 }, - { - name: "schema-edit", - role: "admin", - index: 2 }, - { - name: "config-edit", - role: "admin", - index: 3 }, - { - name: "core-admin-edit", - role: "admin", - index: 4 }, - { - name: "collection-admin-edit", - role: "admin", - index: 5 } ], Looks the unauthorized error is because core-admin-edit requires admin access. How can I config authentication credentials for solr cloud's internal request? Appreciate your help! Thanks, Wei
solr cloud updatehandler stats mismatch
Hi, I use the following api to track the number of update requests: /solr/collection1/admin/mbeans?cat=UPDATE&stats=true&wt=json Result: - class: "org.apache.solr.handler.UpdateRequestHandler", - version: "6.4.2.1", - description: "Add documents using XML (with XSLT), CSV, JSON, or javabin", - src: null, - stats: { - handlerStart: 1509824945436, - requests: 106062, - ... I am quite confused that the number of requests reported above is quite different from the count from solr access logs. A few times the handler stats is much higher: handler reports ~100k requests but in the access log there are only 5k update requests. What could be the possible cause? Thanks, Wei
Re: solr cloud updatehandler stats mismatch
Thanks Amrit. Can you explain a bit more what kind of requests won't be logged? Is that something configurable for solr? Best, Wei On Thu, Nov 9, 2017 at 3:12 AM, Amrit Sarkar wrote: > Wei, > > Are the requests coming through to collection has multiple shards and > replicas. Please mind a update request is received by a node, redirected to > particular shard the doc belong, and then distributed to replicas of the > collection. On each replica, each core, update request is played. > > Can be a probable reason b/w mismatch between Mbeans stats and manual > counting in logs, as not everything gets logged. Need to check that once. > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > Medium: https://medium.com/@sarkaramrit2 > > On Thu, Nov 9, 2017 at 4:34 PM, Furkan KAMACI > wrote: > > > Hi Wei, > > > > Do you compare it with files which are under /var/solr/logs by default? > > > > Kind Regards, > > Furkan KAMACI > > > > On Sun, Nov 5, 2017 at 6:59 PM, Wei wrote: > > > > > Hi, > > > > > > I use the following api to track the number of update requests: > > > > > > /solr/collection1/admin/mbeans?cat=UPDATE&stats=true&wt=json > > > > > > > > > Result: > > > > > > > > >- class: "org.apache.solr.handler.UpdateRequestHandler", > > >- version: "6.4.2.1", > > >- description: "Add documents using XML (with XSLT), CSV, JSON, or > > >javabin", > > >- src: null, > > >- stats: > > >{ > > > - handlerStart: 1509824945436, > > > - requests: 106062, > > > - ... > > > > > > > > > I am quite confused that the number of requests reported above is quite > > > different from the count from solr access logs. A few times the handler > > > stats is much higher: handler reports ~100k requests but in the access > > log > > > there are only 5k update requests. What could be the possible cause? > > > > > > Thanks, > > > Wei > > > > > >
Lucene two-phase iteration question
Hi, I noticed that lucene has introduced a new two-phase iteration API since 5, but could not get a good understanding of how it works. Are there any detail documentation or examples? Does the two-phase iteration result in better query performance? Appreciate your help. Thanks, Wei
Re: Lucene two-phase iteration question
Hello Mikhail, Thank you so much for the info. Trying to digest it first.Can you elaborate more on what has changed? Any pointer is greatly appreciated. Regards, Wei On Mon, Jan 1, 2018 at 10:04 AM, Mikhail Khludnev wrote: > Hello, Wei. > Some first details have been discussed here > https://www.youtube.com/watch?v=BM4-Mv0kWr8 > Unfortunately, things have changed from those times. > > On Sat, Dec 23, 2017 at 1:43 AM, Wei wrote: > > > Hi, > > > > I noticed that lucene has introduced a new two-phase iteration API since > 5, > > but could not get a good understanding of how it works. Are there any > > detail documentation or examples? Does the two-phase iteration result in > > better query performance? Appreciate your help. > > > > Thanks, > > Wei > > > > > > -- > Sincerely yours > Mikhail Khludnev >
Multiple solr instances per host vs Multiple cores in same solr instance
Hi, I have a question about the deployment configuration in solr cloud. When we need to increase the number of shards in solr cloud, there are two options: 1. Run multiple solr instances per host, each with a different port and hosting a single core for one shard. 2. Run one solr instance per host, and have multiple cores(shards) in the same solr instance. Which would be better performance wise? For the first option I think JVM size for each solr instance can be smaller, but deployment is more complicated? Are there any differences for cpu utilization? Thanks, Wei
Re: Multiple solr instances per host vs Multiple cores in same solr instance
Thanks Shawn. When using multiple Solr instances per host, is there any way to prevent solrcloud from putting multiple replicas of the same shard on same host? I see it makes sense if we can splitting into multiple instances with smaller heap size. Besides that, do you think multiple instances will be able to get better CPU utilization on multi-core server? Thanks, Wei On Sun, Aug 26, 2018 at 4:37 AM Shawn Heisey wrote: > On 8/26/2018 12:00 AM, Wei wrote: > > I have a question about the deployment configuration in solr cloud. When > > we need to increase the number of shards in solr cloud, there are two > > options: > > > > 1. Run multiple solr instances per host, each with a different port and > > hosting a single core for one shard. > > > > 2. Run one solr instance per host, and have multiple cores(shards) in > the > > same solr instance. > > > > Which would be better performance wise? For the first option I think JVM > > size for each solr instance can be smaller, but deployment is more > > complicated? Are there any differences for cpu utilization? > > My general advice is to only have one Solr instance per machine. One > Solr instance can handle many indexes, and usually will do so with less > overhead than two or more instances. > > I can think of *ONE* exception to this -- when a single Solr instance > would require a heap that's extremely large. Splitting that into two or > more instances MIGHT greatly reduce garbage collection pauses. But > there's a caveat to the caveat -- in my strong opinion, if your Solr > instance is so big that it requires a huge heap and you're considering > splitting into multiple Solr instances on one machine, you very likely > need to run each of those instances on *separate* machines, so that each > one can have access to all the resources of the machine it's running on. > > For SolrCloud, when you're running multiple instances per machine, Solr > will consider those to be completely separate instances, and you may end > up with all of the replicas for a shard on a single machine, which is a > problem for high availability. > > Thanks, > Shawn > >
Re: Multiple solr instances per host vs Multiple cores in same solr instance
Thanks Bernd. Do you have preferLocalShards=true in both cases? Do you notice CPU/memory utilization difference between the two deployments? How many servers did you use in total? I am curious what's the bottleneck for the one instance and 3 cores configuration. Thanks, Wei On Mon, Aug 27, 2018 at 1:45 AM Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: > My tests with many combinations (instance, node, core) on a 3 server > cluster > with SolrCloud pointed out that highest performance is with multiple solr > instances and shards and replicas placed by rules so that you get advantage > from preferLocalShards=true. > > The disadvantage ist the handling of the system, which means setup, > starting > and stopping, setting up the shards and replicas with rules and so on. > > I tested with 3x3 SolrCloud (3 shards, 3 replicas). > A 3x3 system with one instance and 3 cores per host could handle up to > 30QPS. > A 3x3 system with multi instance (different ports, single core and shard > per > instance) could handle 60QPS on same hardware with same data. > > Also, the single instance per server setup has spikes in the response time > graph > which are not seen with a multi instance setup. > > Tested about 2 month ago with SolCloud 6.4.2. > > Regards, > Bernd > > > Am 26.08.2018 um 08:00 schrieb Wei: > > Hi, > > > > I have a question about the deployment configuration in solr cloud. When > > we need to increase the number of shards in solr cloud, there are two > > options: > > > > 1. Run multiple solr instances per host, each with a different port and > > hosting a single core for one shard. > > > > 2. Run one solr instance per host, and have multiple cores(shards) in > the > > same solr instance. > > > > Which would be better performance wise? For the first option I think JVM > > size for each solr instance can be smaller, but deployment is more > > complicated? Are there any differences for cpu utilization? > > > > Thanks, > > Wei > > >
Re: Multiple solr instances per host vs Multiple cores in same solr instance
Hi Erick, I am looking into the rule based replica placement documentation and confused. How to ensure there are no more than one replica for any shard on the same host? There is an example rule shard:*,replica:<2,node:* seem to serve the purpose, but I am not sure if 'node' refer to solr instance or actual physical host. Is there an example for defining node? Thanks On Sun, Aug 26, 2018 at 8:37 PM Erick Erickson wrote: > Yes, you can use the "node placement rules", see: > https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html > > This is a variant of "rack awareness". > > Of course the simplest way if you're not doing very many collections is to > create the collection with the special "EMPTY" createNodeSet then just > build out your collection with ADDREPLICA, placing each replica on a > particular node. The idea of that capability was exactly to explicitly > control > where each and every replica landed. > > As a third alternative, just create the collection and let Solr put > the replicas where > it will, then use MOVEREPLICA to position replicas as you want. > > The node placement rules are primarily intended for automated or very large > setups. Manually placing replicas is simpler for limited numbers. > > Best, > Erick > On Sun, Aug 26, 2018 at 8:10 PM Wei wrote: > > > > Thanks Shawn. When using multiple Solr instances per host, is there any > way > > to prevent solrcloud from putting multiple replicas of the same shard on > > same host? > > I see it makes sense if we can splitting into multiple instances with > > smaller heap size. Besides that, do you think multiple instances will be > > able to get better CPU utilization on multi-core server? > > > > Thanks, > > Wei > > > > On Sun, Aug 26, 2018 at 4:37 AM Shawn Heisey > wrote: > > > > > On 8/26/2018 12:00 AM, Wei wrote: > > > > I have a question about the deployment configuration in solr cloud. > When > > > > we need to increase the number of shards in solr cloud, there are two > > > > options: > > > > > > > > 1. Run multiple solr instances per host, each with a different port > and > > > > hosting a single core for one shard. > > > > > > > > 2. Run one solr instance per host, and have multiple cores(shards) > in > > > the > > > > same solr instance. > > > > > > > > Which would be better performance wise? For the first option I think > JVM > > > > size for each solr instance can be smaller, but deployment is more > > > > complicated? Are there any differences for cpu utilization? > > > > > > My general advice is to only have one Solr instance per machine. One > > > Solr instance can handle many indexes, and usually will do so with less > > > overhead than two or more instances. > > > > > > I can think of *ONE* exception to this -- when a single Solr instance > > > would require a heap that's extremely large. Splitting that into two or > > > more instances MIGHT greatly reduce garbage collection pauses. But > > > there's a caveat to the caveat -- in my strong opinion, if your Solr > > > instance is so big that it requires a huge heap and you're considering > > > splitting into multiple Solr instances on one machine, you very likely > > > need to run each of those instances on *separate* machines, so that > each > > > one can have access to all the resources of the machine it's running > on. > > > > > > For SolrCloud, when you're running multiple instances per machine, Solr > > > will consider those to be completely separate instances, and you may > end > > > up with all of the replicas for a shard on a single machine, which is a > > > problem for high availability. > > > > > > Thanks, > > > Shawn > > > > > > >
question for rule based replica placement
Hi, In rule based replica placement, how to ensure there are no more than one replica for any shard on the same host? In the documentation there is an example rule shard:*,replica:<2,node:* Does 'node' refer to solr instance or actual physical host? Is there an example for defining the physical host? Thanks, Wei
Re: question for rule based replica placement
Thanks Erick. Suppose I have 5 hosts h1,h2,h3,h4,h5 and want to create a 5X2 solr cloud of 5 shards, 2 replicas per shard. On each host I will run two solr JVMs, each hosts a single solr core. Solr's default 'snitch' provide a 'host' tag, so I wonder if I can use it to prevent any host from have two replicas from the same shard, when creating collection: /solr/admin/collections?action=CREATE&name=mycollection&numShards=5&replicationFactor=2&maxShardsPerNode=1&rule=shard:*, replica<2, host:* Is this the correct way to use 'snitch'? I cannot find more relevant documentation on how to configure and customize 'snitch'. Thanks, Wei On Sun, Sep 2, 2018 at 9:30 PM Erick Erickson wrote: > You need to provide a "snitch" and define a rule appropriately. This > is a variant of "rack awareness". > > Solr considers two JVMs running on the same physical host as > completely separate Solr instances, so to get replicas on different > hosts you need a snitch etc. > > Best, > Erick > On Sun, Sep 2, 2018 at 4:39 PM Wei wrote: > > > > Hi, > > > > In rule based replica placement, how to ensure there are no more than > one > > replica for any shard on the same host? In the documentation there is > an > > example rule > > > > shard:*,replica:<2,node:* > > > > Does 'node' refer to solr instance or actual physical host? Is there an > > example for defining the physical host? > > > > Thanks, > > Wei >
preferLocalShards setting
Hi, I am setting up a solr cloud with external load balancer. Noticed the 'preferLocalShards' configuration and I am wondering how it would impact performance. If one host can have replicas from all shards it sure will be beneficial; but in my 5 shard / 2 replica cloud on 5 servers, each server will only host 2 of the 5 shards( 2 JVMs per server, each JVM have one replica from different shards). Is it useful to set preferLocalShards=true in this case? Thanks, Wei
Index optimization takes too long
Hello, After a recent schema change, it takes almost 40 minutes to optimize the index. The schema change is to enable docValues for all sort/facet fields, which increase the index size from 12G to 14G. Before the change it only takes 5 minutes to do the optimization. I have tried to increase maxMergeAtOnceExplicit because the default 30 could be too low: 100 But it doesn't seem to help. Any suggestions? Thanks, Wei
Re: Index optimization takes too long
Thanks everyone! I checked the system metrics during the optimization process. CPU usage is quite low, there is no I/O wait, and memory usage is not much different from before the docValues change. So I wonder what could be the bottleneck. Thanks, Wei On Sat, Nov 3, 2018 at 1:38 PM Erick Erickson wrote: > Going from my phone so it'll be terse. See uninvertingmergeuodateprocessor > (or something like that). Also, there's an idea in SOLR-12259 IIRC, but > that'll be in 7.6 at the earliest. > > On Sat, Nov 3, 2018, 07:13 Shawn Heisey > > On 11/3/2018 5:32 AM, Dave wrote: > > > On a side note, does adding docvalues to an already indexed field, and > > then optimizing, prevent the need to reindex to take advantage of > > docvalues? I was under the impression you had to reindex the content. > > > > You must reindex when changing the schema to add docValues. An optimize > > will not build the new data structures. It will only rebuild the data > > structures that are already there. > > > > Thanks, > > Shawn > > > > >
Retrieve field from docValues
Hi, I have a few questions about using the useDocValuesAsStored option to retrieve field from docValues: 1. For schema version 1.6, useDocValuesAsStored=true is default, so there is no need to explicitly set it in schema.xml? 2. With useDocValuesAsStored=true and the following definition, will Solr retrieve id from docValues instead of stored field? if fl= id, title, score, both id and title are single value field: Do I need to have all fields stored="false" docValues="true" to make solr retrieve from docValues only? I am using Solr 6.6. Thanks, Wei
Re: Retrieve field from docValues
Thanks Yasufumi and Erick. ---. 2. "it depends". Solr will try to do the most efficient thing possible. If _all_ the fields are docValues, it will return the stored values from the docValues structure. I find this jira: https://issues.apache.org/jira/browse/SOLR-8344Does this mean "Solr will try to do the most efficient thing possible" only working for 7.x? Is the behavior available for 6.6? -- This prevents a disk seek and decompress cycle. Does this still hold if whole index is loaded into memory? Also for the benefit of performance improvement, does the uniqueKey field need to be always docValues? Since it is used in the first phase of distributed search. Thanks, Wei On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson wrote: > 2. "it depends". Solr will try to do the most efficient thing > possible. If _all_ the fields are docValues, it will return the stored > values from the docValues structure. This prevents a disk seek and > decompress cycle. > > However, if even one field is docValues=false Solr will by default > return the stored values. For the multiValued case, you can explicitly > tell Solr to return the docValues field. > > Best, > Erick > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi > wrote: > > > > Hi, > > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so > there > > > is no need to explicitly set it in schema.xml? > > > > Yes. > > > > > 2. With useDocValuesAsStored=true and the following definition, will > Solr > > > retrieve id from docValues instead of stored field? > > > > No. > > AFAIK, if you define both docValues="true" and stored="true" in your > > schema, > > Solr tries to retrieve stored value. > > (Except using streaming expressions or /export handler etc... > > See: > > > https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues > > ) > > > > Thanks, > > Yasufumi > > > > > > 2018年11月6日(火) 9:54 Wei : > > > > > Hi, > > > > > > I have a few questions about using the useDocValuesAsStored option to > > > retrieve field from docValues: > > > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so > there > > > is no need to explicitly set it in schema.xml? > > > > > > 2. With useDocValuesAsStored=true and the following definition, will > Solr > > > retrieve id from docValues instead of stored field? if fl= id, title, > > > score, both id and title are single value field: > > > > > >> > docValues="true" required="true"/> > > > > > > > > docValues="true" required="true"/> > > > > > > Do I need to have all fields stored="false" docValues="true" to make > solr > > > retrieve from docValues only? I am using Solr 6.6. > > > > > > Thanks, > > > Wei > > > >
Re: Retrieve field from docValues
I see there is also a docValuesFormat option, what's the default for this setting? Performance wise is it good to set docValuesFormat="Memory" ? Best, Wei On Tue, Nov 6, 2018 at 11:55 AM Erick Erickson wrote: > Yes, "the most efficient possible" is associated with that JIRA, so only > in 7x. > > "Does this still hold if whole index is loaded into memory?" > The decompression part yes, the disk seek part no. And it's also > sensitive to whether the documentCache already has the document. > > I'd also make uniqueKey ant the _version_ fields docValues. > > Best, > Erick > On Tue, Nov 6, 2018 at 10:44 AM Wei wrote: > > > > Thanks Yasufumi and Erick. > > > > ---. 2. "it depends". Solr will try to do the most efficient thing > > possible. If _all_ the fields are docValues, it will return the stored > > values from the docValues structure. > > > > I find this jira: https://issues.apache.org/jira/browse/SOLR-8344 > Does > > this mean "Solr will try to do the most efficient thing possible" only > > working for 7.x? Is the behavior available for 6.6? > > > > -- This prevents a disk seek and decompress cycle. > > > > Does this still hold if whole index is loaded into memory? Also for the > > benefit of performance improvement, does the uniqueKey field need to be > > always docValues? Since it is used in the first phase of distributed > > search. > > > > Thanks, > > Wei > > > > > > > > On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson > > wrote: > > > > > 2. "it depends". Solr will try to do the most efficient thing > > > possible. If _all_ the fields are docValues, it will return the stored > > > values from the docValues structure. This prevents a disk seek and > > > decompress cycle. > > > > > > However, if even one field is docValues=false Solr will by default > > > return the stored values. For the multiValued case, you can explicitly > > > tell Solr to return the docValues field. > > > > > > Best, > > > Erick > > > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi > > > wrote: > > > > > > > > Hi, > > > > > > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so > > > there > > > > > is no need to explicitly set it in schema.xml? > > > > > > > > Yes. > > > > > > > > > 2. With useDocValuesAsStored=true and the following definition, > will > > > Solr > > > > > retrieve id from docValues instead of stored field? > > > > > > > > No. > > > > AFAIK, if you define both docValues="true" and stored="true" in your > > > > schema, > > > > Solr tries to retrieve stored value. > > > > (Except using streaming expressions or /export handler etc... > > > > See: > > > > > > > > https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues > > > > ) > > > > > > > > Thanks, > > > > Yasufumi > > > > > > > > > > > > 2018年11月6日(火) 9:54 Wei : > > > > > > > > > Hi, > > > > > > > > > > I have a few questions about using the useDocValuesAsStored option > to > > > > > retrieve field from docValues: > > > > > > > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, so > > > there > > > > > is no need to explicitly set it in schema.xml? > > > > > > > > > > 2. With useDocValuesAsStored=true and the following definition, > will > > > Solr > > > > > retrieve id from docValues instead of stored field? if fl= id, > title, > > > > > score, both id and title are single value field: > > > > > > > > > >> > > > docValues="true" required="true"/> > > > > > > > > > > > > > > docValues="true" required="true"/> > > > > > > > > > > Do I need to have all fields stored="false" docValues="true" to > make > > > solr > > > > > retrieve from docValues only? I am using Solr 6.6. > > > > > > > > > > Thanks, > > > > > Wei > > > > > > > > >
Re: Retrieve field from docValues
Also I notice this issue is still open: https://issues.apache.org/jira/browse/SOLR-10816 Does that mean we still need to have stored=true for uniqueKey? On Tue, Nov 6, 2018 at 2:14 PM Wei wrote: > I see there is also a docValuesFormat option, what's the default for this > setting? Performance wise is it good to set docValuesFormat="Memory" ? > > Best, > Wei > > > On Tue, Nov 6, 2018 at 11:55 AM Erick Erickson > wrote: > >> Yes, "the most efficient possible" is associated with that JIRA, so only >> in 7x. >> >> "Does this still hold if whole index is loaded into memory?" >> The decompression part yes, the disk seek part no. And it's also >> sensitive to whether the documentCache already has the document. >> >> I'd also make uniqueKey ant the _version_ fields docValues. >> >> Best, >> Erick >> On Tue, Nov 6, 2018 at 10:44 AM Wei wrote: >> > >> > Thanks Yasufumi and Erick. >> > >> > ---. 2. "it depends". Solr will try to do the most efficient thing >> > possible. If _all_ the fields are docValues, it will return the stored >> > values from the docValues structure. >> > >> > I find this jira: https://issues.apache.org/jira/browse/SOLR-8344 >> Does >> > this mean "Solr will try to do the most efficient thing possible" only >> > working for 7.x? Is the behavior available for 6.6? >> > >> > -- This prevents a disk seek and decompress cycle. >> > >> > Does this still hold if whole index is loaded into memory? Also for the >> > benefit of performance improvement, does the uniqueKey field need to be >> > always docValues? Since it is used in the first phase of distributed >> > search. >> > >> > Thanks, >> > Wei >> > >> > >> > >> > On Tue, Nov 6, 2018 at 8:30 AM Erick Erickson >> > wrote: >> > >> > > 2. "it depends". Solr will try to do the most efficient thing >> > > possible. If _all_ the fields are docValues, it will return the stored >> > > values from the docValues structure. This prevents a disk seek and >> > > decompress cycle. >> > > >> > > However, if even one field is docValues=false Solr will by default >> > > return the stored values. For the multiValued case, you can explicitly >> > > tell Solr to return the docValues field. >> > > >> > > Best, >> > > Erick >> > > On Tue, Nov 6, 2018 at 1:46 AM Yasufumi Mizoguchi >> > > wrote: >> > > > >> > > > Hi, >> > > > >> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, >> so >> > > there >> > > > > is no need to explicitly set it in schema.xml? >> > > > >> > > > Yes. >> > > > >> > > > > 2. With useDocValuesAsStored=true and the following definition, >> will >> > > Solr >> > > > > retrieve id from docValues instead of stored field? >> > > > >> > > > No. >> > > > AFAIK, if you define both docValues="true" and stored="true" in your >> > > > schema, >> > > > Solr tries to retrieve stored value. >> > > > (Except using streaming expressions or /export handler etc... >> > > > See: >> > > > >> > > >> https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-EnablingDocValues >> > > > ) >> > > > >> > > > Thanks, >> > > > Yasufumi >> > > > >> > > > >> > > > 2018年11月6日(火) 9:54 Wei : >> > > > >> > > > > Hi, >> > > > > >> > > > > I have a few questions about using the useDocValuesAsStored >> option to >> > > > > retrieve field from docValues: >> > > > > >> > > > > 1. For schema version 1.6, useDocValuesAsStored=true is default, >> so >> > > there >> > > > > is no need to explicitly set it in schema.xml? >> > > > > >> > > > > 2. With useDocValuesAsStored=true and the following definition, >> will >> > > Solr >> > > > > retrieve id from docValues instead of stored field? if fl= id, >> title, >> > > > > score, both id and title are single value field: >> > > > > >> > > > > > > > > > docValues="true" required="true"/> >> > > > > >> > > > > > > > > > docValues="true" required="true"/> >> > > > > >> > > > > Do I need to have all fields stored="false" docValues="true" to >> make >> > > solr >> > > > > retrieve from docValues only? I am using Solr 6.6. >> > > > > >> > > > > Thanks, >> > > > > Wei >> > > > > >> > > >> >
solr optimize command
Hi, I use the following http request to start solr index optimization: http://localhost:8983/solr//update?skipError=true -F stream.body=' ' The request returns status code 200 shortly, but when looking at the solr instance I noticed that actual optimization has not completed yet as there are more than 1 segments. Is the optimize command async? What is the best approach to validate that optimize is truly completed? Thanks, Wei
Questions for SynonymGraphFilter and WordDelimiterGraphFilter
Hello, We are upgrading to Solr 7.6.0 and noticed that SynonymFilter and WordDelimiterFilter have been deprecated. Solr doc recommends to use SynonymGraphFilter and WordDelimiterGraphFilter instead. In current schema, we have text field type defined as In the index phase we have both SynonymFilter and WordDelimiterFilter configured: Solr documentation states that "graph filters produces correct token graphs, but cannot consume an input token graph correctly. When use these two graph filter during indexing, you must follow it with a FlattenGraphFilter". I am confused as how to replace them with the new SynonymGraphFilter and WordDelimiterGraphFilter. A few questions: 1. Regarding the FlattenGraphFilter, is it to be used only once or multiple times after each graph filter? Can we have the configure like this? 2. Is it possible to we have two graph filters, i.e. both SynonymGraphFilter and WordDelimiterGraphFilter in the same analysis chain? If not what's the best option to replace our current config? 3. With the StopFilterFactory in between SynonymGraphFilter and WordDelimiterGraphFilter, I get a few index errors: Exception writing document id XX to the index; possible analysis error Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 But if I move StopFilter before the SynonymGraphFilter the errors are gone. I guess the StopFilter mess up the SynonymGraphFilter output? Not sure if it's a solr defect or there is a guideline that StopFilter should not be put after graph filters. Thanks in advance for you input. Thanks, Wei
Re: Questions for SynonymGraphFilter and WordDelimiterGraphFilter
Thanks Thomas. You mentioned "Also there is no need for the FlattenGraphFilter", that's quite interesting because the Solr documentation says it's mandatory for indexing: https://lucene.apache.org/solr/guide/7_6/filter-descriptions.html. Is there any more explanation for this? Best regards, Wei On Mon, Jan 7, 2019 at 7:56 AM Thomas Aglassinger < t.aglassin...@netconomy.net> wrote: > Hi Wei, > > here's a fairly simple field type we currently use in a project that seems > to do the job with graph synonyms. Maybe this helps as a starting point for > you: > > positionIncrementGap="100"> > > > managed="de" /> > /> > preserveOriginal="1" > generateWordParts="1" generateNumberParts="1" > catenateWords="1" > catenateNumbers="1" catenateAll="0" > splitOnCaseChange="1" /> > > > > > > > > As you can see we use the same filters for both indexing and query, so > this might have some impact on positional queries but so far it seems > negligible for the short synonyms we use in practice. Also there is no need > for the FlattenGraphFilter. > > The WhitespaceTokenizerFactory ensures that you can define synonyms with > hyphens like mac-book -> macbook. > > Best regards, Thomas. > > > On 05.01.19, 02:11, "Wei" wrote: > > Hello, > > We are upgrading to Solr 7.6.0 and noticed that SynonymFilter and > WordDelimiterFilter have been deprecated. Solr doc recommends to use > SynonymGraphFilter and WordDelimiterGraphFilter instead > I guess the StopFilter mess up the SynonymGraphFilter output? Not sure > if it's a solr defect or there is a guideline that StopFilter should > not be put after graph filters. > > Thanks in advance for you input. > > > Thanks, > > Wei > > >
Re: Questions for SynonymGraphFilter and WordDelimiterGraphFilter
bump.. On Mon, Jan 7, 2019 at 11:53 AM Wei wrote: > Thanks Thomas. You mentioned "Also there is no need for the > FlattenGraphFilter", that's quite interesting because the Solr > documentation says it's mandatory for indexing: > https://lucene.apache.org/solr/guide/7_6/filter-descriptions.html. Is > there any more explanation for this? > > Best regards, > Wei > > > On Mon, Jan 7, 2019 at 7:56 AM Thomas Aglassinger < > t.aglassin...@netconomy.net> wrote: > >> Hi Wei, >> >> here's a fairly simple field type we currently use in a project that >> seems to do the job with graph synonyms. Maybe this helps as a starting >> point for you: >> >> > positionIncrementGap="100"> >> >> >> > managed="de" /> >> > managed="de" /> >> > preserveOriginal="1" >> generateWordParts="1" generateNumberParts="1" >> catenateWords="1" >> catenateNumbers="1" catenateAll="0" >> splitOnCaseChange="1" /> >> >> >> >> >> >> >> >> As you can see we use the same filters for both indexing and query, so >> this might have some impact on positional queries but so far it seems >> negligible for the short synonyms we use in practice. Also there is no need >> for the FlattenGraphFilter. >> >> The WhitespaceTokenizerFactory ensures that you can define synonyms with >> hyphens like mac-book -> macbook. >> >> Best regards, Thomas. >> >> >> On 05.01.19, 02:11, "Wei" wrote: >> >> Hello, >> >> We are upgrading to Solr 7.6.0 and noticed that SynonymFilter and >> WordDelimiterFilter have been deprecated. Solr doc recommends to use >> SynonymGraphFilter and WordDelimiterGraphFilter instead >> I guess the StopFilter mess up the SynonymGraphFilter output? Not sure >> if it's a solr defect or there is a guideline that StopFilter should >> not be put after graph filters. >> >> Thanks in advance for you input. >> >> >> Thanks, >> >> Wei >> >> >>
solr 7 optimize with Tlog/Pull replicas
Hi, RecentIy I encountered a strange issue with optimize in Solr 7.6. The cloud is created with 4 shards with 2 Tlog replicas per shard. After batch index update I issue an optimize command to a randomly picked replica in the cloud. After a while when I check, all the non-leader Tlog replicas finished optimization to a single segment, however all the leader replicas still have multiple segments. Previously inn the all NRT replica cloud, I see optimization is triggered on all nodes. Is the optimization process different with Tlog/Pull replicas? Best, Wei
Re: solr 7 optimize with Tlog/Pull replicas
Thanks Erick. 1> TLOG replicas shouldn’t optimize on the follower. They should optimize on the leader then replicate the entire index to the follower. Does that mean the follower will ignore the optimize request? Or shall I send the optimize request only to one of the leaders? 2> As of Solr 7.5, optimize should not optimize to a single segment _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set numSegments on the optimize command. -- Is the 5G limit controlled by maxMegedSegmentMB setting? In solrconfig.xml I used these settings: 100 10 10 20480 But in the end I see multiple segments much smaller than the 20GB limit. In 7.6 is it required to explicitly set the number of segments to 1? e.g shall I use /update?optimize=true&waitSearcher=false&maxSegments=1 Best, Wei On Fri, Mar 8, 2019 at 12:29 PM Erick Erickson wrote: > This is very odd for at least two reasons: > > 1> TLOG replicas shouldn’t optimize on the follower. They should optimize > on the leader then replicate the entire index to the follower. > > 2> As of Solr 7.5, optimize should not optimize to a single segment > _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set > numSegments on the optimize command. > > So if you can reliably reproduce this, it’s probably worth a JIRA…... > > > On Mar 8, 2019, at 11:21 AM, Wei wrote: > > > > Hi, > > > > RecentIy I encountered a strange issue with optimize in Solr 7.6. The > cloud > > is created with 4 shards with 2 Tlog replicas per shard. After batch > index > > update I issue an optimize command to a randomly picked replica in the > > cloud. After a while when I check, all the non-leader Tlog replicas > > finished optimization to a single segment, however all the leader > replicas > > still have multiple segments. Previously inn the all NRT replica cloud, > I > > see optimization is triggered on all nodes. Is the optimization process > > different with Tlog/Pull replicas? > > > > Best, > > Wei > >
Re: solr 7 optimize with Tlog/Pull replicas
A side question, for heavy bulk indexing, what's the recommended setting for auto commit? As there is no query needed during the bulking indexing process, I have auto soft commit disabled. Is there any side effect if I also disable auto commit? On Sun, Mar 10, 2019 at 10:22 PM Wei wrote: > Thanks Erick. > > 1> TLOG replicas shouldn’t optimize on the follower. They should optimize > on the leader then replicate the entire index to the follower. > > Does that mean the follower will ignore the optimize request? Or shall I > send the optimize request only to one of the leaders? > > 2> As of Solr 7.5, optimize should not optimize to a single segment > _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set > numSegments on the optimize command. > > -- Is the 5G limit controlled by maxMegedSegmentMB setting? In > solrconfig.xml I used these settings: > > >100 >10 >10 >20480 > > > But in the end I see multiple segments much smaller than the 20GB limit. > In 7.6 is it required to explicitly set the number of segments to 1? e.g > shall I use > > /update?optimize=true&waitSearcher=false&maxSegments=1 > > Best, > Wei > > > On Fri, Mar 8, 2019 at 12:29 PM Erick Erickson > wrote: > >> This is very odd for at least two reasons: >> >> 1> TLOG replicas shouldn’t optimize on the follower. They should optimize >> on the leader then replicate the entire index to the follower. >> >> 2> As of Solr 7.5, optimize should not optimize to a single segment >> _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set >> numSegments on the optimize command. >> >> So if you can reliably reproduce this, it’s probably worth a JIRA…... >> >> > On Mar 8, 2019, at 11:21 AM, Wei wrote: >> > >> > Hi, >> > >> > RecentIy I encountered a strange issue with optimize in Solr 7.6. The >> cloud >> > is created with 4 shards with 2 Tlog replicas per shard. After batch >> index >> > update I issue an optimize command to a randomly picked replica in the >> > cloud. After a while when I check, all the non-leader Tlog replicas >> > finished optimization to a single segment, however all the leader >> replicas >> > still have multiple segments. Previously inn the all NRT replica >> cloud, I >> > see optimization is triggered on all nodes. Is the optimization process >> > different with Tlog/Pull replicas? >> > >> > Best, >> > Wei >> >>
Re: solr 7 optimize with Tlog/Pull replicas
Thanks Erick, it's very helpful. So for bulking indexing in a Tlog or Tlog/Pull cloud, when we optimize at the end of updates, segments on the leader replica will change rapidly and the follower replicas will be continuously pulling from the leader, effectively downloading the whole index. Is there a more efficient way? On Mon, Mar 11, 2019 at 9:59 AM Erick Erickson wrote: > do _not_ turn of hard commits, even when bulk indexing. Set the > OpenSeacher to false in your config. This is for two reasons: > 1> the only time the transaction log is rolled over is when a hard commit > happens. If you turn off commits it’ll grow to a very large size. > 2> If, for any reason, the node restarts, it’ll replay the transaction log > from the last hard commit point, potentially taking hours if you haven’t > committed. > > And you should probably open a new searcher occasionally, even while bulk > indexing. For Real Time Get there are some internal structures that grow in > proportion to the docs indexed since the last searcher was opened. > > And for your other quesitons: > <1> I believe so, try it and look at your solr log. > > <2> Yes. Have you looked at Mike’s video (the third one down) here: > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html? > TieredMergePolicy is the third video. The merge policy combines like-sized > segments. It’s wasteful to rewrite, say, a 19G segment just to add a 1G so > having multiple segments < 20G is perfectly normal. > > Best, > Erick > > > On Mar 10, 2019, at 10:36 PM, Wei wrote: > > > > A side question, for heavy bulk indexing, what's the recommended setting > > for auto commit? As there is no query needed during the bulking indexing > > process, I have auto soft commit disabled. Is there any side effect if I > > also disable auto commit? > > > > On Sun, Mar 10, 2019 at 10:22 PM Wei wrote: > > > >> Thanks Erick. > >> > >> 1> TLOG replicas shouldn’t optimize on the follower. They should > optimize > >> on the leader then replicate the entire index to the follower. > >> > >> Does that mean the follower will ignore the optimize request? Or shall I > >> send the optimize request only to one of the leaders? > >> > >> 2> As of Solr 7.5, optimize should not optimize to a single segment > >> _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set > >> numSegments on the optimize command. > >> > >> -- Is the 5G limit controlled by maxMegedSegmentMB setting? In > >> solrconfig.xml I used these settings: > >> > >> class="org.apache.solr.index.TieredMergePolicyFactory"> > >> 100 > >> 10 > >> 10 > >> 20480 > >> > >> > >> But in the end I see multiple segments much smaller than the 20GB limit. > >> In 7.6 is it required to explicitly set the number of segments to 1? e.g > >> shall I use > >> > >> /update?optimize=true&waitSearcher=false&maxSegments=1 > >> > >> Best, > >> Wei > >> > >> > >> On Fri, Mar 8, 2019 at 12:29 PM Erick Erickson > > >> wrote: > >> > >>> This is very odd for at least two reasons: > >>> > >>> 1> TLOG replicas shouldn’t optimize on the follower. They should > optimize > >>> on the leader then replicate the entire index to the follower. > >>> > >>> 2> As of Solr 7.5, optimize should not optimize to a single segment > >>> _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set > >>> numSegments on the optimize command. > >>> > >>> So if you can reliably reproduce this, it’s probably worth a JIRA…... > >>> > >>>> On Mar 8, 2019, at 11:21 AM, Wei wrote: > >>>> > >>>> Hi, > >>>> > >>>> RecentIy I encountered a strange issue with optimize in Solr 7.6. The > >>> cloud > >>>> is created with 4 shards with 2 Tlog replicas per shard. After batch > >>> index > >>>> update I issue an optimize command to a randomly picked replica in the > >>>> cloud. After a while when I check, all the non-leader Tlog replicas > >>>> finished optimization to a single segment, however all the leader > >>> replicas > >>>> still have multiple segments. Previously inn the all NRT replica > >>> cloud, I > >>>> see optimization is triggered on all nodes. Is the optimization > process > >>>> different with Tlog/Pull replicas? > >>>> > >>>> Best, > >>>> Wei > >>> > >>> > >
Question for separate query and updates with TLOG and PULL replicas
Hi, I have a question about how to complete separate queries and updates in a cluster of mixed TLOG and PULL replicas. solr cloud setup: Solr-7.6.0, 10 shards, each shard has 2 TLOG + 4 PULL replicas. In solrconfig.xml we set preferred replica type for queries to PULL: replica.type:PULL A load-balancer is set up in front of the solr cloud, including both TLOG and PULL replicas. Also we use a http client for queries. Some observations: 1. In the TLOG replicas, I see about the same number of external queries in jetty access log. It is expected as our load balancer does not differentiate TLOG and PULL replicas. My question is, when the TLOG replica receives an external query, will it forward to one of the PULL replicas? Or will it send the shard request to PULL replicas but still serve as the aggregate node for the query? 2. In the TLOG replicas, I am still seeing some internal shard request, but in much lower volume compare to PULL replicas. I checked one leader TLOG replica, the number of shard requests is 1% of that on PULL replicas in the same shard. With shards.preference=replica.type:PULL, why would the TLOG receive any internal shard request? To completely separate query and updates, I think that I might need to have the load-balancer set up to include only the PULL replicas. Is there any other option? Thanks, Wei
BinaryResponseWriter fetches unnecessary fields?
Hi all, We observe that solr query time increases significantly with the number of rows requested, even all we retrieve for each document is just fl=id,score. Debugged a bit and see that most of the increased time was spent in BinaryResponseWriter, converting lucene document into SolrDocument. Inside convertLuceneDocToSolrDoc(): https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491 839e6a6b69/solr/core/src/java/org/apache/solr/response/ DocsStreamer.java#L182 for (IndexableField f : doc.getFields()) I am a bit puzzled why we need to iterate through all the fields in the document. Why can’t we just iterate through the requested fields in fl? Specifically: https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491 839e6a6b69/solr/core/src/java/org/apache/solr/response/ DocsStreamer.java#L156 if we change sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema()) to sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(), fnames) and just iterate through fnames in convertLuceneDocToSolrDoc(), there is a significant performance boost in our case, the query time increase from rows=128 vs rows=500 is much smaller. Am I missing something here? Thanks, Wei
Re: BinaryResponseWriter fetches unnecessary fields?
Thanks Chris! Is RetrieveFieldsOptimizer a new functionality introduced in 7.x? Our observation is with botht 5.4 & 6.4. I have created a jira for the issue: https://issues.apache.org/jira/browse/SOLR-11891 I am also wondering how enableLazyFieldLoading affect the case, but haven't tested yet. Please let us know if you catch anything. Thanks, Wei On Mon, Jan 22, 2018 at 3:15 PM, Chris Hostetter wrote: > > : Inside convertLuceneDocToSolrDoc(): > : > : > : https://github.com/apache/lucene-solr/blob/ > df874432b9a17b547acb24a01d3491 > : 839e6a6b69/solr/core/src/java/org/apache/solr/response/ > : DocsStreamer.java#L182 > : > : > :for (IndexableField f : doc.getFields()) > : > : > : I am a bit puzzled why we need to iterate through all the fields in the > : document. Why can’t we just iterate through the requested fields in fl? > : Specifically: > > I have a hunch here -- but i haven't verified it. > > First of all: the specific code in question that you mention assumes it > doesn't *need* to filter out the result of "doc.getFields()" basd on the > 'fl' because at the point in the processing where the DocsStreamer is > looping over the result of "doc.getFields()" the "Document" object it's > dealing with *should* only contain the specific (subset of stored) fields > requested by the fl param -- this is handled by RetrieveFieldsOptimizer & > SolrDocumentFetcher that the DocsStreamer builds up acording to the > results of ResultContext.getReturnFields() when asking the > SolrIndexSearcher to fetch the doc() > > But i think what's happening here is that because of the documentCache, > there are cases where the SolrIndexSearcher is not actaully using > a SolrDocumentStoredFieldVisitor to limit what's requested from the > IndexReader, and the resulting Document contains all fields -- which is > then compounded by code that loops over every field. > > At a quick glance, I'm a little fuzzy on how exactly > enableLazyFieldLoading may/may-not be affecting things here, but either > way I think you are correct -- we can/should make this overall stack of > code smarter about looping over fields we know we want, vs looping over > all fields in the doc. > > Can you please file a jira for this? > > > -Hoss > http://www.lucidworks.com/
facet.method=uif not working in solr cloud?
Hi, I am using the following parameters for faceting, request solr to use the UIF method; &facet=on&facet.field=color&q=*:*&facet.method=uif&facet.mincount=1&debugQuery=true It works as expected in my local standalone solr: - facet-debug: { - elapse: 2, - sub-facet: [ - { - processor: "SimpleFacets", - elapse: 2, - action: "field facet", - maxThreads: 0, - sub-facet: [ - { - elapse: 2, - requestedMethod: "UIF", - appliedMethod: "UIF", - inputDocSetSize: 8191, - field: "color" } ] } ] }, However when I apply the same query to solr cloud with multiple shards, the appliedMethod is alway FC instead of UIF: { - processor: "SimpleFacets", - elapse: 18, - action: "field facet", - maxThreads: 0, - sub-facet: [ - { - elapse: 58, - requestedMethod: "UIF", - appliedMethod: "FC", - inputDocSetSize: 33487, - field: "color", - numBuckets: 238 } ] } I also see that in standalone mode fieldValueCache is used with UIF applied, but in cloud mode fieldValueCache is always empty. Are there any other parameters I need to apply UIF faceting in solr cloud? Thanks, Wei
Re: facet.method=uif not working in solr cloud?
Thanks Alessandro. Totally agree that from the logic I can't see why the requested facet.method=uif is not accepted. I don't see anything in solr.log also. However I find that the uif method somehow works with json facet api in cloud mode, e.g: curl http://mysolrcloud:8983/solr/mycollection/select -d 'q=*:*&wt=json&rows=0&json.facet={color: {type: terms, field : color, method : uif, limit:1000, mincount:1}}&debugQuery=true' Then in the debug response I see: "facet-trace":{ - "processor":"FacetQueryProcessor", - "elapse":453, - "query":null, - "domainSize":70215, - "sub-facet":[ 1. { - "processor":"FacetFieldProcessorByArrayUIF", - "elapse":1, - "field":"color", - "limit":1000, - "numBuckets":20, - "domainSize":7166 }, 2. { - "processor":"FacetFieldProcessorByArrayUIF", - "elapse":1, - "field":"color", - "limit":1000 - "numBuckets":19, - "domainSize":7004 }, 3. { - "processor":"FacetFieldProcessorByArrayUIF", - "elapse":2, - "field":"color", - "limit":1000, - "numBuckets":20, - "domainSize":7030 }, 4. { - "processor":"FacetFieldProcessorByArrayUIF", - "elapse":80, - "field":"color", - "limit":1000, - "numBuckets":20, - "domainSize":6969 }, 5. { - "processor":"FacetFieldProcessorByArrayUIF", - "elapse":85, - "field":"color", - "limit":1000, - "numBuckets":20, - "domainSize":6953 }, 6. { - "processor":"FacetFieldProcessorByArrayUIF", - "elapse":85, - "field":"color", - "limit":1000, - "numBuckets":20, - "domainSize":6901 }, 7. { - "processor":"FacetFieldProcessorByArrayUIF", - "elapse":93, - "field":"color", - "limit":1000, - "numBuckets":20, - "domainSize":6951 }, 8. { - "processor":"FacetFieldProcessorByArrayUIF", - "elapse":104, - "field":"color", - "limit":1000, - "numBuckets":19, - "domainSize":7127 } ] A few things puzzled me here. Looks like when using the json facet api, SimpleFacets is not used, replaced by FacetFieldPorcessorByArrayUIF processor. Is that the expected behavior? Also with uif method applied, facet latency is greatly increased. Some shards have much bigger elapse time reported ( 104 vs 1), I wonder what could cause the discrepancy as my index in different shards are evenly distributed. Thanks, Wei On Wed, Jan 31, 2018 at 2:24 AM, Alessandro Benedetti wrote: > I worked personally on the SimpleFacets class which does the facet method > selection : > > FacetMethod appliedFacetMethod = selectFacetMethod(field, > sf, requestedMethod, mincount, > exists); > > RTimer timer = null; > if (fdebug != null) { >fdebug.putInfoItem("requestedMethod", requestedMethod==null?"not > specified":requestedMethod.name()); >fdebug.putInfoItem("appliedMethod", appliedFacetMethod.name()); >fdebug.putInfoItem("inputDocSetSize", docs.size()); >fdebug.putInfoItem("field", field); >timer = new RTimer(); > } > > Within the select facet method , the only code block related UIF is ( > another block can apply when facet method arrives null to the Solr Node, > but > that should not apply as we see the facet method in the debug): > > /* UIF without DocValues can't deal with mincount=0, the reason is because > we create the buckets based on the values present in the result > set. > So we are not going to see facet values which are not in the > result > set */ > if (method == FacetMethod.UIF > && !field.hasDocValues() && mincount == 0) { >method = field.multiValued() ? FacetMethod.FC : FacetMethod.FCS; > } > > So is there anything in the logs? > Because that seems to me the only point where you can change from UIF to FC > and you clearly have mincount=1. > > > > > > - > --- > Alessandro Benedetti > Search Consultant, R&D Software Engineer, Director > Sease Ltd. - www.sease.io > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: facet.method=uif not working in solr cloud?
I tried to debug a bit and see that when executing on a cloud solr server, although I put facet.field=color&q=*:*&facet.method=uif&facet.mincount=1 in the request url, at the point it reaches SimpleFacet inside req.params it somehow has been rewritten to f.color.facet.mincount=0, no wonder the method chosen become FC. So one myth solved; but the new myth is why the facet.mincount is override to 0 in solr req? Cheers, Wei On Thu, Feb 1, 2018 at 2:01 AM, Alessandro Benedetti wrote: > " Looks like when using the json facet api, > SimpleFacets is not used, replaced by FacetFieldPorcessorByArrayUIF " > > That is expected, I remember Yonik to stress the fact that it is a > completely different approach to faceting ( and different components and > classes are involved). > > But your first case, it may be worth an investigation. > If you have the tools and you are used to it I would encourage you to > reproduce the issue and remote debug it from a Solr server. > Putting a breakpoint in the Simple Facets method you should be able to > solve > the mystery ( a bug maybe ? I am very curious about it. ) > > Cheers > > > > > - > --- > Alessandro Benedetti > Search Consultant, R&D Software Engineer, Director > Sease Ltd. - www.sease.io > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: facet.method=uif not working in solr cloud?
Adding facet.distrib.mco=true did the trick. Thanks Toke and Alessandro! Cheers, Wei On Thu, Feb 8, 2018 at 1:23 AM, Toke Eskildsen wrote: > On Fri, 2018-02-02 at 17:40 -0800, Wei wrote: > > I tried to debug a bit and see that when executing on a cloud solr > > server, although I put > > facet.field=color&q=*:*&facet.method=uif&facet.mincount=1 in > > the request url, at the point it reaches SimpleFacet inside > > req.params it somehow has been rewritten > > to f.color.facet.mincount=0, no wonder the > > method chosen become FC. So one myth solved; but the new myth is why > > the facet.mincount is override to 0 in solr req? > > AFAIK, it is due to an attempt of optimisation for distributed > faceting. The relevant JIRA seems to be https://issues.apache.org/jira/ > browse/SOLR-8988 > > Try setting facet.distrib.mco=true > > - Toke Eskildsen, Royal Danish Library > >
Re: facet.method=uif not working in solr cloud?
Thanks all! It's really great learning. A bit off the topic, after I enabled facet.method = uif in solr cloud, the faceting performance is actually much worse than the original fc( ~1000 ms with uif vs ~200 ms with fc). My cloud has 8 shards with 6 replicas in each shard. I do see that fieldValueCache is getting utilized. Any reason uif could be so slow? On Tue, Feb 13, 2018 at 7:41 AM, Yonik Seeley wrote: > Great, thanks for tracking that down! > It's interesting that a mincount of 0 disables uif processing in the > first place. IIRC, it's only the hash-based method (as opposed to > array-based) that can't return zero counts. > > -Yonik > > > On Tue, Feb 13, 2018 at 6:17 AM, Alessandro Benedetti > wrote: > > *Update* : This has been actually already solved by Hoss. > > > > https://issues.apache.org/jira/browse/SOLR-11711 and this is the Pull > > Request : https://github.com/apache/lucene-solr/pull/279/files > > > > This should go live with 7.3 > > > > Cheers > > > > > > > > - > > --- > > Alessandro Benedetti > > Search Consultant, R&D Software Engineer, Director > > Sease Ltd. - www.sease.io > > -- > > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: facet.method=uif not working in solr cloud?
Thanks Yonik. If uif has big upfront cost when hits solr the first time, in solr cloud the same faceting request could hit different replicas in the same shard, so that cost will happen at least for the number of replicas? If we are doing frequent auto commits, fieldvaluecache will be invalidated and uif will have to pay the upfront cost again after each commit? On Wed, Feb 14, 2018 at 11:51 AM, Yonik Seeley wrote: > On Wed, Feb 14, 2018 at 2:28 PM, Wei wrote: > > Thanks all! It's really great learning. A bit off the topic, after I > > enabled facet.method = uif in solr cloud, the faceting performance is > > actually much worse than the original fc( ~1000 ms with uif vs ~200 ms > > with fc). My cloud has 8 shards with 6 replicas in each shard. I do see > > that fieldValueCache is getting utilized. Any reason uif could be so > > slow? > > I haven't seen that before. Are you sure it's not the first time > faceting on a field? uif has big upfront cost, but is usually faster > once that cost has been paid. > > > -Yonik > > > On Tue, Feb 13, 2018 at 7:41 AM, Yonik Seeley wrote: > > > >> Great, thanks for tracking that down! > >> It's interesting that a mincount of 0 disables uif processing in the > >> first place. IIRC, it's only the hash-based method (as opposed to > >> array-based) that can't return zero counts. > >> > >> -Yonik > >> > >> > >> On Tue, Feb 13, 2018 at 6:17 AM, Alessandro Benedetti > >> wrote: > >> > *Update* : This has been actually already solved by Hoss. > >> > > >> > https://issues.apache.org/jira/browse/SOLR-11711 and this is the Pull > >> > Request : https://github.com/apache/lucene-solr/pull/279/files > >> > > >> > This should go live with 7.3 > >> > > >> > Cheers > >> > > >> > > >> > > >> > - > >> > --- > >> > Alessandro Benedetti > >> > Search Consultant, R&D Software Engineer, Director > >> > Sease Ltd. - www.sease.io > >> > -- > >> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > >> >
Different solr score between stand alone vs cloud mode solr
Hi, Recently we have an observation that really puzzled us. We have two instances of Solr, one in stand alone mode and one is a single-shard solr cloud with a couple of replicas. Both are indexed with the same documents and have same solr version 6.6.2. When issue the same query, the solr score from stand alone and cloud are different. How could this happen? With the same data, software version and query, should solr score be exactly same regardless of cloud mode or not? Thanks, Wei
Re: Different solr score between stand alone vs cloud mode solr
Thanks Erick. However our indexes on stand alone and cloud are both static -- we indexed them from the same source xmls, optimize and have no updates after it is done. Also in cloud there is only one single shard( with multiple replicas ). I assume distributed stats doesn't have effect in this case? Thanks, Wei On Thu, Jun 7, 2018 at 12:18 PM, Erick Erickson wrote: > Short form: > > As docs are updated, they're marked as deleted until the segment is > merged. This affects things like term frequency and doc frequency > which in turn influences the score. > > Due to how commits happen, i.e. autocommit will hit at slightly skewed > wall-clock time, different segments are merged on different replicas > of the same shard. Thus the scores can be slightly different > > You can turn on distributed stats which will help with this: > https://issues.apache.org/jira/browse/SOLR-1632 > > Best, > Erick > > On Thu, Jun 7, 2018 at 12:07 PM, Wei wrote: > > Hi, > > > > Recently we have an observation that really puzzled us. We have two > > instances of Solr, one in stand alone mode and one is a single-shard > solr > > cloud with a couple of replicas. Both are indexed with the same > documents > > and have same solr version 6.6.2. When issue the same query, the solr > > score from stand alone and cloud are different. How could this happen? > > With the same data, software version and query, should solr score be > > exactly same regardless of cloud mode or not? > > > > Thanks, > > Wei >
How to exclude certain values in multi-value field filter query
Hi, I have a multi-value field, and there is a limited set of values for the field: A, B, C, D. Is there a way to filter out documents that has only A or B values in the multi-value field? Basically I want to exclude document that has: A B A B and get documents that has: C D C D A C B C A D B D A B C A B D A C D B C D A B C D Thanks, Wei
Re: How to exclude certain values in multi-value field filter query
Thanks Mikhail and Alessandro. On Tue, Jun 19, 2018 at 2:37 AM, Mikhail Khludnev wrote: > you need to index num vals > <https://lucene.apache.org/solr/7_1_0//solr-core/org/ > apache/solr/update/processor/CountFieldValuesUpdateProcessorFactory.html> > in the separate field, and then *:* -(V:(A AND B) AND numVals:2) -(V:(A OR > B) AND numVals:1) > > > On Tue, Jun 19, 2018 at 9:20 AM Wei wrote: > > > Hi, > > > > I have a multi-value field, and there is a limited set of values for the > > field: A, B, C, D. > > Is there a way to filter out documents that has only A or B values in the > > multi-value field? > > > > Basically I want to exclude document that has: > > > > A > > > > B > > > > A B > > > > and get documents that has: > > > > > > C > > > > D > > > > C D > > > > A C > > > > B C > > > > A D > > > > B D > > > > A B C > > > > A B D > > > > A C D > > > > B C D > > > > A B C D > > > > > > Thanks, > > > > Wei > > > > > -- > Sincerely yours > Mikhail Khludnev >
solr filter query on text field
Hi, I am running filter query on a field of text_general type and see completely different results for the following queries: fq= my_text_field:"Jurassic park the movie" returns 0 result fq= my_text_field:(Jurassic park the movie) returns 20 result fq= my_text_field:Jurassic park the movie returns thousands of results Which one is the correct syntax? I am confused why the first query doesn't have any match at all. I also thought 2 and 3 are the same, but turns out quite different. Thanks, Wei
Re: solr filter query on text field
Thanks Erick and Andrea! If my default operator is OR, fq= my_text_field:(Jurassic park the movie) is equivalent to my_text_field:(Jurassic OR park OR the OR movie)? That make sense. On Wed, Jul 11, 2018 at 9:06 AM, Andrea Gazzarini wrote: > The syntax is valid in all those three examples, the right one depends on > what you need. > > The first query executes a proximity search (you can think to a phrase > search, for simplicity) so it returns no result because probably you don't > have any matching docs with that whole literal. > > The second is querying the my_text_field for all terms which compose the > value between parenthesis. You can think to a query where each term is an > optional clause, something like mytextfield:jurassic OR mytextfiekd:park... > (it's not exactly an OR but this could give you the idea= > > The third example is not doing what you think. My_text_field is used only > with the first term (Jurassic) while the others are using the default > field. Something like mytextfield:jurassic OR defaultfield:park OR > defaultfield:the That's the reason you have so many results (I guess > the default field is a catch-all field) > > Sorry for typos I'm using my mobile > > Andrea > > Il mer 11 lug 2018, 17:54 Wei ha scritto: > > > Hi, > > > > I am running filter query on a field of text_general type and see > > completely different results for the following queries: > > > >fq= my_text_field:"Jurassic park the movie" returns 0 > > result > > > >fq= my_text_field:(Jurassic park the movie) returns 20 > > result > > > >fq= my_text_field:Jurassic park the movie returns > > thousands of results > > > > > > Which one is the correct syntax? I am confused why the first query > doesn't > > have any match at all. I also thought 2 and 3 are the same, but turns > out > > quite different. > > > > > > Thanks, > > Wei > > >
Re: solr filter query on text field
btw, is there any difference if the fq field is a string field vs test field? On Wed, Jul 11, 2018 at 11:59 AM, Wei wrote: > Thanks Erick and Andrea! If my default operator is OR, fq= > my_text_field:(Jurassic park the movie) is equivalent to > my_text_field:(Jurassic > OR park OR the OR movie)? That make sense. > > On Wed, Jul 11, 2018 at 9:06 AM, Andrea Gazzarini > wrote: > >> The syntax is valid in all those three examples, the right one depends on >> what you need. >> >> The first query executes a proximity search (you can think to a phrase >> search, for simplicity) so it returns no result because probably you don't >> have any matching docs with that whole literal. >> >> The second is querying the my_text_field for all terms which compose the >> value between parenthesis. You can think to a query where each term is an >> optional clause, something like mytextfield:jurassic OR >> mytextfiekd:park... >> (it's not exactly an OR but this could give you the idea= >> >> The third example is not doing what you think. My_text_field is used only >> with the first term (Jurassic) while the others are using the default >> field. Something like mytextfield:jurassic OR defaultfield:park OR >> defaultfield:the That's the reason you have so many results (I guess >> the default field is a catch-all field) >> >> Sorry for typos I'm using my mobile >> >> Andrea >> >> Il mer 11 lug 2018, 17:54 Wei ha scritto: >> >> > Hi, >> > >> > I am running filter query on a field of text_general type and see >> > completely different results for the following queries: >> > >> >fq= my_text_field:"Jurassic park the movie" returns 0 >> > result >> > >> >fq= my_text_field:(Jurassic park the movie) returns 20 >> > result >> > >> >fq= my_text_field:Jurassic park the movie returns >> > thousands of results >> > >> > >> > Which one is the correct syntax? I am confused why the first query >> doesn't >> > have any match at all. I also thought 2 and 3 are the same, but turns >> out >> > quite different. >> > >> > >> > Thanks, >> > Wei >> > >> > >
Solr timeAllowed metric
Hi, We tried to use solr's timeAllowed parameter to restrict the time spend on expensive queries. But as described at https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter " This value is only checked at the time of Query Expansion and Document collection" . Does that mean Solr will not abort the request if timeAllowed is exceeded during the scoring process? What are the components (query, facet, stats, debug etc) this metric is effectively used? Thanks, Wei
Re: Solr timeAllowed metric
Thanks Mikhail! Is traditional facet subject to timeAllowed? On Mon, Aug 6, 2018 at 3:46 AM, Mikhail Khludnev wrote: > One note: enum facets might be stopped by timeAllowed. > > On Mon, Aug 6, 2018 at 1:45 PM Mikhail Khludnev wrote: > > > Hello, Wei. > > > > "Document collection" is done along side with "scoring process". So, > Solr > > will abort the request if > > timeAllowed is exceeded during the scoring process. > > Query, MLT, grouping are subject of timeAllowed constrains, but facet, > > json.facet https://issues.apache.org/jira/browse/SOLR-12478, stats, > debug > > are not. > > > > On Fri, Aug 3, 2018 at 11:34 PM Wei wrote: > > > >> Hi, > >> > >> We tried to use solr's timeAllowed parameter to restrict the time spend > on > >> expensive queries. But as described at > >> > >> > >> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html# > CommonQueryParameters-ThetimeAllowedParameter > >> > >> " This value is only checked at the time of Query Expansion and Document > >> collection" . Does that mean Solr will not abort the request if > >> timeAllowed is exceeded during the scoring process? What are the > >> components > >> (query, facet, stats, debug etc) this metric is effectively used? > >> > >> Thanks, > >> Wei > >> > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > > > -- > Sincerely yours > Mikhail Khludnev >
Unbalanced shard requests
Hi everyone, I have a strange issue after upgrade from 7.6.0 to 8.4.1. My cloud has 6 shards with 10 TLOG replicas each shard. After upgrade I noticed that one of the replicas in each shard is handling most of the distributed shard requests, so 6 nodes are heavily loaded while other nodes are idle. There is no change in shard handler configuration: 3 3 500 What could cause the unbalanced internal distributed request? Thanks in advance. Wei
Re: Unbalanced shard requests
Hi Eric, I am measuring the number of shard requests, and it's for query only, no indexing requests. I have an external load balancer and see each node received about the equal number of external queries. However for the internal shard queries, the distribution is uneven:6 nodes (one in each shard, some of them are leaders and some are non-leaders ) gets about 80% of the shard requests, the other 54 nodes gets about 20% of the shard requests. I checked a few other parameters set: -Dsolr.disable.shardsWhitelist=true shards.preference=replica.location:local,replica.type:TLOG Nothing seems to cause the strange behavior. Any suggestions how to debug this? -Wei On Mon, Apr 27, 2020 at 5:42 PM Erick Erickson wrote: > Wei: > > How are you measuring utilization here? The number of incoming requests or > CPU? > > The leader for each shard are certainly handling all of the indexing > requests since they’re TLOG replicas, so that’s one thing that might > skewing your measurements. > > Best, > Erick > > > On Apr 27, 2020, at 7:13 PM, Wei wrote: > > > > Hi everyone, > > > > I have a strange issue after upgrade from 7.6.0 to 8.4.1. My cloud has 6 > > shards with 10 TLOG replicas each shard. After upgrade I noticed that > one > > of the replicas in each shard is handling most of the distributed shard > > requests, so 6 nodes are heavily loaded while other nodes are idle. There > > is no change in shard handler configuration: > > > > > "HttpShardHandlerFactory"> > > > >3 > > > >30000 > > > >500 > > > > > > > > > > What could cause the unbalanced internal distributed request? > > > > > > Thanks in advance. > > > > > > > > Wei > >
solr payloads performance
Hi everyone, Have a question regarding typical e-commerce scenario: each item may have different price in different store. suppose there are 10 million items and 1000 stores. Option 1: use solr payloads, each document have store_prices_payload:store1|price1 store2|price2 . store1000|price1000 Option 2: use dynamic fields and have 1000 fields in each document, i.e. field1: store1_price: price1 field2: store2_price: price2 ... field1000: store1000_price: price1000 Option 2 doesn't look elegant, but is there any performance benchmark on solr payloads? In terms of filtering, sorting or faceting, how would query performance compare between the two? Thanks, Wei
Re: Unbalanced shard requests
Update: after I remove the shards.preference parameter from solrconfig.xml, issue is gone and internal shard requests are now balanced. The same parameter works fine with solr 7.6. Still not sure of the root cause, but I observed a strange coincidence: the nodes that are most frequently picked for shard requests are the first node in each shard returned from the CLUSTERSTATUS api. Seems something wrong with shuffling equally compared nodes when shards.preference is set. Will report back if I find more. On Mon, Apr 27, 2020 at 5:59 PM Wei wrote: > Hi Eric, > > I am measuring the number of shard requests, and it's for query only, no > indexing requests. I have an external load balancer and see each node > received about the equal number of external queries. However for the > internal shard queries, the distribution is uneven:6 nodes (one in > each shard, some of them are leaders and some are non-leaders ) gets about > 80% of the shard requests, the other 54 nodes gets about 20% of the shard > requests. I checked a few other parameters set: > > -Dsolr.disable.shardsWhitelist=true > shards.preference=replica.location:local,replica.type:TLOG > > Nothing seems to cause the strange behavior. Any suggestions how to > debug this? > > -Wei > > > On Mon, Apr 27, 2020 at 5:42 PM Erick Erickson > wrote: > >> Wei: >> >> How are you measuring utilization here? The number of incoming requests >> or CPU? >> >> The leader for each shard are certainly handling all of the indexing >> requests since they’re TLOG replicas, so that’s one thing that might >> skewing your measurements. >> >> Best, >> Erick >> >> > On Apr 27, 2020, at 7:13 PM, Wei wrote: >> > >> > Hi everyone, >> > >> > I have a strange issue after upgrade from 7.6.0 to 8.4.1. My cloud has 6 >> > shards with 10 TLOG replicas each shard. After upgrade I noticed that >> one >> > of the replicas in each shard is handling most of the distributed shard >> > requests, so 6 nodes are heavily loaded while other nodes are idle. >> There >> > is no change in shard handler configuration: >> > >> > > > "HttpShardHandlerFactory"> >> > >> >3 >> > >> >3 >> > >> >500 >> > >> > >> > >> > >> > What could cause the unbalanced internal distributed request? >> > >> > >> > Thanks in advance. >> > >> > >> > >> > Wei >> >>
Re: Unbalanced shard requests
Thanks Michael! Yes in each shard I have 10 Tlog replicas, no other type of replicas, and each Tlog replica is an individual solr instance on its own physical machine. In the jira you mentioned 'when "last place matches" == "first place matches" – e.g. when shards.preference specified matches *all* available replicas'. My setting is shards.preference=replica.location:local,replica.type:TLOG, I also tried just shards.preference=replica.location:local and it still has the issue. Can you explain a bit more? On Mon, May 11, 2020 at 12:26 PM Michael Gibney wrote: > FYI: https://issues.apache.org/jira/browse/SOLR-14471 > Wei, assuming you have only TLOG replicas, your "last place" matches > (to which the random fallback ordering would not be applied -- see > above issue) would be the same as the "first place" matches selected > for executing distributed requests. > > > On Mon, May 11, 2020 at 1:49 PM Michael Gibney > wrote: > > > > Wei, probably no need to answer my earlier questions; I think I see > > the problem here, and believe it is indeed a bug, introduced in 8.3. > > Will file an issue and submit a patch shortly. > > Michael > > > > On Mon, May 11, 2020 at 12:49 PM Michael Gibney > > wrote: > > > > > > Hi Wei, > > > > > > In considering this problem, I'm stumbling a bit on terminology > > > (particularly, where you mention "nodes", I think you're referring to > > > "replicas"?). Could you confirm that you have 10 TLOG replicas per > > > shard, for each of 6 shards? How many *nodes* (i.e., running solr > > > server instances) do you have, and what is the replica placement like > > > across those nodes? What, if any, non-TLOG replicas do you have per > > > shard (not that it's necessarily relevant, but just to get a complete > > > picture of the situation)? > > > > > > If you're able without too much trouble, can you determine what the > > > behavior is like on Solr 8.3? (there were different changes introduced > > > to potentially relevant code in 8.3 and 8.4, and knowing whether the > > > behavior you're observing manifests on 8.3 would help narrow down > > > where to look for an explanation). > > > > > > Michael > > > > > > On Fri, May 8, 2020 at 7:34 PM Wei wrote: > > > > > > > > Update: after I remove the shards.preference parameter from > > > > solrconfig.xml, issue is gone and internal shard requests are now > > > > balanced. The same parameter works fine with solr 7.6. Still not > sure of > > > > the root cause, but I observed a strange coincidence: the nodes that > are > > > > most frequently picked for shard requests are the first node in each > shard > > > > returned from the CLUSTERSTATUS api. Seems something wrong with > shuffling > > > > equally compared nodes when shards.preference is set. Will report > back if > > > > I find more. > > > > > > > > On Mon, Apr 27, 2020 at 5:59 PM Wei wrote: > > > > > > > > > Hi Eric, > > > > > > > > > > I am measuring the number of shard requests, and it's for query > only, no > > > > > indexing requests. I have an external load balancer and see each > node > > > > > received about the equal number of external queries. However for > the > > > > > internal shard queries, the distribution is uneven:6 nodes > (one in > > > > > each shard, some of them are leaders and some are non-leaders ) > gets about > > > > > 80% of the shard requests, the other 54 nodes gets about 20% of > the shard > > > > > requests. I checked a few other parameters set: > > > > > > > > > > -Dsolr.disable.shardsWhitelist=true > > > > > shards.preference=replica.location:local,replica.type:TLOG > > > > > > > > > > Nothing seems to cause the strange behavior. Any suggestions how > to > > > > > debug this? > > > > > > > > > > -Wei > > > > > > > > > > > > > > > On Mon, Apr 27, 2020 at 5:42 PM Erick Erickson < > erickerick...@gmail.com> > > > > > wrote: > > > > > > > > > >> Wei: > > > > >> > > > > >> How are you measuring utilization here? The number of incoming > requests > > > > >> or CPU? > > > >
Re: Unbalanced shard requests
Hi Phill, What is the RAM config you are referring to, JVM size? How is that related to the load balancing, if each node has the same configuration? Thanks, Wei On Mon, May 18, 2020 at 3:07 PM Phill Campbell wrote: > In my previous report I was configured to use as much RAM as possible. > With that configuration it seemed it was not load balancing. > So, I reconfigured and redeployed to use 1/4 the RAM. What a difference > for the better! > > 10.156.112.50 load average: 13.52, 10.56, 6.46 > 10.156.116.34 load average: 11.23, 12.35, 9.63 > 10.156.122.13 load average: 10.29, 12.40, 9.69 > > Very nice. > My tool that tests records RPS. In the “bad” configuration it was less > than 1 RPS. > NOW it is showing 21 RPS. > > > http://10.156.112.50:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > < > http://10.156.112.50:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > > > { > "responseHeader":{ > "status":0, > "QTime":161}, > "metrics":{ > "solr.core.BTS.shard1.replica_n2":{ > "QUERY./select.requestTimes":{ > "count":5723, > "meanRate":6.8163888639859085, > "1minRate":11.557013215119536, > "5minRate":8.760356217628159, > "15minRate":4.707624230995833, > "min_ms":0.131545, > "max_ms":388.710848, > "mean_ms":30.300492048215947, > "median_ms":6.336654, > "stddev_ms":51.527164088667035, > "p75_ms":35.427943, > "p95_ms":140.025957, > "p99_ms":230.533099, > "p999_ms":388.710848 > > > > http://10.156.122.13:10004/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > < > http://10.156.122.13:10004/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > > > { > "responseHeader":{ > "status":0, > "QTime":11}, > "metrics":{ > "solr.core.BTS.shard2.replica_n8":{ > "QUERY./select.requestTimes":{ > "count":6469, > "meanRate":7.502581801189549, > "1minRate":12.211423085368564, > "5minRate":9.445681397767322, > "15minRate":5.216209798637846, > "min_ms":0.154691, > "max_ms":701.657394, > "mean_ms":34.2734699171445, > "median_ms":5.640378, > "stddev_ms":62.27649205954566, > "p75_ms":39.016371, > "p95_ms":156.997982, > "p99_ms":288.883028, > "p999_ms":538.368031 > > > http://10.156.116.34:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > < > http://10.156.116.34:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > > > { > "responseHeader":{ > "status":0, > "QTime":67}, > "metrics":{ > "solr.core.BTS.shard3.replica_n16":{ > "QUERY./select.requestTimes":{ > "count":7109, > "meanRate":7.787524673806184, > "1minRate":11.88519763582083, > "5minRate":9.893315557386755, > "15minRate":5.620178363676527, > "min_ms":0.150887, > "max_ms":472.826462, > "mean_ms":32.184282366621204, > "median_ms":6.977733, > "stddev_ms":55.729908615189196, > "p75_ms":36.655011, > "p95_ms":151.12627, > "p99_ms":251.440162, > "p999_ms":472.826462 > > > Compare that to the previous report and you can see the improvement. > So, note to myself. Figure out the sweet spot for RAM usage. Use too much > and strange behavior is noticed. While using too much all the load focused > on one box and query times slowed. > I did not see any OOM errors during any of this. > > Regards > > > > > On May 18, 2020, at 3:23 PM, Phill Campbell > wrote: > > > > I have been testing 8.5.2 and it looks like the load has moved but is > still on one machine. > > > > Setup: > > 3 physical machines. > > Each machine hosts 8 instances of Solr. > > Each instance of Solr hosts one replica. > > > > Another way to say it: > > Number of shards
Re: Unbalanced shard requests
Hi Michael, I also verified the patch in SOLR-14471 with 8.4.1 and it fixed the issue with shards.preference=replica.location:local,replica.type:TLOG in my setting. Thanks! Wei On Thu, May 21, 2020 at 12:09 PM Phill Campbell wrote: > Yes, JVM heap settings. > > > On May 19, 2020, at 10:59 AM, Wei wrote: > > > > Hi Phill, > > > > What is the RAM config you are referring to, JVM size? How is that > related > > to the load balancing, if each node has the same configuration? > > > > Thanks, > > Wei > > > > On Mon, May 18, 2020 at 3:07 PM Phill Campbell > > wrote: > > > >> In my previous report I was configured to use as much RAM as possible. > >> With that configuration it seemed it was not load balancing. > >> So, I reconfigured and redeployed to use 1/4 the RAM. What a difference > >> for the better! > >> > >> 10.156.112.50 load average: 13.52, 10.56, 6.46 > >> 10.156.116.34 load average: 11.23, 12.35, 9.63 > >> 10.156.122.13 load average: 10.29, 12.40, 9.69 > >> > >> Very nice. > >> My tool that tests records RPS. In the “bad” configuration it was less > >> than 1 RPS. > >> NOW it is showing 21 RPS. > >> > >> > >> > http://10.156.112.50:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > >> < > >> > http://10.156.112.50:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > >>> > >> { > >> "responseHeader":{ > >>"status":0, > >>"QTime":161}, > >> "metrics":{ > >>"solr.core.BTS.shard1.replica_n2":{ > >> "QUERY./select.requestTimes":{ > >>"count":5723, > >>"meanRate":6.8163888639859085, > >>"1minRate":11.557013215119536, > >>"5minRate":8.760356217628159, > >>"15minRate":4.707624230995833, > >>"min_ms":0.131545, > >>"max_ms":388.710848, > >>"mean_ms":30.300492048215947, > >>"median_ms":6.336654, > >>"stddev_ms":51.527164088667035, > >>"p75_ms":35.427943, > >>"p95_ms":140.025957, > >>"p99_ms":230.533099, > >>"p999_ms":388.710848 > >> > >> > >> > >> > http://10.156.122.13:10004/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > >> < > >> > http://10.156.122.13:10004/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > >>> > >> { > >> "responseHeader":{ > >>"status":0, > >>"QTime":11}, > >> "metrics":{ > >>"solr.core.BTS.shard2.replica_n8":{ > >> "QUERY./select.requestTimes":{ > >>"count":6469, > >>"meanRate":7.502581801189549, > >>"1minRate":12.211423085368564, > >>"5minRate":9.445681397767322, > >>"15minRate":5.216209798637846, > >>"min_ms":0.154691, > >>"max_ms":701.657394, > >>"mean_ms":34.2734699171445, > >>"median_ms":5.640378, > >>"stddev_ms":62.27649205954566, > >>"p75_ms":39.016371, > >>"p95_ms":156.997982, > >>"p99_ms":288.883028, > >>"p999_ms":538.368031 > >> > >> > >> > http://10.156.116.34:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > >> < > >> > http://10.156.116.34:10002/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes > >>> > >> { > >> "responseHeader":{ > >>"status":0, > >>"QTime":67}, > >> "metrics":{ > >>"solr.core.BTS.shard3.replica_n16":{ > >> "QUERY./select.requestTimes":{ > >>"count":7109, > >>"meanRate":7.787524673806184, > >>"1minRate":11.88519763582083, > >>"5minRate":9.893315557386755, > >>"15minRate":5.620178363676527, > >
How to disable cache for facet.query?
Hi, I am trying to disable filter cache for some filter queries as they contain unique ids and cause cache evictions. By adding {!cache=false} the fq is no longer stored in filter cache, however I have similar conditions in facet.query and using facet.query={!cache=false}(color:red AND id:XXX) does not work. Is it possible to stop solr from putting facet.query into filter cache? Thanks, Wei
solr performance with >1 NUMAs
Hi, Recently we deployed solr 8.4.1 on a batch of new servers with 2 NUMAs. I noticed that query latency almost doubled compared to deployment on single NUMA machines. Not sure what's causing the huge difference. Is there any tuning to boost the performance on multiple NUMA machines? Any pointer is appreciated. Best, Wei
Re: solr performance with >1 NUMAs
Thanks Dominique. I'll start with the -XX:+UseNUMA option. Best, Wei On Fri, Sep 25, 2020 at 7:04 AM Dominique Bejean wrote: > Hi, > > This would be a Java VM option, not something Solr itself can know about. > Take a look at this article in comments. May be it will help. > > https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html?showComment=1347033706559#c229885263664926125 > > Regards > > Dominique > > > > Le jeu. 24 sept. 2020 à 03:42, Wei a écrit : > > > Hi, > > > > Recently we deployed solr 8.4.1 on a batch of new servers with 2 NUMAs. I > > noticed that query latency almost doubled compared to deployment on > single > > NUMA machines. Not sure what's causing the huge difference. Is there any > > tuning to boost the performance on multiple NUMA machines? Any pointer is > > appreciated. > > > > Best, > > Wei > > >
Re: solr performance with >1 NUMAs
Thanks Shawn! Currently we are still using the CMS collector for solr with Java 8. When last evaluated with Solr 7, CMS performs better than G1 for our case. When using G1, is it better to upgrade from Java 8 to Java 11? >From https://lucene.apache.org/solr/guide/8_4/solr-system-requirements.html, seems Java 14 is not officially supported for Solr 8. Best, Wei On Fri, Sep 25, 2020 at 5:50 PM Shawn Heisey wrote: > On 9/23/2020 7:42 PM, Wei wrote: > > Recently we deployed solr 8.4.1 on a batch of new servers with 2 NUMAs. I > > noticed that query latency almost doubled compared to deployment on > single > > NUMA machines. Not sure what's causing the huge difference. Is there any > > tuning to boost the performance on multiple NUMA machines? Any pointer is > > appreciated. > > If you're running with standard options, Solr 8.4.1 will start using the > G1 garbage collector. > > As of Java 14, G1 has gained the ability to use the -XX:+UseNUMA option, > which makes better decisions about memory allocations and multiple > NUMAs. If you're running a new enough Java, it would probably be > beneficial to add this to the garbage collector options. Solr itself is > unaware of things like NUMA -- Java must handle that. > > https://openjdk.java.net/jeps/345 > > Thanks, > Shawn >
Re: What does current mean?
My understanding is that current means whether there is data pending to be committed. Best, Wei On Sat, Sep 26, 2020 at 5:09 PM Kayak28 wrote: > Hello, Solr community: > > > > I would like to ask a question about the current icon on the core-overview > > under statistics. > > I thought previously that the current tag tells users whether it is > > searchable or not (commit or not commit) because if I send a > > commit request, it becomes an OK-ish icon from NG-ish icon. > > > > If anyone knows the meaning of the icon, I would like to hear about. > > > > > > > > > > > > -- > > > > Sincerely, > > Kaya > > github: https://github.com/28kayak > >
Re: solr performance with >1 NUMAs
Thanks Shawn. Looks like Java 11 is the way to go with -XX:+UseNUMA. Do you see any backward compatibility issue for Solr 8 with Java 11? Can we run Solr 8 built with JDK 8 in Java 11 JRE, or need to rebuild solr with Java 11 JDK? Best, Wei On Sat, Sep 26, 2020 at 6:44 PM Shawn Heisey wrote: > On 9/26/2020 1:39 PM, Wei wrote: > > Thanks Shawn! Currently we are still using the CMS collector for solr > with > > Java 8. When last evaluated with Solr 7, CMS performs better than G1 for > > our case. When using G1, is it better to upgrade from Java 8 to Java 11? > > From > https://lucene.apache.org/solr/guide/8_4/solr-system-requirements.html, > > seems Java 14 is not officially supported for Solr 8. > > It has been a while since I was working with Solr every day, and when I > was, Java 11 did not yet exist. I have no idea whether Java 11 improves > things beyond Java 8. That said ... all software evolves and usually > improves as time goes by. It is likely that the newer version has SOME > benefit. > > Regarding whether or not Java 14 is supported: There are automated > tests where all the important code branches are run with all major > versions of Java, including pre-release versions, and those tests do > include various garbage collectors. Somebody notices when a combination > doesn't work, and big problems with newer Java versions are something > that gets discussed on our mailing lists. > > Java 14 has been out for a while, with no big problems being discussed > so far. So it is likely that it works with Solr. Can I say for sure? > No. I haven't tried it myself. > > I don't have any hardware available where there is more than one NUMA, > or I would look deeper into this myself. It would be interesting to > find out whether the -XX:+UseNUMA option makes a big difference in > performance. > > Thanks, > Shawn >
Re: solr performance with >1 NUMAs
Hi Shawn, I.m circling back with some new findings with our 2 NUMA issue. After a few iterations, we do see improvement with the useNUMA flag and other JVM setting changes. Here are the current settings, with Java 11: -XX:+UseNUMA -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+UseTLAB -XX:G1MaxNewSizePercent=20 -XX:MaxGCPauseMillis=150 -XX:+DisableExplicitGC -XX:+DoEscapeAnalysis -XX:+ParallelRefProcEnabled -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions Compared to previous Java 8 + CMS on 2 NUMA servers, P99 latency has improved over 20%. Thanks, Wei On Mon, Sep 28, 2020 at 4:02 PM Shawn Heisey wrote: > On 9/28/2020 12:17 PM, Wei wrote: > > Thanks Shawn. Looks like Java 11 is the way to go with -XX:+UseNUMA. Do > you > > see any backward compatibility issue for Solr 8 with Java 11? Can we run > > Solr 8 built with JDK 8 in Java 11 JRE, or need to rebuild solr with Java > > 11 JDK? > > I do not know of any problems running the binary release of Solr 8 > (which is most likely built with the Java 8 JDK) with a newer release > like Java 11 or higher. > > I think Sun was really burned by such problems cropping up in the days > of Java 5 and 6, and their developers have worked really hard to make > sure that never happens again. > > If you're running Java 11, you will need to pick a different garbage > collector if you expect the NUMA flag to function. The most recent > releases of Solr are defaulting to G1GC, which as previously mentioned, > did not gain NUMA optimizations until Java 14. > > It is not clear to me whether the NUMA optimizations will work with any > collector other than Parallel until Java 14. You would need to check > Java documentation carefully or ask someone involved with development of > Java. > > If you do see an improvement using the NUMA flag with Java 11, please > let us know exactly what options Solr was started with. > > Thanks, > Shawn >
docValues usage
Hi, I have a couple of primitive single value numeric type fields, their values are used in boosting functions, but not used in sort/facet. or in returned response. Should I use docValues for them in the schema? I can think of the following options: 1) indexed=true, stored=true, docValues=false 2) indexed=true, stored=false, docValues=true 3) indexed=false, stored=false, docValues=true What would be the performance implications for these options? Best, Wei
Re: docValues usage
Thanks Erick. As indexed is not necessary, and docValues is more efficient than stored fields for function queries, so we shall go with the following: 3) indexed=false, stored=false, docValues=true. Is my understanding correct? Best, Wei On Wed, Nov 4, 2020 at 5:24 AM Erick Erickson wrote: > You don’t need to index the field for function queries, see: > https://lucene.apache.org/solr/guide/8_6/docvalues.html. > > Function queries, as opposed to sorting, faceting and grouping are > evaluated at search time where the > search process is already parked on the document anyway, so answering the > question “for doc X, what > is the value of field Y” to compute the score. DocValues are still more > efficient I think, although I > haven’t measured explicitly... > > For sorting, faceting and grouping, it’s a much different story. Take > sorting. You have to ask > “for field Y, what’s the value in docX and docZ?”. Say you’re parked on > docX. Doc Z is long gone > and getting the value for field Y much more expensive. > > Also, docValues will not increase memory requirements _unless used_. > Otherwise they’ll > just sit there on disk. They will certainly increase disk space whether > used or not. > > And _not_ using docValues when you facet, group or sort will also > _certainly_ increase > your heap requirements since the docValues structure must be built on the > heap rather > than be in MMapDirectory space. > > Best, > Erick > > > > On Nov 4, 2020, at 5:32 AM, uyilmaz wrote: > > > > Hi, > > > > I'm by no means expert on this so if anyone sees a mistake please > correct me. > > > > I think you need to index this field, since boost functions are added to > the query as optional clauses ( > https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter). > It's like boosting a regular field by putting ^2 next to it in a query. > Storing or enabling docValues will unnecesarily consume space/memory. > > > > On Tue, 3 Nov 2020 16:10:50 -0800 > > Wei wrote: > > > >> Hi, > >> > >> I have a couple of primitive single value numeric type fields, their > >> values are used in boosting functions, but not used in sort/facet. or in > >> returned response. Should I use docValues for them in the schema? I > can > >> think of the following options: > >> > >> 1) indexed=true, stored=true, docValues=false > >> 2) indexed=true, stored=false, docValues=true > >> 3) indexed=false, stored=false, docValues=true > >> > >> What would be the performance implications for these options? > >> > >> Best, > >> Wei > > > > > > -- > > uyilmaz > >
Re: docValues usage
And in the case of both stored=true and docValues=true, Solr 8.x shall be choosing the optimal approach by itself? On Wed, Nov 4, 2020 at 9:15 AM Wei wrote: > Thanks Erick. As indexed is not necessary, and docValues is more > efficient than stored fields for function queries, so we shall go with the > following: > > 3) indexed=false, stored=false, docValues=true. > > Is my understanding correct? > > Best, > Wei > > On Wed, Nov 4, 2020 at 5:24 AM Erick Erickson > wrote: > >> You don’t need to index the field for function queries, see: >> https://lucene.apache.org/solr/guide/8_6/docvalues.html. >> >> Function queries, as opposed to sorting, faceting and grouping are >> evaluated at search time where the >> search process is already parked on the document anyway, so answering the >> question “for doc X, what >> is the value of field Y” to compute the score. DocValues are still more >> efficient I think, although I >> haven’t measured explicitly... >> >> For sorting, faceting and grouping, it’s a much different story. Take >> sorting. You have to ask >> “for field Y, what’s the value in docX and docZ?”. Say you’re parked on >> docX. Doc Z is long gone >> and getting the value for field Y much more expensive. >> >> Also, docValues will not increase memory requirements _unless used_. >> Otherwise they’ll >> just sit there on disk. They will certainly increase disk space whether >> used or not. >> >> And _not_ using docValues when you facet, group or sort will also >> _certainly_ increase >> your heap requirements since the docValues structure must be built on the >> heap rather >> than be in MMapDirectory space. >> >> Best, >> Erick >> >> >> > On Nov 4, 2020, at 5:32 AM, uyilmaz >> wrote: >> > >> > Hi, >> > >> > I'm by no means expert on this so if anyone sees a mistake please >> correct me. >> > >> > I think you need to index this field, since boost functions are added >> to the query as optional clauses ( >> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter). >> It's like boosting a regular field by putting ^2 next to it in a query. >> Storing or enabling docValues will unnecesarily consume space/memory. >> > >> > On Tue, 3 Nov 2020 16:10:50 -0800 >> > Wei wrote: >> > >> >> Hi, >> >> >> >> I have a couple of primitive single value numeric type fields, their >> >> values are used in boosting functions, but not used in sort/facet. or >> in >> >> returned response. Should I use docValues for them in the schema? I >> can >> >> think of the following options: >> >> >> >> 1) indexed=true, stored=true, docValues=false >> >> 2) indexed=true, stored=false, docValues=true >> >> 3) indexed=false, stored=false, docValues=true >> >> >> >> What would be the performance implications for these options? >> >> >> >> Best, >> >> Wei >> > >> > >> > -- >> > uyilmaz >> >>
Solr filter query on text fields
Hi, I have always been using solr fq on string fields. Recently I need to apply fq on one text field defined as follows: For query q=*:*&fq=description:”ice cream”, the filter query returns matches for “ice cream bar” and “vanilla ice cream” , but does not match for “ice cold cream”. The results seem neither exact match nor phrase match. What's the expected behavior for fq on text fields? I have tried to look into the solr docs but there is no clear explanation. Thanks, Wei
Re: Solr filter query on text fields
Thanks Shawn! I didn't notice the asterisks are created during copy/paste, one lesson learned :) Does that mean when fq is applied to text fields, it is doing text match in the field just like q in a query field? While for string fields, it is exact match. If it is a phrase query, what are the values for relate parameters such as ps? Thanks, Wei On Mon, Jun 24, 2019 at 4:51 PM Shawn Heisey wrote: > On 6/24/2019 5:37 PM, Wei wrote: > > stored="true"/> > > I'm assuming that the asterisks here are for emphasis, that they are not > actually present. This can be very confusing. It is far better to > relay the precise information and not try to emphasize anything. > > > For query q=*:*&fq=description:”ice cream”, the filter query returns > > matches for “ice cream bar” and “vanilla ice cream” , but does not match > > for “ice cold cream”. > > > > The results seem neither exact match nor phrase match. What's the > expected > > behavior for fq on text fields? I have tried to look into the solr docs > > but there is no clear explanation. > > If the quotes are present in what you actually sent to Solr, then that > IS a phrase query. And that is why it did not match your third example. > > Try one of these instead: > > q=*:*&fq=description:(ice cream) > > q=*:*&fq=description:ice description:cream) > > Thanks, > Shawn >
Re: Solr filter query on text fields
Thanks Erick for the clarification. How does the ps work for fq? I configured ps=4 for q, it doesn't apply to fq though. For phrase queries in fq seems ps=0 is used. Is there a way to config it for fq also? Best, Wei On Tue, Jun 25, 2019 at 9:51 AM Erick Erickson wrote: > q and fq do _exactly_ the same thing in terms of query parsing, subject to > all the same conditions. > > There are two things that apply to fq clauses that have nothing to do with > the query _parsing_. > 1> there is no scoring, so it’s cheaper from that perspective > 2> the results are cached in a bitmap and can be re-used later > > Best, > Erick > > > On Jun 24, 2019, at 7:06 PM, Wei wrote: > > > > Thanks Shawn! I didn't notice the asterisks are created during > copy/paste, > > one lesson learned :) > > Does that mean when fq is applied to text fields, it is doing text match > > in the field just like q in a query field? While for string fields, it > is > > exact match. > > If it is a phrase query, what are the values for relate parameters such > as > > ps? > > > > Thanks, > > Wei > > > > On Mon, Jun 24, 2019 at 4:51 PM Shawn Heisey > wrote: > > > >> On 6/24/2019 5:37 PM, Wei wrote: > >>> >> stored="true"/> > >> > >> I'm assuming that the asterisks here are for emphasis, that they are not > >> actually present. This can be very confusing. It is far better to > >> relay the precise information and not try to emphasize anything. > >> > >>> For query q=*:*&fq=description:”ice cream”, the filter query returns > >>> matches for “ice cream bar” and “vanilla ice cream” , but does not > match > >>> for “ice cold cream”. > >>> > >>> The results seem neither exact match nor phrase match. What's the > >> expected > >>> behavior for fq on text fields? I have tried to look into the solr > docs > >>> but there is no clear explanation. > >> > >> If the quotes are present in what you actually sent to Solr, then that > >> IS a phrase query. And that is why it did not match your third example. > >> > >> Try one of these instead: > >> > >> q=*:*&fq=description:(ice cream) > >> > >> q=*:*&fq=description:ice description:cream) > >> > >> Thanks, > >> Shawn > >> > >
Function Query with multi-value field
Hi, I have a question regarding function query that operates on multi-value fields. For the following field: Each value is a hex string representation of RGB value. for example there are 3 values indexed #FF00FF- C1 #EE82EE - C2 #DA70D6 - C3 How would I write a function query that operates on all values of the field? Given color S in query, how to calculate the similarities between S and C1/C2/C3 and find which one is the closest? I checked https://lucene.apache.org/solr/guide/6_6/function-queries.html but didn't see an example. Thanks, Wei
Re: Function Query with multi-value field
Any suggestion? On Thu, Jul 11, 2019 at 3:03 PM Wei wrote: > Hi, > > I have a question regarding function query that operates on multi-value > fields. For the following field: > > multivalued="true"/> > > Each value is a hex string representation of RGB value. for example > there are 3 values indexed > > #FF00FF- C1 > #EE82EE - C2 > #DA70D6 - C3 > > How would I write a function query that operates on all values of the > field? Given color S in query, how to calculate the similarities between > S and C1/C2/C3 and find which one is the closest? > I checked https://lucene.apache.org/solr/guide/6_6/function-queries.html but > didn't see an example. > > Thanks, > Wei >
How to block expensive solr queries
Hi, Recently we encountered a problem when solr cloud query latency suddenly increase, many simple queries that has small recall gets time out. After digging a bit I found that the root cause is some stats queries happen at the same time, such as /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.calcdistinct=true I see unique_ids is a high cardinality field so this query is quite expensive. But why a small volume of such query blocks other queries and make simple queries time out? I checked the solr thread pool and see there are plenty of idle threads available. We are using solr 7.6.2 with a 10 shard cloud set up. Is there a way to block certain solr queries based on url pattern? i.e. ignore the stats.calcdistinct request in this case. Thanks, Wei
Re: How to block expensive solr queries
Hi Mikhail, Yes I have the timeAllowed parameter configured, still is this case it doesn't seem to prevent the stats request from blocking other normal queries. Is it possible to drop the request before solr executes it? maybe at the jetty request filter? Thanks, Wei On Mon, Oct 7, 2019 at 1:39 PM Mikhail Khludnev wrote: > Hello, Wei. > > Have you tried to abandon heavy queries with > > https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter > ? > It may or may not be able to stop stats. > > https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223 > can clarify it. > > On Mon, Oct 7, 2019 at 8:19 PM Wei wrote: > > > Hi, > > > > Recently we encountered a problem when solr cloud query latency suddenly > > increase, many simple queries that has small recall gets time out. After > > digging a bit I found that the root cause is some stats queries happen at > > the same time, such as > > > > > > > /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.calcdistinct=true > > > > > > > > I see unique_ids is a high cardinality field so this query is quite > > expensive. But why a small volume of such query blocks other queries and > > make simple queries time out? I checked the solr thread pool and see > there > > are plenty of idle threads available. We are using solr 7.6.2 with a 10 > > shard cloud set up. > > > > Is there a way to block certain solr queries based on url pattern? i.e. > > ignore the stats.calcdistinct request in this case. > > > > > > Thanks, > > > > Wei > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: How to block expensive solr queries
On Wed, Oct 9, 2019 at 9:59 AM Wei wrote: > Thanks all. I debugged a bit and see timeAllowed does not limit stats > call. Also I think it would be useful for solr to support a white list or > black list of operations as Toke suggested. Will create jira for it. > Currently seems the only option to explore is adding filter to solr's > embedded jetty. Does anyone have experience doing that? Do I also need to > change SolrDispatchFilter? > > On Tue, Oct 8, 2019 at 3:50 AM Toke Eskildsen wrote: > >> On Mon, 2019-10-07 at 10:18 -0700, Wei wrote: >> > /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.cal >> > cdistinct=true >> ... >> > Is there a way to block certain solr queries based on url pattern? >> > i.e. ignore the stats.calcdistinct request in this case. >> >> It sounds like it is possible for users to issue arbitrary queries >> against your Solr installation. As you have noticed, it makes it easy >> to perform a Denial Of Service (intentional or not). Filtering out >> stats.calcdistinct won't help with the next request for >> group.ngroups=true, facet.field=unique_id&facet.limit=1, >> rows=1 or something fifth. >> >> I recommend you flip your logic and only allow specific types of >> requests and put limits on those. To my knowledge that is not a build- >> in feature of Solr. >> >> - Toke Eskildsem, Royal Danish Library >> >> >>
Updates blocked in Tlog solr cloud?
Hi, I am puzzled by a problem in solr cloud with Tlog replicas and would appreciate your insights. Our solr cloud has two shards and each shard have 5 tlog replicas. When one of the non-leader replica has hardware issue and become unreachable, updates to the whole cloud stopped. We are on solr 7.6 and use solrj client to send updates only to leaders. To my understanding, with Tlog replica type, the leader only forward update requests to replicas for transaction log update and each replica periodically pulls the segment from leader. When one replica fails to respond, why update requests to the cloud are blocked? Does leader need to wait for response from each replica to inform client that update is successful? Best, Wei
Re: Updates blocked in Tlog solr cloud?
Hi Erick, I observed that the update request rate dropped from 20 per sec to 3 per sec for about 8 minutes. After that there is a huge burst of updates. This looks quite match the queue up behavior you mentioned. But I don't think the time out took that long. Is there a configurable setting for the time out? Also the bad tlog replica is not reachable at the time, so we did a DELETEREPLICA command with collections API to remove it from the cloud. Thanks, Wei On Tue, Nov 19, 2019 at 5:52 AM Erick Erickson wrote: > How long are updates blocked and how did the tlog replica on the bad > hardware go down? > > Solr has to wait for an ack back from the tlog follower to be certain that > the follower has all the documents in case it has to switch to that replica > to become the leader. If the update to the follower times out, the leader > will put it into a recovering state. > > So I’d expect the collection to queue up indexing until the request to the > follower on the bad hardware timed out, did you wait at least that long? > > Best, > Erick > > > On Nov 18, 2019, at 7:11 PM, Wei wrote: > > > > Hi, > > > > I am puzzled by a problem in solr cloud with Tlog replicas and would > > appreciate your insights. Our solr cloud has two shards and each shard > > have 5 tlog replicas. When one of the non-leader replica has hardware > issue > > and become unreachable, updates to the whole cloud stopped. We are on > > solr 7.6 and use solrj client to send updates only to leaders. To my > > understanding, with Tlog replica type, the leader only forward update > > requests to replicas for transaction log update and each replica > > periodically pulls the segment from leader. When one replica fails to > > respond, why update requests to the cloud are blocked? Does leader need > > to wait for response from each replica to inform client that update is > > successful? > > > > Best, > > Wei > >
Lucene optimization to disable hit count
Hi, I see this lucene optimization to disable hit counts for better query performance: https://issues.apache.org/jira/browse/LUCENE-8060 Is the feature available in Solr 8.3? Thanks, Wei
Re: Lucene optimization to disable hit count
Thanks! Looking forward to have this feature in Solr. On Wed, Nov 20, 2019 at 5:30 PM Tomás Fernández Löbbe wrote: > Not yet: > https://issues.apache.org/jira/browse/SOLR-13289 > > On Wed, Nov 20, 2019 at 4:57 PM Wei wrote: > > > Hi, > > > > I see this lucene optimization to disable hit counts for better query > > performance: > > > > https://issues.apache.org/jira/browse/LUCENE-8060 > > > > Is the feature available in Solr 8.3? > > > > Thanks, > > Wei > > >
Re: Updates blocked in Tlog solr cloud?
Update for another observation: after the follower replica become unresponsive, I notice there are multiple commits happen on the leader within two minutes, and then seeing the following OOM error on leader: o.a.s.s.HttpSolrCall null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Direct buffer memoryat org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:662)at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:530)at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531)at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at The commits are not inline with our autocommit interval. I am wondering if the commits could be caused by the leader initialed recovery process. Will the Tlog leader do extra commits for the replica to sync up in recovery process? Best, Wei On Tue, Nov 19, 2019 at 1:22 PM Wei wrote: > Hi Erick, > > I observed that the update request rate dropped from 20 per sec to 3 per > sec for about 8 minutes. After that there is a huge burst of updates. This > looks quite match the queue up behavior you mentioned. But I don't think > the time out took that long. Is there a configurable setting for the time > out? > Also the bad tlog replica is not reachable at the time, so we did a > DELETEREPLICA command with collections API to remove it from the cloud. > > Thanks, > Wei > > > On Tue, Nov 19, 2019 at 5:52 AM Erick Erickson > wrote: > >> How long are updates blocked and how did the tlog replica on the bad >> hardware go down? >> >> Solr has to wait for an ack back from the tlog follower to be certain >> that the follower has all the documents in case it has to switch to that >> replica to become the leader. If the update to the follower times out, the >> leader will put it into a recovering state. >> >> So I’d expect the collection to queue up indexing until the request to >> the follower on the bad hardware timed out, did you wait at least that long? >> >> Best, >> Erick >> >> > On Nov 18, 2019, at 7:11 PM, Wei wrote: >> > >> > Hi, >> > >> > I am puzzled by a problem in solr cloud with Tlog replicas and would >> > appreciate your insights. Our solr cloud has two shards and each shard >> > have 5 tlog replicas. When one of the non-leader replica has hardware >> issue >> > and become unreachable, updates to the whole cloud stopped. We are on >> > solr 7.6 and use solrj client to send updates only to leaders.
Convert javabin to json
Hi, Is there a reliable way to convert solr's javabin response to json format? We use solrj client with wt=javabin, but want to convert the received javabin response to json for passing to client. We don't want to use wt=json as javabin is more efficient. We tried the noggit jsonutil https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/noggit/JSONUtil.java but seems it is not able to convert parts of the query response such as facet. Are there any other options available? Thanks, Wei
Early termination in Lucene 8
Hi, I am excited to see Lucene 8 introduced BlockMax WAND as a major speed improvement https://issues.apache.org/jira/browse/LUCENE-8135. My question is, how does it integrate with facet request, when the numFound won't be exact? I did some search but haven't found any documentation on this. Any pointer is greatly appreciated. Best, Wei
Re: Early termination in Lucene 8
Thanks Mikhail. Do you know of any example on query parser with WAND? On Thu, Jan 23, 2020 at 1:02 AM Mikhail Khludnev wrote: > If one creates query parser wrapping queries with WAND it just produce > incomplete docset (I guess), which will be passed to facet component and > produce fewer counts. > > On Thu, Jan 23, 2020 at 2:11 AM Wei wrote: > > > Hi, > > > > I am excited to see Lucene 8 introduced BlockMax WAND as a major speed > > improvement https://issues.apache.org/jira/browse/LUCENE-8135. My > > question > > is, how does it integrate with facet request, when the numFound won't be > > exact? I did some search but haven't found any documentation on this. Any > > pointer is greatly appreciated. > > > > Best, > > Wei > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: solr 5 leaving tomcat, will I be the only one fearing about this?
I think it just means they won't officially support deploying war to tomcat or other container. make sense to me if I was in charge of solr, I would just support jetty, predictable with a single configuration. I wouldn't want to spent countless hrs supporting various configurations. Instead use those hrs to further solr development. I'm sure someone that has enough familiarity with tomcat and Java and solr shouldn't have any issue, after all solr is free but you need to pay for support. On Fri, Oct 7, 2016, 7:13 PM Renee Sun wrote: > I just read through the following link Shawn shared in his reply: > https://wiki.apache.org/solr/WhyNoWar > > While the following statement is true: > > "Supporting a single set of binary bits is FAR easier than worrying > about what kind of customized environment the user has chosen for their > deployment. " > > But it also probably will reduce the flexibility... for example, we tune > for > Scalability at tomcat level, such as its thread pool etc. I assume the > standalone Solr (which is still using Jetty underlying) would expose > sufficient configurable 'knobs' that allow me to turn 'Solr' to meet our > data work load. > > If we want to minimize the migration work, our existing business logic > component will remain in tomcat, then the fact that we will have co-exist > jetty and tomcat deployed in production system is a bit strange... or is > it? > > Even if I could port our webapps to use Jetty, I assume the way solr is > embedding Jetty I would be able to integrate at that level, I probably end > up with 2 Jetty container instances running on same server, correct? It is > still too early for me to be sure how this will impact our system but I am > a > little worried. > > Renee > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300259.html > Sent from the Solr - User mailing list archive at Nabble.com. >
dev-unsubscribe
2 question about solr and lucene
Hi, guys: I met two questions about solr and lucene, wish people to help out. use payload query but can NOT with numerical field type. for example: I implemented my own requesthandler, refer to http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/ I query in solr:sinaTag:operate solr response: "numFound": 2, "start": 0,"maxScore": 99,"docs": [ {"id": "1628209010", "followersCount": 752, "sinaTag": "operate|99 C2C|98 B2C|97 OnlineShopping|96 E-commercial|94", "score": 99 }, {"id": "1900546410", "followersCount": 1002, "sinaTag": "Startup|99 Benz|98 PublicRelation|97 operate|96 Activity|95 Media|94 AD|93 Vehicle|92 ", "score": 96 } This work well. But query with combined with other numberical condition, such as: sinaTag:operate and followersCount:[752 TO 752] {"responseHeader": {"status": 0,"QTime": 40 }, "response": {"numFound": 0,"start": 0, "maxScore": 0,"docs": [] } } According these dataset, the first record should be responsed rather than NOT FOUND. I not know why. 2. About string field fuzzy match filtering, how to get the score? what the formula is? When I used two or several string fuzzy match, probable AND or OR, how to get the score? what the formula is? Might I implement myself score formula class which interface or abstract class to extend ? Thanks in advance.
2 question about solr and lucene
Hi, guys: I met two questions about solr and lucene, wish people to help out. use payload query but can NOT with numerical field type. for example: I implemented my own requesthandler, refer to http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/ I query in solr:sinaTag:operate solr response: "numFound": 2, "start": 0, "maxScore": 99, "docs": [ { "id": "1628209010", "followersCount": 752, "sinaTag": "operate|99 C2C|98 B2C|97 OnlineShopping|96 E-commercial|94", "score": 99 }, { "id": "1900546410", "followersCount": 1002, "sinaTag": "Startup|99 Benz|98 PublicRelation|97 operate|96 Activity|95 Media|94 AD|93 Vehicle|92 ", "score": 96 } This work well. But query with combined with other numberical condition, such as: sinaTag:operate and followersCount:[752 TO 752] { "responseHeader": { "status": 0, "QTime": 40 }, "response": { "numFound": 0, "start": 0, "maxScore": 0, "docs": [] } } According these dataset, the first record should be responsed rather than NOT FOUND. I not know why. 2. About string field fuzzy match filtering, how to get the score? what the formula is? When I used two or several string fuzzy match, probable AND or OR, how to get the score? what the formula is? Might I implement myself score formula class which interface or abstract class to extend ? Thanks in advance.
Solr 4 memory usage increase
We are migrating from Solr 3.5 to Solr 4.2. After some performance testing, we found 4.2's memory usage is a lot higher than 3.5. Our 12GM max heap process used to handle the test pretty well with 3.5. while, with 4.2, the same test runs into serious GC half way (20 minutes) into the test. Anyone knows that something is significantly different from Solr 3.5 in terms of memory usage? We also notice on a slave, IndexeWriter class is actually taking significant portion (around 3GB) of the heap. Why Solr opens a IndexWriter on a slave? Is there a conf I can use to turn it off? I don't remember I saw such heap usage by a similar class in Solr 3.5. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-memory-usage-increase-tp4064066.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4 memory usage increase
No, exactly the same JVM of Java6 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-memory-usage-increase-tp4064066p4064108.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4 memory usage increase
Here is the JVM info: $ java -version java version "1.6.0_26" Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-memory-usage-increase-tp4064066p4064271.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4 memory usage increase
We have master/slave setup. We disabled autocommits/autosoftcommits. So the slave only replicates from master and serve query. Master does all the indexing and commit every 5 minutes. Slave polls master every 2.5 minutes and does replication. Both tests with Solr 3.5 and 4.2 was run with the same setup and both with master/slave replication running. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-memory-usage-increase-tp4064066p4064275.html Sent from the Solr - User mailing list archive at Nabble.com.
solr data config questions
Hi All, I am a new user of Solr. We are now trying to enable searching on Digg dataset. It has story_id as the primary key and comment_id are the comment id which commented story_id, so story_id and comment_id is one-to-many relationship. These comment_ids can be replied by some repliers, so comment_id and repliers are one-to-many relationship. The problem is that within a single returned document the search results shows an array of comment_ids and an array of repliers without knowing which repliers replied which comment. For example: now we got comment_id:[c1,c,2...,cn], repliers:[r1,r2,r3rm]. Can we get something like comment_id:[c1,c,2...,cn], repliers:[{r1,r2},{},r3{rm-1,rm}] so that {r1,r2} is corresponding to c1? Our current data-config is attached: Please help me on this. Many thanks Vivian