Optimizing fq query performance
Hi there, We noticed a sizable performance degradation when we add certain fq filters to the query even though the result set does not change between the two queries. I would've expected solr to optimize internally by picking the most constrained fq filter first, but maybe my understanding is wrong. Here's an example: query1: fq = 'field1:* AND field2:value' query2: fq = 'field2:value' If we assume that the result set is identical between the two queries and field1 is in general more frequent in the index, we noticed query1 takes 100x longer than query2. In case it matters field1 is of type tlongs while field2 is a string. Any tips for optimizing this? John
Re: Optimizing fq query performance
More constrained but matching the same set of documents just guarantees that there is more information to evaluate per document matched. For your specific case, you can optimize fq = 'field1:* AND field2:value' to &fq=field1:*&fq=field2:value This will at least cause field1:* to be cached and reused if it's a common pattern. field1:* is slow in general for indexed fields because all terms for the field need to be iterated (e.g. does term1 match doc1, does term2 match doc1, etc) One can optimize this by indexing a term in a different field to turn it into a single term query (i.e. exists:field1) -Yonik On Sat, Apr 13, 2019 at 2:58 PM John Davis wrote: > Hi there, > > We noticed a sizable performance degradation when we add certain fq filters > to the query even though the result set does not change between the two > queries. I would've expected solr to optimize internally by picking the > most constrained fq filter first, but maybe my understanding is wrong. > Here's an example: > > query1: fq = 'field1:* AND field2:value' > query2: fq = 'field2:value' > > If we assume that the result set is identical between the two queries and > field1 is in general more frequent in the index, we noticed query1 takes > 100x longer than query2. In case it matters field1 is of type tlongs while > field2 is a string. > > Any tips for optimizing this? > > John >
Multivalue Field lookup
Hello! I am new to SOLR. This is my field type definition: > stored="true" multiValued="true" omitTermFreqAndPositions="true" > omitNorms="true" /> One use-case we have is to lookup multiple myid with an OR like > fq=myid:(1 2 3 4..) I wish to know which entry in the fq matched this document. I am doing a group query now as a hack. like: > "group.query":["myid:1", "myid:2",...] Is there a better way to do this ? Regards, Kumaresh
Re: Optimizing fq query performance
Also note that field1:* does not necessarily match all documents. A document without that field will not match. So it really can’t be optimized they way you might expect since, as Yonik says, all the terms have to be enumerated…. Best, Erick > On Apr 13, 2019, at 12:30 PM, Yonik Seeley wrote: > > More constrained but matching the same set of documents just guarantees > that there is more information to evaluate per document matched. > For your specific case, you can optimize fq = 'field1:* AND field2:value' > to &fq=field1:*&fq=field2:value > This will at least cause field1:* to be cached and reused if it's a common > pattern. > field1:* is slow in general for indexed fields because all terms for the > field need to be iterated (e.g. does term1 match doc1, does term2 match > doc1, etc) > One can optimize this by indexing a term in a different field to turn it > into a single term query (i.e. exists:field1) > > -Yonik > > On Sat, Apr 13, 2019 at 2:58 PM John Davis > wrote: > >> Hi there, >> >> We noticed a sizable performance degradation when we add certain fq filters >> to the query even though the result set does not change between the two >> queries. I would've expected solr to optimize internally by picking the >> most constrained fq filter first, but maybe my understanding is wrong. >> Here's an example: >> >> query1: fq = 'field1:* AND field2:value' >> query2: fq = 'field2:value' >> >> If we assume that the result set is identical between the two queries and >> field1 is in general more frequent in the index, we noticed query1 takes >> 100x longer than query2. In case it matters field1 is of type tlongs while >> field2 is a string. >> >> Any tips for optimizing this? >> >> John >>
Re: Shard and replica went down in Solr 6.1.0
Thanks for your reply. 2> In production, lots of documents come for indexing within a second.If i do hard commit interval to 60 seconds then in less times open searchers when hard commit execute. Is it ohk for performance? 3> soft commit 60 second we can not do as of now because our product implement like NRT after indexing instant show that changes. Actually I have more analysed my production issue logs :: Replica becomes leader and after some times some documents come for indexing in new leader as a understanding of previously replica. My production log : Shard Log(10.102.119.85) : 2019-04-08 12:54:01.405 ERROR (updateExecutor-2-thread-36358-processing-http:10.101.111.80:8983//solr//product x:product r:core_node1 n:10.102.119.85:8983_solr s:shard1 c:product) [c:product s:shard1 r:core_node1 x:product] o.a.s.u.StreamingSolrClients error org.apache.solr.common.SolrException: Service Unavailable request: http://10.101.111.80:8983/solr/product/update?update.distrib=FROMLEADER&distrib.from=http%3A%2F%2F10.102.119.85%3A8983%2Fsolr%2Fproduct%2F&wt=javabin&version=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:320) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$3/30175207.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Replica Log(10.101.111.80) : 2019-04-08 12:52:19.717 INFO (zkCallback-4-thread-207-processing-n:10.101.111.80:8983_solr) [ ] o.a.s.c.c.ZkStateReader A live node change: [WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes], has occurred - updating... (live nodes size: [4]) 2019-04-08 12:52:19.725 INFO (zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product s:shard1 r:core_node3 x:product] o.a.s.c.ShardLeaderElectionContext Running the leader process for shard=shard1 and weAreReplacement=true and leaderVoteWait=18 2019-04-08 12:52:19.727 INFO (zkCallback-4-thread-207-processing-n:10.101.111.80:8983_solr) [ ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (4) -> (3) 2019-04-08 12:52:19.764 INFO (zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product s:shard1 r:core_node3 x:product] o.a.s.c.ShardLeaderElectionContext Checking if I should try and be the leader. 2019-04-08 12:52:19.765 INFO (zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product s:shard1 r:core_node3 x:product] o.a.s.c.ShardLeaderElectionContext My last published State was Active, it's okay to be the leader. 2019-04-08 12:52:19.766 INFO (zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product s:shard1 r:core_node3 x:product] o.a.s.c.ShardLeaderElectionContext I may be the new leader - try and sync 2019-04-08 12:52:19.767 WARN (zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product s:shard1 r:core_node3 x:product] o.a.s.c.RecoveryStrategy Stopping recovery for core=[product] coreNodeName=[core_node3] 2019-04-08 12:52:20.291 INFO (zkCallback-4-thread-207-processing-n:10.101.111.80:8983_solr) [ ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json], has occurred - updating... (live nodes size: [3]) 2019-04-08 12:52:20.532 INFO (zkCallback-4-thread-207-processing-n:10.101.111.80:8983_solr) [ ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json], has occurred - updating... (live nodes size: [3]) 2019-04-08 12:52:22.274 INFO (zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product s:shard1 r:core_node3 x:product] o.a.s.c.SyncStrategy Sync replicas to http://10.101.111.80:8983/solr/product/ 2019-04-08 12:52:22.274 INFO (zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product s:shard1 r:core_node3 x:product] o.a.s.c.SyncStrategy Sync Success - now sync replicas to me 2019-04-08 12:52:22.274 INFO (zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product s:shard1 r:core_node3 x:product] o.a.s.c.SyncStrategy http://10.101.111.80:8983/solr/product/ has no replicas 2019-04-08 12:52:22.275 INFO (zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product s:shard1 r:core_node3 x:product] o.a.s.c.ShardLeaderElectionContextBase Creating leader registration node /collections/product/leaders/shard1/leader after winning as /collections/product/leader_elect/shard1/election/245953614547976270-core_node3-n_01 2019-04-08 1
Re: Spatial Search using two separate fields for lat and long
Hi, I think your requirement of exporting back to CSV is fine but it's quite normal for there to be some transformation steps on input and/or output... and that such steps you mostly do yourself (not Solr). That said, one straight-forward solution is to have your spatial field be redundant with the lat & lon separately. Your spatial field could be stored=false, and the separate fields would be stored but otherwise not be indexed or have other characteristics that add weight. The result is efficient; no redundancies. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Wed, Apr 3, 2019 at 1:54 AM Tim Hedlund wrote: > Hi all, > > I'm importing documents (rows in excel file) that includes latitude and > longitude fields. I want to use those two separate fields for searching > with a bounding box. Is this possible (not using deprecated LatLonType) or > do I need to combine them into one single field when indexing? The reason I > want to keep the fields as two separate ones is that I want to be able to > export from solr back to exact same excel file structure, i.e. solr fields > maps exactly to excel columns. > > I'm using solr 7. Any thoughts or suggestions would be appreciated. > > Regards > Tim > >
Re: Spatial Search using two separate fields for lat and long
Specifically, the pre-processing can be done with UpdateRequestProcessors: https://lucene.apache.org/solr/guide/7_2/update-request-processors.html In your case, you probably want to chain *) CloneUpdate: http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html *) ConcatField: http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html Note that ConcatField URP inherits FieldMutating URP, so you have some flexibility on how you define the fields: http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html Set the target field to index-only (not stored) and it will only be used for search. The original fields can be set to be not-indexed, as David already explained. Regards, Alex. On Sat, 13 Apr 2019 at 23:50, David Smiley wrote: > > Hi, > > I think your requirement of exporting back to CSV is fine but it's quite > normal for there to be some transformation steps on input and/or output... > and that such steps you mostly do yourself (not Solr). That said, one > straight-forward solution is to have your spatial field be redundant with > the lat & lon separately. Your spatial field could be stored=false, and > the separate fields would be stored but otherwise not be indexed or have > other characteristics that add weight. The result is efficient; no > redundancies. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Wed, Apr 3, 2019 at 1:54 AM Tim Hedlund wrote: > > > Hi all, > > > > I'm importing documents (rows in excel file) that includes latitude and > > longitude fields. I want to use those two separate fields for searching > > with a bounding box. Is this possible (not using deprecated LatLonType) or > > do I need to combine them into one single field when indexing? The reason I > > want to keep the fields as two separate ones is that I want to be able to > > export from solr back to exact same excel file structure, i.e. solr fields > > maps exactly to excel columns. > > > > I'm using solr 7. Any thoughts or suggestions would be appreciated. > > > > Regards > > Tim > > > >
Re: Optimizing fq query performance
> field1:* is slow in general for indexed fields because all terms for the > field need to be iterated (e.g. does term1 match doc1, does term2 match > doc1, etc) This feels like something could be optimized internally by tracking existence of the field in a doc instead of making users index yet another field to track existence? BTW does this same behavior apply for tlong fields too where the value might be more continuous vs discrete strings? On Sat, Apr 13, 2019 at 12:30 PM Yonik Seeley wrote: > More constrained but matching the same set of documents just guarantees > that there is more information to evaluate per document matched. > For your specific case, you can optimize fq = 'field1:* AND field2:value' > to &fq=field1:*&fq=field2:value > This will at least cause field1:* to be cached and reused if it's a common > pattern. > field1:* is slow in general for indexed fields because all terms for the > field need to be iterated (e.g. does term1 match doc1, does term2 match > doc1, etc) > One can optimize this by indexing a term in a different field to turn it > into a single term query (i.e. exists:field1) > > -Yonik > > On Sat, Apr 13, 2019 at 2:58 PM John Davis > wrote: > > > Hi there, > > > > We noticed a sizable performance degradation when we add certain fq > filters > > to the query even though the result set does not change between the two > > queries. I would've expected solr to optimize internally by picking the > > most constrained fq filter first, but maybe my understanding is wrong. > > Here's an example: > > > > query1: fq = 'field1:* AND field2:value' > > query2: fq = 'field2:value' > > > > If we assume that the result set is identical between the two queries and > > field1 is in general more frequent in the index, we noticed query1 takes > > 100x longer than query2. In case it matters field1 is of type tlongs > while > > field2 is a string. > > > > Any tips for optimizing this? > > > > John > > >