date:20190413

Optimizing fq query performance

2019-04-13 Thread John Davis

Hi there,

We noticed a sizable performance degradation when we add certain fq filters
to the query even though the result set does not change between the two
queries. I would've expected solr to optimize internally by picking the
most constrained fq filter first, but maybe my understanding is wrong.
Here's an example:

query1: fq = 'field1:* AND field2:value'
query2: fq = 'field2:value'

If we assume that the result set is identical between the two queries and
field1 is in general more frequent in the index, we noticed query1 takes
100x longer than query2. In case it matters field1 is of type tlongs while
field2 is a string.

Any tips for optimizing this?

John

Re: Optimizing fq query performance

2019-04-13 Thread Yonik Seeley

More constrained but matching the same set of documents just guarantees
that there is more information to evaluate per document matched.
For your specific case, you can optimize fq = 'field1:* AND field2:value'
to &fq=field1:*&fq=field2:value
This will at least cause field1:* to be cached and reused if it's a common
pattern.
field1:* is slow in general for indexed fields because all terms for the
field need to be iterated (e.g. does term1 match doc1, does term2 match
doc1, etc)
One can optimize this by indexing a term in a different field to turn it
into a single term query (i.e. exists:field1)

-Yonik

On Sat, Apr 13, 2019 at 2:58 PM John Davis 
wrote:

> Hi there,
>
> We noticed a sizable performance degradation when we add certain fq filters
> to the query even though the result set does not change between the two
> queries. I would've expected solr to optimize internally by picking the
> most constrained fq filter first, but maybe my understanding is wrong.
> Here's an example:
>
> query1: fq = 'field1:* AND field2:value'
> query2: fq = 'field2:value'
>
> If we assume that the result set is identical between the two queries and
> field1 is in general more frequent in the index, we noticed query1 takes
> 100x longer than query2. In case it matters field1 is of type tlongs while
> field2 is a string.
>
> Any tips for optimizing this?
>
> John
>

Multivalue Field lookup

2019-04-13 Thread Kumaresh AK

Hello!
I am new to SOLR. This is my field type definition:

>  stored="true" multiValued="true" omitTermFreqAndPositions="true"
> omitNorms="true" />


One use-case we have is to lookup multiple myid with an OR like

> fq=myid:(1 2 3 4..)

I wish to know which entry in the fq matched this document. I am doing a
group query now as a hack. like:

> "group.query":["myid:1", "myid:2",...]

Is there a better way to do this ?

Regards,
Kumaresh

Re: Optimizing fq query performance

2019-04-13 Thread Erick Erickson

Also note that field1:* does not necessarily match all documents. A document 
without that field will not match. So it really can’t be optimized they way you 
might expect since, as Yonik says, all the terms have to be enumerated….

Best,
Erick

> On Apr 13, 2019, at 12:30 PM, Yonik Seeley  wrote:
> 
> More constrained but matching the same set of documents just guarantees
> that there is more information to evaluate per document matched.
> For your specific case, you can optimize fq = 'field1:* AND field2:value'
> to &fq=field1:*&fq=field2:value
> This will at least cause field1:* to be cached and reused if it's a common
> pattern.
> field1:* is slow in general for indexed fields because all terms for the
> field need to be iterated (e.g. does term1 match doc1, does term2 match
> doc1, etc)
> One can optimize this by indexing a term in a different field to turn it
> into a single term query (i.e. exists:field1)
> 
> -Yonik
> 
> On Sat, Apr 13, 2019 at 2:58 PM John Davis 
> wrote:
> 
>> Hi there,
>> 
>> We noticed a sizable performance degradation when we add certain fq filters
>> to the query even though the result set does not change between the two
>> queries. I would've expected solr to optimize internally by picking the
>> most constrained fq filter first, but maybe my understanding is wrong.
>> Here's an example:
>> 
>> query1: fq = 'field1:* AND field2:value'
>> query2: fq = 'field2:value'
>> 
>> If we assume that the result set is identical between the two queries and
>> field1 is in general more frequent in the index, we noticed query1 takes
>> 100x longer than query2. In case it matters field1 is of type tlongs while
>> field2 is a string.
>> 
>> Any tips for optimizing this?
>> 
>> John
>>

Re: Shard and replica went down in Solr 6.1.0

2019-04-13 Thread vishal patel


Thanks for your reply.

2> In production, lots of documents come for indexing within a second.If i do 
hard commit interval to 60 seconds then in less times open searchers when hard 
commit execute. Is it ohk for performance?
3> soft commit 60 second we can not do as of now because our product implement 
like NRT after indexing instant show that changes.

Actually I have more analysed my production issue logs ::
Replica becomes leader and after some times some documents come for indexing in 
new leader as a understanding of previously replica.

My production log :

Shard Log(10.102.119.85) :

2019-04-08 12:54:01.405 ERROR 
(updateExecutor-2-thread-36358-processing-http:10.101.111.80:8983//solr//product
 x:product r:core_node1 n:10.102.119.85:8983_solr s:shard1 c:product) 
[c:product s:shard1 r:core_node1 x:product] o.a.s.u.StreamingSolrClients error
org.apache.solr.common.SolrException: Service Unavailable
request: 
http://10.101.111.80:8983/solr/product/update?update.distrib=FROMLEADER&distrib.from=http%3A%2F%2F10.102.119.85%3A8983%2Fsolr%2Fproduct%2F&wt=javabin&version=2
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:320)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$3/30175207.run(Unknown
 Source)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Replica Log(10.101.111.80) :

2019-04-08 12:52:19.717 INFO  
(zkCallback-4-thread-207-processing-n:10.101.111.80:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader A live node change: [WatchedEvent state:SyncConnected 
type:NodeChildrenChanged path:/live_nodes], has occurred - updating... (live 
nodes size: [4])
2019-04-08 12:52:19.725 INFO  
(zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product 
s:shard1 r:core_node3 x:product] o.a.s.c.ShardLeaderElectionContext Running the 
leader process for shard=shard1 and weAreReplacement=true and 
leaderVoteWait=18
2019-04-08 12:52:19.727 INFO  
(zkCallback-4-thread-207-processing-n:10.101.111.80:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (4) -> (3)
2019-04-08 12:52:19.764 INFO  
(zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product 
s:shard1 r:core_node3 x:product] o.a.s.c.ShardLeaderElectionContext Checking if 
I should try and be the leader.
2019-04-08 12:52:19.765 INFO  
(zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product 
s:shard1 r:core_node3 x:product] o.a.s.c.ShardLeaderElectionContext My last 
published State was Active, it's okay to be the leader.
2019-04-08 12:52:19.766 INFO  
(zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product 
s:shard1 r:core_node3 x:product] o.a.s.c.ShardLeaderElectionContext I may be 
the new leader - try and sync
2019-04-08 12:52:19.767 WARN  
(zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product 
s:shard1 r:core_node3 x:product] o.a.s.c.RecoveryStrategy Stopping recovery for 
core=[product] coreNodeName=[core_node3]
2019-04-08 12:52:20.291 INFO  
(zkCallback-4-thread-207-processing-n:10.101.111.80:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json], has occurred 
- updating... (live nodes size: [3])
2019-04-08 12:52:20.532 INFO  
(zkCallback-4-thread-207-processing-n:10.101.111.80:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json], has occurred 
- updating... (live nodes size: [3])
2019-04-08 12:52:22.274 INFO  
(zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product 
s:shard1 r:core_node3 x:product] o.a.s.c.SyncStrategy Sync replicas to 
http://10.101.111.80:8983/solr/product/
2019-04-08 12:52:22.274 INFO  
(zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product 
s:shard1 r:core_node3 x:product] o.a.s.c.SyncStrategy Sync Success - now sync 
replicas to me
2019-04-08 12:52:22.274 INFO  
(zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product 
s:shard1 r:core_node3 x:product] o.a.s.c.SyncStrategy 
http://10.101.111.80:8983/solr/product/ has no replicas
2019-04-08 12:52:22.275 INFO  
(zkCallback-4-thread-202-processing-n:10.101.111.80:8983_solr) [c:product 
s:shard1 r:core_node3 x:product] o.a.s.c.ShardLeaderElectionContextBase 
Creating leader registration node /collections/product/leaders/shard1/leader 
after winning as 
/collections/product/leader_elect/shard1/election/245953614547976270-core_node3-n_01
2019-04-08 1

Re: Spatial Search using two separate fields for lat and long

2019-04-13 Thread David Smiley

Hi,

I think your requirement of exporting back to CSV is fine but it's quite
normal for there to be some transformation steps on input and/or output...
and that such steps you mostly do yourself (not Solr).  That said, one
straight-forward solution is to have your spatial field be redundant with
the lat & lon separately.  Your spatial field could be stored=false, and
the separate fields would be stored but otherwise not be indexed or have
other characteristics that add weight.  The result is efficient; no
redundancies.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Apr 3, 2019 at 1:54 AM Tim Hedlund  wrote:

> Hi all,
>
> I'm importing documents (rows in excel file) that includes latitude and
> longitude fields. I want to use those two separate fields for searching
> with a bounding box. Is this possible (not using deprecated LatLonType) or
> do I need to combine them into one single field when indexing? The reason I
> want to keep the fields as two separate ones is that I want to be able to
> export from solr back to exact same excel file structure, i.e. solr fields
> maps exactly to excel columns.
>
> I'm using solr 7. Any thoughts or suggestions would be appreciated.
>
> Regards
> Tim
>
>

Re: Spatial Search using two separate fields for lat and long

2019-04-13 Thread Alexandre Rafalovitch

Specifically, the pre-processing can be done with
UpdateRequestProcessors:
https://lucene.apache.org/solr/guide/7_2/update-request-processors.html

In your case, you probably want to chain
*) CloneUpdate:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html
*) ConcatField:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html

Note that ConcatField URP inherits FieldMutating URP, so you have some
flexibility on how you define the fields:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html

Set the target field to index-only (not stored) and it will only be
used for search. The original fields can be set to be not-indexed, as
David already explained.

Regards,
   Alex.

On Sat, 13 Apr 2019 at 23:50, David Smiley  wrote:
>
> Hi,
>
> I think your requirement of exporting back to CSV is fine but it's quite
> normal for there to be some transformation steps on input and/or output...
> and that such steps you mostly do yourself (not Solr).  That said, one
> straight-forward solution is to have your spatial field be redundant with
> the lat & lon separately.  Your spatial field could be stored=false, and
> the separate fields would be stored but otherwise not be indexed or have
> other characteristics that add weight.  The result is efficient; no
> redundancies.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, Apr 3, 2019 at 1:54 AM Tim Hedlund  wrote:
>
> > Hi all,
> >
> > I'm importing documents (rows in excel file) that includes latitude and
> > longitude fields. I want to use those two separate fields for searching
> > with a bounding box. Is this possible (not using deprecated LatLonType) or
> > do I need to combine them into one single field when indexing? The reason I
> > want to keep the fields as two separate ones is that I want to be able to
> > export from solr back to exact same excel file structure, i.e. solr fields
> > maps exactly to excel columns.
> >
> > I'm using solr 7. Any thoughts or suggestions would be appreciated.
> >
> > Regards
> > Tim
> >
> >

Re: Optimizing fq query performance

2019-04-13 Thread John Davis

> field1:* is slow in general for indexed fields because all terms for the
> field need to be iterated (e.g. does term1 match doc1, does term2 match
> doc1, etc)

This feels like something could be optimized internally by tracking
existence of the field in a doc instead of making users index yet another
field to track existence?

BTW does this same behavior apply for tlong fields too where the value
might be more continuous vs discrete strings?

On Sat, Apr 13, 2019 at 12:30 PM Yonik Seeley  wrote:

> More constrained but matching the same set of documents just guarantees
> that there is more information to evaluate per document matched.
> For your specific case, you can optimize fq = 'field1:* AND field2:value'
> to &fq=field1:*&fq=field2:value
> This will at least cause field1:* to be cached and reused if it's a common
> pattern.
> field1:* is slow in general for indexed fields because all terms for the
> field need to be iterated (e.g. does term1 match doc1, does term2 match
> doc1, etc)
> One can optimize this by indexing a term in a different field to turn it
> into a single term query (i.e. exists:field1)
>
> -Yonik
>
> On Sat, Apr 13, 2019 at 2:58 PM John Davis 
> wrote:
>
> > Hi there,
> >
> > We noticed a sizable performance degradation when we add certain fq
> filters
> > to the query even though the result set does not change between the two
> > queries. I would've expected solr to optimize internally by picking the
> > most constrained fq filter first, but maybe my understanding is wrong.
> > Here's an example:
> >
> > query1: fq = 'field1:* AND field2:value'
> > query2: fq = 'field2:value'
> >
> > If we assume that the result set is identical between the two queries and
> > field1 is in general more frequent in the index, we noticed query1 takes
> > 100x longer than query2. In case it matters field1 is of type tlongs
> while
> > field2 is a string.
> >
> > Any tips for optimizing this?
> >
> > John
> >
>

Optimizing fq query performance

Re: Optimizing fq query performance

Multivalue Field lookup

Re: Optimizing fq query performance

Re: Shard and replica went down in Solr 6.1.0

Re: Spatial Search using two separate fields for lat and long

Re: Spatial Search using two separate fields for lat and long

Re: Optimizing fq query performance

8 matches

Site Navigation

Mail list logo

Footer information