Facet Query performance

2019-07-08 Thread Midas A
Hi ,

I have enabled docvalues on facet field but query is still taking time.

How i can improve the Query time .
 

*Query: *
http://X.X.X.X:
/solr/search/select?df=ttl&ps=0&hl=true&fl=id,upt&f.ind.mincount=1&hl.usePhraseHighlighter=true&f.pref.mincount=1&q.op=OR&fq=NOT+hemp:(%22xgidx29760%22+%22xmwxmonster%22+%22xmwxmonsterindia%22+%22xmwxcom%22+%22xswxmonster+com%22+%22xswxmonster%22+%22xswxmonsterindia+com%22+%22xswxmonsterindia%22)&fq=NOT+cEmp:(%
22nomster.com%22+OR+%22utyu%22)&fq=NOT+pEmp:(%22nomster.com
%22+OR+%22utyu%22)&fq=ind:(5)&fq=NOT+is_udis:2&fq=NOT+id:(92197+OR+240613+OR+249717+OR+1007148+OR+2500513+OR+2534675+OR+2813498+OR+9401682)&lowercaseOperators=true&ps2=0&bq=is_resume:0^-1000&bq=upt_date:[*+TO+NOW/DAY-36MONTHS]^2&bq=upt_date:[NOW/DAY-36MONTHS+TO+NOW/DAY-24MONTHS]^3&bq=upt_date:[NOW/DAY-24MONTHS+TO+NOW/DAY-12MONTHS]^4&bq=upt_date:[NOW/DAY-12MONTHS+TO+NOW/DAY-9MONTHS]^5&bq=upt_date:[NOW/DAY-9MONTHS+TO+NOW/DAY-6MONTHS]^10&bq=upt_date:[NOW/DAY-6MONTHS+TO+NOW/DAY-3MONTHS]^15&bq=upt_date:[NOW/DAY-3MONTHS+TO+*]^20&bq=NOT+country:isoin^-10&facet.query=exp:[+10+TO+11+]&facet.query=exp:[+11+TO+13+]&facet.query=exp:[+13+TO+15+]&facet.query=exp:[+15+TO+17+]&facet.query=exp:[+17+TO+20+]&facet.query=exp:[+20+TO+25+]&facet.query=exp:[+25+TO+109+]&facet.query=ctc:[+100+TO+101+]&facet.query=ctc:[+101+TO+101.5+]&facet.query=ctc:[+101.5+TO+102+]&facet.query=ctc:[+102+TO+103+]&facet.query=ctc:[+103+TO+104+]&facet.query=ctc:[+104+TO+105+]&facet.query=ctc:[+105+TO+107.5+]&facet.query=ctc:[+107.5+TO+110+]&facet.query=ctc:[+110+TO+115+]&facet.query=ctc:[+115+TO+10100+]&ps3=0&qf=contents^0.05+currdesig^1.5+predesig^1.5+lng^2+ttl+kw_skl+kw_it&f.cl.mincount=1&sow=false&hl.fl=ttl,kw_skl,kw_it,contents&wt=json&f.cat.mincount=1&qs=0&facet.field=ind&facet.field=cat&facet.field=rol&facet.field=cl&facet.field=pref&debug=timing&qt=/resumesearch&f.rol.mincount=1&start=0&rows=40&version=2&q=*&facet.limit=10&pf=id&hl.q=&facet.mincount=1&pf3=id&pf2=id&facet=true&debugQuery=false


Creating HttpEntityEnclosingRequestBase with a repeatable entity

2019-07-08 Thread Tomer Shahar
Hi. I'm using solrj (7.3.1).

I encountered an error for delete queries that fail on an unauthorized 
exception. I noticed other requests succeed.

I managed to track it down to NTLM authentications. 
org.apache.http.impl.execchain.MainClientExec (line 315) will remove 
authentication headers before the retry, so for any of the other requests, the 
second attempt passes.

Before getting into why it fails on the first attempt, I needed to understand 
what makes the delete queries special. Those request were never attempted a 
second time.

It got me to this in HttpSolrClient, in createMethod:


if(contentWriter != null) {
  String fullQueryUrl = url + wparams.toQueryString();
  HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == 
request.getMethod() ?
  new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
  postOrPut.addHeader("Content-Type",
  contentWriter.getContentType());
  postOrPut.setEntity(new BasicHttpEntity(){
@Override
public boolean isStreaming() {
  return true;
}

@Override
public void writeTo(OutputStream outstream) throws IOException {
  contentWriter.write(outstream);
}

  });
  return postOrPut;

I changed the BasicHttpEntity to be repeatable, as follows


if(contentWriter != null) {
  String fullQueryUrl = url + wparams.toQueryString();
  HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == 
request.getMethod() ?
  new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
  postOrPut.addHeader("Content-Type",
  contentWriter.getContentType());
  postOrPut.setEntity(new BasicHttpEntity(){
@Override
public boolean isStreaming() {
  return true;
}

@Override
public void writeTo(OutputStream outstream) throws IOException {
  contentWriter.write(outstream);
}

@Override
public boolean isRepeatable() {
  return true;
}
  });
  return postOrPut;


This changed allowed delete queries to work on the second attempt as the other 
requests.


My questions:


  1.  Is there a reason for the entity to NOT be repeatable?
  2.  It seems that the authentication headers that are added are implemented 
in httpclient-4.5.3 which solr depends on. Is there a way for me to configure 
it from SolrJ?



This email and any attachments thereto may contain private, confidential, and 
privileged material for the sole use of the intended recipient. Any review, 
copying, or distribution of this email (or any attachments thereto) by others 
is strictly prohibited. If you are not the intended recipient, please contact 
the sender immediately and permanently delete the original and any copies of 
this email and any attachments thereto.


Creating HttpEntityEnclosingRequestBase with a repeatable entity

2019-07-08 Thread Avi Steiner
Hi. I'm using solrj (7.3.1).

I encountered an error for delete queries that fail on an unauthorized 
exception. I noticed other requests succeed.

I managed to track it down to NTLM authentications. 
org.apache.http.impl.execchain.MainClientExec (line 315) will remove 
authentication headers before the retry, so for any of the other requests, the 
second attempt passes.

Before getting into why it fails on the first attempt, I needed to understand 
what makes the delete queries special. Those request were never attempted a 
second time.

It got me to this in HttpSolrClient, in createMethod:


if(contentWriter != null) {
  String fullQueryUrl = url + wparams.toQueryString();
  HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == 
request.getMethod() ?
  new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
  postOrPut.addHeader("Content-Type",
  contentWriter.getContentType());
  postOrPut.setEntity(new BasicHttpEntity(){
@Override
public boolean isStreaming() {
  return true;
}

@Override
public void writeTo(OutputStream outstream) throws IOException {
  contentWriter.write(outstream);
}

  });
  return postOrPut;

I changed the BasicHttpEntity to be repeatable, as follows


if(contentWriter != null) {
  String fullQueryUrl = url + wparams.toQueryString();
  HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == 
request.getMethod() ?
  new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
  postOrPut.addHeader("Content-Type",
  contentWriter.getContentType());
  postOrPut.setEntity(new BasicHttpEntity(){
@Override
public boolean isStreaming() {
  return true;
}

@Override
public void writeTo(OutputStream outstream) throws IOException {
  contentWriter.write(outstream);
}

@Override
public boolean isRepeatable() {
  return true;
}
  });
  return postOrPut;


This changed allowed delete queries to work on the second attempt as the other 
requests.


My questions:


  1.  Is there a reason for the entity to NOT be repeatable?
  2.  It seems that the authentication headers that are added are implemented 
in httpclient-4.5.3 which solr depends on. Is there a way for me to configure 
it from SolrJ?



This email and any attachments thereto may contain private, confidential, and 
privileged material for the sole use of the intended recipient. Any review, 
copying, or distribution of this email (or any attachments thereto) by others 
is strictly prohibited. If you are not the intended recipient, please contact 
the sender immediately and permanently delete the original and any copies of 
this email and any attachments thereto.


Re: Facet Query performance

2019-07-08 Thread Midas A
Hi
How i can know whether DocValues are getting used or not ?
Please help me here .

On Mon, Jul 8, 2019 at 2:38 PM Midas A  wrote:

> Hi ,
>
> I have enabled docvalues on facet field but query is still taking time.
>
> How i can improve the Query time .
>  docValues="true" multiValued="true" termVectors="true" /> 
>
> *Query: *
> http://X.X.X.X:
> /solr/search/select?df=ttl&ps=0&hl=true&fl=id,upt&f.ind.mincount=1&hl.usePhraseHighlighter=true&f.pref.mincount=1&q.op=OR&fq=NOT+hemp:(%22xgidx29760%22+%22xmwxmonster%22+%22xmwxmonsterindia%22+%22xmwxcom%22+%22xswxmonster+com%22+%22xswxmonster%22+%22xswxmonsterindia+com%22+%22xswxmonsterindia%22)&fq=NOT+cEmp:(%
> 22nomster.com%22+OR+%22utyu%22)&fq=NOT+pEmp:(%22nomster.com
> %22+OR+%22utyu%22)&fq=ind:(5)&fq=NOT+is_udis:2&fq=NOT+id:(92197+OR+240613+OR+249717+OR+1007148+OR+2500513+OR+2534675+OR+2813498+OR+9401682)&lowercaseOperators=true&ps2=0&bq=is_resume:0^-1000&bq=upt_date:[*+TO+NOW/DAY-36MONTHS]^2&bq=upt_date:[NOW/DAY-36MONTHS+TO+NOW/DAY-24MONTHS]^3&bq=upt_date:[NOW/DAY-24MONTHS+TO+NOW/DAY-12MONTHS]^4&bq=upt_date:[NOW/DAY-12MONTHS+TO+NOW/DAY-9MONTHS]^5&bq=upt_date:[NOW/DAY-9MONTHS+TO+NOW/DAY-6MONTHS]^10&bq=upt_date:[NOW/DAY-6MONTHS+TO+NOW/DAY-3MONTHS]^15&bq=upt_date:[NOW/DAY-3MONTHS+TO+*]^20&bq=NOT+country:isoin^-10&facet.query=exp:[+10+TO+11+]&facet.query=exp:[+11+TO+13+]&facet.query=exp:[+13+TO+15+]&facet.query=exp:[+15+TO+17+]&facet.query=exp:[+17+TO+20+]&facet.query=exp:[+20+TO+25+]&facet.query=exp:[+25+TO+109+]&facet.query=ctc:[+100+TO+101+]&facet.query=ctc:[+101+TO+101.5+]&facet.query=ctc:[+101.5+TO+102+]&facet.query=ctc:[+102+TO+103+]&facet.query=ctc:[+103+TO+104+]&facet.query=ctc:[+104+TO+105+]&facet.query=ctc:[+105+TO+107.5+]&facet.query=ctc:[+107.5+TO+110+]&facet.query=ctc:[+110+TO+115+]&facet.query=ctc:[+115+TO+10100+]&ps3=0&qf=contents^0.05+currdesig^1.5+predesig^1.5+lng^2+ttl+kw_skl+kw_it&f.cl.mincount=1&sow=false&hl.fl=ttl,kw_skl,kw_it,contents&wt=json&f.cat.mincount=1&qs=0&facet.field=ind&facet.field=cat&facet.field=rol&facet.field=cl&facet.field=pref&debug=timing&qt=/resumesearch&f.rol.mincount=1&start=0&rows=40&version=2&q=*&facet.limit=10&pf=id&hl.q=&facet.mincount=1&pf3=id&pf2=id&facet=true&debugQuery=false
>
>


Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-08 Thread Joseph_Tucker
Thanks again.

I guess I'll have to start researching how to create such custom indexing
scripts and determine which language would be best based on the environment
I'm using (Azure in this case). 

Appreciate the help greatly 




Charlie Hull-3 wrote
> On 05/07/2019 14:33, Joseph_Tucker wrote:
>> Thanks for your help / suggestion.
>>
>> I'm not sure I completely follow in this case.
>> SolrJ looks like a method to allow Java applications to talk to Solr, or
>> any
>> other third party application would simply be a communication method
>> between
>> Solr and the language of your choosing.
>>
>> I guess what I'm after is, how would using SolrJ improve performance when
>> indexing?
> 
> It's not just about improving performance (although DIH is single 
> threaded, so you could obtain a marked indexing performance gain using a 
> client such as SolrJ).  With DIH you will embed a lot of SQL code into 
> Solr's configuration files, and the more sources you add the more 
> complicated, hard to debug and unmaintainable it's going to be. You 
> should thus consider writing a proper indexing script in Java, Python or 
> whatever language you are most familiar with - this has always been our 
> approach.
> 
> Best
> 
> 
> Charlie
> 
>>
>> *** I could be wrong in my assumptions as I'm still learning a great deal
>> about Solr. ***
>>
>> I appreciate your help
>>
>> Regards,
>>
>> Joe
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 
> 
> -- 
> Charlie Hull
> Flax - Open Source Enterprise Search
> 
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-08 Thread Alexandre Rafalovitch
You may also want to look at the existing systems, such as
https://nifi.apache.org/

Regards,
   Alex.

On Mon, 8 Jul 2019 at 08:23, Joseph_Tucker
 wrote:
>
> Thanks again.
>
> I guess I'll have to start researching how to create such custom indexing
> scripts and determine which language would be best based on the environment
> I'm using (Azure in this case).
>
> Appreciate the help greatly
>
>
>
>
> Charlie Hull-3 wrote
> > On 05/07/2019 14:33, Joseph_Tucker wrote:
> >> Thanks for your help / suggestion.
> >>
> >> I'm not sure I completely follow in this case.
> >> SolrJ looks like a method to allow Java applications to talk to Solr, or
> >> any
> >> other third party application would simply be a communication method
> >> between
> >> Solr and the language of your choosing.
> >>
> >> I guess what I'm after is, how would using SolrJ improve performance when
> >> indexing?
> >
> > It's not just about improving performance (although DIH is single
> > threaded, so you could obtain a marked indexing performance gain using a
> > client such as SolrJ).  With DIH you will embed a lot of SQL code into
> > Solr's configuration files, and the more sources you add the more
> > complicated, hard to debug and unmaintainable it's going to be. You
> > should thus consider writing a proper indexing script in Java, Python or
> > whatever language you are most familiar with - this has always been our
> > approach.
> >
> > Best
> >
> >
> > Charlie
> >
> >>
> >> *** I could be wrong in my assumptions as I'm still learning a great deal
> >> about Solr. ***
> >>
> >> I appreciate your help
> >>
> >> Regards,
> >>
> >> Joe
> >>
> >>
> >>
> >> --
> >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
> >
> > --
> > Charlie Hull
> > Flax - Open Source Enterprise Search
> >
> > tel/fax: +44 (0)8700 118334
> > mobile:  +44 (0)7767 825828
> > web: www.flax.co.uk
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-08 Thread Jörn Franke
Ideally you use scripts that can use JVM/Java - in this way you can always use 
the latest SolrJ client library but also other libraries that are relevant (eg 
Tika for unstructured content).
This does not have to be Java directly but can be based also on Scala or JVM 
script languages, such as Groovy.

There are also wrappers for Python etc, but those may not always leverage the 
latest version of the library.

> Am 08.07.2019 um 14:23 schrieb Joseph_Tucker :
> 
> Thanks again.
> 
> I guess I'll have to start researching how to create such custom indexing
> scripts and determine which language would be best based on the environment
> I'm using (Azure in this case). 
> 
> Appreciate the help greatly 
> 
> 
> 
> 
> Charlie Hull-3 wrote
>>> On 05/07/2019 14:33, Joseph_Tucker wrote:
>>> Thanks for your help / suggestion.
>>> 
>>> I'm not sure I completely follow in this case.
>>> SolrJ looks like a method to allow Java applications to talk to Solr, or
>>> any
>>> other third party application would simply be a communication method
>>> between
>>> Solr and the language of your choosing.
>>> 
>>> I guess what I'm after is, how would using SolrJ improve performance when
>>> indexing?
>> 
>> It's not just about improving performance (although DIH is single 
>> threaded, so you could obtain a marked indexing performance gain using a 
>> client such as SolrJ).  With DIH you will embed a lot of SQL code into 
>> Solr's configuration files, and the more sources you add the more 
>> complicated, hard to debug and unmaintainable it's going to be. You 
>> should thus consider writing a proper indexing script in Java, Python or 
>> whatever language you are most familiar with - this has always been our 
>> approach.
>> 
>> Best
>> 
>> 
>> Charlie
>> 
>>> 
>>> *** I could be wrong in my assumptions as I'm still learning a great deal
>>> about Solr. ***
>>> 
>>> I appreciate your help
>>> 
>>> Regards,
>>> 
>>> Joe
>>> 
>>> 
>>> 
>>> --
>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>> 
>> 
>> -- 
>> Charlie Hull
>> Flax - Open Source Enterprise Search
>> 
>> tel/fax: +44 (0)8700 118334
>> mobile:  +44 (0)7767 825828
>> web: www.flax.co.uk
> 
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Relevance by term position

2019-07-08 Thread Jay Potharaju
Thanks, use of payloads works for my use case.
Jay

> On Jun 28, 2019, at 6:46 AM, Alexandre Rafalovitch  wrote:
> 
> This past thread may be relevant: 
> https://markmail.org/message/aau6bjllkpwcpmro
> It suggests that using SpanFirst of XMLQueryParser will have automatic
> boost for earlier matches.
> The other approach suggested was to use Payloads (which got better
> since the original thread).
> 
> Regards,
>   Alex.
> 
>> On Thu, 27 Jun 2019 at 22:01, Jay Potharaju  wrote:
>> 
>> Hi,
>> I am trying to implement autocomplete feature that should rank documents 
>> based on term position in the search field.
>> Example-
>> Doc1- hello world
>> Doc2- blue sky hello
>> Doc3 - John hello
>> 
>> Searching for hello should return
>> Hello world
>> John hello
>> Blue sky hello
>> 
>> I am currently using ngram to do autocomplete. But this does not allow me to 
>> rank results based on term position.
>> 
>> Any suggestions on how this can be done?
>> Thanks
>> 


Re: Facet Query performance

2019-07-08 Thread Shawn Heisey

On 7/8/2019 3:08 AM, Midas A wrote:

I have enabled docvalues on facet field but query is still taking time.

How i can improve the Query time .
docValues="true" multiValued="true" termVectors="true" /> 


*Query: *




There's very little information here -- only a single field definition 
and the query URL.  No information about how many documents, what sort 
of cardinality there is in the fields being used in the query, no 
information about memory and settings, etc.  You haven't even told us 
how long the query takes.


Your main query is a single * wildcard.  A wildcard query is typically 
quite slow.  If you are aiming for all documents, change that to q=*:* 
instead -- this is special syntax that the query parser understands, and 
is normally executed very quickly.


When a field has DocValues defined, it will automatically be used for 
field-based sorting, field-based facets, and field-based grouping. 
DocValues should not be relied on for queries, because indexed data is 
far faster for that usage.  Queries *can* be done with docValues, but it 
would be VERY slow.  Solr will avoid that usage if it can.


I'm reasonably certain that docValues will NOT be used for facet.query 
as long as the field is indexed.


You do have three-field based facets -- using the facet.field parameter. 
 If docValues was present on cat for ALL of the indexing that has 
happened, then they will work for that field, but you have not told us 
whether rol and pref have them defined.


You have a lot of faceting in this query.  That can cause things to be slow.

Thanks,
Shawn


Re: Relevance by term position

2019-07-08 Thread Erick Erickson
To re-enforce what Alex said, payloads have first-class Solr support as of Solr 
6.6, see: https://lucidworks.com/post/solr-payloads/

> On Jul 8, 2019, at 7:15 AM, Jay Potharaju  wrote:
> 
> Thanks, use of payloads works for my use case.
> Jay
> 
>> On Jun 28, 2019, at 6:46 AM, Alexandre Rafalovitch  
>> wrote:
>> 
>> This past thread may be relevant: 
>> https://markmail.org/message/aau6bjllkpwcpmro
>> It suggests that using SpanFirst of XMLQueryParser will have automatic
>> boost for earlier matches.
>> The other approach suggested was to use Payloads (which got better
>> since the original thread).
>> 
>> Regards,
>>  Alex.
>> 
>>> On Thu, 27 Jun 2019 at 22:01, Jay Potharaju  wrote:
>>> 
>>> Hi,
>>> I am trying to implement autocomplete feature that should rank documents 
>>> based on term position in the search field.
>>> Example-
>>> Doc1- hello world
>>> Doc2- blue sky hello
>>> Doc3 - John hello
>>> 
>>> Searching for hello should return
>>> Hello world
>>> John hello
>>> Blue sky hello
>>> 
>>> I am currently using ngram to do autocomplete. But this does not allow me 
>>> to rank results based on term position.
>>> 
>>> Any suggestions on how this can be done?
>>> Thanks
>>> 



Solr 7.7 autoscaling trigger

2019-07-08 Thread Mark Thill
My scenario is:

   - 60 GB collection
   - 2 shards of ~30GB
   - Each shard having 2 replicas so I have a backup
   - So I have 4 nodes with each node holding a single core

My goal is to have autoscaling handle when I lose a node.  So upon loss of
a node the nodeLost event deletes the node.  Then when I add back in
another node I want it to replace the node I lost keeping each shard with 2
replicas.   The problem is that I can't find a policy that keeps 2 replicas
per shard because when the nodeAdded event fires it wants to add a 3rd
replica to the shard that already has 2 replicas.  I can't seem to get it
to add the replica to the shard that is left with the single replica.

Any suggestions on a policy to keep this balanced?

Mark


Re: Facet Query performance

2019-07-08 Thread Shawn Heisey

On 7/8/2019 12:00 PM, Midas A wrote:

Number of Docs :50+ docs
Index Size: 300 GB
RAM: 256 GB
JVM: 32 GB


Half a million documents producing an index size of 300GB suggests 
*very* large documents.  That typically produces an index with fields 
that have very high cardinality, due to text tokenization.


Is Solr the only thing running on this machine, or does it have other 
memory-hungry software running on it?


The screenshot described at the following URL may provide more insight. 
It will be important to get the sort correct.  If the columns have been 
customized to show information other than the examples, it may need to 
be adjusted:


https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

Assuming that Solr is the only thing on the machine, then it means you 
have about 224 GB of memory available to cache your index data, which is 
at least 300GB.  Normally I would think being able to cache two thirds 
of the index should be enough for good performance, but it's always 
possible that there is something about your setup that means you don't 
have enough memory.


Are you sure that you need a 32GB heap?  Half a million documents should 
NOT require anywhere near that much heap.



Cardinality:
cat=44
rol=1005
ind=504
cl=2000


These cardinality values are VERY low.  If you are certain about those 
numbers, it is not likely that these fields are significant contributors 
to query time, either with or without docValues.  How did you obtain 
those numbers?


Those are not the only fields referenced in your query.  I also see these:

hemp
cEmp
pEmp
is_udis
id
is_resume
upt_date
country
exp
ctc
contents
currdesig
predesig
lng
ttl
kw_sql
kw_it


QTime:  2988 ms


Three seconds for a query with so many facets is something I would 
probably be pretty happy to get.



Our 35% queries takes more than 10 sec.


I have no idea what this sentence means.

Please suggest the ways to improve response time . Attached queries and 
schema.xml and solrconfig.xml


1. Is there any other ways to rewrite queries that improve our query 
performance .?


With the information available, the only suggestion I have currently is 
to replace "q=*" with "q=*:*" -- assuming that the intent is to match 
all documents with the main query.  According to what you attached 
(which I am very surprised to see -- attachments usually don't make it 
to the list), your df parameter is "ttl" ... a field that is heavily 
tokenized.  That means that the cardinality of the ttl field is probably 
VERY high, which would make the wildcard query VERY slow.


2. can we see the DocValues cache in plugin/ stats->cache-> section on 
solr UI panel ?


The admin UI only shows Solr caches.  If Lucene even has a docValues 
cache (and I do not know whether it does), it will not be available in 
Solr's statistics.  I am unaware of any cache in Solr for docValues. 
The entire point of docValues is to avoid the need to generate and cache 
large amounts of data, so I suspect there is not going to be anything 
available in this regard.


Thanks,
Shawn


Are docValues useful for FilterQueries?

2019-07-08 Thread Ashwin Ramesh
Hi everybody,

I can't find concrete evidence whether docValues are indeed useful for
filter queries. One example of a field:



This field will have a value between 0-1 The only usecase for this
field is to filter on a range / subset of values. There will be no scoring
/ querying on this field. Is this a good usecase for docValues? Regards, Ash

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
. ***
** Empowering the 
world to design
Also, we're hiring. Apply here! 

  
  
    
  








Re: Are docValues useful for FilterQueries?

2019-07-08 Thread Erick Erickson
DocValues are irrelevant for scoring. Here’s the way I think of it.

When querying (and thus scoring), you have a term X. I need to know
> what docs does it appear in?
> how many docs does it appear in?
> how often does the term appear in the entire corpus?

These are questions the inverted index (indexed=“true”) was designed
to answer.

For faceting, sorting and grouping, I want to know for a _document_,
what value appears in field Y. This is what docValues does much more
efficiently.

Best,
Erick
> On Jul 8, 2019, at 5:36 PM, Ashwin Ramesh  wrote:
> 
> Hi everybody,
> 
> I can't find concrete evidence whether docValues are indeed useful for
> filter queries. One example of a field:
> 
>  required="false" multiValued="false" />
> 
> This field will have a value between 0-1 The only usecase for this
> field is to filter on a range / subset of values. There will be no scoring
> / querying on this field. Is this a good usecase for docValues? Regards, Ash
> 
> -- 
> *P.S. We've launched a new blog to share the latest ideas and case studies 
> from our team. Check it out here: product.canva.com 
> . ***
> ** Empowering the 
> world to design
> Also, we're hiring. Apply here! 
> 
>  
>   
>     
>   
> 
> 
> 
> 
> 
> 



Understanding DebugQuery

2019-07-08 Thread Paresh Khandelwal
Hi All,

I tried to get the debug information about the query for my INNER JOIN and
ACROSS JOIN and trying to understand it.

See the query below - 1487 msec
{
  "responseHeader":{
"status":0, "QTime":1487,
"params":{  "q":"*:*",
  "fq.op":"AND",   "indent":"on",
  "fl":"TC_0Y0_Item_ID",
  "fq":["TC_0Y0_Occurrence_Name:\"6935 style rear MY11+\"",
"TC_0Y0_ProductScope:xtWNf_fTAaLUgD",
"{!join to=TC_0Y0_Item_ID
from=TC_0Y0_ItemRevision_0Y0_awp0Item_item_id
fromIndex=collection1}TC_0Y0_ItemRevision_0Y0_awp0Item_item_id:92138773"],
  "wt":"json",   "debugQuery":"on",
  "group.field":"TC_0Y0_Item_ID", ..
  "debug":{
"join":{
  "{!join from=TC_0Y0_ItemRevision_0Y0_awp0Item_item_id
to=TC_0Y0_Item_ID
fromIndex=collection1}TC_0Y0_ItemRevision_0Y0_awp0Item_item_id:92138773":{
"time":955,
"fromSetSize":3,
"toSetSize":14560,
"fromTermCount":6632106,
"fromTermTotalDf":6632106,
"fromTermDirectCount":6632106,
"fromTermHits":1,
"fromTermHitsTotalDf":1,
"toTermHits":1,
"toTermHitsTotalDf":14560,
"toTermDirectCount":0,
"smallSetsDeferred":1,
"toSetDocsAdded":14560}},
"rawquerystring":"*:*",
"querystring":"*:*",
"parsedquery":"MatchAllDocsQuery(*:*)",
"parsedquery_toString":"*:*",
"explain":{
  "AZD1uV0qgj6GxC":"\n1.0 = *:*, product of:\n  1.0 = boost\n
 1.0 = queryNorm\n"},
"QParser":"LuceneQParser",
"filter_queries":["TC_0Y0_Occurrence_Name:\"6935 style rear
MY11+\"",
  "TC_0Y0_ProductScope:xtWNf_fTAaLUgD",
  "{!join to=TC_0Y0_Item_ID
from=TC_0Y0_ItemRevision_0Y0_awp0Item_item_id
fromIndex=collection1}TC_0Y0_ItemRevision_0Y0_awp0Item_item_id:92138773"],
"parsed_filter_queries":["TC_0Y0_Occurrence_Name:6935 style
rear MY11+",
  "TC_0Y0_ProductScope:xtWNf_fTAaLUgD",
  "JoinQuery({!join
from=TC_0Y0_ItemRevision_0Y0_awp0Item_item_id to=TC_0Y0_Item_ID
fromIndex=collection1}TC_0Y0_ItemRevision_0Y0_awp0Item_item_id:92138773)"],
"timing":{
  "time":1487.0,  ..

I am trying to see why fromTermCount is so high when fromSetSize and
toSetSize is less?

Where can I find the details about all the contents of debugQuery and how
to read each component?

Any help is appreciated.

Regards,
Paresh


How to read DebugQuery output

2019-07-08 Thread Paresh Khandelwal
Hi All,

I tried to get the debug information about the query for my INNER JOIN and
ACROSS JOIN and trying to understand it.

See the query below - 1487 msec
{
  "responseHeader":{
"status":0, "QTime":1487,
"params":{  "q":"*:*",
  "fq.op":"AND",   "indent":"on",
  "fl":"TC_0Y0_Item_ID",
  "fq":["TC_0Y0_Occurrence_Name:\"6935 style rear MY11+\"",
"TC_0Y0_ProductScope:xtWNf_fTAaLUgD",
"{!join to=TC_0Y0_Item_ID
from=TC_0Y0_ItemRevision_0Y0_awp0Item_item_id
fromIndex=collection1}TC_0Y0_ItemRevision_0Y0_awp0Item_item_id:92138773"],
  "wt":"json",   "debugQuery":"on",
  "group.field":"TC_0Y0_Item_ID", ..
  "debug":{
"join":{
  "{!join from=TC_0Y0_ItemRevision_0Y0_awp0Item_item_id
to=TC_0Y0_Item_ID
fromIndex=collection1}TC_0Y0_ItemRevision_0Y0_awp0Item_item_id:92138773":{
"time":955,
"fromSetSize":3,
"toSetSize":14560,
"fromTermCount":6632106,
"fromTermTotalDf":6632106,
"fromTermDirectCount":6632106,
"fromTermHits":1,
"fromTermHitsTotalDf":1,
"toTermHits":1,
"toTermHitsTotalDf":14560,
"toTermDirectCount":0,
"smallSetsDeferred":1,
"toSetDocsAdded":14560}},
"rawquerystring":"*:*",
"querystring":"*:*",
"parsedquery":"MatchAllDocsQuery(*:*)",
"parsedquery_toString":"*:*",
"explain":{
  "AZD1uV0qgj6GxC":"\n1.0 = *:*, product of:\n  1.0 = boost\n
 1.0 = queryNorm\n"},
"QParser":"LuceneQParser",
"filter_queries":["TC_0Y0_Occurrence_Name:\"6935 style rear
MY11+\"",
  "TC_0Y0_ProductScope:xtWNf_fTAaLUgD",
  "{!join to=TC_0Y0_Item_ID
from=TC_0Y0_ItemRevision_0Y0_awp0Item_item_id
fromIndex=collection1}TC_0Y0_ItemRevision_0Y0_awp0Item_item_id:92138773"],
"parsed_filter_queries":["TC_0Y0_Occurrence_Name:6935 style
rear MY11+",
  "TC_0Y0_ProductScope:xtWNf_fTAaLUgD",
  "JoinQuery({!join
from=TC_0Y0_ItemRevision_0Y0_awp0Item_item_id to=TC_0Y0_Item_ID
fromIndex=collection1}TC_0Y0_ItemRevision_0Y0_awp0Item_item_id:92138773)"],
"timing":{
  "time":1487.0,  ..

I am trying to see why fromTermCount is so high when fromSetSize and
toSetSize is less?

Where can I find the details about all the contents of debugQuery and how
to read each component?

Any help is appreciated.

Regards,
Paresh