Re: query optimization

2019-07-03 Thread Midas A
Please suggest here

On Wed, Jul 3, 2019 at 10:23 AM Midas A  wrote:

> Hi,
>
> How can i optimize following query it is taking time
>
>  webapp=/solr path=/search params={
> df=ttl&ps=0&hl=true&f.ind.mincount=1&hl.usePhraseHighlighter=true&lowercaseOperators=true&ps2=0&ps3=0&qf=contents^0.05+currdesig^1.5+predesig^1.5+lng^2+ttl+kw_skl+kw_it&sow=false&hl.fl=ttl,kw_skl,kw_it,contents&semantictermsttl=&f.cat.mincount=1&semanticfieldttl=ttl^0.1+currdesig^0.1+predesig^0.1&qs=0&qt=/resumesearch&semantictermsskl="mbbss"+OR+"medicine"&version=2&omitHeader=true&hl.q=mbbs,+"medical+officer",+doctor,+physician+("medical+officer")+"medical+officer"+"physician""+""general+physician""+""physicians""+""consultant+physician""+""house+physician"+"physician"+"doctor"+"mbbs"+"general+physician"+"physicians"+"consultant+physician"+"house+physician"&typeId=(293)&debugQuery=false&bq1=&echoParams=none&fl=id,upt&f.pref.mincount=1&q.op=OR&fq=NOT+contents:("liaise+with+medical+officer"+"worked+with+medical+officer"+"working+with+medical+officer"+"reported+to+medical+officer"+"references+are+medical+officer"+"coordinated+with+medical+officer"+"closely+with+medical+officer"+"signature+of+medical+officer"+"seal+of++medical+officer"+"liaise+with+physician"+"worked+with+physician"+"working+with+physician"+"reported+to+physician"+"references+are+physician"+"coordinated+with+physician"+"closely+with+physician"+"signature+of+physician"+"seal+of++physician"+"liaise+with+doctor"+"worked+with+doctor"+"working+with+doctor"+"reported+to+doctor"+"references+are+doctor"+"coordinated+with+doctor"+"closely+with+doctor"+"signature+of+doctor"+"seal+of++doctor")&fq=NOT+hemp:("xmwxagency"+"xmwxlimited"+"xmwxplacement"+"xmwxplus"+"xmwxprivate"+"xmwxsecurity"+"xmwxz2"+"xmwxand"+"xswxz2+plus+placement+and+security+agency+private+limited"+"xswxz2+plus+placement+and+security+agency+private"+"xswxz2+plus+placement+and+security+agency"+"xswxz2+plus+placement+and+security"+"xswxz2+plus+placement+and"+"xswxz2+plus+placement"+"xswxz2+plus"+"xswxz2")&fq=ctc:[100.0+TO+107.2]+OR+ctc:[-1.0+TO+-1.0]&fq=(dlh:(22))&fq=ind:(24++42++24++8)&fq=(rol:(292+293+294+322))&fq=(cat:(9))&fq=cat:(1000+OR+907+OR+1+OR+2+OR+3+OR+786+OR+4+OR+5+OR+6+OR+7+OR+8+OR+9+OR+10+OR+11+OR+12+OR+13+OR+14+OR+785+OR+15+OR+16+OR+17+OR+18+OR+908+OR+19+OR+20+OR+21+OR+23+OR+24)&fq=NOT+is_udis:2&bq=is_resume:0^-1000&bq=upt_date:[*+TO+NOW/DAY-36MONTHS]^2&bq=upt_date:[NOW/DAY-36MONTHS+TO+NOW/DAY-24MONTHS]^3&bq=upt_date:[NOW/DAY-24MONTHS+TO+NOW/DAY-12MONTHS]^4&bq=upt_date:[NOW/DAY-12MONTHS+TO+NOW/DAY-9MONTHS]^5&bq=upt_date:[NOW/DAY-9MONTHS+TO+NOW/DAY-6MONTHS]^10&bq=upt_date:[NOW/DAY-6MONTHS+TO+NOW/DAY-3MONTHS]^15&bq=upt_date:[NOW/DAY-3MONTHS+TO+*]^20&bq=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"&bq=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"&bq=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"&bq=dlh:(22)^8&bq={!boost+b%3D4}+_query_:{!edismax+qf%3D"currdesig^8+predesig^6+ttl^3+kw_skl^2+contents"+v%3D"\"doctor\"+\"medical+officer\"+\"physician\""+q.op%3DAND+bq%3D}&bq=_query_:{!edismax+qf%3D"currdesig+predesig+ttl+kw_skl+contents^0.01"+v%3D"\"doctor\"+\"medical+officer\"+\"physician\""+q.op%3DOR+bq%3D}&bq=NOT+country:isoin^-10&facet.query=exp:[+10+TO+11+]&facet.query=exp:[+11+TO+13+]&facet.query=exp:[+13+TO+15+]&facet.query=exp:[+15+TO+17+]&facet.query=exp:[+17+TO+20+]&facet.query=exp:[+20+TO+25+]&facet.query=exp:[+25+TO+109+]&facet.query=ctc:[+100+TO+101+]&facet.query=ctc:[+101+TO+101.5+]&facet.query=ctc:[+101.5+TO+102+]&facet.query=ctc:[+102+TO+103+]&facet.query=ctc:[+103+TO+104+]&facet.query=ctc:[+104+TO+105+]&facet.query=ctc:[+105+TO+107.5+]&facet.query=ctc:[+107.5+TO+110+]&facet.query=ctc:[+110+TO+115+]&facet.query=ctc:[+115+TO+10100+]&f.cl.mincount=1&queryany3=(22)&wt=javabin&queryany2=(293)&queryany1=(294)&queryany0=(322)&facet.field=ind&facet.field=cat&facet.field=rol&facet.field=cl&facet.field=pref&debug=false&f.rol.mincount=1&start=0&rows=40&q=((mbbs+OR+_query_:"{!edismax+qf%3Ddlh+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany3+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+((("medical+officer")+OR+"medical+officer"~0)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany0+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+(("doctor"+OR+doctor)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany2+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+(("physician"+OR+"physicians"+OR+"general+physician"+OR+"house+physician"+OR+"consultant+physician"+OR+physician)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany1+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+_query_:"{!edismax+qf%3D\$semanticfieldskl+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D\$semantictermsskl+q.op%3DOR+bq%3D\$bq1+bf%3D}

tlog/commit questions

2019-07-03 Thread Avi Steiner
Hi

We had some cases with customers (Solr 5.3.1, one search node, one shard) with 
huge tlog files (more than 1 GB).

Our settings:



1
3 
false 



180 



${solr.data.dir:}

  

I don't have enough logs so I don't know if commit failed or not. I just 
remember there were OOM messages.

As you may know, during restart, Solr tries to replay from tlog. It may take a 
lot of time. I tried to move the files to other location, started Solr and only 
after the core was loaded, I moved tlog back to their original location. They 
were cleared after a while.

So I have few questions:

  1.  Do you have any idea for commit failures?
  2.  Should we decrease the maxTime for hard commit or any other settings?
  3.  Is there any way to replay tlog asynchronously (or disable it, so we will 
be able to call it programmatically from our code in a separate thread), so 
Solr will be loaded more quickly?
  4.  Is there any improvement in Solr 7.3.1?

Thanks in advance

Avi



This email and any attachments thereto may contain private, confidential, and 
privileged material for the sole use of the intended recipient. Any review, 
copying, or distribution of this email (or any attachments thereto) by others 
is strictly prohibited. If you are not the intended recipient, please contact 
the sender immediately and permanently delete the original and any copies of 
this email and any attachments thereto.


Nested document using ContentStreamUpdateRequest

2019-07-03 Thread Sreejith Variyath
Hi,

I am tying to achieve below document nested structure in solr using
ContentStreamUpdateRequest. Its represent a document and its metadata.

 {
"id": "1235",
"title": "test",
"sourcemetadata": [{
"id": "455"
}, {
"size": "455"
}],
"content": "this is the content"
}

But when I used ContentStreamUpdateRequest, I am not able to upload the
document to solr. Its throwing the error

missing required field: sourcemetadata.id

Below the java code which I used.

ContentStreamUpdateRequest contentStreamUpdateRequest = new
ContentStreamUpdateRequest("/update/extract?fmap.content=content&commit=true");

   ModifiableSolrParams params = new ModifiableSolrParams();
   params.add("literal.id","5632");
   params.add("literal.sourcemetadata.id","567");
   params.add("literal.sourcemetadata.size",
String.valueOf(file.length()));
   contentStreamUpdateRequest.setParams(params);

Could you please help me how to write nested documents in solr using
ContentStreamUpdateRequest ?




-- 
Best Regards,


*Sreejith Variyath*
Lead Software Engineer
Tarams Software Technologies Pvt. Ltd.
Venus Buildings, 2nd Floor 1/2,3rd Main,
Kalyanamantapa Road Jakasandra, 1st Block Kormangala
Bangalore - 560034
Tarams 

-- 
w: www.tarams.com 

 
   
   

=
DISCLAIMER:
 The information in this message 
is confidential and may be legally 
privileged. It is intended solely for 
the addressee. Access to this 
message by anyone else is unauthorized. If 
you are not the intended 
recipient, any disclosure, copying, or 
distribution of the message, or 
any action or omission taken by you in 
reliance on it, is prohibited and
 may be unlawful. Please immediately 
contact the sender if you have 
received this message in error. Further, 
this e-mail may contain viruses
 and all reasonable precaution to minimize 
the risk arising there from 
is taken by Tarams. Tarams is not liable for 
any damage sustained by you
 as a result of any virus in this e-mail. All 
applicable virus checks 
should be carried out by you before opening this 
e-mail or any 
attachment thereto.
Thank you - Tarams Software Technologies 
Pvt.Ltd.
=


Re: SolrCloud indexing triggers merges and timeouts

2019-07-03 Thread Shawn Heisey

On 7/2/2019 10:53 PM, Rahul Goswami wrote:

Hi Shawn,

Thank you for the detailed suggestions. Although, I would like to
understand the maxMergeCount and maxThreadCount params better. The
documentation

mentions
that

maxMergeCount : The maximum number of simultaneous merges that are allowed.
maxThreadCount : The maximum number of simultaneous merge threads that
should be running at once

Since one thread can only do 1 merge at any given point of time, how does
maxMergeCount being greater than maxThreadCount help anyway? I am having
difficulty wrapping my head around this, and would appreciate if you could
help clear it for me.


The maxMergeCount setting controls the number of merges that can be 
*scheduled* at the same time.  As soon as that number of merges is 
reached, the indexing thread(s) will be paused until the number of 
merges in the schedule drops below this number.  This ensures that no 
more merges will be scheduled.


By setting maxMergeCount higher than the number of merges that are 
expected in the schedule, you can ensure that indexing will never be 
paused.  It would require very atypical merge policy settings for the 
number of scheduled merges to ever reach six.  On my own indexing, I 
reached three scheduled merges quite frequently.  The default setting 
for maxMergeCount is three.


The maxThreadCount setting controls how many of the scheduled merges 
will be simultaneously executed.  With index data on standard spinning 
disks, you do not want to increase this number beyond 1, or you will 
have a performance problem due to thrashing disk heads.  If your data is 
on SSD, you can make it larger than 1.


Thanks,
Shawn


Re: SolrCloud indexing triggers merges and timeouts

2019-07-03 Thread Erick Erickson
Two more tidbits to add to Shawn’s explanation:

There are heuristics built in to ConcurrentMergeScheduler.
From the Javadocs:
* If it's an SSD,
*  {@code maxThreadCount} is set to {@code max(1, min(4, cpuCoreCount/2))},
*  otherwise 1.  Note that detection only currently works on
*  Linux; other platforms will assume the index is not on an SSD.

Second, TieredMergePolicy (the default) merges in “tiers” that
are of similar size. So you can have multiple merges going on
at the same time on disjoint sets of segments.

Best,
Erick

> On Jul 3, 2019, at 7:54 AM, Shawn Heisey  wrote:
> 
> On 7/2/2019 10:53 PM, Rahul Goswami wrote:
>> Hi Shawn,
>> Thank you for the detailed suggestions. Although, I would like to
>> understand the maxMergeCount and maxThreadCount params better. The
>> documentation
>> 
>> mentions
>> that
>> maxMergeCount : The maximum number of simultaneous merges that are allowed.
>> maxThreadCount : The maximum number of simultaneous merge threads that
>> should be running at once
>> Since one thread can only do 1 merge at any given point of time, how does
>> maxMergeCount being greater than maxThreadCount help anyway? I am having
>> difficulty wrapping my head around this, and would appreciate if you could
>> help clear it for me.
> 
> The maxMergeCount setting controls the number of merges that can be 
> *scheduled* at the same time.  As soon as that number of merges is reached, 
> the indexing thread(s) will be paused until the number of merges in the 
> schedule drops below this number.  This ensures that no more merges will be 
> scheduled.
> 
> By setting maxMergeCount higher than the number of merges that are expected 
> in the schedule, you can ensure that indexing will never be paused.  It would 
> require very atypical merge policy settings for the number of scheduled 
> merges to ever reach six.  On my own indexing, I reached three scheduled 
> merges quite frequently.  The default setting for maxMergeCount is three.
> 
> The maxThreadCount setting controls how many of the scheduled merges will be 
> simultaneously executed. With index data on standard spinning disks, you do 
> not want to increase this number beyond 1, or you will have a performance 
> problem due to thrashing disk heads.  If your data is on SSD, you can make it 
> larger than 1.
> 
> Thanks,
> Shawn



Re: tlog/commit questions

2019-07-03 Thread Erick Erickson
Let’s take this a piece at a time.


1. commit failures are very rare, in fact the only time I’ve seen them is when 
running out of disk space, OOMs, pulling the plug, etc. Look in your log files, 
is there any evidence of same?

2. OOM messages. To support Real Time Get, internal structures are kept for all 
docs that have been indexed but no searcher has been opened to make visible. So 
you’re collecting up to 30 minutes of updates. This _may_ be relevant to your 
OOM problem. So I’d recommend dropping your soft commit interval to maybe 5 
minutes.

3. Tlogs shouldn’t replay much. They only replay when Solr quits abnormally, 
OOM, kill -9, pull the plug etc. When Solr is shut down gracefully, i.e. 
“bin/solr stop” etc, it should commit before closing and should _not_ replay 
anything from the tlog. Of course you should stop indexing while shutting down 
Solr…


4. There are lots of improvements in Solr 7x. Go to the latest Solr version 
(7.7.2) rather than 7.3.1. That said, TMP has been around for a long, long 
time. The low-level process of merging segments hasn’t been changed. One thing 
that _has_ changed is that TMP will now respect the max segment size (5G) when 
optimizing or doing an expungeDeletes. And I strongly recommend that you do 
neither of those unless you demonstrate need, just mentioning in case you 
already do that. 
7.4-: 
https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
7.5+: https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/

All in all, I’d recommend getting to the bottom of your OOM issues. Absent 
abnormal termination, tlog replay really shouldn’t be happening. Are you 
totally sure that it was TLOG replay and not a full sync from the leader?

Best,
Erick

> On Jul 3, 2019, at 12:36 AM, Avi Steiner  wrote:
> 
> Hi
> 
> We had some cases with customers (Solr 5.3.1, one search node, one shard) 
> with huge tlog files (more than 1 GB).
> 
> Our settings:
> 
> 
> 
>1
>3 
>false 
>
> 
>
>180 
> 
> 
>
>${solr.data.dir:}
>
>  
> 
> I don't have enough logs so I don't know if commit failed or not. I just 
> remember there were OOM messages.
> 
> As you may know, during restart, Solr tries to replay from tlog. It may take 
> a lot of time. I tried to move the files to other location, started Solr and 
> only after the core was loaded, I moved tlog back to their original location. 
> They were cleared after a while.
> 
> So I have few questions:
> 
>  1.  Do you have any idea for commit failures?
>  2.  Should we decrease the maxTime for hard commit or any other settings?
>  3.  Is there any way to replay tlog asynchronously (or disable it, so we 
> will be able to call it programmatically from our code in a separate thread), 
> so Solr will be loaded more quickly?
>  4.  Is there any improvement in Solr 7.3.1?
> 
> Thanks in advance
> 
> Avi
> 
> 
> 
> This email and any attachments thereto may contain private, confidential, and 
> privileged material for the sole use of the intended recipient. Any review, 
> copying, or distribution of this email (or any attachments thereto) by others 
> is strictly prohibited. If you are not the intended recipient, please contact 
> the sender immediately and permanently delete the original and any copies of 
> this email and any attachments thereto.



Solr query with best match returning high score

2019-07-03 Thread mganeshs
Hello Experts,

I have a following query

product:TV or os:Android or size:(55 60 65) or brand:samsung or issmart:yes
or ram:[4 TO *] or rate:[10 TO *] or bezel : no or sound:dolby

In Total there are 9 conditions. 

Now I need the document with best match should return top. Best match I mean
which satisfies all the 9 conditions ( like using AND instead of or ).
document where product is tv and os is android and size is 55 and brand is
samsung and issmart is yes and rame is 4 and rate is 115000 and bezel is no
and sound is dolby. 

Next could be documents which matches any of 8 conditions. I also have
scenerio with boosting certain fields ( brand:samsung) should have some
priority, so i can give boost for this.

Let me know how this could be achieved. Normal scoring is working bit
differently. Term which is rare among all the documents is having high
scoring. How can we disable that part. 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


org.apache.solr.search.SyntaxError with Join and Tag

2019-07-03 Thread Mikhail Ibraheem
Hi,This query :{!tag=_Customer_facet}{!join fromIndex=sf-details 
toIndex=Downloads from=_CUSTOMER_facet 
to=_Customer_facet}_ACTIVITY_facet:("Welcome")

gives exception org.apache.solr.search.SyntaxError: Cannot parse 
'_ACTIVITY_facet:((\"Welcome\"': Encountered \"\" at line 1, column 27
While these work fine 
1- {!tag=_Customer_facet}{!join fromIndex=sf-details toIndex=Downloads 
from=_CUSTOMER_facet to=_Customer_facet}_ACTIVITY_facet:"Welcome"

2- {!join fromIndex=sf-details toIndex=Downloads from=_CUSTOMER_facet 
to=_Customer_facet}_ACTIVITY_facet:("Welcome")


So when using {!tag} and {!join} I can't use the filter between ()And when 
using only {!join} I can use the filter between () like ("Welcome")
Please advise.
Thanks


Sort on PointFieldType

2019-07-03 Thread Prince Manohar
Hi,
I have a *field* that is of *PointType *and I tried to sort on that field.
But looks like sorting does not work on PointType.
Or am I doing something wrong?
Find my query below:-
http://localhost:8983/solr/testcollection/select?indent=on&q=*:*&sort=abc.pqr_d
DESC&wt=json


-- 
*Regards,
*
*Prince Manohar
*
*B.Tech (InformationTechnology)
*
*Bengaluru
*
*+91 7797045315
*


Re: Release of Solr 8.1.2 bug fix

2019-07-03 Thread Jason Gerlowski
Hi Edwin,

Solr releases can be a messy process.  They're subject to a lot of
unforeseen issues that can drag the process out: test failures
springing up at the last minute, other committers asking to squeeze in
last minute fixes, infrastructure problems cropping up unexpectedly,
etc.  So release-managers rarely offer timelines for when they'll be
able to finish a release.

Cao Manh Dat has volunteered to do the release, and is actively
working on it.  And all of the bugs have been merged that committers
asked Dat to wait for.  Beyond that, there's no real timeline.
Hopefully it'll be soon, but not necessarily.

Best,

Jason

On Wed, Jul 3, 2019 at 2:34 AM Zheng Lin Edwin Yeo  wrote:
>
> Hi,
>
> I understand that currently there is plan for a Solr 8.1.2 bug fix release
> to resolve some of the bugs, like the SOLR-13510 basic authentication issue.
>
> Would like to check, what is the timeline like for the release?
>
> Regards,
> Edwin


Re: Sort on PointFieldType

2019-07-03 Thread Mark Sholund
My thought is that “greater than” and “less than” are generally undefined for 
n-dimensional points where n>1.
Is (45,45) > (-45,-45)?  If you’re talking about distance from (0,0) they’re 
“equal”. If you’re talking about distance from some arbitrary point then they 
are not necessarily “equal”; what would make one sort higher/lower?

On Wed, Jul 3, 2019 at 2:50 PM, Prince Manohar  
wrote:

> Hi,
> I have a *field* that is of *PointType *and I tried to sort on that field.
> But looks like sorting does not work on PointType.
> Or am I doing something wrong?
> Find my query below:-
> http://localhost:8983/solr/testcollection/select?indent=on&q=*:*&sort=abc.pqr_d
> DESC&wt=json
> 
>
> --
> *Regards,
> *
> *Prince Manohar
> *
> *B.Tech (InformationTechnology)
> *
> *Bengaluru
> *
> *+91 7797045315
> *-BEGIN PGP PUBLIC KEY BLOCK-
Version: Pmcrypto Golang 0.0.1 (ddacebe0)
Comment: https://protonmail.com

xjMEXMJGxxYJKwYBBAHaRw8BAQdAbwlnObuOIUWLq2qqb+MFiIqxKvGaHeKEk/k/
7Eh5SUjNPyJtYXJrLmQuc2hvbHVuZEBwcm90b25tYWlsLmNvbSIgPG1hcmsuZC5z
aG9sdW5kQHByb3Rvbm1haWwuY29tPsJ3BBAWCgAfBQJcwkbHBgsJBwgDAgQVCAoC
AxYCAQIZAQIbAwIeAQAKCRB2Mb5icFoL0j/8AP9tDyF3ziA4+0zM93ZTD8FuffX0
6mAIbnW/EmXujHZLDQEA3ALWhh1hjlQpm2ruuF1+dlsngebhd1AO93xMsYhGkwPO
OARcwkbHEgorBgEEAZdVAQUBAQdAoA4U5UGvfPMnqvmLKkRdcvyL5tgFAkoSqSnJ
QWFauykDAQgHwmEEGBYIAAkFAlzCRscCGwwACgkQdjG+YnBaC9K9XwD+NyBcSQqc
pUop1n12B+VA/ZKRMNiz8LQusBUEEr9XAr4A/im3m0KIJGHSwgBTNzSuZreg5n6U
DLlTkt3B58b1z3wP
=BNNh
-END PGP PUBLIC KEY BLOCK-

Re: Release of Solr 8.1.2 bug fix

2019-07-03 Thread Zheng Lin Edwin Yeo
Hi Jason,

Ok understand. Thanks for the info.

Regards,
Edwin

On Thu, 4 Jul 2019 at 03:13, Jason Gerlowski  wrote:

> Hi Edwin,
>
> Solr releases can be a messy process.  They're subject to a lot of
> unforeseen issues that can drag the process out: test failures
> springing up at the last minute, other committers asking to squeeze in
> last minute fixes, infrastructure problems cropping up unexpectedly,
> etc.  So release-managers rarely offer timelines for when they'll be
> able to finish a release.
>
> Cao Manh Dat has volunteered to do the release, and is actively
> working on it.  And all of the bugs have been merged that committers
> asked Dat to wait for.  Beyond that, there's no real timeline.
> Hopefully it'll be soon, but not necessarily.
>
> Best,
>
> Jason
>
> On Wed, Jul 3, 2019 at 2:34 AM Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi,
> >
> > I understand that currently there is plan for a Solr 8.1.2 bug fix
> release
> > to resolve some of the bugs, like the SOLR-13510 basic authentication
> issue.
> >
> > Would like to check, what is the timeline like for the release?
> >
> > Regards,
> > Edwin
>


Re: Faceting with Stats

2019-07-03 Thread Ahmed Adel
Hi,

As per the documentation recommendation of using pivot with stats component
instead (
https://lucene.apache.org/solr/guide/8_1/faceting.html#combining-stats-component-with-pivots),
replacing the stats options that were previously used with the newer pivot
options as follows:

q: *
stats=true
stats.field={!tag=piv1 mean=true}average_rating_f
facet=true
facet.pivot={!stats=piv1}author_s

returns the following error:

Bad Message 400
reason: Illegal character SPACE=' '

This is a syntax issue rather than a logical one, however. Any thoughts of
what could be missing would be appreciated.

Thanks,
A. Adel

On Tue, Jul 2, 2019 at 4:38 PM Ahmed Adel  wrote:

> Hi,
>
> How can stats field value be calculated for top facet values? In other
> words, the following request parameters should return the stats.field
> measures for facets sorted by count:
>
> q: *
> wt: json
> stats: true
> stats.facet: authors_s
> stats.field: average_rating_f
> facet.missing: true
> f.authors_s.facet.sort: count
>
> However, the response is not sorted by facet field count. Is there
> something missing?
>
> Best,
> A.
>


RE: tlog/commit questions

2019-07-03 Thread Avi Steiner
Thanks for your reply, Erick.

1. Unfortunately, we got those incidents after a long time, and relevant log 
files have been already rolled, so I couldn't find commit failure messages, but 
since I found OOM messages in other logs, I can guess it was the root cause.
2. Just to be sure I'm right, every soft commit, a new searcher is opened and 
all caches are warmed up again. Doesn't it impact performance (memory, IO, 
CPU)? Is there a best practice?
3. I think I used the wrong term. We saw cases where the tlog files were huge 
(more than 1 GB). We tried to change some settings in restart Solr, but it took 
a lot of time to load, I guess because of these tlog files. So again, is there 
a way Solr can do it async and not suspend the loading?
4. We can't upgrade to version greater than 7.3.1 currently. The question is 
there something in commit/tlog management that was improved in 7.3.1?

Thanks again.


-Original Message-
From: Erick Erickson 
Sent: Wednesday, July 3, 2019 6:42 PM
To: solr-user@lucene.apache.org
Subject: Re: tlog/commit questions

External Email: Don’t open links/attachments from untrusted senders


Let’s take this a piece at a time.


1. commit failures are very rare, in fact the only time I’ve seen them is when 
running out of disk space, OOMs, pulling the plug, etc. Look in your log files, 
is there any evidence of same?

2. OOM messages. To support Real Time Get, internal structures are kept for all 
docs that have been indexed but no searcher has been opened to make visible. So 
you’re collecting up to 30 minutes of updates. This _may_ be relevant to your 
OOM problem. So I’d recommend dropping your soft commit interval to maybe 5 
minutes.

3. Tlogs shouldn’t replay much. They only replay when Solr quits abnormally, 
OOM, kill -9, pull the plug etc. When Solr is shut down gracefully, i.e. 
“bin/solr stop” etc, it should commit before closing and should _not_ replay 
anything from the tlog. Of course you should stop indexing while shutting down 
Solr…


4. There are lots of improvements in Solr 7x. Go to the latest Solr version 
(7.7.2) rather than 7.3.1. That said, TMP has been around for a long, long 
time. The low-level process of merging segments hasn’t been changed. One thing 
that _has_ changed is that TMP will now respect the max segment size (5G) when 
optimizing or doing an expungeDeletes. And I strongly recommend that you do 
neither of those unless you demonstrate need, just mentioning in case you 
already do that.
7.4-: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucidworks.com_post_segment-2Dmerging-2Ddeleted-2Ddocuments-2Doptimize-2Dmay-2Dbad_&d=DwIFaQ&c=TxO9TIZxM1NIgbR_44vEiALc2o8uaxixBRc1BtwrN08&r=N8Ef6xGR2eDgjA8I5q1SOErZhf616XiV4IPj4Ncf1w0&m=z9ADXLG0NJrFBm2Dxo17ipIMtPEHqjs1V_5liNAIwfk&s=6KFrZbKSYKabZXETQpnsK0CC2JkWue9WuURs6YcjO8M&e=
7.5+: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucidworks.com_post_solr-2Dand-2Doptimizing-2Dyour-2Dindex-2Dtake-2Dii_&d=DwIFaQ&c=TxO9TIZxM1NIgbR_44vEiALc2o8uaxixBRc1BtwrN08&r=N8Ef6xGR2eDgjA8I5q1SOErZhf616XiV4IPj4Ncf1w0&m=z9ADXLG0NJrFBm2Dxo17ipIMtPEHqjs1V_5liNAIwfk&s=XBru20LMKde85b29wQx2BkeLns6RyxHp3ZRzqdMMRlw&e=

All in all, I’d recommend getting to the bottom of your OOM issues. Absent 
abnormal termination, tlog replay really shouldn’t be happening. Are you 
totally sure that it was TLOG replay and not a full sync from the leader?

Best,
Erick

> On Jul 3, 2019, at 12:36 AM, Avi Steiner  wrote:
>
> Hi
>
> We had some cases with customers (Solr 5.3.1, one search node, one shard) 
> with huge tlog files (more than 1 GB).
>
> Our settings:
>
> 
> 
>1
>3 
>false 
>
>
>
>180  
>
>
>${solr.data.dir:}
>
>  
>
> I don't have enough logs so I don't know if commit failed or not. I just 
> remember there were OOM messages.
>
> As you may know, during restart, Solr tries to replay from tlog. It may take 
> a lot of time. I tried to move the files to other location, started Solr and 
> only after the core was loaded, I moved tlog back to their original location. 
> They were cleared after a while.
>
> So I have few questions:
>
>  1.  Do you have any idea for commit failures?
>  2.  Should we decrease the maxTime for hard commit or any other settings?
>  3.  Is there any way to replay tlog asynchronously (or disable it, so we 
> will be able to call it programmatically from our code in a separate thread), 
> so Solr will be loaded more quickly?
>  4.  Is there any improvement in Solr 7.3.1?
>
> Thanks in advance
>
> Avi
>
>
> 
> This email and any attachments thereto may contain private, confidential, and 
> privileged material for the sole use of the intended recipient. Any review, 
> copying, or distribution of this email (or any attachments thereto) by ot