Solr datePointField facet

2020-02-25 Thread Srinivas Kashyap
Hi all,

I have a date field in my schema and I'm trying to facet on that field and 
getting below error:



This field I'm copying to text field(copyfield) as well.



Error:
Can't facet on a PointField without docValues

I tried adding like below:





And after the changes, I did full reindex of the core and restarted as well.

But still facing the same error. Can somebody please help.

Thanks,
Srinivas




DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Re: Is it possible to add stemming in a text_exact field

2020-02-25 Thread Paras Lehana
Hi Dhanesh,

Use KeywordRepeatFilterFactory
.
It will emit each token twice and marking one of them as KEYWORD so
stemming won't work on that token. Use RemoveDuplicates to remove the
duplicates after this.

On Fri, 24 Jan 2020 at 17:13, Lucky Sharma  wrote:

> Hi Dhanesh,
> I have also encountered the problem long back when we have 'skimmed milk'
> and need to search for 'skim milk', for that we have written one filter,
> such that we can customize it, and then use KStemmer, then apply the custom
> ConcatPhraseFilterFactory.
>
> You can use the link mentioned below to review:
> https://github.com/MighTguY/solr-extensions
>
> Regards,
> Lucky Sharma
>
> On Thu, 23 Jan, 2020, 8:58 pm Alessandro Benedetti, 
> wrote:
>
> > Edward is correct, furthermore using a stemmer in an analysis chain that
> > don't tokenise is going to work just for single term queries and single
> > term field values...
> > Not sure it was intended ...
> >
> > Cheers
> >
> >
> > --
> > Alessandro Benedetti
> > Search Consultant, R&D Software Engineer, Director
> > www.sease.io
> >
> >
> > On Wed, 22 Jan 2020 at 16:26, Edward Ribeiro 
> > wrote:
> >
> > > Hi,
> > >
> > > One possible solution would be to create a second field (e.g.,
> > > text_general) that uses DefaultTokenizer, or other tokenizer that
> breaks
> > > the string into tokens, and use a copyField to copy the content from
> > > text_exact to text_general. Then, you can use edismax parser to search
> > both
> > > fields, but giving text_exact a higher boost (qf=text_exact^5
> > > text_general). In this case, both fields should be indexed, but only
> one
> > > needs to be stored.
> > >
> > > Edward
> > >
> > > On Wed, Jan 22, 2020 at 10:34 AM Dhanesh Radhakrishnan <
> > dhan...@hifx.co.in
> > > >
> > > wrote:
> > >
> > > > Hello,
> > > > I'm facing an issue with stemming.
> > > > My search query is "restaurant dubai" and returns  results.
> > > > If I search "restaurants dubai" it returns no data.
> > > >
> > > > How to stem this keyword "restaurant dubai" with "restaurants dubai"
> ?
> > > >
> > > > I'm using a text exact field for search.
> > > >
> > > >  > > > multiValued="true" omitNorms="false"
> omitTermFreqAndPositions="false"/>
> > > >
> > > > Here is the field definition
> > > >
> > > >  > > > positionIncrementGap="100">
> > > > 
> > > >
> > > >
> > > >
> > > >
> > > > 
> > > > 
> > > >   
> > > >   
> > > >   
> > > >   
> > > >
> > > > 
> > > >
> > > > Is there any solutions without changing the tokenizer class.
> > > >
> > > >
> > > >
> > > >
> > > > Dhanesh S.R
> > > >
> > > > --
> > > > IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd.
> Its
> > > > content are confidential to the intended recipient. If you are not
> the
> > > > intended recipient, be advised that you have received this e-mail in
> > > error
> > > > and that any use, dissemination, forwarding, printing or copying of
> > this
> > > > e-mail is strictly prohibited. It may not be disclosed to or used by
> > > > anyone
> > > > other than its intended recipient, nor may it be copied in any way.
> If
> > > > received in error, please email a reply to the sender, then delete it
> > > from
> > > > your system.
> > > >
> > > > Although this e-mail has been scanned for viruses, HiFX
> > > > cannot ultimately accept any responsibility for viruses and it is
> your
> > > > responsibility to scan attachments (if any).
> > > >
> > > > ​Before you print this email
> > > > or attachments, please consider the negative environmental impacts
> > > > associated with printing.
> > > >
> > >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 


Re: Solr console showing error in 7 .7

2020-02-25 Thread Paras Lehana
Please post full error possibly with trace (see logs).

On Mon, 20 Jan 2020 at 22:29, Rajdeep Sahoo 
wrote:

> When reloading the solr console,it is showing some error in the console
> itself for some small amount of time.
> The error is error reloading/initialising the core.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 


Re: Solr datePointField facet

2020-02-25 Thread Paras Lehana
Hi Srinivas,

But still facing the same error.


The same error? Can you please post the facet query? Please post (part of)
your schema too.

On Tue, 25 Feb 2020 at 16:00, Srinivas Kashyap
 wrote:

> Hi all,
>
> I have a date field in my schema and I'm trying to facet on that field and
> getting below error:
>
>  omitTermFreqAndPositions="true"  multiValued="true" />
>
> This field I'm copying to text field(copyfield) as well.
>
> 
>
> Error:
> Can't facet on a PointField without docValues
>
> I tried adding like below:
>
> 
>
>  omitTermFreqAndPositions="true"  multiValued="true" />
>
> And after the changes, I did full reindex of the core and restarted as
> well.
>
> But still facing the same error. Can somebody please help.
>
> Thanks,
> Srinivas
>
>
>
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
>
> Disclaimer
>
> The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
>
> This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for business. Providing a safer and more useful place for
> your human generated data. Specializing in; Security, archiving and
> compliance. To find out more visit the Mimecast website.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 


RE: Solr datePointField facet

2020-02-25 Thread Srinivas Kashyap
Hi Paras,

PFB details:

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://localhost:8983/tssindex/party: SolrCore is loading
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)
at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)

SCHEMA FILE:







  

  







































  

  




  









  
  








  



  




  
  



  




  








  




  








  









   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   
   
   
   

   
   
   
   
   
   
   
   
   
   
   

   
   
   
   
   

   
   
   

   
   
   
   


   
   
   


   
   
   
   
   
   


   
   
   

   


   
   

   

   
   
   
   
   
   

   
   
   

   
   


   
   
   
   
   
   
   
   

   
   
   
   
   
   

   
   


   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   

   
   
   
   

   
   
   
   
   
   
   
   
   

   



ID







  
   


































































































































































































Thanks and Regards,
Srinivas Kashyap

From: Paras Lehana 
Sent: 25 February 2020 16:33
To: solr-user@lucene.apache.org
Subject: Re: Solr datePointField facet

Hi Srinivas,

But still facing the same error.


The same error? Can you please post

Re: Re: Query Autocomplete Evaluation

2020-02-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Thank you, Walter & Paras! 

So, from the MRR equation, I was under the impression the suggestions all 
needed a binary label (0,1) indicating relevance.* But it's great to know that 
you guys use proxies for relevance, such as clicks.

*The reason I think MRR has to have binary relevance labels is this Wikipedia 
article: https://en.wikipedia.org/wiki/Mean_reciprocal_rank, where it states 
below the formula that rank_i = "refers to the rank position of the first 
relevant document for the i-th query." If the suggestions are not labeled as 
relevant (0) or not relevant (1), then how do you compute the rank of the first 
RELEVANT document? 

I'll check out these readings asap, thank you!

And @Paras, the third and fourth evaluation metrics you listed in your first 
reply seem the same to me. What is the difference between the two?

Best,
Audrey

On 2/25/20, 1:11 AM, "Walter Underwood"  wrote:

Here is a blog article with a worked example for MRR based on customer 
clicks.


https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2016_09_12_measuring-2Dsearch-2Drelevance-2Dwith-2Dmrr_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=GzNrf4l_FjMqOkSx2B4_sCIGoJv2QYPbPqWplHGE3PI&e=
 

At my place of work, we compare the CTR and MRR of queries using 
suggestions to those that do not use suggestions. Solr autosuggest based on 
lexicon of book titles is highly effective for us.

wunder
Walter Underwood
wun...@wunderwood.org

https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=L4yZqRG0pWGPpZ8U7S-feoiWSTrz_zBEq0FANYqncuE&e=
   (my blog)

> On Feb 24, 2020, at 9:52 PM, Paras Lehana  
wrote:
> 
> Hey Audrey,
> 
> I assume MRR is about the ranking of the intended suggestion. For this, no
> human judgement is required. We track position selection - the position
> (1-10) of the selected suggestion. For example, this is our recent 
numbers:
> 
> Position 1 Selected (B3) 107,699
> Position 2 Selected (B4) 58,736
> Position 3 Selected (B5) 23,507
> Position 4 Selected (B6) 12,250
> Position 5 Selected (B7) 7,980
> Position 6 Selected (B8) 5,653
> Position 7 Selected (B9) 4,193
> Position 8 Selected (B10) 3,511
> Position 9 Selected (B11) 2,997
> Position 10 Selected (B12) 2,428
> *Total Selections (B13)* *228,954*
> MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13 = 66.45%
> 
> Refer here for MRR calculation keeping Auto-Suggest in perspective:
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40dtunkelang_evaluating-2Dsearch-2Dmeasuring-2Dsearcher-2Dbehavior-2D5f8347619eb0&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=WFv9xHoFHlnQmBgqIoHPi3moIiyttgAZJzWRxFLjyfk&e=
 
> 
> "In practice, this is inverted to obtain the reciprocal rank, e.g., if the
> searcher clicks on the 4th result, the reciprocal rank is 0.25. The 
average
> of these reciprocal ranks is called the mean reciprocal rank (MRR)."
> 
> nDCG may require human intervention. Please let me know in case I have not
> understood your question properly. :)
> 
> 
> 
> On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld - 
audrey.lorberf...@ibm.com
>  wrote:
> 
>> Hi Paras,
>> 
>> This is SO helpful, thank you. Quick question about your MRR metric -- do
>> you have binary human judgements for your suggestions? If no, how do you
>> label suggestions successful or not?
>> 
>> Best,
>> Audrey
>> 
>> On 2/24/20, 2:27 AM, "Paras Lehana"  wrote:
>> 
>>Hi Audrey,
>> 
>>I work for Auto-Suggest at IndiaMART. Although we don't use the
>> Suggester
>>component, I think you need evaluation metrics for Auto-Suggest as a
>>business product and not specifically for Solr Suggester which is the
>>backend. We use edismax parser with EdgeNGrams Tokenization.
>> 
>>Every week, as the property owner, I report around 500 metrics. I 
would
>>like to mention a few of those:
>> 
>>   1. MRR (Mean Reciprocal Rate): How high the user selection was
>> among the
>>   returned result. Ranges from 0 to 1, the higher the better.
>>   2. APL (Average Prefix Length): Prefix is the query by user. Lesser
>> the
>>   better. This reports how less an average user has to type for
>> getting the
>>   intended suggestion.
>>   3. Acceptance Rate or Selection: How many of the total searches are
>>   being served from Auto-Suggest. We are around 50%.
>>   4. Selection to Display Rati

Re: Reindex Required for Merge Policy Changes?

2020-02-25 Thread Zimmermann, Thomas
Thanks so much Erick. Sounds like this should be a perfect approach to helping 
resolve our current issue.

On 2/24/20, 6:48 PM, "Erick Erickson"  wrote:

Thomas:
Yes, upgrading to 7.5+ will automagically take advantage of the 
improvements, eventually... No, you don’t have to reindex.

The “eventually” part. As you add, and particularly replace, existing 
documents, TMP will make decisions based on the new policy. If you’ve optimized 
in the past and have a very large segment (I.e. > 5G), it’ll be rewritten when 
the number of deleted docs exceeds the threshold; I don’t remember what the 
exact number is. Point is it’ll recover from having an over-large segment over 
time and _eventually_ the largest segment will be < 5G.

Absent a previous optimize making a large segment, I’d just consider 
optimizing after you’ve upgraded. The TMP revisions respect the max segment 
size, so that should purge all deleted documents from your index without 
creating a too-large one. Thereafter the number of deleted docs should remain < 
about 33%. It only really approaches that percentage when you’re updating lots 
of existing docs.

Finally, expungeDeletes is less expensive than optimize because it doesn’t 
rewrite segments with 10% deleted docs so that’s an alternative to optimizing 
after upgrading.


Best,
Erick

> On Feb 24, 2020, at 5:42 PM, Zimmermann, Thomas 
 wrote:
> 
> Hi Folks –
> 
> Few questions before I tackled an upgrade here. Looking to go from 7.4 to 
7.7.2 to take advantage of the improved Tiered Merge Policy and segment cleanup 
– we are dealing with some high (45%) deleted doc counts in a few cores. Would 
simply upgrading Solr and setting the cores to use Lucene 7.7.2 take advantage 
of these features? Would I need to reindex to get existing segments merged more 
efficiently? Does it depend on the size of my current segments vs the 
configuration of the merge policy or would upgrading simply allow solr to do 
its own thing help mitigate this issue?
> 
> Also – I noticed the 7.5+ defaults to the Autoscaling for replication, 
and 8.0 defaults to legacy. Would I potentially need to make changes to my 
existing configs to ensure they stay on Legacy replication?
> 
> Thanks much!
> TZ
> 
> 
> 




Re: Re: Re: Query Autocomplete Evaluation

2020-02-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
This article http://wwwconference.org/proceedings/www2011/proceedings/p107.pdf 
also indicates that MRR needs binary relevance labels, p. 114: "To this end, we 
selected a random sample of 198 (query, context) pairs from the set of 7,311 
pairs, and manually tagged each of them as related (i.e., the query is related 
to the context; 60% of the pairs) and unrelated (40% of the pairs)."

On 2/25/20, 10:25 AM, "Audrey Lorberfeld - audrey.lorberf...@ibm.com" 
 wrote:

Thank you, Walter & Paras! 

So, from the MRR equation, I was under the impression the suggestions all 
needed a binary label (0,1) indicating relevance.* But it's great to know that 
you guys use proxies for relevance, such as clicks.

*The reason I think MRR has to have binary relevance labels is this 
Wikipedia article: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Mean-5Freciprocal-5Frank&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=1f2LPzuBvibQd8m-8_HuNVYFm0JvCGyPDul6r4ATsLk&s=Sn7KV-BcFDTrmc1PfRVeSpB9Ysh3UrVIQKcB3G5zstw&e=
 , where it states below the formula that rank_i = "refers to the rank position 
of the first relevant document for the i-th query." If the suggestions are not 
labeled as relevant (0) or not relevant (1), then how do you compute the rank 
of the first RELEVANT document? 

I'll check out these readings asap, thank you!

And @Paras, the third and fourth evaluation metrics you listed in your 
first reply seem the same to me. What is the difference between the two?

Best,
Audrey

On 2/25/20, 1:11 AM, "Walter Underwood"  wrote:

Here is a blog article with a worked example for MRR based on customer 
clicks.


https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2016_09_12_measuring-2Dsearch-2Drelevance-2Dwith-2Dmrr_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=GzNrf4l_FjMqOkSx2B4_sCIGoJv2QYPbPqWplHGE3PI&e=
 

At my place of work, we compare the CTR and MRR of queries using 
suggestions to those that do not use suggestions. Solr autosuggest based on 
lexicon of book titles is highly effective for us.

wunder
Walter Underwood
wun...@wunderwood.org

https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=L4yZqRG0pWGPpZ8U7S-feoiWSTrz_zBEq0FANYqncuE&e=
   (my blog)

> On Feb 24, 2020, at 9:52 PM, Paras Lehana 
 wrote:
> 
> Hey Audrey,
> 
> I assume MRR is about the ranking of the intended suggestion. For 
this, no
> human judgement is required. We track position selection - the 
position
> (1-10) of the selected suggestion. For example, this is our recent 
numbers:
> 
> Position 1 Selected (B3) 107,699
> Position 2 Selected (B4) 58,736
> Position 3 Selected (B5) 23,507
> Position 4 Selected (B6) 12,250
> Position 5 Selected (B7) 7,980
> Position 6 Selected (B8) 5,653
> Position 7 Selected (B9) 4,193
> Position 8 Selected (B10) 3,511
> Position 9 Selected (B11) 2,997
> Position 10 Selected (B12) 2,428
> *Total Selections (B13)* *228,954*
> MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13 = 
66.45%
> 
> Refer here for MRR calculation keeping Auto-Suggest in perspective:
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40dtunkelang_evaluating-2Dsearch-2Dmeasuring-2Dsearcher-2Dbehavior-2D5f8347619eb0&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=WFv9xHoFHlnQmBgqIoHPi3moIiyttgAZJzWRxFLjyfk&e=
 
> 
> "In practice, this is inverted to obtain the reciprocal rank, e.g., 
if the
> searcher clicks on the 4th result, the reciprocal rank is 0.25. The 
average
> of these reciprocal ranks is called the mean reciprocal rank (MRR)."
> 
> nDCG may require human intervention. Please let me know in case I 
have not
> understood your question properly. :)
> 
> 
> 
> On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld - 
audrey.lorberf...@ibm.com
>  wrote:
> 
>> Hi Paras,
>> 
>> This is SO helpful, thank you. Quick question about your MRR metric 
-- do
>> you have binary human judgements for your suggestions? If no, how do 
you
>> label suggestions successful or not?
>> 
>> Best,
>> Audrey
>> 
>> On 2/24/20, 2:27 AM, "Paras Lehana"  
wrote:
>> 
>>Hi Audrey,
>> 
>>I work for Auto-S

Need Help in Apache SOLR scores logic

2020-02-25 Thread Karthik Reddy
Hello Team,

How are you? This is Karthik Reddy and I am working as a Software
Developer. I have one question regarding SOLR scores. One of the projects,
which I am working on we are using Lucene Apache SOLR.
We were using SOLR 5.4.1 initially and then migrated to SOLR 8.4.1. After
migration, I do see the score which is returned by SOLR is got changed in
8.2.0. I would like to use the same score logic as SOLR 5.4.1. Could you
please help what configuration should I change in SOLR 8.4.1 to get the
same scores as version 5.4.1. Thanks in advance.



Regards
Karthik Reddy


Optimize sole 8.4.1

2020-02-25 Thread Massimiliano Randazzo
Good morning,

recently I went from version 6.4 to version 8.4.1, I access solerre through
java applications written by me to which I have updated the
solr-solrj-8.4.1.jar libraries.

I am performing the OCR indexing of a newspaper of about 550,000 pages in
production for which I have calculated at least 1,000,000,000 words and I
am experiencing slowness I wanted to know if you could advise me on changes
to the configuration.

The server I'm using is a server with 12 cores and 64GB of Ram, the only
changes I made in the configuration are:
Solr.in.sh  file
SOLR_HEAP="20480m"
SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
  -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
The Java version I use is
java version "1.8.0_51"
Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)

Also comparing the solr web interface I noticed a difference in the
"Overview" page in solr 6.4 it was affected Optimized and Current and
allowed me to launch Optimized if necessary, in version 8.41 Optimized is
no longer present I hypothesized that this activity is done with the commit
or through some operation in the backgroup, if this were so, is it still
necessary to run the Optimize command from my application when I have
finished indexing? I noticed that the Optimized function requires
considerable time and resources especially in large databases

Thank you for your attention--
Inviato da Gmail Mobile


How to check for uncommitted changes

2020-02-25 Thread Connor Howington
Is there a request I can make to Solr from a client to tell me whether a
core has any uncommitted changes?

Thanks,
Connor

*--*

*Connor Howington*
*Associate Research Programmer*
Center for Research Computing (CRC)
University of Notre Dame
crc.nd.edu

832M Flanner Hall
Notre Dame, IN 46556

[image: Academicmark]


Why does Solr sort on _docid_ with rows=0 ?

2020-02-25 Thread S G
Hi,

I see a lot of such queries in my Solr 7.6.0 logs:


*path=/select
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2}
hits=287128180 status=0 QTime=7173*
On some searching, this is the code seems to fire the above:
https://github.com/apache/lucene-solr/blob/f80e8e11672d31c6e12069d2bd12a28b92e5a336/solr/solrj/src/java/org/apache/solr/client/solrj/impl/LBSolrClient.java#L89-L101

Can someone explain why Solr is doing this?
Note that "hits" is a very large value and is something which could be
impacting performance?

If you want to check a zombie server, shouldn't there be a much less
expensive way to do a health-check instead?

Thanks
SG


How to use existing SolrClient with Streaming

2020-02-25 Thread sambasivarao giddaluri
Hi All ,

I have created a SolrClient bean and checking how to use it with SolrStream.

@Configuration(proxyBeanMethods = *false*)
SolrConfiguration Class

 @Bean

*public* SolrClient solrClient() {

   String solrBaseUrl="http://***";;

*return* *new* Http2SolrClient.Builder(solrBaseUrl).build();



}


Another Streaming  Class


ex:

 *public* List> streamQuery(String expr) {

List> tuples = *null*;

ModifiableSolrParams params = *new* ModifiableSolrParams();

params.set("expr", expr);

params.set("qt", "/stream");

TupleStream tupleStream = *new* SolrStream("http://***";, params)

StreamContext context = *new* StreamContext();

tupleStream.setStreamContext(context);

tuples = getTuples(tupleStream);

}


this works but is there any other way to use the existing SolrClient. I
don't have zookeeper setup as of now


Regards

sambasiva


Re: How to use existing SolrClient with Streaming

2020-02-25 Thread sambasivarao giddaluri
during SolrStream initialization  i had to pass the URL again rather would
like to see if i can get it by any other way .

On Tue, Feb 25, 2020 at 5:05 PM sambasivarao giddaluri <
sambasiva.giddal...@gmail.com> wrote:

> Hi All ,
>
> I have created a SolrClient bean and checking how to use it with
> SolrStream.
>
> @Configuration(proxyBeanMethods = *false*)
> SolrConfiguration Class
>
>  @Bean
>
> *public* SolrClient solrClient() {
>
>String solrBaseUrl="http://***";;
>
> *return* *new* Http2SolrClient.Builder(solrBaseUrl).build();
>
>
>
> }
>
>
> Another Streaming  Class
>
>
> ex:
>
>  *public* List> streamQuery(String expr) {
>
> List> tuples = *null*;
>
> ModifiableSolrParams params = *new* ModifiableSolrParams();
>
> params.set("expr", expr);
>
> params.set("qt", "/stream");
>
> TupleStream tupleStream = *new* SolrStream("http://***";, params)
>
> StreamContext context = *new* StreamContext();
>
> tupleStream.setStreamContext(context);
>
> tuples = getTuples(tupleStream);
>
> }
>
>
> this works but is there any other way to use the existing SolrClient. I
> don't have zookeeper setup as of now
>
>
> Regards
>
> sambasiva
>
>
>
>


Rule of thumb for determining maxTime of AutoCommit

2020-02-25 Thread Kayak28
Hello, Solr Community:

Another day, I had an error "exceeded limit of maxWarmingSearchers=2."
I know this error causes when multiple commits(which opens a new searcher)
are requested too frequently.

As far as I read Solr wiki, it recommends for me to have more interval
between each commit, and make commit frequency less.
Using autoCommit,  I would like to decrease the commit frequency, but I am
not sure how much I should increase the value of maxTime in autoCommit?

My current configuration is the following:


  ${solr.autoCommit.maxTime:15000}
  false




How do you determine how much you increase the value in this case?
Is there any rule of thumb advice to configure commit frequency?

Any help will be appreciated.

Sincerely,
Kaya Ota


RE: How to use existing SolrClient with Streaming

2020-02-25 Thread Gael Jourdan-Weil
Hello,

If I understand well, you want to share a SolrClient for streaming code and 
another piece of non streaming code?

I think you can have a look to the SolrClientCache class.
Instantiate a SolrClientCache once (as a Bean in your case I guess).

Then you can use it for both:

  *   getting a usual SolrClient instance (or SolrCloud by the way if someday 
you want to)
  *   passing it to the StreamContext with 
streamContext.setSolrClientCache(solrClientCache)

Gaël Jourdan-Weil

De : sambasivarao giddaluri 
Envoyé : mercredi 26 février 2020 02:07
À : solr-user@lucene.apache.org 
Objet : Re: How to use existing SolrClient with Streaming

during SolrStream initialization  i had to pass the URL again rather would
like to see if i can get it by any other way .

On Tue, Feb 25, 2020 at 5:05 PM sambasivarao giddaluri <
sambasiva.giddal...@gmail.com> wrote:

> Hi All ,
>
> I have created a SolrClient bean and checking how to use it with
> SolrStream.
>
> @Configuration(proxyBeanMethods = *false*)
> SolrConfiguration Class
>
>  @Bean
>
> *public* SolrClient solrClient() {
>
>String solrBaseUrl="http://***";;
>
> *return* *new* Http2SolrClient.Builder(solrBaseUrl).build();
>
>
>
> }
>
>
> Another Streaming  Class
>
>
> ex:
>
>  *public* List> streamQuery(String expr) {
>
> List> tuples = *null*;
>
> ModifiableSolrParams params = *new* ModifiableSolrParams();
>
> params.set("expr", expr);
>
> params.set("qt", "/stream");
>
> TupleStream tupleStream = *new* SolrStream("http://***";, params)
>
> StreamContext context = *new* StreamContext();
>
> tupleStream.setStreamContext(context);
>
> tuples = getTuples(tupleStream);
>
> }
>
>
> this works but is there any other way to use the existing SolrClient. I
> don't have zookeeper setup as of now
>
>
> Regards
>
> sambasiva
>
>
>
>


Re: Need Help in Apache SOLR scores logic

2020-02-25 Thread Jon Kjær Amundsen
Relevance scoring has indeed changed since Solr 6 from the tf/idf vector
model to Okapi BM25.
You will need to set the similarity to ClassicSimilarityFactory in the
schema.

Consult the reference guide[1] how to do it.

[1]
https://lucene.apache.org/solr/guide/8_4/other-schema-elements.html#similarity

Venlig hilsen/Best regards

*Jon Kjær Amundsen*
Developer


Phone: +45 7023 9080
E-mail: j...@udbudsvagten.dk
Web: www.udbudsvagten.dk
Parken - Tårn D - 8. Sal
Øster Allé 48 | DK - 2100 København



Intelligent Offentlig Samhandel
*Før, under og efter udbud*

*Følg UdbudsVagten og markedet her Linkedin
 *


Den tir. 25. feb. 2020 kl. 18.24 skrev Karthik Reddy :

> Hello Team,
>
> How are you? This is Karthik Reddy and I am working as a Software
> Developer. I have one question regarding SOLR scores. One of the projects,
> which I am working on we are using Lucene Apache SOLR.
> We were using SOLR 5.4.1 initially and then migrated to SOLR 8.4.1. After
> migration, I do see the score which is returned by SOLR is got changed in
> 8.2.0. I would like to use the same score logic as SOLR 5.4.1. Could you
> please help what configuration should I change in SOLR 8.4.1 to get the
> same scores as version 5.4.1. Thanks in advance.
>
>
>
> Regards
> Karthik Reddy
>