Re: Recommended Java Distribution
Hi Kaya, We have been using Amazon Corretto for Solr for the past 6 months without issue. We did not notice any difference from running on Open JDK prior to that. Cheers Eric On 2020-03-19, 6:04 AM, "Jan Høydahl" wrote: Our official statement is here https://lucene.apache.org/solr/guide/8_4/solr-system-requirements.html#sources-for-java I have no experience with Corretto in production, but Amazon uses it heavily for all their Java workloads in the cloud. I believe it is based on OpenJDK but with Amazon’s own patches. I would not hesitate to make such a decision. But perhaps people with first-hand experience can share what they found? Jan > 19. mar. 2020 kl. 11:13 skrev Kayak28 : > > Hello, Solr Community: > > My customer would like to use Amazon Corretto JDK instead of OpenJDK. > > I wonder if it is ok to say, "yes, you can use" or I should not recommend > it at all. > > Is anyone in the Community using Amazon Corretto for your Solr? > > Have you ever had any problems with that? > > If you share any experience, I would be really appreciated. > > > -- > > Sincerely, > Kaya > github: https://github.com/28kayak
FlattenGraphFilter Eliminates Tokens - Can't match "Can't"
Hi all, I have been trying to solve an issue where FlattenGraphFilter (FGF) removes tokens produced by WordDelimiterGraphFilter (WDGF) - consequently searches that contain the contraction "can't" do not match. This is on Solr version 7.7.1. The field in question is defined as follows: And the relevant fieldType "text_general": Finally, the relevant entries in synonyms.txt are: can,cans cants,cant Using the Solr console Analysis and "can't" as the Field Value, the following tokens are produced (find the verbose output at the bottom of this email): Index ST| can't SF| can't WDGF | cant | can't | can | t FGF | cant | can't | can | t SGF | cants | cant | can't | | cans | can | t ICUFF | cants | cant | can't | | cans | can | t FGF | cants | cant | can't | | t Query ST| can't SF| can't WDGF | can | t SF| can | t ICUFF | can | t As you can see after the FGF the tokens "can" and "cans" are pruned so the query does not match. Is there a reasonable way to preserve these tokens? My key concern is that I want the "fix" for this to have as little impact on other queries as possible. Some things I have checked/tried: Searching for similar problems I found this thread: https://lucene.472066.n3.nabble.com/Questions-for-SynonymGraphFilter-and-WordDelimiterGraphFilter-td4420154.html Here it is suggested that FGF is not necessary (without any supporting evidence). This goes directly against the documentation that states "If you use [the SynonymGraphFilter] during indexing, you must follow it with a Flatten Graph Filter": https://lucene.apache.org/solr/guide/7_0/filter-descriptions.html Despite this warning I tried out removing the FGF on a local cluster and indeed it still runs and this search now works, however I am paranoid that this will break far more things than it fixes. I have tried adding the FGF as a filter to the query. This does not eliminate the "can" term in the query analysis. I have tested other contracted words. Some have this issue as well - others do not. "haven't", "shouldn't", "couldn't", "I'll", "weren't", "ain't" all preserve their tokens "won't" does not. I believe the pattern here is that whenever part of the contraction has synonyms this problem manifests. Eliminating WDGF is not viable as we rely on this functionality for other uses of delimiters (such as wi-fi -> wi fi). Performing WDGF after synonyms is also not viable as in the case that we have the data "historical-text" we want this to match the search "history text". The hacky solution I have found is to use the PatternReplaceFilterFactory to replace "can't" with "cant". Though this technically solves the issue, I hope it is obvious why this does not feel like an ideal solution. Has anyone encountered this type of issue before? Any advice on how the filter use here could be improved to handle this case? Thanks, Eric Buss PS. The verbose output from Analysis of "can't" Index ST| text | can't| | raw_bytes | [63 61 6e 27 74] | | start | 0| | end | 5| | positionLength| 1| | type || | termFrequency | 1| | position | 1| SF| text | can't| | raw_bytes | [63 61 6e 27 74] | | start | 0| | end | 5| | positionLength| 1| | type || | termFrequency | 1| | position | 1| WDGF | text | cant | can't| can| t | | raw_bytes | [63 61 6e 74] | [63 61 6e 27 74] | [63 61 6e] | [74] | | start | 0 | 0| 0 | 4 | | end | 5 | 5| 3 | 5 | | positionLength| 2 | 2| 1 | 1 | | type | || | | | termFrequency | 1 | 1| 1 | 1 | | position | 1 | 1| 1 | 2 | | keyword | false | false| false | false | FGF | text | cant | can't| can| t | | raw_bytes | [63 61 6e 74]
Re: FlattenGraphFilter Eliminates Tokens - Can't match "Can't"
Thanks for the reply, I wouldn't be surprised if the issue you linked is related, I also found another similar issue: https://issues.apache.org/jira/browse/LUCENE-8723 You are absolutely right that the FlattenGraphFilter should only be used once, but as you noted the issue I am experiencing seems unrelated. On 2019-12-05, 10:23 AM, "Michael Gibney" wrote: I wonder if this might be similar/related to the underlying problem that is intended to be addressed by https://issues.apache.org/jira/browse/LUCENE-8985? btw, I think you only want to use FlattenGraphFilter *once* in the indexing analysis chain, towards the end (after all components that emit graphs). ...though that's probably *not* what's causing the problem (based on the fact that the extra FGF doesn't seem to modify any attributes). On Mon, Nov 25, 2019 at 2:19 PM Eric Buss wrote: > > Hi all, > > I have been trying to solve an issue where FlattenGraphFilter (FGF) removes > tokens produced by WordDelimiterGraphFilter (WDGF) - consequently searches that > contain the contraction "can't" do not match. > > This is on Solr version 7.7.1. > > The field in question is defined as follows: > > > > And the relevant fieldType "text_general": > > > > > > > > > > > > > > > > > > > > > Finally, the relevant entries in synonyms.txt are: > > can,cans > cants,cant > > Using the Solr console Analysis and "can't" as the Field Value, the following > tokens are produced (find the verbose output at the bottom of this email): > > Index > ST| can't > SF| can't > WDGF | cant | can't | can | t > FGF | cant | can't | can | t > SGF | cants | cant | can't | | cans | can | t > ICUFF | cants | cant | can't | | cans | can | t > FGF | cants | cant | can't | | t > > Query > ST| can't > SF| can't > WDGF | can | t > SF| can | t > ICUFF | can | t > > As you can see after the FGF the tokens "can" and "cans" are pruned so the query > does not match. Is there a reasonable way to preserve these tokens? > > My key concern is that I want the "fix" for this to have as little impact on > other queries as possible. > > Some things I have checked/tried: > > Searching for similar problems I found this thread: > https://lucene.472066.n3.nabble.com/Questions-for-SynonymGraphFilter-and-WordDelimiterGraphFilter-td4420154.html > Here it is suggested that FGF is not necessary (without any supporting > evidence). This goes directly against the documentation that states "If you use > [the SynonymGraphFilter] during indexing, you must follow it with a Flatten > Graph Filter": > https://lucene.apache.org/solr/guide/7_0/filter-descriptions.html > Despite this warning I tried out removing the FGF on a local > cluster and indeed it still runs and this search now works, however I am > paranoid that this will break far more things than it fixes. > > I have tried adding the FGF as a filter to the query. This does not eliminate > the "can" term in the query analysis. > > I have tested other contracted words. Some have this issue as well - others do > not. "haven't", "shouldn't", "couldn't", "I'll", "weren't", "ain't" all > preserve their tokens "won't" does not. I believe the pattern here is that > whenever part of the contraction has synonyms this problem manifests. > > Eliminating WDGF is not viable as we rely on this functionality for other uses > of delimiters (such as wi-fi -> wi fi). > > Performing WDGF after synonyms is also not viable as in the case that we have > the data "historical-text" we want this to match the search "history text". > > The hacky solution I have found is to use the PatternReplaceFilterFactory to > replace "can't" with "cant". Though this technically solves the issue, I hope it > is obvious why this does not feel like an ideal solution. > > Has anyone encount