YeonghyeonKO opened a new issue, #13802:
URL: https://github.com/apache/lucene/issues/13802

   ### Description
   
   From org.apache.lucene:lucene-analysis-common:9.11.1, the static variable 
`DEFAULT_MAX_GRAM_SIZE` of EdgeNGramTokenizer is ONE not TWO. 
   
   Logically, the maximum n-gram size must be >= minGramSize but since many 
libraries(**git code**: 
[Elasticsearch](https://github.com/elastic/elasticsearch/blob/main/modules/analysis-common/src/main/java/org/elasticsearch/analysis/common/CommonAnalysisPlugin.java#L511),
 
[OpenSearch](https://github.com/opensearch-project/OpenSearch/blob/main/modules/analysis-common/src/main/java/org/opensearch/analysis/common/EdgeNGramTokenizerFactory.java#L54))
 use `NGramTokenizer.DEFAULT_MAX_NGRAM_SIZE` not `EdgeNGramTokenizer`'s.
   Will there be any dependency problem in Solo project as a result of my 
suggestion?
   
   See the below codes:
   
   ```java
   public class EdgeNGramTokenizer extends NGramTokenizer {
       public static final int DEFAULT_MAX_GRAM_SIZE = 1;    /* How about 
changing '1' to '2'? */
       public static final int DEFAULT_MIN_GRAM_SIZE = 1;
   
       public EdgeNGramTokenizer(int minGram, int maxGram) {
           super(minGram, maxGram, true);
       }
   
       public EdgeNGramTokenizer(AttributeFactory factory, int minGram, int 
maxGram) {
           super(factory, minGram, maxGram, true);
       }
   }
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to