binshengliu opened a new issue, #14137:
URL: https://github.com/apache/lucene/issues/14137

   ### Description
   
   Hi, I'd like to report an issue using `FixedShingleFilter` with 
`WordDelimiterGraphFilter`. An exception is raised on the following conditions.
   * Tokenizer produces 1 token
   * WordDelimiterGraphFilter produces multiple tokens
   * FixedShingleFliter used
   
   I ran into the issue when using Elasticsearch's 
[search_as_you_type](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html)
 which uses FixedShingleFilter.
   ```
   Exception in thread "main" java.lang.IllegalArgumentException: first 
position increment must be > 0 (got 0) for field 'contents'
           at 
org.apache.lucene.index.IndexingChain$PerField.invertTokenStream(IndexingChain.java:1232)
           at 
org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1196)
           at 
org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:741)
           at 
org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:618)
           at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:274)
           at 
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425)
           at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1552)
           at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1837)
           at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1477)
           at org.apache.lucene.demo.IndexFiles.indexDoc(IndexFiles.java:283)
           at org.apache.lucene.demo.IndexFiles.indexDocs(IndexFiles.java:234)
           at org.apache.lucene.demo.IndexFiles.main(IndexFiles.java:167)
   ```
   
   This is the change to IndexFiles.java that can trigger the exception, tested 
on c20e09e62f4.
   ```
   diff --git a/lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java 
b/lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java
   index dca01f61254..e39f6e440d4 100644
   --- a/lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java
   +++ b/lucene/demo/src/java/org/apache/lucene/demo/IndexFiles.java
   @@ -30,7 +30,11 @@ import java.nio.file.attribute.BasicFileAttributes;
    import java.util.Date;
    import java.util.Objects;
    import org.apache.lucene.analysis.Analyzer;
   -import org.apache.lucene.analysis.standard.StandardAnalyzer;
   +import org.apache.lucene.analysis.core.FlattenGraphFilterFactory;
   +import org.apache.lucene.analysis.custom.CustomAnalyzer;
   +import 
org.apache.lucene.analysis.miscellaneous.WordDelimiterGraphFilterFactory;
   +import org.apache.lucene.analysis.shingle.FixedShingleFilterFactory;
   +import org.apache.lucene.analysis.standard.StandardTokenizerFactory;
    import org.apache.lucene.demo.knn.DemoEmbeddings;
    import org.apache.lucene.demo.knn.KnnVectorDict;
    import org.apache.lucene.document.Document;
   @@ -126,7 +130,12 @@ public class IndexFiles implements AutoCloseable {
          System.out.println("Indexing to directory '" + indexPath + "'...");
    
          Directory dir = FSDirectory.open(Paths.get(indexPath));
   -      Analyzer analyzer = new StandardAnalyzer();
   +      Analyzer analyzer = CustomAnalyzer.builder()
   +              .withTokenizer(StandardTokenizerFactory.NAME)
   +              .addTokenFilter(WordDelimiterGraphFilterFactory.NAME, 
"catenateNumbers", "1")
   +              .addTokenFilter(FlattenGraphFilterFactory.NAME)
   +              .addTokenFilter(FixedShingleFilterFactory.NAME)
   +              .build();
          IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
    
          if (create) {
   ```
   
   Testing data:
   I have a file with the following content and then the file is fed to 
IndexFiles.
   ```
   555,0
   ```
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to