Re: WordDelimiterFilter looses position increments of tokens

Yonik Seeley Wed, 05 Jul 2006 07:28:11 -0700

On 7/5/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

So fixing the first token at the end of next() and also at the other
exit point (line 276) is probably the easiest fix.


Something like this I suppose:

Index: src/java/org/apache/solr/analysis/WordDelimiterFilter.java
===================================================================
--- src/java/org/apache/solr/analysis/WordDelimiterFilter.java  (revision 
417024)
+++ src/java/org/apache/solr/analysis/WordDelimiterFilter.java  (working copy)
@@ -170,6 +170,7 @@
    // Would it actually be faster to check for the common form
    // of isLetter() isLower()*, and then backtrack if it doesn't match?

+    int origPosOffset;
    while(true) {
      Token t = input.next();
      if (t == null) return null;
@@ -180,6 +181,8 @@
      int end=s.length();
      if (end==0) continue;

+      origPosOffset = t.getPositionIncrement();
+
      // Avoid calling charType more than once for each char (basically
      // avoid any backtracking).
      // makes code slightly more difficult, but faster.
@@ -273,6 +276,7 @@
            // optimization... if this is the only token,
            // return it immediately.
            if (queue.size()==0) {
+              newtok.setPositionIncrement(origPosOffset);
              return newtok;
            }

@@ -376,7 +380,9 @@
    // System.out.println("##########AFTER COMBINATIONS:"+ str(queue));

    queuePos=1;
-    return queue.get(0);
+    Token tok = queue.get(0);
+    tok.setPositionIncrement(origPosOffset);
+    return tok;
  }





-Yonik

Re: WordDelimiterFilter looses position increments of tokens

Reply via email to