On 7/5/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
So fixing the first token at the end of next() and also at the other exit point (line 276) is probably the easiest fix.
Something like this I suppose: Index: src/java/org/apache/solr/analysis/WordDelimiterFilter.java =================================================================== --- src/java/org/apache/solr/analysis/WordDelimiterFilter.java (revision 417024) +++ src/java/org/apache/solr/analysis/WordDelimiterFilter.java (working copy) @@ -170,6 +170,7 @@ // Would it actually be faster to check for the common form // of isLetter() isLower()*, and then backtrack if it doesn't match? + int origPosOffset; while(true) { Token t = input.next(); if (t == null) return null; @@ -180,6 +181,8 @@ int end=s.length(); if (end==0) continue; + origPosOffset = t.getPositionIncrement(); + // Avoid calling charType more than once for each char (basically // avoid any backtracking). // makes code slightly more difficult, but faster. @@ -273,6 +276,7 @@ // optimization... if this is the only token, // return it immediately. if (queue.size()==0) { + newtok.setPositionIncrement(origPosOffset); return newtok; } @@ -376,7 +380,9 @@ // System.out.println("##########AFTER COMBINATIONS:"+ str(queue)); queuePos=1; - return queue.get(0); + Token tok = queue.get(0); + tok.setPositionIncrement(origPosOffset); + return tok; } -Yonik