On 2/15/07, nick19701 <[EMAIL PROTECTED]> wrote:


Mike Klaas wrote:
>
> Corresponds to:
>                                         startOffset =
> tokenGroup.matchStartOffset;
>                                         endOffset =
> tokenGroup.matchEndOffset;
>                                         tokenText =
> text.substring(startOffset, endOffset);
>
> where the offsets are token offsets from analysis, and should not be
> -52.  Are you using term vectors?  Is the field multi-valued?  Also,
> what version of Solr are you using?
>
> Could you c&p the output of verbose analysis of this text in the solr
> admin?
>
> thanks,
> -Mike
>
>

As far as I know, I'm not using term vectors and this field is
single-valued.
Solr version is 1.1.0 dated on 12/17/2006.

Below is the verbose analysis:

Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory   {}


term position
1       2       3       4       5       6       7       8       9       10      
11      12      13


term text
Best    buy     -       Acer    Aspire  AS5610-2273     -       $599.   Windows 
vista,  1       GB      RAM


term type
word    word    word    word    word    word    word    word    word    word    
word    word    word


source start,end
0,4     5,8     9,10    11,15   16,22   23,34   35,36   37,42   43,50   51,57   
58,59   60,62   63,66


org.apache.solr.analysis.SynonymFilterFactory   {expand=true,
ignoreCase=true, synonyms=index_synonyms.txt}


term position
1       2       3       4       5       6       7       8       9       10      
11      12      13


term text
bestbuy buy     -       Acer    Aspire  AS5610-2273     -       $599.   Windows 
vista,  1       GB      RAM


bb      gib

best    gigabyte

gigabytes

term type
word    word    word    word    word    word    word    word    word    word    
word    word    word


word    word

word    word

word

source start,end
0,8     0,8     9,10    11,15   16,22   23,34   35,36   37,42   43,50   51,57   
58,59   60,8    63,66


0,8     60,8

0,8     60,8

60,8

org.apache.solr.analysis.StopFilterFactory   {words=stopwords.txt,
ignoreCase=true}


term position

1       2       3       4       5       6       7       8       9       10      
11      12      13

term text

bestbuy buy     -       Acer    Aspire  AS5610-2273     -       $599.   Windows 
vista,  1       GB      RAM

bb      gib


best    gigabyte

gigabytes

term type
word    word    word    word    word    word    word    word    word    word    
word    word    word


word    word

word    word

word

source start,end
0,8     0,8     9,10    11,15   16,22   23,34   35,36   37,42   43,50   51,57   
58,59   60,8    63,66


0,8     60,8

0,8     60,8

60,8

org.apache.solr.analysis.WordDelimiterFilterFactory   {catenateWords=1,
catenateNumbers=1, catenateAll=0, generateNumberParts=1,
generateWordParts=1}


term position

1       2       3       4       5       6       7       8       9       10      
11      12      13

term text

bestbuy buy     Acer    Aspire  AS      5610    2273    599     Windows vista   
1       GB      RAM

bb      56102273        gib


best    gigabyte

gigabytes

term type
word    word    word    word    word    word    word    word    word    word    
word    word    word


word    word    word

word    word

word

source start,end
0,8     0,8     11,15   16,22   23,25   25,29   30,34   38,41   43,50   51,56   
58,59   60,8    63,66


0,8     25,34   60,8

0,8     60,8

60,8

org.apache.solr.analysis.LowerCaseFilterFactory   {}



term position
1       2       3       4       5       6       7       8       9       10      
11      12      13


term text
bestbuy buy     acer    aspire  as      5610    2273    599     windows vista   
1       gb      ram


bb      56102273        gib

best    gigabyte

gigabytes

term type
word    word    word    word    word    word    word    word    word    word    
word    word    word


word    word    word

word    word

word

source start,end
0,8     0,8     11,15   16,22   23,25   25,29   30,34   38,41   43,50   51,56   
58,59   60,8    63,66


0,8     25,34   60,8

0,8     60,8

60,8

org.apache.solr.analysis.EnglishPorterFilterFactory
{protected=protwords.txt}



term position
1       2       3       4       5       6       7       8       9       10      
11      12      13


term text
bestbuy buy     acer    aspir   as      5610    2273    599     window  vista   
1       gb      ram


bb      56102273        gib

best    gigabyt

gigabyt

term type
word    word    word    word    word    word    word    word    word    word    
word    word    word


word    word    word

word    word

word

source start,end
0,8     0,8     11,15   16,22   23,25   25,29   30,34   38,41   43,50   51,56   
58,59   60,8    63,66


0,8     25,34   60,8

0,8     60,8

60,8

That 60, 8 produced by the synonym filter is surely signs of a bug
(and what is producing the -52).  What is your list of synonyms?

-Mike

Reply via email to