Mike Klaas wrote:
>
> Corresponds to:
> startOffset =
> tokenGroup.matchStartOffset;
> endOffset =
> tokenGroup.matchEndOffset;
> tokenText =
> text.substring(startOffset, endOffset);
>
> where the offsets are token offsets from analysis, and should not be
> -52. Are you using term vectors? Is the field multi-valued? Also,
> what version of Solr are you using?
>
> Could you c&p the output of verbose analysis of this text in the solr
> admin?
>
> thanks,
> -Mike
>
>
As far as I know, I'm not using term vectors and this field is
single-valued.
Solr version is 1.1.0 dated on 12/17/2006.
Below is the verbose analysis:
Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position
1 2 3 4 5 6 7 8 9 10
11 12 13
term text
Best buy - Acer Aspire AS5610-2273 - $599. Windows
vista, 1 GB RAM
term type
word word word word word word word word word word
word word word
source start,end
0,4 5,8 9,10 11,15 16,22 23,34 35,36 37,42 43,50 51,57
58,59 60,62 63,66
org.apache.solr.analysis.SynonymFilterFactory {expand=true,
ignoreCase=true, synonyms=index_synonyms.txt}
term position
1 2 3 4 5 6 7 8 9 10
11 12 13
term text
bestbuy buy - Acer Aspire AS5610-2273 - $599. Windows
vista, 1 GB RAM
bb gib
best gigabyte
gigabytes
term type
word word word word word word word word word word
word word word
word word
word word
word
source start,end
0,8 0,8 9,10 11,15 16,22 23,34 35,36 37,42 43,50 51,57
58,59 60,8 63,66
0,8 60,8
0,8 60,8
60,8
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}
term position
1 2 3 4 5 6 7 8 9 10
11 12 13
term text
bestbuy buy - Acer Aspire AS5610-2273 - $599. Windows
vista, 1 GB RAM
bb gib
best gigabyte
gigabytes
term type
word word word word word word word word word word
word word word
word word
word word
word
source start,end
0,8 0,8 9,10 11,15 16,22 23,34 35,36 37,42 43,50 51,57
58,59 60,8 63,66
0,8 60,8
0,8 60,8
60,8
org.apache.solr.analysis.WordDelimiterFilterFactory {catenateWords=1,
catenateNumbers=1, catenateAll=0, generateNumberParts=1,
generateWordParts=1}
term position
1 2 3 4 5 6 7 8 9 10
11 12 13
term text
bestbuy buy Acer Aspire AS 5610 2273 599 Windows vista
1 GB RAM
bb 56102273 gib
best gigabyte
gigabytes
term type
word word word word word word word word word word
word word word
word word word
word word
word
source start,end
0,8 0,8 11,15 16,22 23,25 25,29 30,34 38,41 43,50 51,56
58,59 60,8 63,66
0,8 25,34 60,8
0,8 60,8
60,8
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position
1 2 3 4 5 6 7 8 9 10
11 12 13
term text
bestbuy buy acer aspire as 5610 2273 599 windows vista
1 gb ram
bb 56102273 gib
best gigabyte
gigabytes
term type
word word word word word word word word word word
word word word
word word word
word word
word
source start,end
0,8 0,8 11,15 16,22 23,25 25,29 30,34 38,41 43,50 51,56
58,59 60,8 63,66
0,8 25,34 60,8
0,8 60,8
60,8
org.apache.solr.analysis.EnglishPorterFilterFactory
{protected=protwords.txt}
term position
1 2 3 4 5 6 7 8 9 10
11 12 13
term text
bestbuy buy acer aspir as 5610 2273 599 window vista
1 gb ram
bb 56102273 gib
best gigabyt
gigabyt
term type
word word word word word word word word word word
word word word
word word word
word word
word
source start,end
0,8 0,8 11,15 16,22 23,25 25,29 30,34 38,41 43,50 51,56
58,59 60,8 63,66
0,8 25,34 60,8
0,8 60,8
60,8