: pos token offset
: 1 3 0-1
: 2 diphenyl 2-10
: 3 propanoic 11-20
: 3 diphenylpropanoic 2-20

: Say someone enters the query string 3-diphenylpropanoic
: 
: The query parser I'm using transforms this into a phrase query and the
: indexed form is missed because based the positions of the terms '3'
: and 'diphenylpropanoic' indicate they are not adjacent?
: 
: Is this intended behavior? I expect that the catenated word
: 'diphenylpropanoic' should have a position of 2 based on the position
: of the first term in the concatenation, but perhaps I'm missing

I believe this is correct, but i'm not certain for hte reason - i think 
it's just an implementation detail.  Consider the ooposite scenerio: if 
your indexed text was diphenyl-propanoic-3 and things worked the way 
you are suggesting they should, the term diphenylpropanoic 
would up at position 1 (with diphenyl) and "diphenylpropanoic-3" would not 
match because then the terms wouldn't be adjacent.

damned if you do, damned if you don't

typically for fields whwere you are using WDF with the "concat" options 
you would usually use a bit of slop on the generated phrase queries to 
allow for the loosenes of the position information.  (in an ideal world, 
the token strem wouldn't have monotomic integer positions, it would be 
a DAG, and then these things would be easily represented, but that's 
pretty non-trivial to do with the internals.


-Hoss

Reply via email to