: Of course. What I meant to say was there is
: always exactly one token in a non-tokenized
: field and it's offset is always exactly 0. There
: will never be tokens at position 1.
: 
: So asking to match phrases, which is based on
: term positions is basically a no-op.

That's not always true.

consider a situation where you have a multivalued "author_exact" field 
containing the authors full name as a literal string -- either using 
StrField or TextField w/keywordTokenizer; and it's copyFielded from an 
"author" field which is similar but tokenized.

So if a document contains the following two values in the author field...
        "David Smiley"
        "Eric Pugh"

then that document should be matched by all three of these queries...

defType=edismax&q=David&qf=author&pf=author_exact
defType=edismax&q=David+Pugh&qf=author&pf=author_exact
defType=edismax&q=David+Smiley&qf=author&pf=author_exact

...but it should score *really* high for that last query because it not 
only matches on the author field, but it also gets an exact match on the 
entire query string as an implicit phrase in the authr_exact field.

Dismax does behave this way, as you can see using the 3.5 example configs 
& data (note that "cat" is a StrField)...

http://localhost:8983/solr/select/?debugQuery=true&defType=dismax&qf=name^5+features^3&pf=features^2+cat^4&q=hard+drive
<str name="parsedquery">
  +((DisjunctionMaxQuery((features:hard^3.0 | name:hard^5.0)) 
     DisjunctionMaxQuery((features:drive^3.0 | name:drive^5.0))
    )~2) 
   DisjunctionMaxQuery((features:"hard drive"^2.0 | cat:hard drive^4.0))


But for some reason EDismax doesn't behave similarly...

http://localhost:8983/solr/select/?debugQuery=true&defType=edismax&qf=name^5+features^3&pf=features^2+cat^4&q=hard+drive
<str name="parsedquery">
  +((DisjunctionMaxQuery((features:hard^3.0 | name:hard^5.0)) 
     DisjunctionMaxQuery((features:drive^3.0 | name:drive^5.0))
    )~2) 
   DisjunctionMaxQuery((features:"hard drive"^2.0))

...that definitely seems like a bug to me.  but it's not entirely clear 
why it's happening (the pf related code in edismax is kind of hairy)

https://issues.apache.org/jira/browse/SOLR-2988

-Hoss

Reply via email to