Peter Keane wrote:
I've used Luke to figure out what is going on, and I see in the fields that
fail to match, a "null_1".  Could someone tell me what that is?  I see some
null_100s there as well, which see to separate field values.  Clearly the
null_1s are causing the search to fail.

You used the "Reconstruct" function to obtain the field values for unstored fields, right? null_NNN is Luke's way of telling you that the tokens that should be on these positions are absent, because they were removed by analyzer during indexing, and there is no stored value of this field from which you could recover the original text. In other words, they are holes in the token stream, of length NNN.

Such holes may be also produced by artificially increasing the token positions, hence the null_100 that serves to separate multiple field values so that e.g. phrase queries don't match unrelated text.

Phrase queries that you can construct using QueryParser can't match two tokens separated by a hole, unless you set a slop value > 0.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to