Peter Keane wrote:
I've used Luke to figure out what is going on, and I see in the fields that fail to match, a "null_1". Could someone tell me what that is? I see some null_100s there as well, which see to separate field values. Clearly the null_1s are causing the search to fail.
You used the "Reconstruct" function to obtain the field values for unstored fields, right? null_NNN is Luke's way of telling you that the tokens that should be on these positions are absent, because they were removed by analyzer during indexing, and there is no stored value of this field from which you could recover the original text. In other words, they are holes in the token stream, of length NNN.
Such holes may be also produced by artificially increasing the token positions, hence the null_100 that serves to separate multiple field values so that e.g. phrase queries don't match unrelated text.
Phrase queries that you can construct using QueryParser can't match two tokens separated by a hole, unless you set a slop value > 0.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com