> Looks to me like MultiPhraseQuery is getting in the way. Shingles > that begin at the same word are given the same position by > ShingleFilter, and Solr's FieldQParserPlugin creates a > MultiPhraseQuery when it encounters tokens in a query with the same > position. I think what you want is to convert queries into shingle > disjunctions (*any* matching shingle results in a hit), right?
Yes you're right Steve. thank you. One way, i see now, to get the behaviour i want is to set the unigrams' positionIncrement to zero instead of one. For example in ShingleFilter.fillOutputBuffer(..) replacing the two ocurrances of > .setPositionIncrement(1); with > .setPositionIncrement(0); Then i end up with a MultiPhraseQuery with termArrays[0] = { list_entry_shingles:abcd list_entry_shingles:abcd efgh list_entry_shingles:abcd efgh ijkl list_entry_shingles:efgh list_entry_shingles:efgh ijkl list_entry_shingles:ijkl } and it works perfectly :-) I see no way of configuring this behaviour though. If it is possible and someone can say how this would be a real godsend. Otherwise would a patch to ShingleFilter that offers an option "unigramPositionIncrement" (that defaults to 1) likely be accepted into trunk? ~mck -- "Between two evils, I always pick the one I never tried before." Mae West | semb.wever.org | sesat.no | sesam.no |
signature.asc
Description: This is a digitally signed message part