On Nov 24, 2008, at 8:52 AM, Erik Hatcher wrote:


On Nov 24, 2008, at 8:37 AM, David Santamauro wrote:
i need to search something as
myText:billion AND guarantee

i need to be extracted only the record where the words exists in the same value (in this case only the first record) because in the 2nd record the two words are in different values

is it possible?

It's not possible with a purely boolean query like this, but it is possible with a sloppy phrase query where the position increment gap (see example schema.xml) is greater than the slop factor.

        Erik



I think what is needed here is the concept of SAME, i.e., myText:billion SAME guarantee. I know a few full-text engines that can handle this operator one way or another. And without it, I don't quick understand the usefulness of multiValue fields.

Yeah, multi-valued fields are a bit awkward to grasp fully in Lucene. Especially in this context where it's a full-text field. Basically as far as indexing goes, there's no such thing as a "multi- valued" field. An indexed field gets split into terms, and terms have positional information attached to them (thus a position increment gap can be used to but a big virtual gap between the last term of one field instance and the first term of the next one). A multi-valued field gets stored (if it is set to be stored, that is) as separate strings, and is retrievable as the separate values.

Multi-valued fields are handy for facets where, say, a product can have multiple categories associated with it. In this case it's a bit clearer. It's the full-text multi-valued fields that seem a bit strange.

        Erik



OK, it seems it is the multi-dimensional aspect that is missing

field[0]: A B C D
field[1]:   B   D

...and the concept of field array would need to be introduced (probably at the lucene level).

Do you know if there has been any serious thought given to this, i.e., the possibility of introducing a new SAME operator or is this a corner- case not worthy?

thanks
David





Reply via email to