Hi David, On 03/29/2010 at 4:54 PM, David Smiley (@MITRE.org) wrote: > Did you read my original message where I suggested perhaps a solution > might lie in intersecting different queries based on common multi-value > field offsets derived from matching term positions? I have no idea how > far off the current codebase is to exposing enough information to make > such an approach possible.
AFAICT, your above-described solution addresses the "one-to-many problem" by representing multiple records within a single document via parallel arrays, one array per address-part field. The parallel array alignment is effected via alignment of position increments. What's missing from Solr/Lucene is the ability to constrain matches such that the position increment of all matching address-part fields is the same. I suspect that the Flexible Indexing branch would allow a slightly less involved index usage pattern: you could add a new term attribute that explicitly represents the record index. That way you wouldn't have to fiddle around with increment gaps and guess about maximum record size. You still need to perform the equivalent of an SQL table join across the matching address-part fields (in addition to any non-address constraints), using parallel array index equality as the join predicate. I don't know how hard it would be to implement this, but you'd need to: add the ability to express this kind of constraint in the query language; make a new Similarity implementation that could handle it; and, if you go the route of adding a new record index term attribute, add a new postings codec that handles writing/reading it. Steve