Hi David,

On 03/29/2010 at 4:54 PM, David Smiley (@MITRE.org) wrote:
> Did you read my original message where I suggested perhaps a solution
> might lie in intersecting different queries based on common multi-value
> field offsets derived from matching term positions?  I have no idea how
> far off the current codebase is to exposing enough information to make
> such an approach possible.
AFAICT, your above-described solution addresses the "one-to-many problem" by 
representing multiple records within a single document via parallel arrays, one 
array per address-part field.  The parallel array alignment is effected via 
alignment of position increments.  What's missing from Solr/Lucene is the 
ability to constrain matches such that the position increment of all matching 
address-part fields is the same.

I suspect that the Flexible Indexing branch would allow a slightly less 
involved index usage pattern: you could add a new term attribute that 
explicitly represents the record index.  That way you wouldn't have to fiddle 
around with increment gaps and guess about maximum record size.

You still need to perform the equivalent of an SQL table join across the 
matching address-part fields (in addition to any non-address constraints), 
using parallel array index equality as the join predicate.  I don't know how 
hard it would be to implement this, but you'd need to: add the ability to 
express this kind of constraint in the query language; make a new Similarity 
implementation that could handle it; and, if you go the route of adding a new 
record index term attribute, add a new postings codec that handles 
writing/reading it.

Steve

Reply via email to