Hi,

I want to issue queries where queried fields have a specified value or are
"missing".  I know that I can query missing values using a negated
full-range query, but it doesn't seem like that's very efficient (the fields
in question have a lot of possible values).  So I've opted to store special
"missing" value for each field that isn't found in a document, and issue
queries like "+(field1:value field1:missing) +(field2:value
field2:missing)".

The issue is that storing the missing values increases the size of the index
by 30%, because a lot of documents don't have values for all fields.  I'd
like to keep the index as small as possible so it can be cached in memory.

Any ideas on an alternative approach?  Is there a way to convince lucene to
store the doc-id list for the "missing" field value as a bitmap?  What if I
added some boolean fields to my schema; e.g., field1_missing and
field2_missing and stored a true in those fields for documents that were
missing the corresponding fields?  Does lucene store BoolField's as bitmaps?

-dallan

Reply via email to