Hi, I want to issue queries where queried fields have a specified value or are "missing". I know that I can query missing values using a negated full-range query, but it doesn't seem like that's very efficient (the fields in question have a lot of possible values). So I've opted to store special "missing" value for each field that isn't found in a document, and issue queries like "+(field1:value field1:missing) +(field2:value field2:missing)".
The issue is that storing the missing values increases the size of the index by 30%, because a lot of documents don't have values for all fields. I'd like to keep the index as small as possible so it can be cached in memory. Any ideas on an alternative approach? Is there a way to convince lucene to store the doc-id list for the "missing" field value as a bitmap? What if I added some boolean fields to my schema; e.g., field1_missing and field2_missing and stored a true in those fields for documents that were missing the corresponding fields? Does lucene store BoolField's as bitmaps? -dallan