Hi,
We have a requirement of implementing a boolean filter with up to 500k values.

We took the approach of post filter.

Our environment has 7 servers of 128gb ram and 64cpus each server. We have 
20-40m very large documents. Each solr instance has 64 shards with 2 replicas 
and JVM memory xms and xmx set to 31GB.

We are seeing that using single post filter with 1000 on 20m documents takes 
about 4.5 seconds.

Logic in our collect method:
numericDocValues = 
reader.getNumericDocValues(FileFilterPostQuery.this.metaField);

                    if (numericDocValues != null && 
numericDocValues.advanceExact(docNumber)) {
                        longVal = numericDocValues.longValue();
                    } else {
                        return;
                    }
                }

                if (numericValuesSet.contains(longVal)) {
                    super.collect(docNumber);
                }


Is it the best we can get?


Thanks,
Artur Rudenko


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.

Reply via email to