Hi,
We have a requirement of implementing a boolean filter with up to 500k values.
We took the approach of post filter.
Our environment has 7 servers of 128gb ram and 64cpus each server. We have
20-40m very large documents. Each solr instance has 64 shards with 2 replicas
and JVM memory xms and xmx set to 31GB.
We are seeing that using single post filter with 1000 on 20m documents takes
about 4.5 seconds.
Logic in our collect method:
numericDocValues =
reader.getNumericDocValues(FileFilterPostQuery.this.metaField);
if (numericDocValues != null &&
numericDocValues.advanceExact(docNumber)) {
longVal = numericDocValues.longValue();
} else {
return;
}
}
if (numericValuesSet.contains(longVal)) {
super.collect(docNumber);
}
Is it the best we can get?
Thanks,
Artur Rudenko
This electronic message may contain proprietary and confidential information of
Verint Systems Inc., its affiliates and/or subsidiaries. The information is
intended to be for the use of the individual(s) or entity(ies) named above. If
you are not the intended recipient (or authorized to receive this e-mail for
the intended recipient), you may not use, copy, disclose or distribute to
anyone this message or any information contained in this message. If you have
received this electronic message in error, please notify us by replying to this
e-mail.