[GitHub] [lucene] Tony-X commented on issue #12358: Optimize `count()` for BooleanQuery disjunction

via GitHub Sat, 10 Jun 2023 11:12:30 -0700


Tony-X commented on issue #12358:
URL: https://github.com/apache/lucene/issues/12358#issuecomment-1585758064

Just caught up on this thread -- the design tenet of the current benchmark
game is to measure time taken to do the same work in contention-free
environment.

As of now I'm still trying to build trust of the benchmarks so thank you for
your evaluation and feedbacks @uschindler !

So far I believe there are doing the "same" work as I have chased down a few
tokenization issues. Right now the indexes on both side have --
* almost "same" tokenization -- split by whitespaces and remove tokens with
length >=256
* same index sort
* same set of deleted docs (2% in total)
* single segment

Regarding the JVM here is what we do now
* warm up the JVM with 6.1k query for each `COUNT` and `TOP_10_COUNT`. We
could increase the warmup iterations easily
[here](https://github.com/Tony-X/search-benchmark-game/blob/4402d42c906830e85d8d79a30ae776f204ade770/Makefile#L18).
As I was typing, I already changed warmup iter to 3 and kicked off a run.

Admittedly we haven't looked into playing with different JVM arguments.
@mikemccand thanks for creating
https://github.com/Tony-X/search-benchmark-game/issues/37 to explore the heap
sizes :)

IMO, GC here is less of an issue since we measure the best latency (min)
across 10 runs for each query (a slight favor for JVM). The probability that
every 10 of 10 run of the same query hit an GC is very tiny.

It would be great to share your insights about an optimal JVM setting for
this case.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] Tony-X commented on issue #12358: Optimize `count()` for BooleanQuery disjunction

Reply via email to