jpountz opened a new pull request, #12444:
URL: https://github.com/apache/lucene/pull/12444
Lucene's scorers that can dynamically prune on score provide great speedups
when they manage to skip many hits. Unfortunately, there are also cases when
they cannot skip hits efficiently, one example
jpountz commented on PR #12444:
URL: https://github.com/apache/lucene/pull/12444#issuecomment-1637514621
I played with the following tasks file to evaluate the impact of this change:
```
OrHigh2: several following
OrHigh3: several following publisher
OrHigh4: several followin
jpountz commented on PR #12444:
URL: https://github.com/apache/lucene/pull/12444#issuecomment-1637792504
Here is the usual set of queries, still on wikimedium10m. Sparser
disjunctive queries like `Fuzzy1`, `Fuzzy2` and `OrHighLow` can get a slowdown
when the majority of clauses have very fe
jpountz commented on PR #12444:
URL: https://github.com/apache/lucene/pull/12444#issuecomment-1637854931
Here is a similar table as above but with low-cardinality clauses instead of
high-cardinality clauses in order to show how the overhead of the bitset
manifests:
```
OrLow2: riv
jpountz commented on issue #12439:
URL: https://github.com/apache/lucene/issues/12439#issuecomment-1638155116
The above idea actually works quite well, see #12444.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
UR
epotyom opened a new pull request, #12445:
URL: https://github.com/apache/lucene/pull/12445
Keep a set of Expression variables that are used more than once. This set
can then be used by Lucene application to decide if corresponding
DoubleValuesSource can benefit from caching caching.
jpountz opened a new pull request, #12446:
URL: https://github.com/apache/lucene/pull/12446
Both MAXSCORE and WAND can easily be tuned to perform rank-unsafe
optimizations, by skipping doc IDs that are unlikely to make it to the top-k.
The main challenge is how to expose this kind of optimi
shubhamvishu commented on PR #12183:
URL: https://github.com/apache/lucene/pull/12183#issuecomment-1638374134
> I'm starting to believe that we should fix the executor to run tasks in
the current thread if called from a thread of the pool instead of fixing our
collectors in the testing fram
shubhamvishu commented on issue #12394:
URL: https://github.com/apache/lucene/issues/12394#issuecomment-1638379241
I seethanks for clarifying @jpountz
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
jpountz commented on PR #12446:
URL: https://github.com/apache/lucene/pull/12446#issuecomment-1638400035
As an example, with this PR and calling
`searcher.setMaxEvaluatedHitRatio(.001f)`, the query `be (+mostly +interview)`
goes from 7.0ms to 2.7ms while still returning the same top 100 hit
benwtrent commented on PR #12434:
URL: https://github.com/apache/lucene/pull/12434#issuecomment-1638442919
@jpountz my original benchmarks were flawed. There was a bug in my testing.
Nested is actually 80% slower (or 1.8x times) than the current search times.
I am investigating the c
ChrisHegarty commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1638836015
Here's where I'm at, after spending the best part of the last three days
hacking in this area - I'm on the fence about whether or not this is worth it.
The current code and fo
tang-hi commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1639151537
I also currently believe that it may not be a good time to vectorize it.
Although vectorized code combined with lazy compute does improve performance,
we currently cannot achieve scalar
13 matches
Mail list logo