mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1751999625
Here are the results from running `test_all_sizes.py` then
`results_to_md.py`:
|NodeHash size|FST (mb)|RAM (mb)|FST build time (sec)|
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752003494
Thanks for looking into this @rmuir, I've been thinking similar myself (just
didn't get around to anything other than the thinking! )
On my Mac M2.
JDK 20.0.2.
```
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752024575
```
// sum into accumulators
Vector prod16 = prod16_1.add(prod16_2);
acc = acc.add(prod16.convert(VectorOperators.S2I, 0));
acc = acc.add(prod16.convert(VectorOper
mikemccand commented on PR #12628:
URL: https://github.com/apache/lucene/pull/12628#issuecomment-1752028823
Very cool, surprisingly impactful!
> I ran the Tantivy benchmark with TOP_10 and TOP_100 commands
This is the Tantivy benchmark tooling, but you are comparing Lucene (mai
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752029230
And of course, `ZERO_EXTEND_S2I`, will work in the maximum boundary case,
but not in others. So the question is then just about the maximum value of the
bytes in these input arrays
mikemccand commented on issue #12542:
URL: https://github.com/apache/lucene/issues/12542#issuecomment-1752030874
Talking to @sokolovm at Community Over Code 2023 he suggested another idea
here: instead of a (RAM hungry) hash table, couldn't we use the growing FST
itself to lookup suffixes?
mikemccand commented on PR #12631:
URL: https://github.com/apache/lucene/pull/12631#issuecomment-1752031210
> sum | 31606784 | 27188690 | -13.98%
WHOA, wow! This is a massive gain for such a tiny change :) I'll try to
review soon! Nice to revisit ancient `TODO`s in the source code
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752033176
> What is the maximum value that we can see in the input bytes?
All possible values is how i test
> Can they every hold `-128`?
Yes!
> Do we need to handle "ove
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752035773
Ok, cool. If there is not already one, we should add a test to the Panama /
scalar unit test for the boundary values.
--
This is an automated message from the Apache Git Service.
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752036396
yeah agreed: we should test the boundaries for all 3 functions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
mikemccand commented on code in PR #12631:
URL: https://github.com/apache/lucene/pull/12631#discussion_r1349699402
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java:
##
@@ -99,6 +102,26 @@ public final class FieldReader extends Terms {
*/
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752039360
yeah, you are right, i am wrong. the trick only works in the unsigned case,
Byte.MIN_VALUE is a problem :(
--
This is an automated message from the Apache Git Service.
To respond to the
rmuir opened a new pull request, #12634:
URL: https://github.com/apache/lucene/pull/12634
Let's improve the testing for the boundary cases and check them explicitly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752041404
at least we can improve the testing out of this:
https://github.com/apache/lucene/pull/12634
--
This is an automated message from the Apache Git Service.
To respond to the message, pleas
gf2121 commented on code in PR #12631:
URL: https://github.com/apache/lucene/pull/12631#discussion_r1349705693
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsReader.java:
##
@@ -81,8 +81,11 @@ public final class Lucene90BlockTreeTermsRe
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752049654
don't worry, i have a plan B. it is just frustrating due to the nightmare of
operating on the mac, combined with the fact this benchmark and lucene source
is a separate repo. it makes the
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752050233
see latest commit for the idea. on my mac it gives a decent boost. it uses
"32-bit" vector by loading 64-bit vector from array but only processing half of
it. The tests should fail as i n
mikemccand commented on code in PR #12631:
URL: https://github.com/apache/lucene/pull/12631#discussion_r1349711457
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsReader.java:
##
@@ -81,8 +81,11 @@ public final class Lucene90BlockTreeTer
mikemccand commented on PR #12631:
URL: https://github.com/apache/lucene/pull/12631#issuecomment-1752050479
I kicked off a `luceneutil` run ... I'll post results here soonish.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub an
rmuir merged PR #12634:
URL: https://github.com/apache/lucene/pull/12634
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752063622
ok on my mac i see:
```
Benchmark (size) Mode Cnt Score
Error Units
BinaryCosineBenchmark.cosineDistanceNew 1024 thrpt5 2.
mikemccand commented on PR #12631:
URL: https://github.com/apache/lucene/pull/12631#issuecomment-1752064474
`luceneutil` results on `wikimediumall` look good -- looks like all noise
(even for `PKLookup`), or, any signal (change) is very low, making the ~15%
reduction very much worth it.
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752098666
I get similar bench results, the new impl is faster.
```
Benchmark (size) Mode Cnt Score
Error Units
BinaryDotProductBenchmark.
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752099845
My sense here is that accessing a `part` other than `0` is less performant
that just reloading the data, which seems a little off.
--
This is an automated message from the Apache
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752100681
> My sense here is that accessing a `part` other than `0` is less performant
that just reloading the data, which seems a little off.
It seems to have a heavy cost no matter how i do
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752101786
btw, another crazy avenue to possibly explore here another day, since we
seem bottlenecked on integer multiply. We could try it on arm too. It is faster
than the current binary code on my
shubhamvishu opened a new issue, #12635:
URL: https://github.com/apache/lucene/issues/12635
### Description
Currently, there is lot of code duplication due to
[ByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.ja
shubhamvishu opened a new pull request, #12636:
URL: https://github.com/apache/lucene/pull/12636
### Description
The classes
[ByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.java)
and
[FloatVectorValues](http
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752107370
The other thought I had around conversion costs would be to look into
reinterpret+shuffle/shift/mask crap ourselves, which seems really crazy but i'm
running low on ideas.
--
This is an
epugh commented on PR #448:
URL: https://github.com/apache/lucene/pull/448#issuecomment-1752112078
It would be nice if this was updated to the awesome new OpenNLP 2.x line!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and u
jpountz commented on PR #12628:
URL: https://github.com/apache/lucene/pull/12628#issuecomment-1752152301
I'll try to give a bit more context how I ended up here. With recent work on
vector search and excitement around it, I can't prevent myself from thinking
that all users who are happy to
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1752165322
For comparison, this is how the curve (RAM required during construction vs
final FST size) looks on trunk, using the god-like parameters as best I could.
I sorted the results in reve
benwtrent commented on PR #12636:
URL: https://github.com/apache/lucene/pull/12636#issuecomment-1752194821
It was sort of this way before but we decided to switch it as a common
interface required either:
- having to use generics
- an API where things weren't fully implemented or r
Shibi-bala opened a new issue, #12637:
URL: https://github.com/apache/lucene/issues/12637
### Description
Found that the [replace
method](https://github.com/qcri/solr-6/blob/master/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L875-L878)
doesn't set `userData` with t
pzygielo commented on PR #12611:
URL: https://github.com/apache/lucene/pull/12611#issuecomment-1752377046
Thanks for checking.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment
yugushihuang opened a new pull request, #12638:
URL: https://github.com/apache/lucene/pull/12638
### Description
A simple API in TermStates to expose the `needStats` flag.
Addresses #12617 #
--
This is an automated message from the Apache Git Service.
To respond to the m
dweiss merged PR #12611:
URL: https://github.com/apache/lucene/pull/12611
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
dweiss commented on PR #12611:
URL: https://github.com/apache/lucene/pull/12611#issuecomment-1752397871
I've applied this to main and branch_9x (9.9). Thank you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
jpountz commented on PR #12638:
URL: https://github.com/apache/lucene/pull/12638#issuecomment-1752414836
Can you explain how/when you plan to use this new API?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL ab
dweiss commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1752416032
I didn't get into all the details but I think this looks good. Your
questions are indeed intriguing - I can't provide any explanation off the top
of my head, really.
--
This is an auto
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752723993
@rmuir
Building on your idea, and focusing again on the x64 case, I get a bit of a
boost by just converting directly to int (rather than the short dance).
On my Rocket
gf2121 commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752758194
> Benchmark (size) Mode Cnt Score
Error Units
BinaryDotProductBenchmark.dotProductNew 1024 thrpt5 20.675 ±
0.051 ops/us
Binar
mikemccand commented on code in PR #12631:
URL: https://github.com/apache/lucene/pull/12631#discussion_r1350166040
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java:
##
@@ -99,6 +102,26 @@ public final class FieldReader extends Terms {
*/
mikemccand commented on code in PR #12630:
URL: https://github.com/apache/lucene/pull/12630#discussion_r1350170058
##
lucene/core/src/test/org/apache/lucene/index/TestBufferedUpdates.java:
##
@@ -61,10 +61,10 @@ public void testRamBytesUsed() {
public void testDeletedTerms()
mikemccand commented on PR #12625:
URL: https://github.com/apache/lucene/pull/12625#issuecomment-1752818499
> While working in the code base I stumble with this
[TODO](https://github.com/apache/lucene/blob/2474940bffe6118ed31ceb717fd49705d819e1fc/lucene/core/src/java/org/apache/lucene/util/P
mikemccand commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1350191783
##
lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java:
##
@@ -21,19 +21,18 @@
import java.util.List;
import org.apache.lucene.store.DataInput;
impor
mikemccand commented on code in PR #12625:
URL: https://github.com/apache/lucene/pull/12625#discussion_r1350196379
##
lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java:
##
@@ -312,14 +261,14 @@ private int findHash(BytesRef bytes) {
// final position
int ha
jpountz commented on code in PR #12631:
URL: https://github.com/apache/lucene/pull/12631#discussion_r1350212761
##
lucene/core/src/test/org/apache/lucene/codecs/lucene90/blocktree/TestMSBVLong.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
jpountz commented on PR #12630:
URL: https://github.com/apache/lucene/pull/12630#issuecomment-1752884136
wow good catch. Out of curiosity, how did you catch it? Are you running
snapshot Lucene builds in production?
--
This is an automated message from the Apache Git Service.
To respond to
gf2121 opened a new issue, #12639:
URL: https://github.com/apache/lucene/issues/12639
### Description
I played with vector API to sum up bit count. This pattern can be used in
[bitset
cardinality](https://github.com/apache/lucene/blob/dfff1e635805ffc61dd6029a8060e2635bfcbdb9/lucene/c
gsmiller opened a new pull request, #12640:
URL: https://github.com/apache/lucene/pull/12640
As DrillSidewaysScorer is currently written, if any leaf collectors throw
CollectionTerminatedException then `LeafCollector#finish` won't properly get
called. This patch makes sure we always call `#
jpountz commented on issue #12639:
URL: https://github.com/apache/lucene/issues/12639#issuecomment-1752922284
This looks appealing. What is the `size` parameter in your micro benchmark,
is it the number of longs or the number of bits?
--
This is an automated message from the Apache Git Se
gf2121 commented on issue #12639:
URL: https://github.com/apache/lucene/issues/12639#issuecomment-1752926917
> is it the number of longs or the number of bits?
It is the number of longs. Here is the whole class:
```
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICR
gf2121 commented on PR #12630:
URL: https://github.com/apache/lucene/pull/12630#issuecomment-1752936864
> Just to confirm: the previous PR was not released/included in 9.8.0 right?
So users are not hitting this memory leak when using the 9.8.0 release.
Yes, the previous PR is not incl
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752963144
Oh! I didn't look back as far as the original commit on this PR, sorry. I
see now that @rmuir tried exactly the same thing.
@gf2121 Strange that we see different results. C
gf2121 commented on code in PR #12630:
URL: https://github.com/apache/lucene/pull/12630#discussion_r1350277816
##
lucene/core/src/test/org/apache/lucene/index/TestBufferedUpdates.java:
##
@@ -61,10 +61,10 @@ public void testRamBytesUsed() {
public void testDeletedTerms() {
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1753003897
> Oh! I didn't look back as far as the original commit on this PR, sorry. I
see now that @rmuir tried exactly the same thing.
>
> @gf2121 Strange that we see different results. Could
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1753006621
> Especially clang already makes a reasonable choice that's only sub-optimal
because of CPU quirks (32x32 => 32-bit SIMD mulitplication costs more on recent
Intel microarchitectures than 2
benwtrent commented on issue #12621:
URL: https://github.com/apache/lucene/issues/12621#issuecomment-1753021709
Thank you @rmuir && @ChrisHegarty for digging into this!
The current Panama Vector API makes doing this kind of thing frustrating.
Thank y'all for wrestling with it to make
rmuir commented on issue #12639:
URL: https://github.com/apache/lucene/issues/12639#issuecomment-1753033439
This is confusing since IMO compiler should be doing this already? I
remember seeing it relatively recently but you are testing with JDK20...
https://bugs.openjdk.org/browse/JDK
iverase commented on code in PR #12625:
URL: https://github.com/apache/lucene/pull/12625#discussion_r1350345121
##
lucene/core/src/java/org/apache/lucene/util/BytesRefBlockPool.java:
##
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
iverase commented on PR #12625:
URL: https://github.com/apache/lucene/pull/12625#issuecomment-1753069277
I run luceneutil for wikimedium10m and I don't think it shows any slow down
(I find hard to understand the output):
```
TaskQPS baseline StdDevQ
gf2121 commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1753077559
I rerun on java 21, `squareDistanceNewNew` looks faster:
```
openjdk version "21" 2023-09-19
OpenJDK Runtime Environment (build 21+35-2513)
OpenJDK 64-Bit Server VM (build 21+35
gf2121 commented on issue #12639:
URL: https://github.com/apache/lucene/issues/12639#issuecomment-1753084518
The scalar impl in JDK21 looks better
```
Benchmark (size) Mode Cnt Score Error Units
BitcountBenchmark.bitCountNew 1024 thrpt5
dungba88 commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1350425577
##
lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java:
##
@@ -21,19 +21,18 @@
import java.util.List;
import org.apache.lucene.store.DataInput;
import
dungba88 commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1350425577
##
lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java:
##
@@ -21,19 +21,18 @@
import java.util.List;
import org.apache.lucene.store.DataInput;
import
rmuir opened a new issue, #12641:
URL: https://github.com/apache/lucene/issues/12641
### Description
Background: I'm having a hard time keeping
https://github.com/rmuir/vectorbench up to date, the code has differences with
what the integrated vector code in lucene is, I have to copy/
dungba88 commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1350439623
##
lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java:
##
@@ -21,19 +21,18 @@
import java.util.List;
import org.apache.lucene.store.DataInput;
import
rmuir commented on issue #12621:
URL: https://github.com/apache/lucene/issues/12621#issuecomment-1753202064
@benwtrent I think a big source of confusion is that while the data might be
`byte`, the related functions return 4-byte `int` and 4-byte `float` so from a
vector api perspective, the
uschindler commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1753361544
> I rerun on java 21, `squareDistanceNewNew` looks faster:
In this PR is no change in square distance!? It only optimizes cosine and
dotProduct.
--
This is an automated messa
gf2121 commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1753372410
> In this PR is no change in square distance!? It only optimizes cosine and
dotProduct.
See the [first commit of this PR](132bf28ecf86f06f6a015f5797139d7dcf3d2fb0)
and [the corresp
yugushihuang commented on PR #12638:
URL: https://github.com/apache/lucene/pull/12638#issuecomment-1753389852
Because TermStates can be built with or without the needStats. If in
application, we build the TermStates and pass them around. It is worthwhile
for the application to check if the
gsmiller opened a new pull request, #12642:
URL: https://github.com/apache/lucene/pull/12642
Small bug fix where `#finish` can be called multiple times on the base
collector during drill-sideways
--
This is an automated message from the Apache Git Service.
To respond to the message, pleas
gsmiller opened a new pull request, #12643:
URL: https://github.com/apache/lucene/pull/12643
(no comment)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-
benwtrent commented on PR #12629:
URL: https://github.com/apache/lucene/pull/12629#issuecomment-1753485937
I am going to merge this unless there is prevailing negative sentiment. This
change should significantly reduce code churn for vector codecs that require
reading/writing vectors in a f
dweiss commented on issue #12641:
URL: https://github.com/apache/lucene/issues/12641#issuecomment-1753495875
JMH is fairly self-contained, I don't think it should be a big deal to wrap
it up into a separate module, without external plugins (which are problematic
to debug, in case of problem
clayburn commented on PR #12293:
URL: https://github.com/apache/lucene/pull/12293#issuecomment-1753607350
@dsmiley - Here is the PR we were discussing at Community Over Code
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
mikemccand commented on PR #12625:
URL: https://github.com/apache/lucene/pull/12625#issuecomment-1753617337
> I run luceneutil for wikimedium10m and I don't think it shows any slow
down (I find hard to understand the output):
Hmm, surprisingly noisy, especially for the biggest regress
jpountz opened a new issue, #12644:
URL: https://github.com/apache/lucene/issues/12644
### Description
Counts on disjunctions could be optimized in the following case:
- 2 clauses
- both clauses are term queries
- there are no deletes
Then we could compute the count
epugh commented on PR #448:
URL: https://github.com/apache/lucene/pull/448#issuecomment-1753691080
@jzonthemtn not sure I have the knowledge or chops to do this upgrade...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and u
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1753705229
Translating/merging the above two tables into a graph:

Some observations:
gsmiller commented on code in PR #12587:
URL: https://github.com/apache/lucene/pull/12587#discussion_r1350774457
##
lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java:
##
@@ -112,7 +113,23 @@ private static PrefixCodedTerms packTerms(String field,
Collection ter
jzonthemtn commented on PR #448:
URL: https://github.com/apache/lucene/pull/448#issuecomment-1753825042
> @jzonthemtn not sure I have the knowledge or chops to do this upgrade...
I'll push an update!
--
This is an automated message from the Apache Git Service.
To respond to the mess
gsmiller commented on issue #12644:
URL: https://github.com/apache/lucene/issues/12644#issuecomment-1753879508
Oh, +1. Interesting idea to try out!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
gsmiller commented on PR #12638:
URL: https://github.com/apache/lucene/pull/12638#issuecomment-1753905354
I'd also be curious to better understand the need here. Is it really about
making `#docFreq` and `#totalTermFreq` calls safer/easier for callers somehow?
It looks like you'll get `Illeg
dungba88 commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1350439623
##
lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java:
##
@@ -21,19 +21,18 @@
import java.util.List;
import org.apache.lucene.store.DataInput;
import
gf2121 commented on code in PR #12587:
URL: https://github.com/apache/lucene/pull/12587#discussion_r1351349192
##
lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java:
##
@@ -112,7 +113,23 @@ private static PrefixCodedTerms packTerms(String field,
Collection ter
jpountz commented on issue #12644:
URL: https://github.com/apache/lucene/issues/12644#issuecomment-1754413611
You are right, I added the condition on deleted docs and term queries so
that `count(clause)` can be computed as the doc freq of the term.
--
This is an automated message from the
gf2121 merged PR #12630:
URL: https://github.com/apache/lucene/pull/12630
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
gf2121 commented on PR #12631:
URL: https://github.com/apache/lucene/pull/12631#issuecomment-1754442681
@jpountz @mikemccand Thanks a lot for the great suggestions and benchmark !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHu
gf2121 merged PR #12631:
URL: https://github.com/apache/lucene/pull/12631
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
gf2121 closed issue #12620: Write VLong in opposite order for better outputs
sharing in the FST
URL: https://github.com/apache/lucene/issues/12620
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
gf2121 opened a new pull request, #12645:
URL: https://github.com/apache/lucene/pull/12645
No need to fill zero as `computeDocFreqs` will do.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
gf2121 commented on code in PR #12645:
URL: https://github.com/apache/lucene/pull/12645#discussion_r1351615624
##
lucene/CHANGES.txt:
##
@@ -178,7 +178,7 @@ Optimizations
* GITHUB#12623: Use a MergeSorter taking advantage of extra storage for
StableMSBRadixSorter. (Guo Feng)
gf2121 commented on code in PR #12643:
URL: https://github.com/apache/lucene/pull/12643#discussion_r1351629111
##
lucene/core/src/java/org/apache/lucene/search/LeafCollector.java:
##
@@ -125,6 +125,8 @@ default DocIdSetIterator competitiveIterator() throws
IOException {
* i
dungba88 opened a new pull request, #12646:
URL: https://github.com/apache/lucene/pull/12646
### Description
Currently FSTCompiler and FST has a circular dependencies to each other.
FSTCompiler creates an instance of FST, and on adding node, it delegates to
`FST.addNode()` and passin
gf2121 merged PR #12645:
URL: https://github.com/apache/lucene/pull/12645
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
gf2121 commented on code in PR #12642:
URL: https://github.com/apache/lucene/pull/12642#discussion_r1351726366
##
lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java:
##
@@ -1490,7 +1542,22 @@ public List
reduce(Collection collectors) {
.collect(Coll
romseygeek commented on PR #12646:
URL: https://github.com/apache/lucene/pull/12646#issuecomment-1754625497
Thanks for opening @dungba88! This FST building code is very hairy and this
is a nice start at cleaning it up.
Given how expert this code is and that the relevant methods are al
jpountz commented on code in PR #12642:
URL: https://github.com/apache/lucene/pull/12642#discussion_r1351794343
##
lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java:
##
@@ -316,6 +316,58 @@ public void testBasic() throws Exception {
IOUtils.close(searcher
601 - 700 of 20534 matches
Mail list logo