dweiss commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1752416032
I didn't get into all the details but I think this looks good. Your
questions are indeed intriguing - I can't provide any explanation off the top
of my head, really.
--
This is an auto
jpountz commented on PR #12638:
URL: https://github.com/apache/lucene/pull/12638#issuecomment-1752414836
Can you explain how/when you plan to use this new API?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL ab
dweiss commented on PR #12611:
URL: https://github.com/apache/lucene/pull/12611#issuecomment-1752397871
I've applied this to main and branch_9x (9.9). Thank you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
dweiss merged PR #12611:
URL: https://github.com/apache/lucene/pull/12611
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
yugushihuang opened a new pull request, #12638:
URL: https://github.com/apache/lucene/pull/12638
### Description
A simple API in TermStates to expose the `needStats` flag.
Addresses #12617 #
--
This is an automated message from the Apache Git Service.
To respond to the m
pzygielo commented on PR #12611:
URL: https://github.com/apache/lucene/pull/12611#issuecomment-1752377046
Thanks for checking.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment
Shibi-bala opened a new issue, #12637:
URL: https://github.com/apache/lucene/issues/12637
### Description
Found that the [replace
method](https://github.com/qcri/solr-6/blob/master/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L875-L878)
doesn't set `userData` with t
benwtrent commented on PR #12636:
URL: https://github.com/apache/lucene/pull/12636#issuecomment-1752194821
It was sort of this way before but we decided to switch it as a common
interface required either:
- having to use generics
- an API where things weren't fully implemented or r
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1752165322
For comparison, this is how the curve (RAM required during construction vs
final FST size) looks on trunk, using the god-like parameters as best I could.
I sorted the results in reve
jpountz commented on PR #12628:
URL: https://github.com/apache/lucene/pull/12628#issuecomment-1752152301
I'll try to give a bit more context how I ended up here. With recent work on
vector search and excitement around it, I can't prevent myself from thinking
that all users who are happy to
epugh commented on PR #448:
URL: https://github.com/apache/lucene/pull/448#issuecomment-1752112078
It would be nice if this was updated to the awesome new OpenNLP 2.x line!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and u
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752107370
The other thought I had around conversion costs would be to look into
reinterpret+shuffle/shift/mask crap ourselves, which seems really crazy but i'm
running low on ideas.
--
This is an
shubhamvishu opened a new pull request, #12636:
URL: https://github.com/apache/lucene/pull/12636
### Description
The classes
[ByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.java)
and
[FloatVectorValues](http
shubhamvishu opened a new issue, #12635:
URL: https://github.com/apache/lucene/issues/12635
### Description
Currently, there is lot of code duplication due to
[ByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.ja
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752101786
btw, another crazy avenue to possibly explore here another day, since we
seem bottlenecked on integer multiply. We could try it on arm too. It is faster
than the current binary code on my
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752100681
> My sense here is that accessing a `part` other than `0` is less performant
that just reloading the data, which seems a little off.
It seems to have a heavy cost no matter how i do
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752099845
My sense here is that accessing a `part` other than `0` is less performant
that just reloading the data, which seems a little off.
--
This is an automated message from the Apache
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752098666
I get similar bench results, the new impl is faster.
```
Benchmark (size) Mode Cnt Score
Error Units
BinaryDotProductBenchmark.
mikemccand commented on PR #12631:
URL: https://github.com/apache/lucene/pull/12631#issuecomment-1752064474
`luceneutil` results on `wikimediumall` look good -- looks like all noise
(even for `PKLookup`), or, any signal (change) is very low, making the ~15%
reduction very much worth it.
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752063622
ok on my mac i see:
```
Benchmark (size) Mode Cnt Score
Error Units
BinaryCosineBenchmark.cosineDistanceNew 1024 thrpt5 2.
rmuir merged PR #12634:
URL: https://github.com/apache/lucene/pull/12634
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
mikemccand commented on PR #12631:
URL: https://github.com/apache/lucene/pull/12631#issuecomment-1752050479
I kicked off a `luceneutil` run ... I'll post results here soonish.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub an
mikemccand commented on code in PR #12631:
URL: https://github.com/apache/lucene/pull/12631#discussion_r1349711457
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsReader.java:
##
@@ -81,8 +81,11 @@ public final class Lucene90BlockTreeTer
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752050233
see latest commit for the idea. on my mac it gives a decent boost. it uses
"32-bit" vector by loading 64-bit vector from array but only processing half of
it. The tests should fail as i n
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752049654
don't worry, i have a plan B. it is just frustrating due to the nightmare of
operating on the mac, combined with the fact this benchmark and lucene source
is a separate repo. it makes the
gf2121 commented on code in PR #12631:
URL: https://github.com/apache/lucene/pull/12631#discussion_r1349705693
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsReader.java:
##
@@ -81,8 +81,11 @@ public final class Lucene90BlockTreeTermsRe
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752041404
at least we can improve the testing out of this:
https://github.com/apache/lucene/pull/12634
--
This is an automated message from the Apache Git Service.
To respond to the message, pleas
rmuir opened a new pull request, #12634:
URL: https://github.com/apache/lucene/pull/12634
Let's improve the testing for the boundary cases and check them explicitly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752039360
yeah, you are right, i am wrong. the trick only works in the unsigned case,
Byte.MIN_VALUE is a problem :(
--
This is an automated message from the Apache Git Service.
To respond to the
mikemccand commented on code in PR #12631:
URL: https://github.com/apache/lucene/pull/12631#discussion_r1349699402
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java:
##
@@ -99,6 +102,26 @@ public final class FieldReader extends Terms {
*/
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752036396
yeah agreed: we should test the boundaries for all 3 functions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752035773
Ok, cool. If there is not already one, we should add a test to the Panama /
scalar unit test for the boundary values.
--
This is an automated message from the Apache Git Service.
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752033176
> What is the maximum value that we can see in the input bytes?
All possible values is how i test
> Can they every hold `-128`?
Yes!
> Do we need to handle "ove
mikemccand commented on PR #12631:
URL: https://github.com/apache/lucene/pull/12631#issuecomment-1752031210
> sum | 31606784 | 27188690 | -13.98%
WHOA, wow! This is a massive gain for such a tiny change :) I'll try to
review soon! Nice to revisit ancient `TODO`s in the source code
mikemccand commented on issue #12542:
URL: https://github.com/apache/lucene/issues/12542#issuecomment-1752030874
Talking to @sokolovm at Community Over Code 2023 he suggested another idea
here: instead of a (RAM hungry) hash table, couldn't we use the growing FST
itself to lookup suffixes?
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752029230
And of course, `ZERO_EXTEND_S2I`, will work in the maximum boundary case,
but not in others. So the question is then just about the maximum value of the
bytes in these input arrays
mikemccand commented on PR #12628:
URL: https://github.com/apache/lucene/pull/12628#issuecomment-1752028823
Very cool, surprisingly impactful!
> I ran the Tantivy benchmark with TOP_10 and TOP_100 commands
This is the Tantivy benchmark tooling, but you are comparing Lucene (mai
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752024575
```
// sum into accumulators
Vector prod16 = prod16_1.add(prod16_2);
acc = acc.add(prod16.convert(VectorOperators.S2I, 0));
acc = acc.add(prod16.convert(VectorOper
ChrisHegarty commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1752003494
Thanks for looking into this @rmuir, I've been thinking similar myself (just
didn't get around to anything other than the thinking! )
On my Mac M2.
JDK 20.0.2.
```
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1751999625
Here are the results from running `test_all_sizes.py` then
`results_to_md.py`:
|NodeHash size|FST (mb)|RAM (mb)|FST build time (sec)|
40 matches
Mail list logo