jpountz commented on PR #14203:
URL: https://github.com/apache/lucene/pull/14203#issuecomment-2724038977
I have some small concerns:
- The fact that the 512 step is tied to the number of points per leaf,
though it's not a big deal at all, postings are similar: their encoding logic
is sp
jpountz commented on PR #14333:
URL: https://github.com/apache/lucene/pull/14333#issuecomment-2724046501
I started looking at the code but you would know better: does this new
encoding make it easier to know the length of leaf blocks while traversing the
terms index so that we could prefetc
dweiss commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2724314718
Or we can just embrace the fact that it can be a non-minimal NFA and justlet
it run like that (with NFARunAutomaton).
--
This is an automated message from the Apache Git Service.
To res
DivyanshIITB commented on PR #14335:
URL: https://github.com/apache/lucene/pull/14335#issuecomment-2724394013
Just a gentle reminder
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specif
dweiss commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2724062564
I don't know Unicode as well as Rob so I can't say what these alternate case
folding
equivalence classes are... but they definitely don't have a "canonical"
representation
with rega
gf2121 commented on code in PR #14333:
URL: https://github.com/apache/lucene/pull/14333#discussion_r1994867386
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java:
##
@@ -0,0 +1,486 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one o
thecoop commented on code in PR #14304:
URL: https://github.com/apache/lucene/pull/14304#discussion_r1987194449
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -907,4 +907,87 @@ public static long int4BitDotProduct128(byte[]
jpountz commented on PR #14345:
URL: https://github.com/apache/lucene/pull/14345#issuecomment-2724895262
Nightly benchmarks confirmed the speedup:
https://benchmarks.mikemccandless.com/FilteredAndHighHigh.html. I'll push an
annotation.
--
This is an automated message from the Apache Git
jpountz commented on PR #14335:
URL: https://github.com/apache/lucene/pull/14335#issuecomment-2724923272
Apologies I had missed your reply.
> should this be a shared global pool across all IndexWriters, or should
each writer have its own pool?
It should be shared, we don't want
jpountz merged PR #14354:
URL: https://github.com/apache/lucene/pull/14354
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
dweiss commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2725496380
Ok, fair enough.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
renatoh opened a new pull request, #14356:
URL: https://github.com/apache/lucene/pull/14356
### Description
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To u
rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2724585337
> Or we can just embrace the fact that it can be a non-minimal NFA and
justlet it run like that (with NFARunAutomaton).
I don't think this is currently a good option either: users wo
rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2725736846
It isn't a good idea. If the user wants to "erase case differences" then
they should apply `foldcase(ch)`. That's what case-folding means. That
CaseFolding class does everything, except, t
rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2724580292
This is why i recommended to not use the unicode function and to start
simple. Then you have a potential way to get it working efficiently.
--
This is an automated message from t
msfroh commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2725709282
This is kind of what I had in mind:
```java
private static int canonicalize(int codePoint) {
int[] alternatives = CaseFolding.lookupAlternates(codePoint);
if (alte
hanbj commented on PR #14352:
URL: https://github.com/apache/lucene/pull/14352#issuecomment-2724230306
Thank you for providing ideas. In scenarios with multiple dimensions, the
internal nodes in the bkd tree can only be sorted according to a certain
dimension. Different internal nodes may h
jpountz commented on PR #14203:
URL: https://github.com/apache/lucene/pull/14203#issuecomment-2726015514
Thanks for running benchmarks. So it looks like the JVM doesn't think these
shorter loops (with step 128) are worth unrolling? This makes me wonder how
something like that performs on y
gf2121 commented on PR #14203:
URL: https://github.com/apache/lucene/pull/14203#issuecomment-2725390772
> There must be something that happens with this 512 step that doesn't
happen otherwise such as using different instructions, loop unrolling, better
CPU pipelining or something else.
msfroh commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2726097192
Hmm... I'm thinking of just requiring that input is lowercase (per
`Character.lowerCase(c)`), then check for collisions on uppercase versions when
adding transitions, and throw an excepti
20 matches
Mail list logo