rmuir commented on PR #14193:
URL: https://github.com/apache/lucene/pull/14193#issuecomment-2635997545
My example for this one, if you have something like `[^a-gklM-O\s]`, with
the case-insensitive flag maybe, it just calls the new
`makeCharClass(int[],int[])` method and you get minimal aut
rmuir commented on PR #14193:
URL: https://github.com/apache/lucene/pull/14193#issuecomment-2635939281
anyway, I think this is the right path, rather than fight with union(),
let's just get it out of our way. with this change union() is only used for
union operator (`|`) and not internally.
rmuir commented on PR #14193:
URL: https://github.com/apache/lucene/pull/14193#issuecomment-2635936648
That's error-prone that's broke trying to do some null analysis :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
rmuir commented on PR #14193:
URL: https://github.com/apache/lucene/pull/14193#issuecomment-2635933607
I generalized this to `makeCharClass(int[],int[])`, added a "character
class" node to use it instead of unioning many nodes, replaced the pre-built
class functionality with it too.
MaruHyl opened a new pull request, #14199:
URL: https://github.com/apache/lucene/pull/14199
fix typo
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-m
cheng66551 commented on PR #14163:
URL: https://github.com/apache/lucene/pull/14163#issuecomment-2635560399
> I don't think we should merge this change, but it's good that you were
able to use it to confirm that merging would reclaim these deleted docs.
>
> Can you add your data about
cheng66551 commented on PR #14163:
URL: https://github.com/apache/lucene/pull/14163#issuecomment-2635558626
> I don't think we should merge this change, but it's good that you were
able to use it to confirm that merging would reclaim these deleted docs.
>
> Can you add your data about
cheng66551 commented on PR #14163:
URL: https://github.com/apache/lucene/pull/14163#issuecomment-2635548335
> I don't think we should merge this change, but it's good that you were
able to use it to confirm that merging would reclaim these deleted docs.
>
> Can you add your data about
cheng66551 commented on PR #14163:
URL: https://github.com/apache/lucene/pull/14163#issuecomment-2635542291
> If you are able to turn on `InfoStream` for the ES shard that won't merge
segments with so many deletions, and post a chunk here, I can have a look and
see if there are clues.
cheng66551 closed pull request #14163: supports force merge based on specified
segments.
URL: https://github.com/apache/lucene/pull/14163
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
cheng66551 commented on PR #14163:
URL: https://github.com/apache/lucene/pull/14163#issuecomment-2635519801
> It's terrible that `TieredMergePolicy` was not merging these segments,
naturally or under `forceMerge` -- let's understand why it's failing to do so?
It's like we need an `explain`
cheng66551 commented on PR #14163:
URL: https://github.com/apache/lucene/pull/14163#issuecomment-2635518397
> It's terrible that `TieredMergePolicy` was not merging these segments,
naturally or under `forceMerge` -- let's understand why it's failing to do so?
It's like we need an `explain`
cheng66551 commented on PR #14163:
URL: https://github.com/apache/lucene/pull/14163#issuecomment-2635516819
> It's terrible that `TieredMergePolicy` was not merging these segments,
naturally or under `forceMerge` -- let's understand why it's failing to do so?
It's like we need an `explain`
msfroh opened a new pull request, #14198:
URL: https://github.com/apache/lucene/pull/14198
### Description
This resurrects the OpenNLP model training task from Ant
(https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/analysis/opennlp/build.xml#L52-L84)
to Gr
github-actions[bot] commented on PR #14119:
URL: https://github.com/apache/lucene/pull/14119#issuecomment-2635430815
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
rmuir commented on code in PR #14192:
URL: https://github.com/apache/lucene/pull/14192#discussion_r1942013537
##
lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java:
##
@@ -696,17 +896,52 @@ private Automaton toAutomaton(
return a;
}
- private Automaton
rmuir commented on code in PR #14192:
URL: https://github.com/apache/lucene/pull/14192#discussion_r1942009773
##
lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java:
##
@@ -436,6 +478,160 @@ public enum Kind {
*/
@Deprecated public static final int DEPRECATE
john-wagster commented on code in PR #14192:
URL: https://github.com/apache/lucene/pull/14192#discussion_r1941963110
##
lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java:
##
@@ -696,17 +896,52 @@ private Automaton toAutomaton(
return a;
}
- private Aut
john-wagster commented on code in PR #14192:
URL: https://github.com/apache/lucene/pull/14192#discussion_r1941960613
##
lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java:
##
@@ -436,6 +478,160 @@ public enum Kind {
*/
@Deprecated public static final int DE
benwtrent opened a new pull request, #14197:
URL: https://github.com/apache/lucene/pull/14197
The tests caught a bug! Good thing!
The code wasn't taking account of the underlying leaf context doc base when
creating the top doc iterator for a segment.
No changes entry as its a b
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2635120100
Build failure seems unrelated, created #14196
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to g
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2635103176
### Some more points / thoughts
- Built for Faiss `v1.10.0` (version is validated at runtime)
- Can be compiled with lower versions of Java, and run with 22+ (using an
MR-JAR)
-
kaivalnp opened a new issue, #14196:
URL: https://github.com/apache/lucene/issues/14196
### Description
[`TestSeededKnnFloatVectorQuery.testSeedWithTimeout`](https://github.com/apache/lucene/blob/e4321619bba8669e93311ffb9456fa043d519b21/lucene/core/src/test/org/apache/lucene/search/Te
benwtrent opened a new issue, #14195:
URL: https://github.com/apache/lucene/issues/14195
### Description
java.lang.IllegalArgumentException: The number of entry points provided is
less than the number of entry points requested
```
java.lang.IllegalArgumentException: The numb
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1941904975
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
msfroh commented on PR #14194:
URL: https://github.com/apache/lucene/pull/14194#issuecomment-2635079150
Test failure:
```
Reproduce with: gradlew :lucene:core:test --tests
"org.apache.lucene.search.TestSeededKnnByteVectorQuery.testSeedWithTimeout"
-Ptests.jvms=1 -Ptests.jvmargs= -
msfroh opened a new pull request, #14194:
URL: https://github.com/apache/lucene/pull/14194
### Description
This allows users to use either a Penn or UD part-of-speech tagging model,
but output tags in the other format. This allows users to combine a Penn POS
tagging model with a lemm
tteofili commented on PR #14191:
URL: https://github.com/apache/lucene/pull/14191#issuecomment-2634636630
I'm going to try a more promising way of slicing segments to threads
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
tteofili commented on PR #14191:
URL: https://github.com/apache/lucene/pull/14191#issuecomment-2634633824
my previous luceneutil runs were useless, now with changes in luceneutil
(`NoMergePolicy` and no force merge on index side, using the `ExecutorService`
on the search side), I get far di
rmuir commented on issue #14182:
URL: https://github.com/apache/lucene/issues/14182#issuecomment-2634632482
@mikemccand rather than mess with info stream logging could we consider
adding some counters to indexwriter to give visibility? Eg if you have a flush
count with a simple int getter,
tteofili commented on PR #14191:
URL: https://github.com/apache/lucene/pull/14191#issuecomment-2634489270
I've adjusted `AbstractKnnVectorQuery` to pick the largest
`LeafReaderContext` (largest `#reader().numDocs()`) for the first search, this
introduces an additive O(|leafReaderContexts|)
mikemccand closed issue #14182: Add easier segment tracing / verbosity /
transparency to `IndexWriter`
URL: https://github.com/apache/lucene/issues/14182
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
benchaplin commented on PR #14160:
URL: https://github.com/apache/lucene/pull/14160#issuecomment-2634270282
@benwtrent Yep, everything's in the PR. I ran on 1M docs, 100 queries to
keep the benchmark under an hour.
--
This is an automated message from the Apache Git Service.
To respond t
benwtrent commented on PR #14160:
URL: https://github.com/apache/lucene/pull/14160#issuecomment-2634245107
@benchaplin I found another bug. The recall numbers were indeed way too good
to be true. I was returning duplicate documents 🤦 . So, recall was great
because we contained a valid docum
benwtrent commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2634182175
> Java limits the size of arrays (and lists) to 'int max' and does not allow
'long' array indices. These will need to be changed to use a different data
structure.
Yeah, I don't
mikemccand commented on issue #14182:
URL: https://github.com/apache/lucene/issues/14182#issuecomment-2634071572
I have been tinkering with fun little Python tools in
[luceneutil](https://github.com/mikemccand/luceneutil) to 1) [parse a full
`InfoStream`
log](https://github.com/mikemccand/
benwtrent commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1941120908
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
rmuir commented on PR #14193:
URL: https://github.com/apache/lucene/pull/14193#issuecomment-2633741911
I considered it but then didn't use any varargs after Dawid's email about
compiler performance problems coming from them.
--
This is an automated message from the Apache Git Service.
To
jpountz commented on code in PR #14176:
URL: https://github.com/apache/lucene/pull/14176#discussion_r1940875523
##
lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java:
##
@@ -115,30 +117,24 @@ void writeDocIds(int[] docIds, int start, int count,
DataOutput out) th
ChrisHegarty commented on PR #14131:
URL: https://github.com/apache/lucene/pull/14131#issuecomment-2633256419
I've made the cuvs-java api Java 21 friendly, with an spi and a java-22
specific impl in the versioned section of an mrjar - MemorySegment and Arena
have been removed from the api,
kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r1940738982
##
lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
41 matches
Mail list logo