s1monw commented on PR #12829:
URL: https://github.com/apache/lucene/pull/12829#issuecomment-1829488114
@mikemccand @jpountz thanks for your ideas. I'd love to flash this out more
before we add anything we write to the index. Today we'd only use this for
sorting but if that field can be use
dungba88 commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1407526003
##
lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java:
##
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
dungba88 commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1407526003
##
lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java:
##
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
dungba88 commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1407526003
##
lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java:
##
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
dungba88 commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1407526003
##
lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java:
##
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
mikemccand commented on PR #12624:
URL: https://github.com/apache/lucene/pull/12624#issuecomment-1829601799
> Tested Test2BFST with -Dtests.seed=D193E7FD4B9E68C4
Duh, I forgot to fix the seed! And the test is indeed random in the inputs
it compiles. Sorry for the false alarm :)
--
mikemccand commented on PR #12624:
URL: https://github.com/apache/lucene/pull/12624#issuecomment-1829613228
Thanks @dungba88 -- I will catch up with the latest iterations soon. I
tested just how much slower the `ByteBuffer` based store is than the FST's
`BytesStore`:
9.x:
```
mikemccand commented on PR #12624:
URL: https://github.com/apache/lucene/pull/12624#issuecomment-1829633268
> More than two orders-of-magnitude (base 10) slower!
I wonder: are there other places in Lucene that might fall prey to this
performance trap (calling `toDataInput` frequently
mikemccand commented on code in PR #12847:
URL: https://github.com/apache/lucene/pull/12847#discussion_r1407685508
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -867,6 +867,10 @@ public long fstRamBytesUsed() {
return fst.ramBytesUsed();
}
dungba88 commented on code in PR #12847:
URL: https://github.com/apache/lucene/pull/12847#discussion_r1407739948
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -867,6 +867,10 @@ public long fstRamBytesUsed() {
return fst.ramBytesUsed();
}
+
slow-J commented on PR #12797:
URL: https://github.com/apache/lucene/pull/12797#issuecomment-1829926883
Hi @mikemccand thanks for all the comments, addressed them all and now
resolved the new merge conflicts!
--
This is an automated message from the Apache Git Service.
To respond to the m
jpountz commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1829969643
Sorry I wasn't clear, I meant to replace entries of the treeset with entries
of the other treeset by clearing it first, and then doing an `addAll`.
--
This is an automated message from
jpountz commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1407925571
##
lucene/core/src/java/org/apache/lucene/util/ArrayUtil.java:
##
@@ -330,6 +330,29 @@ public static int[] growExact(int[] array, int newLength) {
return copy;
benwtrent commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1407980242
##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -31,18 +33,21 @@
*
* @lucene.internal
*/
-public class NeighborArray {
+public cl
cpoerschke commented on code in PR #12674:
URL: https://github.com/apache/lucene/pull/12674#discussion_r1407985494
##
lucene/licenses/opennlp-tools-NOTICE.txt:
##
@@ -1,11 +1,101 @@
Apache OpenNLP
Review Comment:
https://github.com/apache/opennlp/blob/opennlp-2.3.1/NOTICE
cpoerschke commented on code in PR #12674:
URL: https://github.com/apache/lucene/pull/12674#discussion_r1407992397
##
lucene/analysis/opennlp/src/test/org/apache/lucene/analysis/opennlp/TestOpenNLPChunkerFilterFactory.java:
##
@@ -58,7 +58,7 @@ public class TestOpenNLPChunkerFil
mikemccand commented on PR #12552:
URL: https://github.com/apache/lucene/pull/12552#issuecomment-1830133688
I think this was mistakingly not backported to 9.x? (I only caught this
because I was seeing merge conflicts trying to backport #12803 and saw this.
I'll backport shortly -- I think
cpoerschke commented on code in PR #12674:
URL: https://github.com/apache/lucene/pull/12674#discussion_r1408000809
##
lucene/licenses/slf4j-api-LICENSE-MIT.txt:
##
@@ -0,0 +1,24 @@
+Copyright (c) 2004-2022 QOS.ch Sarl (Switzerland)
Review Comment:
https://github.com/qos-ch/s
mikemccand commented on PR #12803:
URL: https://github.com/apache/lucene/pull/12803#issuecomment-1830145262
This one is also low risk for 9.9.0 -- it's cutting over to a cleaner FST
ctor API, and has been baking in main for almost a week. I had meant to
backport last week but Turkey interv
msokolov commented on PR #12829:
URL: https://github.com/apache/lucene/pull/12829#issuecomment-1830150503
@s1monw that makes sense. I think I was confusing index-time changes and
query-time changes. This whole piece of functionality is a little confusing
given how loosely coupled these thin
msokolov commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1408011800
##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -31,18 +33,21 @@
*
* @lucene.internal
*/
-public class NeighborArray {
+public cla
mikemccand commented on PR #12688:
URL: https://github.com/apache/lucene/pull/12688#issuecomment-1830166075
> This is reasonable as the terms index (FST) holds all the terms.
+1, nice!
> Fuzzy/Wildcard/Prefix queries got _much slower_
> This is also expected because curr
msokolov commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1408025348
##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -31,18 +33,21 @@
*
* @lucene.internal
*/
-public class NeighborArray {
+public cla
benwtrent opened a new pull request, #12848:
URL: https://github.com/apache/lucene/pull/12848
periodic and random merge policies can cause the docs iterated to be in a
different order (as they are merged).
This commit reduces the randomness of the merge policy for more consistent
ve
mikemccand merged PR #12797:
URL: https://github.com/apache/lucene/pull/12797
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
mikemccand closed issue #11023: Make CheckIndex doChecksumsOnly / -fast as
default [LUCENE-9984]
URL: https://github.com/apache/lucene/issues/11023
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to th
mikemccand merged PR #12525:
URL: https://github.com/apache/lucene/pull/12525
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
mikemccand closed issue #12522: Add support for ignoreKeywords in
WordDelimiterGraphFilterFactory
URL: https://github.com/apache/lucene/issues/12522
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
zhaih commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1408142129
##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -31,18 +33,21 @@
*
* @lucene.internal
*/
-public class NeighborArray {
+public class
ChrisHegarty commented on code in PR #12848:
URL: https://github.com/apache/lucene/pull/12848#discussion_r1408143557
##
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java:
##
@@ -732,7 +733,13 @@ public void testIndexedValueNotAliased(
slow-J commented on PR #12797:
URL: https://github.com/apache/lucene/pull/12797#issuecomment-1830400036
> Thanks @slow-J -- looks great!
>
> This is a 10.0 only change right? I'll merge soon.
Thanks!
And yes 10.0 only.
--
This is an automated message from the Apache Git Se
mikemccand commented on code in PR #12847:
URL: https://github.com/apache/lucene/pull/12847#discussion_r1408282267
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -867,6 +867,10 @@ public long fstRamBytesUsed() {
return fst.ramBytesUsed();
}
benwtrent merged PR #12848:
URL: https://github.com/apache/lucene/pull/12848
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
slow-J opened a new pull request, #12849:
URL: https://github.com/apache/lucene/pull/12849
Fixing a basic bug in UnescapedCharSequence
https://github.com/apache/lucene/blob/2bb69f3246218dd8176cf92d8064623688c5272c/lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/core/util/U
slow-J commented on code in PR #12849:
URL: https://github.com/apache/lucene/pull/12849#discussion_r1408514109
##
lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/core/util/UnescapedCharSequence.java:
##
@@ -101,7 +90,7 @@ public String toStringEscaped() {
gsmiller commented on issue #12558:
URL: https://github.com/apache/lucene/issues/12558#issuecomment-1830955298
OK, came back across this while cleaning up open browser tabs and decided to
repro it myself. I know what's going on. It has to do with `#finish` not
properly getting called on sid
dungba88 commented on code in PR #12847:
URL: https://github.com/apache/lucene/pull/12847#discussion_r1408573574
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -867,6 +867,10 @@ public long fstRamBytesUsed() {
return fst.ramBytesUsed();
}
+
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1831132910
> replace entries of the treeset with entries of the other treeset by
clearing it first, and then doing an addAll.
Sorry, I am confused about this. If we clear an unEmpty treeset,
david-sitsky commented on issue #12313:
URL: https://github.com/apache/lucene/issues/12313#issuecomment-1831197772
> The key issue is document collection. Right now, the `topK` is limited to
only `topK` children documents. Really, what you want is the `topK` parent
documents based on childr
Jeevananthan-23 commented on issue #12531:
URL: https://github.com/apache/lucene/issues/12531#issuecomment-1831313847
Hi @uschindler, I came across an interesting article on Qdrant vector
database that uses io_uring for async and mmap benchmarking.
https://qdrant.tech/articles/io_uring/
dungba88 commented on PR #12847:
URL: https://github.com/apache/lucene/pull/12847#issuecomment-1831363117
The test failed with just `Error: The operation was canceled.` but I can't
tell why it happened. The same PR in my local branch works:
https://github.com/dungba88/lucene/pull/20
--
T
41 matches
Mail list logo