Tony-X commented on PR #12688:
URL: https://github.com/apache/lucene/pull/12688#issuecomment-1857417012
Here is the even more interesting stuff. After all those allocation
optimizations. I also implemented the on-paper more "efficient" algorithm to
intersect FST and FSA for Terms.intersect(
daixque commented on code in PR #12915:
URL: https://github.com/apache/lucene/pull/12915#discussion_r1427647228
##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java:
##
@@ -60,15 +60,13 @@ public JapaneseHiraganaUppercaseFilter(
Tony-X commented on PR #12688:
URL: https://github.com/apache/lucene/pull/12688#issuecomment-1857380213
## Non-trivial amount of allocations for? building IndexInput slice
descriptions !?
`jdk.internal.misc.Unsafe#allocateUninitializedArray()`. This was not
trivial to find out
dungba88 commented on issue #12902:
URL: https://github.com/apache/lucene/issues/12902#issuecomment-1857375041
I realized FSTPostingsFormat is an experimental one, which is only being
used in 5 places! Those Lucene9xPostingsFormat seem to be active ones, which in
turn use `Lucene90BlockTree
Tony-X commented on PR #12688:
URL: https://github.com/apache/lucene/pull/12688#issuecomment-1857371557
Since the first working version, I iterated with a list of profiling-guided
allocation optimizations, as they stood out quite obviously from the merged JFR
reports (thanks to luceneutil !
dungba88 commented on issue #12902:
URL: https://github.com/apache/lucene/issues/12902#issuecomment-1857304704
I just briefly looked at the code, but it seems `FSTTermsWriter` will write
the field metadata (number of terms, term freq, doc freq, etc), FST metadata,
and FST main body for each
dungba88 commented on issue #12902:
URL: https://github.com/apache/lucene/issues/12902#issuecomment-1857254522
A candidate could be the `FSTTermsWriter`, which can help building
FSTPostingsFormat with much less heap size.
--
This is an automated message from the Apache Git Service.
To res
dungba88 commented on issue #12513:
URL: https://github.com/apache/lucene/issues/12513#issuecomment-1857252458
I'm still consuming this thread, pardon me if I ask something that's already
discussed.
> Yes, I actually tried to use FSTPostingsFormat in the benchmarks game and
I had to
stefanvodita commented on issue #12734:
URL: https://github.com/apache/lucene/issues/12734#issuecomment-1856645411
Both the follow-up PRs are merged. I don't think it's worth pursuing this
further. Closing.
--
This is an automated message from the Apache Git Service.
To respond to the mes
stefanvodita closed issue #12734: Should reseting a ByteBlockPool zero out the
buffers?
URL: https://github.com/apache/lucene/issues/12734
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
benwtrent commented on issue #12945:
URL: https://github.com/apache/lucene/issues/12945#issuecomment-1856557362
Bumping the searched vectors to 70 from 60 makes the test pass, but this
still bugs be a bit as that commit shouldn't have changed any behavior...
--
This is an automated messag
benwtrent commented on issue #12945:
URL: https://github.com/apache/lucene/issues/12945#issuecomment-1856480501
This is interesting, that commit shouldn't have changed anything, just a
refactor.
I have confirmed I can repeat it (after several attempts), but cannot when
going to the
benwtrent closed issue #12940: Test failure in
TestKnnGraph.testMultiThreadedSearch
URL: https://github.com/apache/lucene/issues/12940
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific co
benwtrent merged PR #12943:
URL: https://github.com/apache/lucene/pull/12943
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
dweiss commented on issue #12946:
URL: https://github.com/apache/lucene/issues/12946#issuecomment-1856319652
+1.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscr
easyice commented on PR #12842:
URL: https://github.com/apache/lucene/pull/12842#issuecomment-1856293777
Sorry for the late update! i spent some more time on other PR, i encoded the
positions with group-varint when `storeOffsets` is false and there are no
payloads. with the last commit, it
mikemccand commented on issue #12877:
URL: https://github.com/apache/lucene/issues/12877#issuecomment-1856109370
> > am I making this up?
>
> Ha! No, you are not hallucinating @jpountz! We do have something like this
for Amazon product search -- it's crucial for our usage to keep long
mikemccand commented on code in PR #12872:
URL: https://github.com/apache/lucene/pull/12872#discussion_r1426907008
##
lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java:
##
@@ -389,13 +389,39 @@ private static void parseSegmentInfos(
}
long totalDocs = 0;
rmuir commented on issue #12946:
URL: https://github.com/apache/lucene/issues/12946#issuecomment-1856094262
we can still ban it and just use `@SuppressWarnings` before
SleepingLockWrapper or any other exceptional cases? It prevents any new sleeps
from creeping in without someone thinking tw
mikemccand commented on code in PR #12872:
URL: https://github.com/apache/lucene/pull/12872#discussion_r1426901678
##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -957,6 +974,9 @@ private Status.SegmentInfoStatus testSegment(
SegmentReader reader = nu
mikemccand commented on PR #12875:
URL: https://github.com/apache/lucene/pull/12875#issuecomment-1856089585
> I am looking at `TestUnifiedHighlighter*` tests. Does it mean that I need
to use specific fieldType? Can I use any fieldType(s) from existing
`UHTestHelper.parametersFactoryList()`?
easyice opened a new pull request, #12948:
URL: https://github.com/apache/lucene/pull/12948
In https://github.com/apache/lucene/pull/12594, we mark
`ByteBuffersDataInput#size()` as `Deprecated`. For simplicity, maybe we should
replace the usage of deprecated `size()` with `length()` ?
-
mikemccand commented on code in PR #12862:
URL: https://github.com/apache/lucene/pull/12862#discussion_r1426891701
##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java:
##
@@ -32,8 +32,8 @@
import org.apache.lucene.document.Field;
im
msokolov commented on issue #12946:
URL: https://github.com/apache/lucene/issues/12946#issuecomment-1856076075
+1 this does seem to be shaking out a lot of dust
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL a
msokolov commented on issue #12946:
URL: https://github.com/apache/lucene/issues/12946#issuecomment-1856074053
and `TimerThread` in `TimeLimitingCollector` has this historical artifact.
Maybe it's time to clean up that TODO:
public void run() {
while (!stop) {
msokolov commented on issue #12946:
URL: https://github.com/apache/lucene/issues/12946#issuecomment-1856070627
I agree with the idea, but we do have a lot of these now. EG
SleepingLockWrapper, although I see this in its javadocs "this is not a good
idea" LOL
--
This is an automated messa
mikemccand commented on code in PR #12894:
URL: https://github.com/apache/lucene/pull/12894#discussion_r1426884836
##
lucene/test-framework/src/java/org/apache/lucene/tests/util/fst/FSTTester.java:
##
@@ -283,14 +283,17 @@ public FST doTest() throws IOException {
}
}
msokolov commented on code in PR #12943:
URL: https://github.com/apache/lucene/pull/12943#discussion_r1426843989
##
lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java:
##
@@ -100,22 +100,18 @@ public KnnVectorsFormat
getKnnVectorsFormatForField(String field) {
msokolov commented on issue #12945:
URL: https://github.com/apache/lucene/issues/12945#issuecomment-1856057317
Here, `git bisect` identifies [18bb826564bb16fde70bab3c06a167280b6cc632]
Extract the hnsw graph merging from being part of the vector writer (#12657) as
the commit where this test
stefanvodita opened a new pull request, #12947:
URL: https://github.com/apache/lucene/pull/12947
`LSBRadicSorter.sort` doesn't need the buffer to preserve the data that was
written to it for a previous sort.
`TaskSequence` doesn't need to grow arrays beyond the number of iterations
i
stefanvodita commented on issue #12941:
URL: https://github.com/apache/lucene/issues/12941#issuecomment-1856044036
I went through most of the calls to grow int arrays. There aren't a lot of
places where there's an obvious way to improve, but I opened #12947 for the
couple cases I spotted.
mikemccand commented on PR #12912:
URL: https://github.com/apache/lucene/pull/12912#issuecomment-1856040733
> Solr used that, as solr is no longer part of our tree we could add sleep
to fobiddenaps (maybe globally for both tests and main code). Maybe Lucene does
not sleep in tests, so it co
mikemccand opened a new issue, #12946:
URL: https://github.com/apache/lucene/issues/12946
### Description
Spinoff from #12912.
`Thread.sleep` should ideally never appear in our main and test sources.
Let's add it to forbidden APIs?
--
This is an automated message from the A
mikemccand merged PR #12942:
URL: https://github.com/apache/lucene/pull/12942
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
msokolov opened a new issue, #12945:
URL: https://github.com/apache/lucene/issues/12945
### Description
./gradlew :lucene:core:test --tests
"org.apache.lucene.util.hnsw.TestHnswFloatVectorGraph.testSortedAndUnsortedIndicesReturnSameResults"
-Ptests.jvms=4 -Ptests.jvmargs= -Ptests
msokolov commented on code in PR #12943:
URL: https://github.com/apache/lucene/pull/12943#discussion_r1426843989
##
lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java:
##
@@ -100,22 +100,18 @@ public KnnVectorsFormat
getKnnVectorsFormatForField(String field) {
benwtrent commented on code in PR #12943:
URL: https://github.com/apache/lucene/pull/12943#discussion_r1426839005
##
lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java:
##
@@ -100,22 +100,18 @@ public KnnVectorsFormat
getKnnVectorsFormatForField(String field) {
msokolov commented on code in PR #12943:
URL: https://github.com/apache/lucene/pull/12943#discussion_r1426817545
##
lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java:
##
@@ -100,22 +100,18 @@ public KnnVectorsFormat
getKnnVectorsFormatForField(String field) {
benwtrent commented on issue #12940:
URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855974696
> I can see that in this test run we are using a quantizing scorer, but I
don't think the test case explicitly calls for that. I wonder if we beefed up
the test framework to rando
benwtrent commented on code in PR #12943:
URL: https://github.com/apache/lucene/pull/12943#discussion_r1426804219
##
lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java:
##
@@ -100,22 +100,18 @@ public KnnVectorsFormat
getKnnVectorsFormatForField(String field) {
msokolov commented on code in PR #12943:
URL: https://github.com/apache/lucene/pull/12943#discussion_r1426801907
##
lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java:
##
@@ -100,22 +100,18 @@ public KnnVectorsFormat
getKnnVectorsFormatForField(String field) {
msokolov commented on issue #12940:
URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855957479
ah, thanks @benwtrent I'll check your fix then
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abov
msokolov commented on issue #12940:
URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855955234
I can see that in this test run we are using a quantizing scorer, but I
don't think the test case explicitly calls for that. I wonder if we beefed up
the test framework to randomly
easyice opened a new pull request, #12944:
URL: https://github.com/apache/lucene/pull/12944
Currently, the `readLongs/readInts/readFloats` in `ByteBufferIndexInput` may
throws `NullPointerException` when `IndexInput` is closed, The expected should
be `AlreadyClosedException`.
--
This
benwtrent commented on issue #12940:
URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855929384
https://github.com/apache/lucene/pull/12943
@msokolov LOL
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub a
benwtrent commented on issue #12940:
URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855929679
We both figured it out at the same time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
benwtrent opened a new pull request, #12943:
URL: https://github.com/apache/lucene/pull/12943
While quantization generally works well, when the number of dimensions is
tiny (just two like in our tests), and we are indexing a circle, and we have
random merge policies, we can end up getting u
msokolov commented on issue #12940:
URL: https://github.com/apache/lucene/issues/12940#issuecomment-1855926010
Thanks @vsop-479, it reproduces for me as well, both on main and 9x
branches. The same test passes on 9.8.0 release. I'll try `git bisect` ... and
it blames this commit:
[a
slow-J commented on code in PR #12930:
URL: https://github.com/apache/lucene/pull/12930#discussion_r1426770295
##
dev-docs/codec-version-bump-howto.md:
##
@@ -0,0 +1,74 @@
+
+
+# Lucene Codec Version Bump How-To Manual
+
+Changing the name of the codec in Lucene is required for
zhaih closed issue #12839: Grow arrays up to a given limit to avoid
overallocation where possible
URL: https://github.com/apache/lucene/issues/12839
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
zhaih merged PR #12844:
URL: https://github.com/apache/lucene/pull/12844
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
benwtrent commented on code in PR #12942:
URL: https://github.com/apache/lucene/pull/12942#discussion_r1426699287
##
lucene/core/src/test/org/apache/lucene/index/Test2BPoints.java:
##
@@ -143,6 +143,6 @@ public void test2D() throws Exception {
}
private static Codec getC
mikemccand commented on PR #772:
URL: https://github.com/apache/lucene/pull/772#issuecomment-1855783693
@mocobeta hello! I hit conflicts backporting
https://github.com/apache/lucene/issues/12911 because this PR was never
backported to 9.x.
Is there any reason not to backport? It lo
mikemccand merged PR #12933:
URL: https://github.com/apache/lucene/pull/12933
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
mikemccand closed issue #12911: Require bundled FSTs to be on the current FST
version
URL: https://github.com/apache/lucene/issues/12911
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
mikemccand commented on PR #12829:
URL: https://github.com/apache/lucene/pull/12829#issuecomment-1855769298
I opened https://github.com/mikemccand/luceneutil/issues/252 to try to
measure the performance change of `addDocument` N times vs `addDocuments` once.
--
This is an automated messag
mikemccand commented on PR #12829:
URL: https://github.com/apache/lucene/pull/12829#issuecomment-1855755782
One small observation here: one can use the `add/updateDocuments` API today
with no intention of using those as doc blocks at search time, purely as an
optimization over calling separ
uschindler commented on PR #12912:
URL: https://github.com/apache/lucene/pull/12912#issuecomment-1855739201
Solr used that, as solr is no longer part of our tree we could add sleep to
fobiddenaps (maybe globally for both tests and main code). Maybe Lucene does
not sleep in tests, so it coul
mikemccand commented on code in PR #12829:
URL: https://github.com/apache/lucene/pull/12829#discussion_r1426637555
##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java:
##
@@ -2164,6 +2166,83 @@ public void testSortedIndex() throws
mikemccand commented on PR #12912:
URL: https://github.com/apache/lucene/pull/12912#issuecomment-1855716194
... but at least it lead to discovering a horrifying `Thread.sleep` in our
test code!
Can we ban `Thread.sleep` throughout our code? Or are there actually useful
places for it
mikemccand commented on code in PR #12933:
URL: https://github.com/apache/lucene/pull/12933#discussion_r1426611498
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -1236,5 +1236,9 @@ public FSTMetadata(
this.version = version;
this.numBytes = numB
uschindler commented on PR #12933:
URL: https://github.com/apache/lucene/pull/12933#issuecomment-1855691742
> > The test is not the nicest looking thing, but I accept it, because it
doesn't break classloading of resources. 👍
>
> Ha! I take this a strong positive feedback @uschindler ;
uschindler commented on code in PR #12933:
URL: https://github.com/apache/lucene/pull/12933#discussion_r1426604242
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -1236,5 +1236,9 @@ public FSTMetadata(
this.version = version;
this.numBytes = numB
mikemccand commented on code in PR #12915:
URL: https://github.com/apache/lucene/pull/12915#discussion_r1426601556
##
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java:
##
@@ -60,15 +60,13 @@ public JapaneseHiraganaUppercaseFilt
mikemccand commented on issue #12884:
URL: https://github.com/apache/lucene/issues/12884#issuecomment-1855669679
Thanks @dungba88.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific com
mikemccand commented on issue #12884:
URL: https://github.com/apache/lucene/issues/12884#issuecomment-1855669391
> I can look into this. Is this place
https://github.com/apache/lucene/tree/main/lucene/benchmark/src/java/org/apache/lucene/benchmark
the correct path to add the benchmark, or i
mikemccand commented on PR #12929:
URL: https://github.com/apache/lucene/pull/12929#issuecomment-1855665643
> @mikemccand luceneutil is better at remaining up-to-date with Lucene than
Lucene itself :)
[mikemccand/luceneutil@76ff349](https://github.com/mikemccand/luceneutil/commit/76ff349499
mikemccand commented on PR #12933:
URL: https://github.com/apache/lucene/pull/12933#issuecomment-1855660806
> The test is not the nicest looking thing, but I accept it, because it
doesn't break classloading of resources. 👍
Ha! I take this a strong positive feedback @uschindler ;)
mikemccand commented on issue #12931:
URL: https://github.com/apache/lucene/issues/12931#issuecomment-1855650638
Thanks @uschindler and @rmuir!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
mikemccand commented on PR #12936:
URL: https://github.com/apache/lucene/pull/12936#issuecomment-1855649826
Thanks @rmuir!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
mikemccand closed issue #12934: Should we clean up the few remaining references
to `Lucene/Solr`?
URL: https://github.com/apache/lucene/issues/12934
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
mikemccand commented on code in PR #12939:
URL: https://github.com/apache/lucene/pull/12939#discussion_r1426567725
##
gradle/help.gradle:
##
@@ -46,7 +46,7 @@ configure(rootProject) {
help {
doLast {
println ""
- println "This is an experimental Lucene/Solr g
mikemccand merged PR #12939:
URL: https://github.com/apache/lucene/pull/12939
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
mikemccand commented on issue #12932:
URL: https://github.com/apache/lucene/issues/12932#issuecomment-1855574409
> For me it fails when running: `./gradlew check -Ptests.heapsize=16g
-Dtests.monster=true` with
It fails for me too -- it's silly (trying to write to an arbitrary "example
stefanvodita commented on PR #12844:
URL: https://github.com/apache/lucene/pull/12844#issuecomment-1855460382
Done, thank you @zhaih! I've opened #12941 to replace other uses of the
unbounded growth API.
--
This is an automated message from the Apache Git Service.
To respond to the messag
75 matches
Mail list logo