Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-07 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1844846435 > > The best idea that I have instead of VarHandles: Create an implementation for ByteBuffer > > do you mean create an implementation to get the current block as `ByteBuffer` f

Re: [PR] Add ParentJoin KNN support [lucene]

2023-12-07 Thread via GitHub
gauravj88 commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1844886604 Hello, I hope this message finds you well. I am reaching out regarding the recent enhancement described in https://lucene.apache.org/core/9_8_0/changes/Changes.html#v9.8.0.new_f

Re: [PR] Remove some redundant modifiers from code [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on PR #12880: URL: https://github.com/apache/lucene/pull/12880#issuecomment-1845055537 +1 to clean these up, and open a follow-on issue to find some way to statically detect / remove these redundant modifiers. As long as we backport this to 9.x it should not make futur

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418728841 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418728841 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418734995 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418738243 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,22 +125,44 @@ public class FSTCompiler { final float directAddressingMaxOversizin

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418742557 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -218,13 +279,19 @@ public Builder allowFixedLengthArcs(boolean allowFixedLengthArcs) {

[I] Create a simple JMH benchmark to measure FST compilation / traversal times [lucene]

2023-12-07 Thread via GitHub
mikemccand opened a new issue, #12884: URL: https://github.com/apache/lucene/issues/12884 ### Description Over in #12543 we are struggling to measure the performance cost of different ways of creating an on-heap reader/writer. We have been using the "rough" numbers coming out of `Te

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1845106275 Since we are struggling to best measure FST performance impact of these changes, I opened a spinoff [issue to create a dedicated FST microbenchmark](https://github.com/apache/lucene/p

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
uschindler commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418761384 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
uschindler commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418764876 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418746164 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -419,6 +417,8 @@ public FST(FSTMetadata metadata, DataInput in, FSTStore fstStore) throws IOEx

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1845155159 +1 to backport the deprecation message/tags to 9.x, even if we cannot fully remove our own internal usage of the deprecated APIs before releasing 10.0. The two can / should be decoupled

Re: [PR] CheckIndex - Removal of some dead code [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on PR #12876: URL: https://github.com/apache/lucene/pull/12876#issuecomment-1845176680 > I was wondering about the use of `@lucene.experimental` throughout this class. It has been experimental for > 10 years. Will the entire class or some parts, ever be classified as no

Re: [PR] CheckIndex - Removal of some dead code [lucene]

2023-12-07 Thread via GitHub
mikemccand merged PR #12876: URL: https://github.com/apache/lucene/pull/12876 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] CheckIndex - Removal of some dead code [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12876: URL: https://github.com/apache/lucene/pull/12876#discussion_r1418816397 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -97,11 +96,11 @@ */ public final class CheckIndex implements Closeable { + private fina

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1845249074 Thanks everyone! I addressed comments, putting a simpler implementation. +1 to the FST micro benchmarking -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418892433 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -419,7 +418,8 @@ public FST(FSTMetadata metadata, DataInput in, FSTStore fstStore) throws IOEx

[PR] [Draft] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-07 Thread via GitHub
kuramitsu opened a new pull request, #12885: URL: https://github.com/apache/lucene/pull/12885 ### Description I found a bug in using JapaneseReadingFormFilter that some hiragana are not converted to romaji. (For example, "ぐ" does not become "gu". I noticed this because "マスキング" did not

Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12868: URL: https://github.com/apache/lucene/pull/12868#discussion_r1418900086 ## lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java: ## @@ -150,9 +150,10 @@ private FuzzySet(FixedBitSet filter, int bloomSize, int hashCount)

Re: [I] Add a MergePolicy wrapper that preserves search concurrency? [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on issue #12877: URL: https://github.com/apache/lucene/issues/12877#issuecomment-1845276785 +1, I like this idea. It might be implemented by having `TieredMergePolicy` dynamically set the max segment size (in doc count, not just bytes) as a function of total `maxDoc` i

Re: [PR] CheckIndex - Removal of some dead code [lucene]

2023-12-07 Thread via GitHub
slow-J commented on PR #12876: URL: https://github.com/apache/lucene/pull/12876#issuecomment-1845277796 > Thanks @slow-J -- this looks great. I'll merge. Thanks @mikemccand for the review, merge and port to 9_x! I will tackle the removal of some of the `@lucene.experimental` nex

Re: [PR] Fix position increment in (Reverse)PathHierarchyTokenizer [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12875: URL: https://github.com/apache/lucene/pull/12875#discussion_r1418918221 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/path/PathHierarchyTokenizer.java: ## @@ -112,11 +115,8 @@ public PathHierarchyTokenizer( public fin

Re: [PR] Fix position increment in (Reverse)PathHierarchyTokenizer [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12875: URL: https://github.com/apache/lucene/pull/12875#discussion_r1418918500 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/path/ReversePathHierarchyTokenizer.java: ## @@ -158,10 +161,9 @@ public final boolean incrementToken()

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1418919257 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -419,7 +418,8 @@ public FST(FSTMetadata metadata, DataInput in, FSTStore fstStore) throws IOEx

Re: [PR] Fix position increment in (Reverse)PathHierarchyTokenizer [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on PR #12875: URL: https://github.com/apache/lucene/pull/12875#issuecomment-1845294522 Thanks for tackling this @lukas-vlcek and @msfroh! I left a couple small comments, but otherwise it looks great. Given that this alters the indexed tokens (makes them non-overl

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-07 Thread via GitHub
jpountz commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1845316218 > The only downside of this is that the caller code must know beforehand how large the slice must be. I think that there is another downside, which is that this might not allow yo

Re: [PR] Remove stale BWC tests [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12874: URL: https://github.com/apache/lucene/pull/12874#discussion_r1418942637 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestManyPointsInOldIndex.java: ## @@ -1,77 +0,0 @@ -/* - * Licensed to the Apache Software Found

Re: [PR] Remove stale BWC tests [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12874: URL: https://github.com/apache/lucene/pull/12874#discussion_r1418943291 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestIndexWriterOnOldIndex.java: ## @@ -1,68 +0,0 @@ -/* - * Licensed to the Apache Software Foun

Re: [I] Add a MergePolicy wrapper that preserves search concurrency? [lucene]

2023-12-07 Thread via GitHub
jpountz commented on issue #12877: URL: https://github.com/apache/lucene/issues/12877#issuecomment-1845325507 > +1, I like this idea. I have a vague recollection of you saying you already implemented something like that, am I making this up? (it's quite possible, I struggle to keep lo

Re: [PR] Enable CheckIndex to exorcise segments with missing segment infos (.si) (#7820) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12872: URL: https://github.com/apache/lucene/pull/12872#discussion_r1418947126 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -873,13 +876,26 @@ public int compare(String a, String b) { if (0 == result.numBadSegmen

Re: [I] Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on issue #12180: URL: https://github.com/apache/lucene/issues/12180#issuecomment-1845362248 > Sorry for confusion, but this issue is not fully done yet Ahh, thanks @epotyom. Since this issue was partially merged and released in 9.9.0, can you open a new issue for

Re: [PR] Add Facets#getBulkSpecificValues method (#12180) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12862: URL: https://github.com/apache/lucene/pull/12862#discussion_r1418982289 ## lucene/CHANGES.txt: ## @@ -67,6 +67,8 @@ API Changes * GITHUB#11023: Adding -level param to CheckIndex, making the old -fast param the default behaviour. (Ja

Re: [PR] Avoid null PointValues when merging points in SlowCompositeCodecReaderWrapper [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12859: URL: https://github.com/apache/lucene/pull/12859#discussion_r1418996383 ## lucene/core/src/java/org/apache/lucene/index/SlowCompositeCodecReaderWrapper.java: ## @@ -599,7 +599,11 @@ public PointValues getValues(String field) throws IOE

Re: [PR] Correct last remaining instances of typo e.g. "Levenstein" -> "Levenshtein" [lucene]

2023-12-07 Thread via GitHub
shaikhu closed pull request #12519: Correct last remaining instances of typo e.g. "Levenstein" -> "Levenshtein" URL: https://github.com/apache/lucene/pull/12519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Avoid null PointValues when merging points in SlowCompositeCodecReaderWrapper [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12859: URL: https://github.com/apache/lucene/pull/12859#discussion_r1419007805 ## lucene/core/src/java/org/apache/lucene/index/SlowCompositeCodecReaderWrapper.java: ## @@ -599,7 +599,11 @@ public PointValues getValues(String field) throws IOE

Re: [PR] Avoid null PointValues when merging points in SlowCompositeCodecReaderWrapper [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12859: URL: https://github.com/apache/lucene/pull/12859#discussion_r1419010987 ## lucene/core/src/java/org/apache/lucene/index/SlowCompositeCodecReaderWrapper.java: ## @@ -599,7 +599,11 @@ public PointValues getValues(String field) throws IOE

Re: [I] Remove redundant fieldType.stored() check [LUCENE-9603] [lucene]

2023-12-07 Thread via GitHub
hurutoriya commented on issue #10643: URL: https://github.com/apache/lucene/issues/10643#issuecomment-1845408466 Hello Can we resolve the this issues? Sorry for the sudden mentions @uschindler I would like to clean up the issue for beginner lucene contributers. > This can be resolv

Re: [I] Remove redundant fieldType.stored() check [LUCENE-9603] [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on issue #10643: URL: https://github.com/apache/lucene/issues/10643#issuecomment-1845418543 Ahh yes this is indeed done -- I'll close this issue. Thanks @hurutoriya and @slow-J and @mrkm4ntr! -- This is an automated message from the Apache Git Service. To respond to

Re: [I] Remove redundant fieldType.stored() check [LUCENE-9603] [lucene]

2023-12-07 Thread via GitHub
mikemccand closed issue #10643: Remove redundant fieldType.stored() check [LUCENE-9603] URL: https://github.com/apache/lucene/issues/10643 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Correct last remaining instances of typo e.g. "Levenstein" -> "Levenshtein" [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on PR #12519: URL: https://github.com/apache/lucene/pull/12519#issuecomment-1845422890 Woops -- this is still an issue @shaikhu? And it looks like a few more mis-spellings crept in? -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Correct last remaining instances of typo e.g. "Levenstein" -> "Levenshtein" [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on PR #12519: URL: https://github.com/apache/lucene/pull/12519#issuecomment-1845423581 Reopening, though the repo that had the PR is now deleted. Not sure how GH handles this ... -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1419045318 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -419,7 +418,8 @@ public FST(FSTMetadata metadata, DataInput in, FSTStore fstStore) throws IOEx

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1845435406 Thanks for persisting @dungba88 -- this was a crazy long and tricky exercise. I'm so excited Lucene can finally build arbitrarily large FSTs with bounded heap usage. I'll merg

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-12-07 Thread via GitHub
mikemccand merged PR #12624: URL: https://github.com/apache/lucene/pull/12624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Remove Google Analytics from Lucene site [LUCENE-9858] [lucene]

2023-12-07 Thread via GitHub
hurutoriya commented on issue #10897: URL: https://github.com/apache/lucene/issues/10897#issuecomment-1845441643 @janhoy Hello can we close this issue? since It seems already GA is disabled in web site by your commit. https://github.com/apache/lucene-site/commit/429cda11b70f7ffa250

[PR] Bump Yetus version to 0.15.0 [lucene]

2023-12-07 Thread via GitHub
hurutoriya opened a new pull request, #12886: URL: https://github.com/apache/lucene/pull/12886 ### Description Refer to https://github.com/apache/lucene/issues/9561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[PR] Add test that tickles a JVM JIT crash on JDK's less than 21.0.1 [lucene]

2023-12-07 Thread via GitHub
ChrisHegarty opened a new pull request, #12887: URL: https://github.com/apache/lucene/pull/12887 Add test that tickles a JVM JIT crash on JDK's less than 21.0.1 Work in Progress -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Disable suffix sharing for block tree index [lucene]

2023-12-07 Thread via GitHub
mikemccand commented on PR #12722: URL: https://github.com/apache/lucene/pull/12722#issuecomment-1845575349 Unfortunately, because I messed up conflict resolution on backport of #12633 (commit https://github.com/apache/lucene/commit/2ca906d99c64fc45dd4246b7daad57db3a7abdf8), I accidentally

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-07 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1419203464 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java: ## @@ -2164,6 +2166,83 @@ public void testSortedIndex() throws Exce

Re: [PR] Fix position increment in (Reverse)PathHierarchyTokenizer [lucene]

2023-12-07 Thread via GitHub
lukas-vlcek commented on PR #12875: URL: https://github.com/apache/lucene/pull/12875#issuecomment-1845620998 > Note that this should make highlighting based on postings offsets (e.g. UnifiedHighlighter, in certain modes) work on such fields when it does not today. Ture... so it sound

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-07 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1419209272 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java: ## @@ -2164,6 +2166,83 @@ public void testSortedIndex() throws Exce

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-07 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1419210065 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java: ## @@ -262,6 +290,44 @@ long updateDocuments( } } + private Iterable filterPar

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-07 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1419216698 ## lucene/core/src/java/org/apache/lucene/index/IndexingChain.java: ## @@ -219,15 +222,33 @@ private Sorter.DocMap maybeSortSegment(SegmentWriteState state) throws IOE

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-07 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1845632140 Thank you for sharing the code, it seems very clear, another way, could we pass the current block(ByteBuffer) to the decode function like below? this will keep the remaining byte checki

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-07 Thread via GitHub
jpountz commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1419268112 ## lucene/core/src/java/org/apache/lucene/index/IndexingChain.java: ## @@ -219,15 +222,33 @@ private Sorter.DocMap maybeSortSegment(SegmentWriteState state) throws IO

Re: [PR] Mark DrillSideways#createDrillDownFacetsCollector as @Deprecated [lucene]

2023-12-07 Thread via GitHub
stefanvodita commented on code in PR #12854: URL: https://github.com/apache/lucene/pull/12854#discussion_r1419296316 ## lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java: ## @@ -133,7 +133,12 @@ public DrillSideways( /** * Subclass can override to customize

Re: [I] Remove Google Analytics from Lucene site [LUCENE-9858] [lucene]

2023-12-07 Thread via GitHub
janhoy closed issue #10897: Remove Google Analytics from Lucene site [LUCENE-9858] URL: https://github.com/apache/lucene/issues/10897 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Remove Google Analytics from Lucene site [LUCENE-9858] [lucene]

2023-12-07 Thread via GitHub
janhoy commented on issue #10897: URL: https://github.com/apache/lucene/issues/10897#issuecomment-1845735057 Sure, closing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [Minor] Shorten getOrdinal synchronized loop [lucene]

2023-12-07 Thread via GitHub
stefanvodita commented on PR #12870: URL: https://github.com/apache/lucene/pull/12870#issuecomment-1845739868 Than you both! I’ve added a CHANGES entry. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Add test that tickles a JVM JIT crash on JDK's less than 21.0.1 [lucene]

2023-12-07 Thread via GitHub
ChrisHegarty commented on code in PR #12887: URL: https://github.com/apache/lucene/pull/12887#discussion_r1419356772 ## gradle/testing/defaults-tests.gradle: ## @@ -51,7 +51,7 @@ allprojects { includeInReproLine: false ], [propName: 'tests.jvmar

Re: [PR] Add test that tickles a JVM JIT crash on JDK's less than 21.0.1 [lucene]

2023-12-07 Thread via GitHub
dweiss commented on code in PR #12887: URL: https://github.com/apache/lucene/pull/12887#discussion_r1419356575 ## gradle/testing/defaults-tests.gradle: ## @@ -51,7 +51,7 @@ allprojects { includeInReproLine: false ], [propName: 'tests.jvmargs', -

Re: [PR] Add test that tickles a JVM JIT crash on JDK's less than 21.0.1 [lucene]

2023-12-07 Thread via GitHub
dweiss commented on code in PR #12887: URL: https://github.com/apache/lucene/pull/12887#discussion_r1419353966 ## gradle/testing/defaults-tests.gradle: ## @@ -51,7 +51,7 @@ allprojects { includeInReproLine: false ], [propName: 'tests.jvmargs', -

Re: [PR] Re-use information from graph traversal during exact search [lucene]

2023-12-07 Thread via GitHub
benwtrent commented on code in PR #12820: URL: https://github.com/apache/lucene/pull/12820#discussion_r1419363698 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnCollector.java: ## @@ -66,4 +69,19 @@ public final int k() { @Override public abstract TopDocs to

Re: [PR] Performance improvements to MatchHighlighter and MatchRegionRetriever [lucene]

2023-12-07 Thread via GitHub
dweiss commented on PR #12881: URL: https://github.com/apache/lucene/pull/12881#issuecomment-1845808948 Thanks, @romseygeek ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Performance improvements to MatchHighlighter and MatchRegionRetriever [lucene]

2023-12-07 Thread via GitHub
dweiss merged PR #12881: URL: https://github.com/apache/lucene/pull/12881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Re-use information from graph traversal during exact search [lucene]

2023-12-07 Thread via GitHub
benwtrent commented on code in PR #12820: URL: https://github.com/apache/lucene/pull/12820#discussion_r1419363698 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnCollector.java: ## @@ -66,4 +69,19 @@ public final int k() { @Override public abstract TopDocs to

Re: [PR] [Draft] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-07 Thread via GitHub
msfroh commented on PR #12885: URL: https://github.com/apache/lucene/pull/12885#issuecomment-1845843504 This is really interesting. It looks like the filter logic is already trying to conversion to katakana before converting to romaji. Specifically in https://github.com/apache/lucene

Re: [PR] Re-use information from graph traversal during exact search [lucene]

2023-12-07 Thread via GitHub
benwtrent commented on code in PR #12820: URL: https://github.com/apache/lucene/pull/12820#discussion_r1419374932 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnCollector.java: ## @@ -66,4 +69,19 @@ public final int k() { @Override public abstract TopDocs to

Re: [I] Reproducible failure in TestUnifiedHighlighter.testOneSentence (and others) - index order [lucene]

2023-12-07 Thread via GitHub
dweiss commented on issue #12883: URL: https://github.com/apache/lucene/issues/12883#issuecomment-1845854432 The test occasionally gets MockRandomMergePolicy and it relies on document addition order. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-07 Thread via GitHub
jpountz commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1845870621 @uschindler I pushed a quick ugly impl of reading group vints via a `ByteBuffer` at https://github.com/apache/lucene/commit/9f5d9f7ab6777b6331c7e0456b5f7660cb64d55b. `DataInput` gets a

Re: [PR] Enable CheckIndex to exorcise segments with missing segment infos (.si) (#7820) [lucene]

2023-12-07 Thread via GitHub
gokaai commented on code in PR #12872: URL: https://github.com/apache/lucene/pull/12872#discussion_r1419439605 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -4270,11 +4290,21 @@ public int doCheck(Options opts) throws IOException, InterruptedException {

Re: [PR] Enable CheckIndex to exorcise segments with missing segment infos (.si) (#7820) [lucene]

2023-12-07 Thread via GitHub
gokaai commented on code in PR #12872: URL: https://github.com/apache/lucene/pull/12872#discussion_r1419442594 ## lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java: ## @@ -389,13 +389,39 @@ private static void parseSegmentInfos( } long totalDocs = 0; +

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-07 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1845904374 Lol. That's strange! I would not have added a readNBytes method and just do the ByteBuffer wrapping in the readVInt method that calls the static method to decode. If that

Re: [PR] Enable CheckIndex to exorcise segments with missing segment infos (.si) (#7820) [lucene]

2023-12-07 Thread via GitHub
gokaai commented on code in PR #12872: URL: https://github.com/apache/lucene/pull/12872#discussion_r1419454517 ## lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java: ## @@ -389,13 +389,39 @@ private static void parseSegmentInfos( } long totalDocs = 0; +

[PR] [WIP] LUCENE-10002: Deprecate FacetsCollector#search helper methods as they internally use IndexSearcher#search(Query, Collector) API [lucene]

2023-12-07 Thread via GitHub
zacharymorn opened a new pull request, #12890: URL: https://github.com/apache/lucene/pull/12890 This is a WIP / discussion only PR. As the next step for https://github.com/apache/lucene/issues/11041, I'm thinking to deprecate FacetsCollector#search methods as they internally use Ind

Re: [PR] Make unified highlighter tests avoid mock random merge policy's document reordering [lucene]

2023-12-07 Thread via GitHub
dweiss commented on PR #12889: URL: https://github.com/apache/lucene/pull/12889#issuecomment-1845931168 Refactored the code a bit and pulled up a method to create a writer dodging random document reordering merge policy. Ran tests with -Ptests.iters=100 and no failures: ``` gradlew

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-07 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1419505134 ## lucene/core/src/java/org/apache/lucene/search/Sort.java: ## @@ -59,10 +61,28 @@ public Sort() { * is still a tie after all SortFields are checked, the internal L

Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

2023-12-07 Thread via GitHub
gsmiller commented on code in PR #12853: URL: https://github.com/apache/lucene/pull/12853#discussion_r1409693152 ## lucene/CHANGES.txt: ## @@ -112,6 +112,9 @@ Bug Fixes * GITHUB#12220: Hunspell: disallow hidden title-case entries from compound middle/end +* GITHUB#12558: E

Re: [PR] Mark DrillSideways#createDrillDownFacetsCollector as @Deprecated [lucene]

2023-12-07 Thread via GitHub
gsmiller commented on code in PR #12854: URL: https://github.com/apache/lucene/pull/12854#discussion_r1419576038 ## lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java: ## @@ -133,7 +133,12 @@ public DrillSideways( /** * Subclass can override to customize dri

Re: [PR] Mark DrillSideways#createDrillDownFacetsCollector as @Deprecated [lucene]

2023-12-07 Thread via GitHub
gsmiller commented on code in PR #12854: URL: https://github.com/apache/lucene/pull/12854#discussion_r1419580023 ## lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java: ## @@ -133,7 +133,12 @@ public DrillSideways( /** * Subclass can override to customize dri

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
benwtrent commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1419656899 ## lucene/CHANGES.txt: ## @@ -167,7 +167,10 @@ API Changes New Features - -(No changes) + +* GITHUB#12679: Add support for similarity-based v

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
benwtrent commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1419656899 ## lucene/CHANGES.txt: ## @@ -167,7 +167,10 @@ API Changes New Features - -(No changes) + +* GITHUB#12679: Add support for similarity-based v

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1846106597 Thanks for all the help here @benwtrent ! > Could you add changes for Lucene 9.10? Added an entry under "New Features" (also added one of my teammates along with whom this

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1419660341 ## lucene/CHANGES.txt: ## @@ -167,7 +167,10 @@ API Changes New Features - -(No changes) + +* GITHUB#12679: Add support for similarity-based ve

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
benwtrent commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1419666959 ## lucene/CHANGES.txt: ## @@ -167,7 +167,10 @@ API Changes New Features - -(No changes) + +* GITHUB#12679: Add support for similarity-based v

Re: [PR] Upgrade ECJ to 3.36.0 [lucene]

2023-12-07 Thread via GitHub
ChrisHegarty merged PR #12888: URL: https://github.com/apache/lucene/pull/12888 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [I] Explore a single scoring implementation in DrillSidewaysScorer [LUCENE-10037] [lucene]

2023-12-07 Thread via GitHub
gsmiller commented on issue #11076: URL: https://github.com/apache/lucene/issues/11076#issuecomment-1846150788 Oh that's an interesting find and good question @slow-J. I'm not sure I have a great answer to whether-or-not we could trust results from this. I wonder if we have any past changes

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-12-07 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1419688833 ## lucene/CHANGES.txt: ## @@ -167,7 +167,10 @@ API Changes New Features - -(No changes) + +* GITHUB#12679: Add support for similarity-based ve

[PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-12-07 Thread via GitHub
zacharymorn opened a new pull request, #12891: URL: https://github.com/apache/lucene/pull/12891 This PR backports https://github.com/apache/lucene/pull/240 to `branch_9x` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[I] Remove all deprecated IndexSearcher#search(Query, Collector) usage / methods in the next major release [lucene]

2023-12-07 Thread via GitHub
zacharymorn opened a new issue, #12892: URL: https://github.com/apache/lucene/issues/12892 ### Description As a follow-up of https://github.com/apache/lucene/issues/11041, we would like to remove all deprecated `IndexSearcher#search(Query, Collector)` methods in the next major releas

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-12-07 Thread via GitHub
zacharymorn commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1846181930 Thanks both @javanna @mikemccand . I have adjusted the change entry and opened a backport PR, as well as a new spinoff issue for `10.0.0` https://github.com/apache/lucene/issues/12892.

Re: [PR] [Minor] Shorten getOrdinal synchronized loop [lucene]

2023-12-07 Thread via GitHub
gsmiller merged PR #12870: URL: https://github.com/apache/lucene/pull/12870 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-07 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1846510407 > Thank you for sharing the code, it seems very clear, another way, could we pass the current block(ByteBuffer) to the decode function like below? this will keep the remaining bytes chec

Re: [PR] [Draft] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-07 Thread via GitHub
kuramitsu commented on PR #12885: URL: https://github.com/apache/lucene/pull/12885#issuecomment-1846526187 @msfroh Thank you for your confirmation. As you said, I think it would be better to check and fix the bug in the JaMorphData.getReading() part. I will try to check and fix it.

Re: [PR] [Draft] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-07 Thread via GitHub
kuramitsu commented on PR #12885: URL: https://github.com/apache/lucene/pull/12885#issuecomment-1846598183 I found that one of the implementations of getReading, UnknownMorphData.getReading(), always returns null. Since the problem seems to be that termAttr is used as it is for OOV Term,

Re: [I] Reproducible failure in TestUnifiedHighlighter.testOneSentence (and others) - index order [lucene]

2023-12-07 Thread via GitHub
dweiss closed issue #12883: Reproducible failure in TestUnifiedHighlighter.testOneSentence (and others) - index order URL: https://github.com/apache/lucene/issues/12883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Make unified highlighter tests avoid mock random merge policy's document reordering [lucene]

2023-12-07 Thread via GitHub
dweiss merged PR #12889: URL: https://github.com/apache/lucene/pull/12889 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-07 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1846710600 > Thank you for quick impl Adrien, for reference, i tried this approach [code link](https://github.com/easyice/lucene/commit/13851013e98ff8e27f05fa6dc4bc2e450ea6c03d#diff-c81a04bd13d2

  1   2   >