Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
easyice commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1791978766 I ran this with wikimedium10m and wikimediumall, There was no significant performance improvement or regression that was found. The total size of tip has a slight reduced: |

Re: [PR] Add a specialized bulk scorer for regular conjunctions. [lucene]

2023-11-03 Thread via GitHub
jpountz commented on PR #12719: URL: https://github.com/apache/lucene/pull/12719#issuecomment-1792001309 Interestingly, it seems to also help with facets: http://people.apache.org/~mikemccand/lucenebench/AndHighHighDayTaxoFacets.html. -- This is an automated message from the Apache Git Se

Re: [PR] Add a specialized bulk scorer for regular conjunctions. [lucene]

2023-11-03 Thread via GitHub
jpountz commented on PR #12719: URL: https://github.com/apache/lucene/pull/12719#issuecomment-1792000689 This yielded a good speedup on [nightly benchmarks](http://people.apache.org/~mikemccand/lucenebench/CountAndHighHigh.html). I pushed an annotation. -- This is an automated message f

Re: [PR] LUCENE-10560: Faster merging of TermsEnum [lucene]

2023-11-03 Thread via GitHub
jpountz merged PR #1052: URL: https://github.com/apache/lucene/pull/1052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [I] Can we speed up OrdinalMap construction? [LUCENE-10560] [lucene]

2023-11-03 Thread via GitHub
jpountz closed issue #11596: Can we speed up OrdinalMap construction? [LUCENE-10560] URL: https://github.com/apache/lucene/issues/11596 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] LUCENE-10560: Faster merging of TermsEnum [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #1052: URL: https://github.com/apache/lucene/pull/1052#issuecomment-1792099793 > +1 I fell a bit into a trap by trying to make long shared prefixes less adversarial. Let's do progress over perfection and start with a simple approach and look into whether/how we ca

Re: [I] TestIndexWriterOnVMError.testUnknownError timesout [lucene]

2023-11-03 Thread via GitHub
s1monw commented on issue #12654: URL: https://github.com/apache/lucene/issues/12654#issuecomment-1792101597 @dweiss I agree this is the problem. We should execute that `IOUtils.closeWhileHandlingException(readerPool, deleter, writeLock);` in a try / finally block. I can open a PR for that

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1792198963 I tested this PR using `IndexToFST` from `luceneutil`. This just tests construction time and final FST size, on all `wikimediumall` unique terms, allowing up to 64 MB RAM while build

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1792202492 I'll run `Test2BFST` too ... takes a few hours! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] TestIndexWriterOnVMError.testUnknownError timesout [lucene]

2023-11-03 Thread via GitHub
dweiss commented on issue #12654: URL: https://github.com/apache/lucene/issues/12654#issuecomment-1792205263 Thanks, Simon. I'll open up a PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12748: URL: https://github.com/apache/lucene/pull/12748#discussion_r1381468008 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -444,9 +446,15 @@ long addNode(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

[PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-03 Thread via GitHub
dweiss opened a new pull request, #12751: URL: https://github.com/apache/lucene/pull/12751 Fixes #12654. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

[PR] The default tests.multiplier passed from gradle was 1, but [lucene]

2023-11-03 Thread via GitHub
dweiss opened a new pull request, #12752: URL: https://github.com/apache/lucene/pull/12752 LuceneTestCase tried to compute its default value from TESTS_NIGHTLY. This could lead to subtle errors: nightly mode failures would not report tests.multipler=1 and when started from the IDE, the test

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381520513 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws IOException { }

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1792271552 `Test2BFSTs` is happy: ``` BUILD SUCCESSFUL in 50m 6s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
gf2121 commented on code in PR #12748: URL: https://github.com/apache/lucene/pull/12748#discussion_r1381513302 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsReader.java: ## @@ -86,8 +86,11 @@ public final class Lucene90BlockTreeTermsRe

Re: [PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-03 Thread via GitHub
s1monw commented on code in PR #12751: URL: https://github.com/apache/lucene/pull/12751#discussion_r1381543043 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -2560,10 +2560,15 @@ private void rollbackInternalNoCommit() throws IOException {

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381538494 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBlockPoolReverseBytesReader.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1792293648 Thanks @dungba88! I confirmed that `IndexToFST` now works again, and, when given "up to" `inf` RAM to use, it produces the same sized minimal `fst.bin` as main at `367244208 by

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381559347 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381564163 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -110,25 +117,39 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381565556 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381570983 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h;

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
easyice commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1792322904 @mikemccand Thanks for the benchmarking, i also write 10 million docs of random long values, then use `TermInSetQuery` for benchmarking. here is the result: The file size of tip r

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1792327869 `Test2BFST` is happy, yay! ``` BUILD SUCCESSFUL in 56m 36s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381580022 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermType.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-03 Thread via GitHub
dweiss commented on code in PR #12751: URL: https://github.com/apache/lucene/pull/12751#discussion_r1381608903 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -2560,10 +2560,15 @@ private void rollbackInternalNoCommit() throws IOException {

Re: [PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-03 Thread via GitHub
dweiss commented on code in PR #12751: URL: https://github.com/apache/lucene/pull/12751#discussion_r1381610595 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -2560,10 +2560,15 @@ private void rollbackInternalNoCommit() throws IOException {

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
jimczi commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792391067 I agree with Adrien that hardcoded formats with a clear strategy are better. We want to avoid exposing a knn format that takes another abstract format. That would be cryptic and diffic

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
easyice commented on code in PR #12748: URL: https://github.com/apache/lucene/pull/12748#discussion_r1381647779 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -96,6 +96,8 @@ public enum INPUT_TYPE { */ static final byte ARCS_FOR_DIRECT_ADDRESSING = 1 <

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-03 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1792404783 > So we are removing this half of the undirected connection but I don't think we are removing the other half c ---> b anywhere. This will leave inconsistent Graph This is by

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
easyice commented on code in PR #12748: URL: https://github.com/apache/lucene/pull/12748#discussion_r1381655684 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -444,9 +446,15 @@ long addNode(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792528482 Hi @mikemccand, I reset the branch to the initial commit (without BytesRefHash & Co. changes ). Then I merged and pushed. I will now try to redo the changes. In fact, on x86 m

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792537252 @mikemccand, I checked in main branch, it no longer uses any varhandles in BytesRefHash and ByteBlockPool. No idea where the code moved to. It now uses BytesRefBlockPool, but thi

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792545702 @mikemccand: If you want to see the changes I reverted, see the above comparison: https://github.com/apache/lucene/compare/c1b626c0636821f4d7c085895359489e7dfa330f..36de2bb7fa7a0587a102cf

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792546852 @jimczi what do you mean "existing format as implementation detail"? The flat format is an implementation detail. Folks using the quantized hnsw do not have to supply a flat form

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
easyice commented on code in PR #12748: URL: https://github.com/apache/lucene/pull/12748#discussion_r1381794470 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsReader.java: ## @@ -86,8 +86,11 @@ public final class Lucene90BlockTreeTermsR

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) [lucene]

2023-11-03 Thread via GitHub
javanna commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1792564301 heya @zacharymorn I worked quite a bit on this last year. I should have addresses all of this little by little, although we are still not very close on deprecating search(Query, Collector).

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792570482 > @mikemccand: If you want to see the changes I reverted, see the above comparison: https://github.com/apache/lucene/compare/36de2bb7fa7a0587a102cf5c4d35ac8f94976bbd..c1b626c0636821f4d7c0

Re: [PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-03 Thread via GitHub
s1monw commented on code in PR #12751: URL: https://github.com/apache/lucene/pull/12751#discussion_r1381800115 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -2560,10 +2560,15 @@ private void rollbackInternalNoCommit() throws IOException {

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
jpountz commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381802489 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-03 Thread via GitHub
jpountz commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1792586157 For reference, I'm interested in taking advantage of the fact we're changing the codec anyway to look into other smaller changes, like switching tail postings from vints to group-varint,

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381811085 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h;

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381812079 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h;

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
jimczi commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792590687 > The flat format is an implementation detail. Folks using the quantized hnsw do not have to supply a flat format. We can register the flat format for direct usage (outside of HNSW)

Re: [I] Explore partially decoding blocks (within-block skipping) [lucene]

2023-11-03 Thread via GitHub
jpountz commented on issue #12749: URL: https://github.com/apache/lucene/issues/12749#issuecomment-1792592185 How would it work? Since blocks are delta-coded, you can't know the value at a given index without decoding all previous values and computing their sum? Or you need to store some ch

[PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty opened a new pull request, #12753: URL: https://github.com/apache/lucene/pull/12753 [ There is no intent to merge this PR ] This PR is intended to help tease out potential issues that may arise from compiling with JDK 21. We can use it to identify and pick out the individ

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792595250 > @uschindler pushed 0 commits. Huh, how do you do that? Mike McCandless http://blog.mikemccandless.com On Fri, Nov 3, 2023 a

Re: [PR] speedup arm int functions? [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1792596894 @ChrisHegarty I did some investigation, looked at the assembly on ARM machines, did some experiments, etc. I didn't mess around with intel, but i think the situation is the same. My though

Re: [PR] Skip docs with Docvalues in NumericLeafComparator [lucene]

2023-11-03 Thread via GitHub
jpountz commented on PR #12405: URL: https://github.com/apache/lucene/pull/12405#issuecomment-1792600276 Sorry, since I had approved the PR, I had not understood it was still waiting on me. It's a great change, let's see how to get it in. -- This is an automated message from the Apache Gi

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792606649 Thanks @uschindler! Removing vShort and switching to LE (or native -- I didn't understand the problem with that -- this is never (directly) serialized to a Lucene index) short seems good

Re: [PR] speedup arm int functions? [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1792630498 equivalent on intel ice lake: https://www.felixcloutier.com/x86/vpdpbusd IMO, we should figure out a path to using these, to get the best performance from the binary vectors. it isn't us

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792647332 > 2\. we use deprecated java.util Locale constructor - usage should likely be replaced with Locale:of > Locale:of factories are added in Java 19, so this kinda the change to the ver

Re: [PR] Skip docs with Docvalues in NumericLeafComparator [lucene]

2023-11-03 Thread via GitHub
jpountz commented on PR #12405: URL: https://github.com/apache/lucene/pull/12405#issuecomment-1792654011 I did my best at fixing conflicts, @LuXugang are you able to check the changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792669164 I'll try to cutover your branch, seems some stuff here should be using it already. For example Luke GUI already expects a field to be language tag, so we shouldn't be using this Locale.of(

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792685739 Thanks @rmuir -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792686970 @jimczi the HNSWWriter and Readers need the passed flat vector readers and writers to provide specific functions. Like the mergeOneField that returns closeable scorers. I am not

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792687728 the only ugly one was the benchmark locale task, because its got a method shaped just like the deprecated java-ism: ``` static Locale createLocale(String language, String country, Str

Re: [PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-03 Thread via GitHub
dweiss commented on code in PR #12751: URL: https://github.com/apache/lucene/pull/12751#discussion_r1381916122 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -2560,10 +2560,15 @@ private void rollbackInternalNoCommit() throws IOException {

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792702231 I'd say lets factor out the cleanups and commit those without the java-21 stuff? it would make the java-21 PR smaller and these are really just tech-debt type fixes that should be addresse

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792700032 oops, sorry i missed some, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] tests.multiplier could be omitted in failed test reproduce line [lucene]

2023-11-03 Thread via GitHub
dweiss merged PR #12752: URL: https://github.com/apache/lucene/pull/12752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Skip docs with Docvalues in NumericLeafComparator [lucene]

2023-11-03 Thread via GitHub
jpountz commented on PR #12405: URL: https://github.com/apache/lucene/pull/12405#issuecomment-1792713630 Tests fail because the optimization kicks in in more cases than the test expects, it's not clear to me yet if it's a bug or not. -- This is an automated message from the Apache Git Ser

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792714766 > I'd say lets factor out the cleanups and commit those without the java-21 stuff? it would make the java-21 PR smaller and these are really just tech-debt type fixes that should be

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792737633 LOL forbidden apis caught us with the language tags: yes let's use error handling: > If the specified language tag contains any ill-formed subtags, the first such subtag and all fol

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792745161 I just noticed that too! easy fix! ;-) ( this PR is marked non-draft, just to get the CI building/testing, which helps spot such issues, without warming my home! ) -- This is an

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792750934 I am working on it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
uschindler opened a new pull request, #12754: URL: https://github.com/apache/lucene/pull/12754 This code was previously in `RamUsageEstimator` and also in `PanamaVectorUtilSupport`. In addition this moves detection of Client VM and fast FMA support to `Constants` class (in preparatio

Re: [PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
uschindler commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1381959455 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProper

Re: [PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
uschindler commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1381961561 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProper

Re: [PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
uschindler commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1381963481 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProper

Re: [PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
rmuir commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1381962437 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProperty("j

[PR] use URI where possible [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty opened a new pull request, #12755: URL: https://github.com/apache/lucene/pull/12755 This commit replaces the usage of the deprecated `java.net.URL` constructor with `URI`, later converting `toURL` where necessary to interoperate with the URLConnection API. The usage is mostly

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792805460 I will fix the norwegian problems in the tests. not sure what this `NY` stuff is. there is: `no`, `nn`, `nb` for norwegian, nynorsk, and bokmål. I assume the test wants `nn`. -- This is

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381984403 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermStateCodecImpl.java: ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792821264 ok, i think the best fix is to just cutover this benchmark task to take a tag. The parsing is strict, so if someone has .alg file with `en,US` or whatever, they will get a nice error messa

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381990655 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermType.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381991598 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermsIndexBuilder.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381991598 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermsIndexBuilder.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381991598 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermsIndexBuilder.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792837540 primary purpose of this task is to benchmark collation, where you really want to use the tag anyway, e.g. `de-DE-u-co-phonebk` -- This is an automated message from the Apache Git Service

Re: [PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
rmuir commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1382004593 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProperty("j

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792850561 build is green -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
jimczi commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792854167 > I think we should expose the flat formats in the codec. But the required new functions for indexing the vectors seem to justify a new abstraction. Can we add the abstraction as an

Re: [PR] Replace usage of deprecated java.net.URL constructor with URI [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12755: URL: https://github.com/apache/lucene/pull/12755#issuecomment-1792858675 needs @uschindler to review. Only germans understand the difference between URI and URL. Probably not great usability-wise for java to deprecate URL and force everyone to deal with this st

Re: [PR] Replace usage of deprecated java.net.URL constructor with URI [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12755: URL: https://github.com/apache/lucene/pull/12755#issuecomment-1792862514 oh yeah, this is also the same class that does DNS lookups in its `equals()` method :) -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382020346 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java: ## @@ -33,7 +33,7 @@ class GermanStemmer { /** Amount of characters that a

Re: [PR] Refactor access to VM options and move some VM options to oal.util.Constants [lucene]

2023-11-03 Thread via GitHub
uschindler commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1382022330 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProper

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382022708 ## lucene/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java: ## @@ -793,19 +793,17 @@ public void testLocale() throws Exception {

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792871213 >build is green Woot! Thanks @rmuir -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Refactor access to VM options and move some VM options to oal.util.Constants [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #12754: URL: https://github.com/apache/lucene/pull/12754#issuecomment-1792879954 Hi @rmuir , I also fixed the broken security manager and NULL property handling in Constants.java, so we won't crush. Thats an improvement, but long overdue. -- This is an aut

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382037193 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java: ## @@ -33,7 +33,7 @@ class GermanStemmer { /** Amount of characters that are remo

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382039175 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java: ## @@ -33,7 +33,7 @@ class GermanStemmer { /** Amount of characters that are remo

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382043830 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java: ## @@ -33,7 +33,7 @@ class GermanStemmer { /** Amount of characters that a

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381994228 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermsIndexBuilder.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382046377 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java: ## @@ -33,7 +33,7 @@ class GermanStemmer { /** Amount of characters that a

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1792903331 > It seems like you have the low level encode/decode working? So all that remains is to hook that up with the Codec components that read/write the terms dict ... then you can test the Cod

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
nknize commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1382078005 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/bitpacking/BitPacker.java: ## Review Comment: > I'm in the process of building th

Re: [PR] Replace usage of deprecated java.net.URL constructor with URI [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12755: URL: https://github.com/apache/lucene/pull/12755#issuecomment-1792956804 > The German explanation: one is a location the other is just an opaque name. Every URL is an URI, but not otherwise round. If every URL is a URI, then how come `URL.equals()` do a D

Re: [PR] Replace usage of deprecated java.net.URL constructor with URI [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #12755: URL: https://github.com/apache/lucene/pull/12755#issuecomment-1792961270 > > The German explanation: one is a location the other is just an opaque name. Every URL is an URI, but not otherwise round. > > If every URL is a URI, then how come `URL.equal

  1   2   >