Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793362865 I tweaked the FMA logic for AMD cpus, to only avoid the high-latency scalar FMA where necessary. Should appease germans to get that extra ulp or whatever. sysprops default to "auto"

Re: [I] Should we not enlarge PagedGrowableWriter initial bitPerValue on NodeHash.rehash()? [lucene]

2023-11-03 Thread via GitHub
dungba88 commented on issue #12744: URL: https://github.com/apache/lucene/issues/12744#issuecomment-1793276663 I think this should be enhancement instead of bug, but I can't edit it. @mikemccand can you help to change the label? -- This is an automated message from the Apache Git Service.

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1382304599 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [I] surpriseMePolygon and createRegularPolygon in test util class returns invalid polygon [lucene]

2023-11-03 Thread via GitHub
stefanvodita commented on issue #12596: URL: https://github.com/apache/lucene/issues/12596#issuecomment-1793273774 What happens with `createRegularPolygon` is very interesting. The smaller the polygon's radius is and the more vertices it has, the smaller the sides are. At some point they ge

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793238778 So you can see the difference in approach. Personally i prefer how this AMD AVX-512 works: that for some operations, the 512-bit variant just isn't any faster than the 256-bit variant, ver

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793231388 vector results for this AMD CPU are unchanged by this PR. Float-relevant performance info from avxturbo. This CPU doesn't downclock but 512-bit FMA is 2x as slow as 256-bit FMA, so

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793202062 What are the results for vector API with this CPU? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-03 Thread via GitHub
kevindrosendahl commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1793186041 Hey @benwtrent and all, just wanted to let you know that I'm experimenting some with different index structures for larger than memory indexes. I have a working implem

Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1793153141 AMD EPYC 9R14: `INFO: Java vector incubator API enabled; uses preferredBitSize=512; FMA enabled` main: ``` Benchmark (size) Mode Cnt Sc

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) [lucene]

2023-11-03 Thread via GitHub
zacharymorn commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1793075423 Thanks @javanna for the quick confirmation! I will pick it back up in the next few days and see what still needs to be done then. -- This is an automated message from the Apache Git Se

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382155265 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java: ## @@ -33,7 +33,7 @@ class GermanStemmer { /** Amount of characters that are remo

Re: [PR] Replace usage of deprecated java.net.URL constructor with URI [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty merged PR #12755: URL: https://github.com/apache/lucene/pull/12755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Replace usage of deprecated java.net.URL constructor with URI [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on PR #12755: URL: https://github.com/apache/lucene/pull/12755#issuecomment-1793024558 > oh yeah, this is also the same class that does DNS lookups in its `equals()` method :) Yeah, it’s a 20+ yr old issue, that is too late to change. I tried… Anyway, thanks f

Re: [PR] Refactor access to VM options and move some VM options to oal.util.Constants [lucene]

2023-11-03 Thread via GitHub
uschindler merged PR #12754: URL: https://github.com/apache/lucene/pull/12754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1382113535 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/bitpacking/BitPacker.java: ## Review Comment: > +1. I'm assuming that will be add

[PR] During concurrent slice searches in IndexSearcher stop other tasks if one throws an Exception [lucene]

2023-11-03 Thread via GitHub
quux00 opened a new pull request, #12756: URL: https://github.com/apache/lucene/pull/12756 ### Description Since TaskExecutor now waits for all concurrent tasks to finish, even if one throws an Exception and when an exception is thrown, any remaining unscheduled tasks are cancelled,

Re: [PR] Replace usage of deprecated java.net.URL constructor with URI [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #12755: URL: https://github.com/apache/lucene/pull/12755#issuecomment-1792961270 > > The German explanation: one is a location the other is just an opaque name. Every URL is an URI, but not otherwise round. > > If every URL is a URI, then how come `URL.equal

Re: [PR] Replace usage of deprecated java.net.URL constructor with URI [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12755: URL: https://github.com/apache/lucene/pull/12755#issuecomment-1792956804 > The German explanation: one is a location the other is just an opaque name. Every URL is an URI, but not otherwise round. If every URL is a URI, then how come `URL.equals()` do a D

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
nknize commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1382078005 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/bitpacking/BitPacker.java: ## Review Comment: > I'm in the process of building th

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1792903331 > It seems like you have the low level encode/decode working? So all that remains is to hook that up with the Codec components that read/write the terms dict ... then you can test the Cod

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382046377 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java: ## @@ -33,7 +33,7 @@ class GermanStemmer { /** Amount of characters that a

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381994228 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermsIndexBuilder.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382043830 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java: ## @@ -33,7 +33,7 @@ class GermanStemmer { /** Amount of characters that a

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382039175 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java: ## @@ -33,7 +33,7 @@ class GermanStemmer { /** Amount of characters that are remo

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382037193 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java: ## @@ -33,7 +33,7 @@ class GermanStemmer { /** Amount of characters that are remo

Re: [PR] Refactor access to VM options and move some VM options to oal.util.Constants [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #12754: URL: https://github.com/apache/lucene/pull/12754#issuecomment-1792879954 Hi @rmuir , I also fixed the broken security manager and NULL property handling in Constants.java, so we won't crush. Thats an improvement, but long overdue. -- This is an aut

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792871213 >build is green Woot! Thanks @rmuir -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382022708 ## lucene/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java: ## @@ -793,19 +793,17 @@ public void testLocale() throws Exception {

Re: [PR] Refactor access to VM options and move some VM options to oal.util.Constants [lucene]

2023-11-03 Thread via GitHub
uschindler commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1382022330 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProper

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on code in PR #12753: URL: https://github.com/apache/lucene/pull/12753#discussion_r1382020346 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java: ## @@ -33,7 +33,7 @@ class GermanStemmer { /** Amount of characters that a

Re: [PR] Replace usage of deprecated java.net.URL constructor with URI [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12755: URL: https://github.com/apache/lucene/pull/12755#issuecomment-1792862514 oh yeah, this is also the same class that does DNS lookups in its `equals()` method :) -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Replace usage of deprecated java.net.URL constructor with URI [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12755: URL: https://github.com/apache/lucene/pull/12755#issuecomment-1792858675 needs @uschindler to review. Only germans understand the difference between URI and URL. Probably not great usability-wise for java to deprecate URL and force everyone to deal with this st

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
jimczi commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792854167 > I think we should expose the flat formats in the codec. But the required new functions for indexing the vectors seem to justify a new abstraction. Can we add the abstraction as an

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792850561 build is green -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
rmuir commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1382004593 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProperty("j

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792837540 primary purpose of this task is to benchmark collation, where you really want to use the tag anyway, e.g. `de-DE-u-co-phonebk` -- This is an automated message from the Apache Git Service

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381991598 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermsIndexBuilder.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381991598 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermsIndexBuilder.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381991598 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermsIndexBuilder.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381990655 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermType.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792821264 ok, i think the best fix is to just cutover this benchmark task to take a tag. The parsing is strict, so if someone has .alg file with `en,US` or whatever, they will get a nice error messa

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
Tony-X commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381984403 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermStateCodecImpl.java: ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792805460 I will fix the norwegian problems in the tests. not sure what this `NY` stuff is. there is: `no`, `nn`, `nb` for norwegian, nynorsk, and bokmål. I assume the test wants `nn`. -- This is

[PR] use URI where possible [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty opened a new pull request, #12755: URL: https://github.com/apache/lucene/pull/12755 This commit replaces the usage of the deprecated `java.net.URL` constructor with `URI`, later converting `toURL` where necessary to interoperate with the URLConnection API. The usage is mostly

Re: [PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
rmuir commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1381962437 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProperty("j

Re: [PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
uschindler commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1381963481 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProper

Re: [PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
uschindler commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1381961561 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProper

Re: [PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
uschindler commented on code in PR #12754: URL: https://github.com/apache/lucene/pull/12754#discussion_r1381959455 ## lucene/core/src/java/org/apache/lucene/util/Constants.java: ## @@ -53,28 +53,33 @@ private Constants() {} // can't construct /** The value of System.getProper

[PR] Refactor access to VM options [lucene]

2023-11-03 Thread via GitHub
uschindler opened a new pull request, #12754: URL: https://github.com/apache/lucene/pull/12754 This code was previously in `RamUsageEstimator` and also in `PanamaVectorUtilSupport`. In addition this moves detection of Client VM and fast FMA support to `Constants` class (in preparatio

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792750934 I am working on it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792745161 I just noticed that too! easy fix! ;-) ( this PR is marked non-draft, just to get the CI building/testing, which helps spot such issues, without warming my home! ) -- This is an

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792737633 LOL forbidden apis caught us with the language tags: yes let's use error handling: > If the specified language tag contains any ill-formed subtags, the first such subtag and all fol

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792714766 > I'd say lets factor out the cleanups and commit those without the java-21 stuff? it would make the java-21 PR smaller and these are really just tech-debt type fixes that should be

Re: [PR] Skip docs with Docvalues in NumericLeafComparator [lucene]

2023-11-03 Thread via GitHub
jpountz commented on PR #12405: URL: https://github.com/apache/lucene/pull/12405#issuecomment-1792713630 Tests fail because the optimization kicks in in more cases than the test expects, it's not clear to me yet if it's a bug or not. -- This is an automated message from the Apache Git Ser

Re: [PR] tests.multiplier could be omitted in failed test reproduce line [lucene]

2023-11-03 Thread via GitHub
dweiss merged PR #12752: URL: https://github.com/apache/lucene/pull/12752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792700032 oops, sorry i missed some, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792702231 I'd say lets factor out the cleanups and commit those without the java-21 stuff? it would make the java-21 PR smaller and these are really just tech-debt type fixes that should be addresse

Re: [PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-03 Thread via GitHub
dweiss commented on code in PR #12751: URL: https://github.com/apache/lucene/pull/12751#discussion_r1381916122 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -2560,10 +2560,15 @@ private void rollbackInternalNoCommit() throws IOException {

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792687728 the only ugly one was the benchmark locale task, because its got a method shaped just like the deprecated java-ism: ``` static Locale createLocale(String language, String country, Str

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792686970 @jimczi the HNSWWriter and Readers need the passed flat vector readers and writers to provide specific functions. Like the mergeOneField that returns closeable scorers. I am not

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792685739 Thanks @rmuir -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792669164 I'll try to cutover your branch, seems some stuff here should be using it already. For example Luke GUI already expects a field to be language tag, so we shouldn't be using this Locale.of(

Re: [PR] Skip docs with Docvalues in NumericLeafComparator [lucene]

2023-11-03 Thread via GitHub
jpountz commented on PR #12405: URL: https://github.com/apache/lucene/pull/12405#issuecomment-1792654011 I did my best at fixing conflicts, @LuXugang are you able to check the changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1792647332 > 2\. we use deprecated java.util Locale constructor - usage should likely be replaced with Locale:of > Locale:of factories are added in Java 19, so this kinda the change to the ver

Re: [PR] speedup arm int functions? [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1792630498 equivalent on intel ice lake: https://www.felixcloutier.com/x86/vpdpbusd IMO, we should figure out a path to using these, to get the best performance from the binary vectors. it isn't us

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792606649 Thanks @uschindler! Removing vShort and switching to LE (or native -- I didn't understand the problem with that -- this is never (directly) serialized to a Lucene index) short seems good

Re: [PR] Skip docs with Docvalues in NumericLeafComparator [lucene]

2023-11-03 Thread via GitHub
jpountz commented on PR #12405: URL: https://github.com/apache/lucene/pull/12405#issuecomment-1792600276 Sorry, since I had approved the PR, I had not understood it was still waiting on me. It's a great change, let's see how to get it in. -- This is an automated message from the Apache Gi

Re: [PR] speedup arm int functions? [lucene]

2023-11-03 Thread via GitHub
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1792596894 @ChrisHegarty I did some investigation, looked at the assembly on ARM machines, did some experiments, etc. I didn't mess around with intel, but i think the situation is the same. My though

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792595250 > @uschindler pushed 0 commits. Huh, how do you do that? Mike McCandless http://blog.mikemccandless.com On Fri, Nov 3, 2023 a

[PR] [DRAFT] Bump release to Java 21 [lucene]

2023-11-03 Thread via GitHub
ChrisHegarty opened a new pull request, #12753: URL: https://github.com/apache/lucene/pull/12753 [ There is no intent to merge this PR ] This PR is intended to help tease out potential issues that may arise from compiling with JDK 21. We can use it to identify and pick out the individ

Re: [I] Explore partially decoding blocks (within-block skipping) [lucene]

2023-11-03 Thread via GitHub
jpountz commented on issue #12749: URL: https://github.com/apache/lucene/issues/12749#issuecomment-1792592185 How would it work? Since blocks are delta-coded, you can't know the value at a given index without decoding all previous values and computing their sum? Or you need to store some ch

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
jimczi commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792590687 > The flat format is an implementation detail. Folks using the quantized hnsw do not have to supply a flat format. We can register the flat format for direct usage (outside of HNSW)

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381812079 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h;

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381811085 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h;

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-03 Thread via GitHub
jpountz commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1792586157 For reference, I'm interested in taking advantage of the fact we're changing the codec anyway to look into other smaller changes, like switching tail postings from vints to group-varint,

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
jpountz commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381802489 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-03 Thread via GitHub
s1monw commented on code in PR #12751: URL: https://github.com/apache/lucene/pull/12751#discussion_r1381800115 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -2560,10 +2560,15 @@ private void rollbackInternalNoCommit() throws IOException {

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792570482 > @mikemccand: If you want to see the changes I reverted, see the above comparison: https://github.com/apache/lucene/compare/36de2bb7fa7a0587a102cf5c4d35ac8f94976bbd..c1b626c0636821f4d7c0

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) [lucene]

2023-11-03 Thread via GitHub
javanna commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1792564301 heya @zacharymorn I worked quite a bit on this last year. I should have addresses all of this little by little, although we are still not very close on deprecating search(Query, Collector).

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
easyice commented on code in PR #12748: URL: https://github.com/apache/lucene/pull/12748#discussion_r1381794470 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsReader.java: ## @@ -86,8 +86,11 @@ public final class Lucene90BlockTreeTermsR

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792546852 @jimczi what do you mean "existing format as implementation detail"? The flat format is an implementation detail. Folks using the quantized hnsw do not have to supply a flat form

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792545702 @mikemccand: If you want to see the changes I reverted, see the above comparison: https://github.com/apache/lucene/compare/c1b626c0636821f4d7c085895359489e7dfa330f..36de2bb7fa7a0587a102cf

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792537252 @mikemccand, I checked in main branch, it no longer uses any varhandles in BytesRefHash and ByteBlockPool. No idea where the code moved to. It now uses BytesRefBlockPool, but thi

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2023-11-03 Thread via GitHub
uschindler commented on PR #888: URL: https://github.com/apache/lucene/pull/888#issuecomment-1792528482 Hi @mikemccand, I reset the branch to the initial commit (without BytesRefHash & Co. changes ). Then I merged and pushed. I will now try to redo the changes. In fact, on x86 m

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
easyice commented on code in PR #12748: URL: https://github.com/apache/lucene/pull/12748#discussion_r1381655684 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -444,9 +446,15 @@ long addNode(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-03 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1792404783 > So we are removing this half of the undirected connection but I don't think we are removing the other half c ---> b anywhere. This will leave inconsistent Graph This is by

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
easyice commented on code in PR #12748: URL: https://github.com/apache/lucene/pull/12748#discussion_r1381647779 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -96,6 +96,8 @@ public enum INPUT_TYPE { */ static final byte ARCS_FOR_DIRECT_ADDRESSING = 1 <

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
jimczi commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792391067 I agree with Adrien that hardcoded formats with a clear strategy are better. We want to avoid exposing a knn format that takes another abstract format. That would be cryptic and diffic

Re: [PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-03 Thread via GitHub
dweiss commented on code in PR #12751: URL: https://github.com/apache/lucene/pull/12751#discussion_r1381610595 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -2560,10 +2560,15 @@ private void rollbackInternalNoCommit() throws IOException {

Re: [PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-03 Thread via GitHub
dweiss commented on code in PR #12751: URL: https://github.com/apache/lucene/pull/12751#discussion_r1381608903 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -2560,10 +2560,15 @@ private void rollbackInternalNoCommit() throws IOException {

Re: [PR] Random access term dictionary [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1381580022 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/lucene90/randomaccess/TermType.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1792327869 `Test2BFST` is happy, yay! ``` BUILD SUCCESSFUL in 56m 36s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-03 Thread via GitHub
easyice commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1792322904 @mikemccand Thanks for the benchmarking, i also write 10 million docs of random long values, then use `TermInSetQuery` for benchmarking. here is the result: The file size of tip r

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381570983 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h;

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381565556 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381564163 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -110,25 +117,39 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException {

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381559347 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,149 +218,228 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; }

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1792293648 Thanks @dungba88! I confirmed that `IndexToFST` now works again, and, when given "up to" `inf` RAM to use, it produces the same sized minimal `fst.bin` as main at `367244208 by

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

2023-11-03 Thread via GitHub
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1381538494 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBlockPoolReverseBytesReader.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF

Re: [PR] TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). [lucene]

2023-11-03 Thread via GitHub
s1monw commented on code in PR #12751: URL: https://github.com/apache/lucene/pull/12751#discussion_r1381543043 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -2560,10 +2560,15 @@ private void rollbackInternalNoCommit() throws IOException {

  1   2   >