[GitHub] [lucene] gf2121 commented on pull request #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries

2022-12-14 Thread GitBox
gf2121 commented on PR #12006: URL: https://github.com/apache/lucene/pull/12006#issuecomment-1350639546 This brings a small jump for Distance Filter LatLonPoint task. http://people.apache.org/~mikemccand/geobench.html -- This is an automated message from the Apache Git Service. To respond

[GitHub] [lucene] jpountz merged pull request #12004: Move byte vector queries into new KnnByteVectorQuery

2022-12-14 Thread GitBox
jpountz merged PR #12004: URL: https://github.com/apache/lucene/pull/12004 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
uschindler commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1350666129 Hi, I agree with all Robert say. I would also like to make another suggestion (that could also be applied when this has been fixed). To me it looks like a bad decission of antlr, that

[GitHub] [lucene] iverase commented on pull request #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries

2022-12-14 Thread GitBox
iverase commented on PR #12006: URL: https://github.com/apache/lucene/pull/12006#issuecomment-1350678308 > If it gives a good speedup, we could think about applying the same trick to other queries too, if we can do it without making things too ugly. For example LatLonPoint, IntPoint, etc cu

[GitHub] [lucene] jpountz commented on pull request #12004: Move byte vector queries into new KnnByteVectorQuery

2022-12-14 Thread GitBox
jpountz commented on PR #12004: URL: https://github.com/apache/lucene/pull/12004#issuecomment-1350702865 @benwtrent Would you mind helping with the backport by creating a PR? There are a few conflicts and I could use some help with resolving them. -- This is an automated message from the

[GitHub] [lucene] uschindler commented on pull request #12004: Move byte vector queries into new KnnByteVectorQuery

2022-12-14 Thread GitBox
uschindler commented on PR #12004: URL: https://github.com/apache/lucene/pull/12004#issuecomment-1350722742 > @benwtrent Would you mind helping with the backport by creating a PR? There are a few conflicts and I could use some help with resolving them. There are also Java 17 switch ex

[GitHub] [lucene] jpountz commented on pull request #11997: Add IntField, LongField, FloatField and DoubleField

2022-12-14 Thread GitBox
jpountz commented on PR #11997: URL: https://github.com/apache/lucene/pull/11997#issuecomment-1350964750 @fcofdez I hope you don't mind but I checked out your branch to look into the above issue and ended up pushing my changes. These new fields not only support indexing sorted numeric doc v

[GitHub] [lucene] fcofdez commented on pull request #11997: Add IntField, LongField, FloatField and DoubleField

2022-12-14 Thread GitBox
fcofdez commented on PR #11997: URL: https://github.com/apache/lucene/pull/11997#issuecomment-1351000741 @jpountz thanks for looking into this, unfortunately I didn't have time to go through the last round of review comments. The change makes sense to me. -- This is an automated message f

[GitHub] [lucene] LuXugang opened a new pull request, #12017: Aggressive `count` in BooleanWeight

2022-12-14 Thread GitBox
LuXugang opened a new pull request, #12017: URL: https://github.com/apache/lucene/pull/12017 When BooleanQuery is pure disjunction, if at least one clause could match all docs, then we could get right `count` even though there was other clause whose `count` is unknown. -- This is

[GitHub] [lucene] dweiss commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
dweiss commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351155914 > With ASM this was hard to do but with Grade it is quite easy. Just add another configuration with those two dependencies and use the "Gradle Shadow Plugin" (https://imperceptiblethought

[GitHub] [lucene] jpountz commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight

2022-12-14 Thread GitBox
jpountz commented on code in PR #12017: URL: https://github.com/apache/lucene/pull/12017#discussion_r1048383541 ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -470,14 +470,19 @@ private int reqCount(LeafReaderContext context) throws IOException {

[GitHub] [lucene] benwtrent commented on pull request #12004: Move byte vector queries into new KnnByteVectorQuery

2022-12-14 Thread GitBox
benwtrent commented on PR #12004: URL: https://github.com/apache/lucene/pull/12004#issuecomment-1351211685 @uschindler 100%! I can get the backport PR done asap -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351332547 personally i am against the shading. I think it is a huge antipattern, it hides third-party artifacts completely. think about someone trying to do security or license audit and they have "

[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351345609 I also dont understand the issue where ppl think they can modify arbitrary versions of lucene dependencies. Can we specify our dependencies in a different way (e.g. exact version) in

[GitHub] [lucene] dweiss commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
dweiss commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351355396 > Can we specify our dependencies in a different way (e.g. exact version) in our maven stuff so this won't happen? e.g. you can do this in python, and specify that you depend on antlr ==

[GitHub] [lucene] benwtrent opened a new pull request, #12018: Move byte vector queries into new KnnByteVectorQuery (#12004)

2022-12-14 Thread GitBox
benwtrent opened a new pull request, #12018: URL: https://github.com/apache/lucene/pull/12018 Backport of #12004 This is the first commit of a much larger refactor. The overall goal is to separate the concerns of byte vectors and float vectors. Making their usage and APIs clearer fo

[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351366346 Yes, java showed what a disaster it is around supply chains with the log4j vulnerability. Shading should not even be considered as an option. -- This is an automated message from the Apa

[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
uschindler commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351368145 > > Can we specify our dependencies in a different way (e.g. exact version) in our maven stuff so this won't happen? e.g. you can do this in python, and specify that you depend on ant

[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351374308 I feel that if we publish a shaded jar we become guilty of "contributing to the delinquency" of terrible supply chain management, by hiding third party dependencies and their versions from

[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351378582 another way to say it: shading is a terrible, TERRIBLE idea and you only hear about it in java, because java is the only language with developers that are bad enough to consider it.

[GitHub] [lucene] benwtrent commented on pull request #12018: Move byte vector queries into new KnnByteVectorQuery (#12004)

2022-12-14 Thread GitBox
benwtrent commented on PR #12018: URL: https://github.com/apache/lucene/pull/12018#issuecomment-1351392230 @jpountz Here is the backport. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [lucene] jpountz merged pull request #12018: Move byte vector queries into new KnnByteVectorQuery (#12004)

2022-12-14 Thread GitBox
jpountz merged PR #12018: URL: https://github.com/apache/lucene/pull/12018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] benwtrent commented on pull request #12014: Ban use of Math.fma across the entire codebase

2022-12-14 Thread GitBox
benwtrent commented on PR #12014: URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351560532 Holy crap, creating `BigDecimal` and then multiplying & adding is crazy. This is a completely unacceptable fallback calculation for this method. +1 on banning its use in the code

[GitHub] [lucene] dweiss commented on pull request #12014: Ban use of Math.fma across the entire codebase

2022-12-14 Thread GitBox
dweiss commented on PR #12014: URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351660443 I honestly don't know who can use this method without any provided cpuid check... We actually use fma in our code but do so by detecting the performance difference between a naive impleme

[GitHub] [lucene] jdconrad commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
jdconrad commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351666949 One other possibility would be to hand-roll the expression lexer/parser. This would get rid of the need for any additional dependencies and generated code. From what I can tell the API

[GitHub] [lucene] rmuir commented on pull request #12014: Ban use of Math.fma across the entire codebase

2022-12-14 Thread GitBox
rmuir commented on PR #12014: URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351680479 I looked at what e.g. glibc does here as a fallback out of curiousity, for floats it is very simple (using Dekker algorithm), but requires changing the FP rounding mode, which you cant do

[GitHub] [lucene] uschindler commented on pull request #12014: Ban use of Math.fma across the entire codebase

2022-12-14 Thread GitBox
uschindler commented on PR #12014: URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351734347 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
uschindler commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351743460 > One other possibility would be to hand-roll the expression lexer/parser. This would get rid of the need for any additional dependencies and generated code. From what I can tell the

[GitHub] [lucene] msokolov commented on pull request #11946: add similarity threshold for hnsw

2022-12-14 Thread GitBox
msokolov commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1351747886 It seems there are conflicts due to a recent refactor of this query - would you mind merging from main and resolving those please? -- This is an automated message from the Apache Git

[GitHub] [lucene] jdconrad commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
jdconrad commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351772233 > javacc? I was under the impression this was EOL? If it's still well supported I'm not familiar enough with the code generation to know if this would avoid the pitfalls ANTLR ha

[GitHub] [lucene] benwtrent commented on a diff in pull request #11946: add similarity threshold for hnsw

2022-12-14 Thread GitBox
benwtrent commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1048763428 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) { * @thro

[GitHub] [lucene] benwtrent commented on a diff in pull request #11946: add similarity threshold for hnsw

2022-12-14 Thread GitBox
benwtrent commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1048764283 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) { * @thro

[GitHub] [lucene] AlmostFamiliar commented on issue #9231: HyphenationCompoundWordTokenFilter creates overlapping tokens with onlyLongestMatch enabled [LUCENE-8183]

2022-12-14 Thread GitBox
AlmostFamiliar commented on issue #9231: URL: https://github.com/apache/lucene/issues/9231#issuecomment-1351830837 Are there still plans to merge this? I would also be very interested in this feature. -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [lucene] benwtrent commented on issue #11963: Improve vector quantization API

2022-12-14 Thread GitBox
benwtrent commented on issue #11963: URL: https://github.com/apache/lucene/issues/11963#issuecomment-1351900658 OK, step one of this is done with the `KnnByteVectorQuery`. I am next approaching a new `KnnByteVectorField` and this will cause some refactoring to `LeafReader` or `VectorValues`

[GitHub] [lucene] rmuir commented on issue #11963: Improve vector quantization API

2022-12-14 Thread GitBox
rmuir commented on issue #11963: URL: https://github.com/apache/lucene/issues/11963#issuecomment-1351946133 I would tend to expect some organization like this: indexing: * ByteVectorField * FloatVectorField searching: * ByteVectorField.newQuery * FloatVectorField.new

[GitHub] [lucene] msokolov commented on a diff in pull request #11946: add similarity threshold for hnsw

2022-12-14 Thread GitBox
msokolov commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1048846385 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) { * @throw

[GitHub] [lucene] benwtrent commented on a diff in pull request #11946: add similarity threshold for hnsw

2022-12-14 Thread GitBox
benwtrent commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1048856364 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) { * @thro

[GitHub] [lucene] benwtrent opened a new pull request, #12019: Clean up vector backward-codecs

2022-12-14 Thread GitBox
benwtrent opened a new pull request, #12019: URL: https://github.com/apache/lucene/pull/12019 I noticed that the Lucene94 backward-codec was still allowing KNN fields writer. I have removed it, and updated the related tests similarly to other KNN backward-codec changes. There are var

[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
reta commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352084948 @rmuir first pass over pickiness, all tests are now run w/o diagnostics listener (picky / not picky mode), the rough observations so far are encouraging, there are no significant changes in

[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352177919 we don't need to parameterize the pickiness IMO, we can just turn it on in these tests. Thanks for getting this hooked up. I will look at your branch and try to play with it. -- This is

[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352248407 I pushed a proposal to your branch (only fixing one of the tests in .js to use it). If you are really against it, just revert the commit. But i think it keeps tests simpler. -- This is

[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
reta commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352258413 > I pushed a proposal to your branch (only fixing one of the tests in .js to use it). If you are really against it, just revert the commit. But i think it keeps tests simpler. :+1: N

[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352259233 sorry i didnt do the other tests, I gotta run for now. i can do them later tonight or tomorrow if you need but just wanted to prototype it out -- This is an automated message from the Ap

[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
uschindler commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049005864 ## lucene/expressions/src/java/module-info.java: ## @@ -19,7 +19,7 @@ module org.apache.lucene.expressions { Review Comment: Can we now remove this suppression

[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
uschindler commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049010145 ## lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [lucene] reta commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
reta commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049014391 ## lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
uschindler commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049015663 ## lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [lucene] rmuir commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
rmuir commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049016403 ## lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [lucene] reta commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
reta commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049019777 ## lucene/expressions/src/java/module-info.java: ## @@ -19,7 +19,7 @@ module org.apache.lucene.expressions { Review Comment: ah ... it is still automatic :( although

[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
reta commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352291207 > sorry i didnt do the other tests, I gotta run for now. i can do them later tonight or tomorrow if you need but just wanted to prototype it out @rmuir tests have been migrated, thank

[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
uschindler commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049026102 ## lucene/expressions/src/java/module-info.java: ## @@ -19,7 +19,7 @@ module org.apache.lucene.expressions { Review Comment: Ok. No problem. I just noticed nam

[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352322297 I will check it out and inspect the coverage report from .js tests and see if there are any holes. If i find them I will push more tests. I am just really paranoid about some of the slow t

[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox
uschindler commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352328979 > I will check it out and inspect the coverage report from .js tests and see if there are any holes. If i find them I will push more tests. I am just really paranoid about some of the

[GitHub] [lucene] LuXugang commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight

2022-12-14 Thread GitBox
LuXugang commented on code in PR #12017: URL: https://github.com/apache/lucene/pull/12017#discussion_r1049162892 ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -470,14 +470,19 @@ private int reqCount(LeafReaderContext context) throws IOException {

[GitHub] [lucene] LuXugang commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight

2022-12-14 Thread GitBox
LuXugang commented on code in PR #12017: URL: https://github.com/apache/lucene/pull/12017#discussion_r1049163040 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java: ## @@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() throws Excep