abhirkz commented on issue #9883:
URL: https://github.com/apache/lucene/issues/9883#issuecomment-1586719249
Any update on this ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comme
uschindler commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1586750531
Hi,
so this looks like 2 different issues:
- incorrect validation of query and indexed field contents.
- some vectors cause NaN when the cosine is calculated. I am not sure how
uschindler commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1586774660
I think I know why it happens, in both providers we calculate the cosine
like that and when applying the sqrt it gets negative for one of those reasons:
https://github.com/apache/l
uschindler commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1586803881
I debugged through it: The vector causes `norm1` and `norm2`, as well as
`sum` to get `Infinity`. `Infinity/Infinity` results in `NaN`.
So it is not caused by sqrt. In general y
uschindler commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587138822
I think the related issue found with some vectors creating an NaN cosine
(happens when the floats are too large by exponent and the result gets infinity
after multiplication) is a sep
uschindler commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587147803
I think we could decide to also disallow such vectors. If the square of one
its components gets infinity it should maybe also be rejected. What do you
think?
```java
float y
uschindler commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587159643
Actually with the current similarity methods we should make sure that for
each vector component `v`, the following is true: `Math.isFinite(v * v *
vector.length)`.
This is quit
uschindler commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587183519
I modified the function like that, but have not yet committed:
```java
/**
* Checks if a float vector only has finite components and the square of
its components mult
uschindler commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-158716
The above check should auto-vectorize in Hotspot, so the check during
indexing/searching should be cheap.
--
This is an automated message from the Apache Git Service.
To respond to
benwtrent commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226584452
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Fou
benwtrent commented on PR #12253:
URL: https://github.com/apache/lucene/pull/12253#issuecomment-1587256870
@alessandrobenedetti , I see many `to discuss` messages, but no discussion
clearly labeled in Lucene or the dev lists. My searching skills may be failing
me. Could you link these discu
uschindler commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226632668
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Fo
uschindler commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226634599
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Fo
eliaporciani commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226642488
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software
benwtrent commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587319629
@uschindler I think this change (`v[i] * v[i] * v.length`) is getting
complicated. It really makes me think about if we should do any complex
infinite checking other than `isFinite(flo
eliaporciani commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226655236
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software
benwtrent commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226688194
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Fou
uschindler commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587489458
That would be my plan! Let's open a new issue and discuss that there.
So we should merge this PR for now. Before doing that I will only add the
"vector size>0" check in the API,
eliaporciani commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226796096
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software
eliaporciani commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226796096
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software
cfournie commented on code in PR #12249:
URL: https://github.com/apache/lucene/pull/12249#discussion_r1226823890
##
lucene/core/src/test/org/apache/lucene/util/graph/TestGraphTokenStreamFiniteStrings.java:
##
@@ -16,6 +16,9 @@
*/
package org.apache.lucene.util.graph;
+impor
cfournie commented on code in PR #12249:
URL: https://github.com/apache/lucene/pull/12249#discussion_r1226824178
##
lucene/CHANGES.txt:
##
@@ -80,6 +80,8 @@ Bug Fixes
* GITHUB#12220: Hunspell: disallow hidden title-case entries from compound
middle/end
+* LUCENE-10181: Res
benwtrent commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226828445
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Fou
uschindler commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587551398
Hi, I added the dimension check to the constructor which uses a predefined
field type. For the query it can't be done in the constructor, as we do not
know the field type. The query w
uschindler commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226849047
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Fo
jpountz commented on code in PR #12249:
URL: https://github.com/apache/lucene/pull/12249#discussion_r1226846973
##
lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java:
##
@@ -45,6 +45,8 @@
* different paths of the {@link Automaton}.
*/
publi
uschindler commented on PR #12294:
URL: https://github.com/apache/lucene/pull/12294#issuecomment-1587577046
I talked with @mcimadamore in the meantime. At least for the Panama Foreign
API (FFI) there are no changes planned anymore, so this is safe to merge into
main branch.
Keep in m
mikemccand commented on PR #12357:
URL: https://github.com/apache/lucene/pull/12357#issuecomment-1587578171
`wikimedium10k` is a tiny corpus, really just for quick testing that your
benchy is setup correctly. If you look at the QPS of each task they are
ridiculously high :)
In the f
jbellis commented on PR #12281:
URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587588219
SGTM, thank you for taking the lead on investigating the root cause of the
NaNs!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on t
msokolov commented on PR #12360:
URL: https://github.com/apache/lucene/pull/12360#issuecomment-1587648171
It sounds to me as if order is no longer important. We wouldn't change the
order because these are stored in the index, but if say someone was adding a
new option for some reason, they
alessandrobenedetti commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226909040
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache So
alessandrobenedetti commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226910532
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache So
alessandrobenedetti commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226910532
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache So
jpountz merged PR #12249:
URL: https://github.com/apache/lucene/pull/12249
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz closed issue #11218:
GraphTokenStreamFiniteStrings#articulationPointsRecurse can run into stack
overflows [LUCENE-10181]
URL: https://github.com/apache/lucene/issues/11218
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
msokolov commented on issue #12358:
URL: https://github.com/apache/lucene/issues/12358#issuecomment-1587661349
> No no. There is no specific capitalization :). I was correcting tantivity
-> tantivy.
Just to keep everyone accurate. the original misspelling was "tantitvy" not
"tantivit
benwtrent commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1226928216
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Fou
benwtrent commented on code in PR #12281:
URL: https://github.com/apache/lucene/pull/12281#discussion_r1226928953
##
lucene/core/src/java/org/apache/lucene/document/KnnByteVectorField.java:
##
@@ -137,7 +137,12 @@ public KnnByteVectorField(String name, byte[] vector,
FieldType
sohami commented on issue #12347:
URL: https://github.com/apache/lucene/issues/12347#issuecomment-1587721480
@jpountz and @javanna Seems like there is plan of feature freeze by this Fri
(06/16) for next release. I was really hoping if we can get it in the 9.x
release and would appreciate yo
jmazanec15 commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1587866962
@benwtrent Thanks for taking a look! Interesting, I am not too familiar
with MBW. Ill take a look.
The main reason I wanted to avoid returning the dot product was to avoi
uschindler commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1227084420
##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java:
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Fo
benwtrent commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1587878704
> Also, what do you mean by scaling continuously?
Your algorithm is piecewise vs. continuous. But, I am not sure how we could
do a continuous transformation (everything is
uschindler merged PR #12294:
URL: https://github.com/apache/lucene/pull/12294
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
uschindler commented on PR #12294:
URL: https://github.com/apache/lucene/pull/12294#issuecomment-1587934813
Hi @ChrisHegarty, i think we can start with a JDK 21 branch for Panama
Vectors, the directories are there. Of course, you need to regenerate the
apijars with vector classes (that's no
uschindler commented on PR #12294:
URL: https://github.com/apache/lucene/pull/12294#issuecomment-1587947849
I also removed a relic of the old filename of the apijar in the 9.x folder.
It was not deleted while cherry-picking when we integrated vector API.
--
This is an automated message fr
ChrisHegarty commented on PR #12294:
URL: https://github.com/apache/lucene/pull/12294#issuecomment-1587973337
> Hi @ChrisHegarty, i think we can start with a JDK 21 branch for Panama
Vectors, the directories are there. Of course, you need to regenerate the
apijars with vector classes (that'
ChrisHegarty opened a new pull request, #12363:
URL: https://github.com/apache/lucene/pull/12363
Port of the Java 20 version of this code to Java 21.
* cut'n'paste VectorUtilPanamaProvider - there are opportunities to
eventually remove some workarounds, but this is ok for now
* Upd
ChrisHegarty commented on PR #12363:
URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588061449
I verified this locally by running the tests :
```
JENKINS_XX=true ./gradlew :lucene:core:test --tests
"org.apache.lucene.util.TestVectorUtil**" -Pvalidation.git.failOnModi
ChrisHegarty commented on code in PR #12363:
URL: https://github.com/apache/lucene/pull/12363#discussion_r1227226190
##
lucene/CHANGES.txt:
##
@@ -139,8 +139,8 @@ New Features
* GITHUB#12257: Create OnHeapHnswGraphSearcher to let OnHeapHnswGraph to be
searched in a thread-safe
uschindler commented on PR #12363:
URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588081418
That was fast. 😍🐇
I will check tomorrow morning but this looks great. @rmuir wanted to run
the benchmark with 21, too.
Did you use latest openjdk 21-ea build to extract?
uschindler commented on code in PR #12363:
URL: https://github.com/apache/lucene/pull/12363#discussion_r1227231734
##
lucene/core/src/java21/org/apache/lucene/util/VectorUtilPanamaProvider.java:
##
@@ -0,0 +1,493 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
uschindler commented on code in PR #12363:
URL: https://github.com/apache/lucene/pull/12363#discussion_r1227232392
##
lucene/core/src/java21/org/apache/lucene/util/VectorUtilPanamaProvider.java:
##
@@ -0,0 +1,493 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
uschindler commented on code in PR #12363:
URL: https://github.com/apache/lucene/pull/12363#discussion_r1227235078
##
lucene/CHANGES.txt:
##
@@ -139,8 +139,8 @@ New Features
* GITHUB#12257: Create OnHeapHnswGraphSearcher to let OnHeapHnswGraph to be
searched in a thread-safety
uschindler commented on PR #12363:
URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588094332
> That was fast. 😍🐇
>
> I will check tomorrow morning but this looks great. @rmuir wanted to run
the benchmark with 21, too.
>
> Did you use latest openjdk 21-ea build to
ChrisHegarty commented on code in PR #12363:
URL: https://github.com/apache/lucene/pull/12363#discussion_r1227243997
##
lucene/core/src/java21/org/apache/lucene/util/VectorUtilPanamaProvider.java:
##
@@ -0,0 +1,493 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
tflobbe commented on PR #12360:
URL: https://github.com/apache/lucene/pull/12360#issuecomment-1588112203
But we do things like:
> @Override
public boolean hasFreqs() {
return
fieldInfo.getIndexOptions().compareTo(IndexOptions.DOCS_AND_FREQS) >= 0;
}
So we need to k
uschindler commented on code in PR #12363:
URL: https://github.com/apache/lucene/pull/12363#discussion_r1227252561
##
lucene/core/src/java21/org/apache/lucene/util/VectorUtilPanamaProvider.java:
##
@@ -0,0 +1,493 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
uschindler commented on PR #12363:
URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588134825
I tried it out: I reverted the APIJAR changes and only left in following
changes:
- VectorUtilsProvider.java to enable of 21
- `vectorIncubatorJavaVersions = [ JavaVersion.VERSIO
ChrisHegarty commented on PR #12363:
URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588172519
> I tried it out: I reverted the APIJAR changes and only left in following
changes:
>
> * VectorUtilsProvider.java to enable of 21
> * `vectorIncubatorJavaVersions = [ Java
uschindler commented on PR #12363:
URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588191838
Maybe just leave a readme file in the folder of the Java file stating that
the impl is identical to Java 20.
--
This is an automated message from the Apache Git Service.
To respond
uschindler commented on PR #12363:
URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588209511
I checked the commits:
https://github.com/openjdk/jdk21/commits/master/src/jdk.incubator.vector/share/classes
There were some changes, but nothing that affects us. It is mostly
gf2121 merged PR #12324:
URL: https://github.com/apache/lucene/pull/12324
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
gf2121 opened a new pull request, #12364:
URL: https://github.com/apache/lucene/pull/12364
backport of https://github.com/apache/lucene/pull/12324
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
63 matches
Mail list logo