[GitHub] [lucene] searchivarius commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-07-11 Thread via GitHub
searchivarius commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1631808301 thank you [@jmazanec15](https://github.com/jmazanec15) : there's also an unpublished paper (I can share the preprint private) where we benchmarked HNSW for maximum inner produ

[GitHub] [lucene] searchivarius commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-07-11 Thread via GitHub
searchivarius commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1631795367 thank you @jmazanec15 : there's also an unpublished paper where benchmarked HNSW for maximum inner product search and it was just fine. In my thesis, I benchmarked SW-graph (w

[GitHub] [lucene] gautamworah96 commented on pull request #12427: StringsToAutomaton#build to take List as parameter instead of Collection

2023-07-11 Thread via GitHub
gautamworah96 commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1631629455 I was looking at this problem too and the change in test classes to meld arbitrary collections to `Lists` irked me as well. +1 to throw an `IllegalArgumentException` in `#a

[GitHub] [lucene] jpountz commented on pull request #12434: Add ParentJoin KNN support

2023-07-11 Thread via GitHub
jpountz commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1631586992 From a quick look, this lower level KNN collection API looks interesting. It has currently a high surface - presumably because extending the queue was easier to have a working prototype,

[GitHub] [lucene] jmazanec15 commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-07-11 Thread via GitHub
jmazanec15 commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1631577657 @benwtrent Interesting, Im still not sure if this approach is necessary. I spoke with @searchivarius who is the maintainer of nmslib, and he mentioned that there was some researc

[GitHub] [lucene] gsmiller commented on pull request #12427: StringsToAutomaton#build to take List as parameter instead of Collection

2023-07-11 Thread via GitHub
gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1631550554 Yeah, good points/questions! I'd be curious how much overhead it would actually add to sort the input when it's already sorted? But to take a step back for a moment, we also have

[GitHub] [lucene] benwtrent opened a new pull request, #12434: Add ParentJoin KNN support

2023-07-11 Thread via GitHub
benwtrent opened a new pull request, #12434: URL: https://github.com/apache/lucene/pull/12434 A `join` within Lucene is built by adding child-docs and parent-docs in order. Since our vector field already supports sparse indexing, it should be able to support parent join indexing. Ho

[GitHub] [lucene] gsmiller commented on pull request #12417: forutil add vectorized and scalar code

2023-07-11 Thread via GitHub
gsmiller commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1631529704 Those are interesting results @tang-hi. Thanks for sharing! I also notice a regression in `wildcard` in the vectorized approach as well. The prefix3 and wildcard queries both build an u

[GitHub] [lucene] sohami closed pull request #12348: Make SliceExecutor extensible and include method to computeSlices. This will allow different custom implementation to be plugged in with IndexSearc

2023-07-11 Thread via GitHub
sohami closed pull request #12348: Make SliceExecutor extensible and include method to computeSlices. This will allow different custom implementation to be plugged in with IndexSearcher to compute and execute slices on provided executor URL: https://github.com/apache/lucene/pull/12348 -- Thi

[GitHub] [lucene] sohami commented on pull request #12348: Make SliceExecutor extensible and include method to computeSlices. This will allow different custom implementation to be plugged in with Inde

2023-07-11 Thread via GitHub
sohami commented on PR #12348: URL: https://github.com/apache/lucene/pull/12348#issuecomment-1631500582 Closing this PR as it is solved by https://github.com/apache/lucene/pull/12374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [lucene] almogtavor commented on issue #12406: Register nested queries (ToParentBlockJoinQuery) to Lucene Monitor

2023-07-11 Thread via GitHub
almogtavor commented on issue #12406: URL: https://github.com/apache/lucene/issues/12406#issuecomment-1631109326 @romseygeek @dweiss I'd love to get feedback from you on the subject @jpountz @benwtrent I saw that Elasticsearch does have the option of percolating nested queries. I wond

[GitHub] [lucene] shubhamvishu commented on pull request #12183: Make some heavy query rewrites concurrent

2023-07-11 Thread via GitHub
shubhamvishu commented on PR #12183: URL: https://github.com/apache/lucene/pull/12183#issuecomment-1631040694 > I suspected that there would be cases when we would fork from the executor into itself, which causes deadlocks I see , thanks for the explanation @jpountz ! > Now th

[GitHub] [lucene] tang-hi commented on pull request #12417: forutil add vectorized and scalar code

2023-07-11 Thread via GitHub
tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1631024688 Hi, everyone. I tried the lazy compute idea that I mentioned before. First, I attempted to change the code in the main branch to lazy compute, the benchmark results didn't show much dif

[GitHub] [lucene] jpountz commented on pull request #12426: Introduce VerifyingQuery

2023-07-11 Thread via GitHub
jpountz commented on PR #12426: URL: https://github.com/apache/lucene/pull/12426#issuecomment-1630974964 LongField, DoubleField, FloatField and IntField provide the same index-time enforcement as KeywordField. Plus there is a similar pull request about doing the same for geo points: https:/

[GitHub] [lucene] dantuzi opened a new pull request, #12433: Introduce the similarity as boost functionality to the Word2VecSynonyFilter

2023-07-11 Thread via GitHub
dantuzi opened a new pull request, #12433: URL: https://github.com/apache/lucene/pull/12433 ### Description This is the follow-up of https://github.com/apache/lucene/pull/12169 In the Word2VecSynonymFilter, when we extract the synonyms of a term, we have the cosine similarity b

[GitHub] [lucene] mikemccand commented on pull request #12426: Introduce VerifyingQuery

2023-07-11 Thread via GitHub
mikemccand commented on PR #12426: URL: https://github.com/apache/lucene/pull/12426#issuecomment-1630910575 It will indeed be very costly in some cases, and likely should only be used in a test context. Maybe we move this to `test-framework`, and rename to `SlowVerifyingQuery`? Or .

[GitHub] [lucene] jpountz commented on pull request #12183: Make some heavy query rewrites concurrent

2023-07-11 Thread via GitHub
jpountz commented on PR #12183: URL: https://github.com/apache/lucene/pull/12183#issuecomment-1630848267 Thanks for sharing the failing seeds. When I saw your message, I suspected that there would be cases when we would fork from the executor into itself, which causes deadlocks. I saw the f

[GitHub] [lucene] slow-J commented on issue #10643: Remove redundant fieldType.stored() check [LUCENE-9603]

2023-07-11 Thread via GitHub
slow-J commented on issue #10643: URL: https://github.com/apache/lucene/issues/10643#issuecomment-1630519546 This can be resolved, it was removed in https://github.com/apache/lucene/commit/d1297e52d91dbbcd83012c438ee0122d96808fa8 . -- This is an automated message from the Apache Git Serv

[GitHub] [lucene] jpountz commented on pull request #12405: Skip docs with Docvalues in NumericLeafComparator

2023-07-11 Thread via GitHub
jpountz commented on PR #12405: URL: https://github.com/apache/lucene/pull/12405#issuecomment-1630496789 We probably need to change `SortField#getComparator`. We introduced `enableSkipping` in the past, so that the first comparator in the chain could know it can dynamically prune, maybe it

[GitHub] [lucene] LuXugang commented on pull request #12405: Skip docs with Docvalues in NumericLeafComparator

2023-07-11 Thread via GitHub
LuXugang commented on PR #12405: URL: https://github.com/apache/lucene/pull/12405#issuecomment-1630431580 > I'm not clear if this change is still correct when there is another sort field after the one that gets optimized Thanks @jpountz. Oh, you are right, Sorry for missing this.

[GitHub] [lucene] shubhamvishu commented on pull request #12183: Make some heavy query rewrites concurrent

2023-07-11 Thread via GitHub
shubhamvishu commented on PR #12183: URL: https://github.com/apache/lucene/pull/12183#issuecomment-1630275732 @jpountz As I was running the tests(`./gradlew test`) with this PR I observed it getting stuck on `TestPayloadCheckQuery` or `TestBasics` a couple of times intermittently. Looking i