Re: [PR] python: enable all linting checks and type-hint the code [lucene]

2025-03-06 Thread via GitHub
rmuir commented on PR #14326: URL: https://github.com/apache/lucene/pull/14326#issuecomment-2705165587 By the way, thank you for reviewing these changes! I know it isnt fun, and I know i'll cause something to break: but these scripts are important (releases, backwards-compatibility, communi

Re: [PR] python: enable all linting checks and type-hint the code [lucene]

2025-03-06 Thread via GitHub
stefanvodita commented on code in PR #14326: URL: https://github.com/apache/lucene/pull/14326#discussion_r1984191116 ## dev-tools/scripts/pyproject.toml: ## @@ -1,21 +1,163 @@ [tool.pyright] +pythonVersion = "3.12" venvPath = "." venv = ".venv" -# TODO: improve! -# typeChecki

Re: [PR] python: enable all linting checks and type-hint the code [lucene]

2025-03-06 Thread via GitHub
rmuir commented on code in PR #14326: URL: https://github.com/apache/lucene/pull/14326#discussion_r1984173422 ## dev-tools/scripts/create_line_file_docs.py: ## @@ -60,7 +60,7 @@ def compress_with_seek_points(file_name_in, file_name_out, num_seek_points): break

Re: [PR] python: enable all linting checks and type-hint the code [lucene]

2025-03-06 Thread via GitHub
rmuir commented on code in PR #14326: URL: https://github.com/apache/lucene/pull/14326#discussion_r1984174681 ## dev-tools/scripts/releaseWizard.py: ## @@ -887,7 +873,7 @@ def get_release_version(): version = Version.parse(v) except Exception: print("Not a valid ver

Re: [PR] python: enable all linting checks and type-hint the code [lucene]

2025-03-06 Thread via GitHub
rmuir commented on code in PR #14326: URL: https://github.com/apache/lucene/pull/14326#discussion_r1984178330 ## dev-tools/scripts/releaseWizard.py: ## @@ -887,7 +873,7 @@ def get_release_version(): version = Version.parse(v) except Exception: print("Not a valid ver

Re: [PR] python: enable all linting checks and type-hint the code [lucene]

2025-03-06 Thread via GitHub
stefanvodita commented on code in PR #14326: URL: https://github.com/apache/lucene/pull/14326#discussion_r1984194382 ## dev-tools/scripts/create_line_file_docs.py: ## @@ -60,7 +60,7 @@ def compress_with_seek_points(file_name_in, file_name_out, num_seek_points): break

Re: [PR] python: enable all linting checks and type-hint the code [lucene]

2025-03-06 Thread via GitHub
stefanvodita commented on code in PR #14326: URL: https://github.com/apache/lucene/pull/14326#discussion_r1984194652 ## dev-tools/scripts/releaseWizard.py: ## @@ -887,7 +873,7 @@ def get_release_version(): version = Version.parse(v) except Exception: print("Not a va

Re: [PR] reformat the python code with 'make reformat' and enable format in CI check [lucene]

2025-03-06 Thread via GitHub
rmuir commented on PR #14322: URL: https://github.com/apache/lucene/pull/14322#issuecomment-2705113695 it is merged, the squash-based merging is not useful for situations like this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-03-06 Thread via GitHub
msokolov commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2704319560 Another wrinkle here is that KnngraphTester currently does not ever run its queries multithreaded; it does not pass an Executor when it creates IndexSearcher. Because of that we are not

Re: [PR] python: enable all linting checks and type-hint the code [lucene]

2025-03-06 Thread via GitHub
rmuir commented on code in PR #14326: URL: https://github.com/apache/lucene/pull/14326#discussion_r1984161499 ## dev-tools/scripts/create_line_file_docs.py: ## @@ -60,7 +60,7 @@ def compress_with_seek_points(file_name_in, file_name_out, num_seek_points): break

Re: [PR] [WIP] A specialized Trie for Block Tree Index [lucene]

2025-03-06 Thread via GitHub
gf2121 closed pull request #14333: [WIP] A specialized Trie for Block Tree Index URL: https://github.com/apache/lucene/pull/14333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-06 Thread via GitHub
jpountz commented on PR #14304: URL: https://github.com/apache/lucene/pull/14304#issuecomment-2705628064 12.5% faster search overall if I read correctly? This is pretty cool! We've been excited about smaller speedups many times in Lucene's history. :) -- This is an automated message from

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-06 Thread via GitHub
jpountz commented on PR #14304: URL: https://github.com/apache/lucene/pull/14304#issuecomment-2705644072 Hmm maybe I got confused, as quantization only needs to be applied to the query vector at query time, so the search speedup is noise and I should rather be looking at the indexing speedu

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-03-06 Thread via GitHub
kaivalnp commented on code in PR #14178: URL: https://github.com/apache/lucene/pull/14178#discussion_r1982959099 ## lucene/sandbox/src/java22/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java: ## @@ -0,0 +1,457 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-03-06 Thread via GitHub
kaivalnp commented on code in PR #14178: URL: https://github.com/apache/lucene/pull/14178#discussion_r1982960581 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java: ## @@ -0,0 +1,488 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [I] Incorrect use of fsync [lucene]

2025-03-06 Thread via GitHub
viliam-durina commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2704000723 TL;DR: I think this issue is still relevant to Lucene today. Explanation: Quoting [from here](https://wiki.postgresql.org/wiki/Fsync_Errors): > Linux 4.13 and 4

[I] Incorrect use of fsync [lucene]

2025-03-06 Thread via GitHub
viliam-durina opened a new issue, #14334: URL: https://github.com/apache/lucene/issues/14334 ### Description According to [this answer](https://stackoverflow.com/a/50158433/952135), calling `fsync` after the file descriptor is closed gives no guarantees about what's persistent on dis

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-03-06 Thread via GitHub
benwtrent commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2703746035 > @benwtrent @navneet1v I wonder if either of you were able to replicate benchmarks? I didn't want to leave you hanging @kaivalnp, especially after you have obviously put a ton

[PR] [WIP] A specialized Trie for Block Tree Index [lucene]

2025-03-06 Thread via GitHub
gf2121 opened a new pull request, #14333: URL: https://github.com/apache/lucene/pull/14333 **Context** * #12631 introduced a MSBVLong format to encode the first fp of FST output. It is the first time we benefit from the output sharing in blocktree. The change reduces ~13% tip size, i

Re: [I] Incorrect use of fsync [lucene]

2025-03-06 Thread via GitHub
msokolov commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2703792662 Interesting: I would note that info is from ~8 years ago; I wonder if it is still valid. Also, if you sync() and then close() isn't there a risk there may be intervening writes tha

Re: [PR] reformat the python code with 'make reformat' and enable format in CI check [lucene]

2025-03-06 Thread via GitHub
rmuir commented on PR #14322: URL: https://github.com/apache/lucene/pull/14322#issuecomment-2705096447 thank you @stefanvodita for reviewing (in both places). I am sure it won't go 100% smooth and there will be short-term problems, but I think in the long term it will make things easier.

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-03-06 Thread via GitHub
msokolov commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2705100651 > Agreed, I think a "num_search_threads" parameter would be beneficial. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] python: enable all linting checks and type-hint the code [lucene]

2025-03-06 Thread via GitHub
rmuir commented on code in PR #14326: URL: https://github.com/apache/lucene/pull/14326#discussion_r1984165645 ## dev-tools/scripts/pyproject.toml: ## @@ -1,21 +1,163 @@ [tool.pyright] +pythonVersion = "3.12" venvPath = "." venv = ".venv" -# TODO: improve! -# typeCheckingMode

Re: [PR] python: enable all linting checks and type-hint the code [lucene]

2025-03-06 Thread via GitHub
stefanvodita commented on code in PR #14326: URL: https://github.com/apache/lucene/pull/14326#discussion_r1984136378 ## dev-tools/scripts/create_line_file_docs.py: ## @@ -60,7 +60,7 @@ def compress_with_seek_points(file_name_in, file_name_out, num_seek_points): break

Re: [PR] reformat the python code with 'make reformat' and enable format in CI check [lucene]

2025-03-06 Thread via GitHub
rmuir closed pull request #14322: reformat the python code with 'make reformat' and enable format in CI check URL: https://github.com/apache/lucene/pull/14322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-03-06 Thread via GitHub
benwtrent commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2704331980 > Net/net I think we ought to be adding some multithreaded test capability to KnnGraphTester. Agreed, I think a "num_search_threads" parameter would be beneficial. Then the sear

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-03-06 Thread via GitHub
dungba88 commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2703590078 Sorry I'm back today and will try this idea first > - What if we just set the per-leaf k to the same as global k in the second pass, and stop at second pass? I'm curious about the