[GitHub] [lucene] mohamedniyaz1996 opened a new issue, #11902: Customization of Edit distance costs for different operations

2022-11-07 Thread GitBox
mohamedniyaz1996 opened a new issue, #11902: URL: https://github.com/apache/lucene/issues/11902 ### Description I came across this python library [weighted-levenshtein](https://pypi.org/project/weighted-levenshtein/) which has a way to specify different costs for insertion, deletion,

[GitHub] [lucene] jpountz commented on pull request #11875: Usability improvements for timeout support in IndexSearcher

2022-11-07 Thread GitBox
jpountz commented on PR #11875: URL: https://github.com/apache/lucene/pull/11875#issuecomment-1305387673 Historically configuring timeouts on searches has been too complicated: users had to wrap their collector with a `TimeLimitingCollector` and to wrap their readers with an `ExitableDirect

[GitHub] [lucene] jpountz opened a new pull request, #11903: Speed up sorting on unique string fields.

2022-11-07 Thread GitBox
jpountz opened a new pull request, #11903: URL: https://github.com/apache/lucene/pull/11903 Since increasing the number of hits retrieved in nightly benchmarks from 10 to 100, the performance of sorting documents by title dropped back to the level it had before introducing dynamic pruning.

[GitHub] [lucene] LuXugang merged pull request #11884: Simplify the logic of matchAll() in IndexSortSortedNumericDocValuesRangeQuery

2022-11-07 Thread GitBox
LuXugang merged PR #11884: URL: https://github.com/apache/lucene/pull/11884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[GitHub] [lucene] jpountz commented on pull request #11895: count() in BooleanQuery could be early quit

2022-11-07 Thread GitBox
jpountz commented on PR #11895: URL: https://github.com/apache/lucene/pull/11895#issuecomment-1305558470 This makes me think that we could also enhance this logic to count queries that have a mix of `SHOULD` and `MUST_NOT` clauses, in case this is something you are interested in looking int

[GitHub] [lucene] donnerpeter opened a new pull request, #11904: [hunspell] speed up WordFormGenerator

2022-11-07 Thread GitBox
donnerpeter opened a new pull request, #11904: URL: https://github.com/apache/lucene/pull/11904 Various minor optimizations to reduce allocations and unnecessary checks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [lucene] dweiss commented on a diff in pull request #11904: [hunspell] speed up WordFormGenerator

2022-11-07 Thread GitBox
dweiss commented on code in PR #11904: URL: https://github.com/apache/lucene/pull/11904#discussion_r1015435007 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/AffixCondition.java: ## @@ -31,8 +31,30 @@ */ interface AffixCondition { String ALWAYS_TRUE

[GitHub] [lucene] donnerpeter commented on a diff in pull request #11904: [hunspell] speed up WordFormGenerator

2022-11-07 Thread GitBox
donnerpeter commented on code in PR #11904: URL: https://github.com/apache/lucene/pull/11904#discussion_r1015510285 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/AffixCondition.java: ## @@ -31,8 +31,30 @@ */ interface AffixCondition { String ALWAYS

[GitHub] [lucene] jpountz commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-07 Thread GitBox
jpountz commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1305835975 I guess that encoding each block with a different number of bits per value would mostly help if node IDs are somewhat clustered so that the set of neighbors to a given node would be clos

[GitHub] [lucene] mikemccand commented on issue #11885: Refactor and generalize file deleter

2022-11-07 Thread GitBox
mikemccand commented on issue #11885: URL: https://github.com/apache/lucene/issues/11885#issuecomment-1306025198 I'm not sure these classes should be made public? What is the use-case where this would help? But big +1 to somehow un-fork them! It's crazy such a scary functionality i

[GitHub] [lucene] donnerpeter merged pull request #11904: [hunspell] speed up WordFormGenerator

2022-11-07 Thread GitBox
donnerpeter merged PR #11904: URL: https://github.com/apache/lucene/pull/11904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

[GitHub] [lucene] zhaih commented on issue #11885: Refactor and generalize file deleter

2022-11-07 Thread GitBox
zhaih commented on issue #11885: URL: https://github.com/apache/lucene/issues/11885#issuecomment-1306203835 @mikemccand Like when we want to do a segment replication (indexer sends new segments to searcher instead of searcher consuming live updates) but decided not to use `nrt` module. Then

[GitHub] [lucene] benwtrent opened a new pull request, #11905: Fix integer overflow when seeking the vector index for connections

2022-11-07 Thread GitBox
benwtrent opened a new pull request, #11905: URL: https://github.com/apache/lucene/pull/11905 This bug has been around since 9.1. It relates directly to the number of nodes that are contained in the level 0 of the HNSW graph. Since level 0 contains all the nodes, this implies the following:

[GitHub] [lucene] msokolov commented on pull request #11852: Luke Webapp

2022-11-07 Thread GitBox
msokolov commented on PR #11852: URL: https://github.com/apache/lucene/pull/11852#issuecomment-1306374073 I found this simple thing useful without all that, but nobody else seems to like the idea - it's fine, I won't push it On Sun, Nov 6, 2022 at 7:37 AM Tomoko Uchida ***@***.***>

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-07 Thread GitBox
rmuir commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1306601477 Not good: thanks for splitting this out from your other PR! wonder if we can start cooking up something similar to #11867 ? Looks like we need more vectors but they can have less dimension

[GitHub] [lucene] rmuir opened a new pull request, #11906: Add monster test for many knn docs

2022-11-07 Thread GitBox
rmuir opened a new pull request, #11906: URL: https://github.com/apache/lucene/pull/11906 Goal is to reproduce #11905 Basically I adapted the test from #11867 as a start, tweaked the indexing and merging to try to get it running reasonably, since it needs to make single segment of 16

[GitHub] [lucene] rmuir commented on pull request #11905: Fix integer overflow when seeking the vector index for connections

2022-11-07 Thread GitBox
rmuir commented on PR #11905: URL: https://github.com/apache/lucene/pull/11905#issuecomment-1306713558 ok i tried to make a stab at a test in that draft PR, but its still pretty slow so I'm gonna leave it running. we have to start building up tests for these cases because this seems like de