[GitHub] [lucene] mikemccand commented on pull request #12526: Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.

2023-08-31 Thread via GitHub
mikemccand commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1700726359 Wow, impressive! Maybe we should add `OrHighVeryLow` to nightly benchy too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [lucene] mikemccand commented on issue #7350: FieldCacheRangeFilter missing from MIGRATE.html [LUCENE-6288]

2023-08-31 Thread via GitHub
mikemccand commented on issue #7350: URL: https://github.com/apache/lucene/issues/7350#issuecomment-1700812722 `FieldCache` is indeed long gone from Lucene -- I'm not sure how we missed adding this to `MIGRATE.txt`. For efficient range filtering, it is usually best now to index your n

[GitHub] [lucene] mikemccand commented on issue #7047: Populate MIGRATE.txt with useful information for upgrade to Lucene 5.0 [LUCENE-5985]

2023-08-31 Thread via GitHub
mikemccand commented on issue #7047: URL: https://github.com/apache/lucene/issues/7047#issuecomment-1700832220 @PenghaiZhang sorry about all this. Given how ancient these Lucene versions are, few people even remember the specifics of how to do this migration. Maybe you could jot some

[GitHub] [lucene] mikemccand commented on issue #12527: Optimize readInts24 performance for DocIdsWriter

2023-08-31 Thread via GitHub
mikemccand commented on issue #12527: URL: https://github.com/apache/lucene/issues/12527#issuecomment-1700871714 I like this idea, reducing possible IO overhead. But I tested it with `luceneutil` on `wikimediumall`: ``` TaskQPS base StdDev QP

[GitHub] [lucene] benwtrent commented on pull request #12421: Concurrent hnsw graph and builder, take two

2023-08-31 Thread via GitHub
benwtrent commented on PR #12421: URL: https://github.com/apache/lucene/pull/12421#issuecomment-1700939168 Haven't forgotten about this. Just been bogged down with other things. Hope to revisit again soon! -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [lucene] mikemccand opened a new pull request, #12530: Fix CheckIndex to detect major corruption with old (not the latest) commit point

2023-08-31 Thread via GitHub
mikemccand opened a new pull request, #12530: URL: https://github.com/apache/lucene/pull/12530 ### Description Relates #7820. `CheckIndex` today only detects and exorcises corruption with the latest commit point, yet `IndexWriter` will be angry on init if there are older commit

[GitHub] [lucene] mikemccand commented on issue #7820: CheckIndex cannot "fix" indexes that have individual segments with missing or corrupt .si files because sanity checks will fail trying to read th

2023-08-31 Thread via GitHub
mikemccand commented on issue #7820: URL: https://github.com/apache/lucene/issues/7820#issuecomment-1700986388 > Third off, there is possibly a separate improvement we could make to IndexWriter, to remove segments_N files before removing all other files when a commit point is deleted, to tr

[GitHub] [lucene] mikemccand commented on pull request #12530: Fix CheckIndex to detect major corruption with old (not the latest) commit point

2023-08-31 Thread via GitHub
mikemccand commented on PR #12530: URL: https://github.com/apache/lucene/pull/12530#issuecomment-1701119507 Thanks @rmuir > as far as the exorcise stuff, I think a good next step would be to start writing some unit tests that invoke exorcise? we have a grand total of zero tests exer

[GitHub] [lucene] benwtrent commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-31 Thread via GitHub
benwtrent commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1311719842 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene94/Lucene94HnswVectorsWriter.java: ## @@ -630,7 +621,8 @@ private abstract static class Fi

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-31 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1311935419 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -172,106 +83,36 @@ public static KnnCollector search( * @return a set of collected

[GitHub] [lucene] Jeevananthan-23 opened a new issue, #12531: Virtual threads and Lucene (support async tasks)

2023-08-31 Thread via GitHub
Jeevananthan-23 opened a new issue, #12531: URL: https://github.com/apache/lucene/issues/12531 ### Description In **LuceneNet**(`C#`) we had a long conversation about adding support for [Async API issue](https://github.com/apache/lucenenet/issues/763) but the scope of the project was

[GitHub] [lucene] benwtrent commented on pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-31 Thread via GitHub
benwtrent commented on PR #12529: URL: https://github.com/apache/lucene/pull/12529#issuecomment-1701731417 Ran a benchmark on Lucene util and here are the results, candidate (this PR) is consistently slightly slower. ``` recall latency nDocfanout maxConn beamWidth visit

[GitHub] [lucene] zhaih commented on a diff in pull request #12480: Enhancement 11236 lazy compute similarity score

2023-08-31 Thread via GitHub
zhaih commented on code in PR #12480: URL: https://github.com/apache/lucene/pull/12480#discussion_r1312372706 ## lucene/CHANGES.txt: ## @@ -90,6 +90,8 @@ Optimizations * GITHUB#12408: Lazy initialization improvements for Facets implementations when there are segments with no

[GitHub] [lucene] Tony-X commented on issue #12513: Try out a tantivy's term dictionary format

2023-08-31 Thread via GitHub
Tony-X commented on issue #12513: URL: https://github.com/apache/lucene/issues/12513#issuecomment-1701907189 I'd like to seek for some advices regarding the situation I am in -- I want to preserve the nice properties of the tantivy's termdict as I port it over for Lucene 1. defini

[GitHub] [lucene] easyice opened a new pull request, #12532: Update outdated comment about maxPointsInLeafNode in BKD tree

2023-08-31 Thread via GitHub
easyice opened a new pull request, #12532: URL: https://github.com/apache/lucene/pull/12532 ### Description Sice https://github.com/apache/lucene-solr/pull/1464 , the default for maxPointsPerLeafNode is changed from 1024 to 512, some comments is outdated. -- This is an

[GitHub] [lucene] zhaih commented on pull request #12480: Enhancement 11236 lazy compute similarity score

2023-08-31 Thread via GitHub
zhaih commented on PR #12480: URL: https://github.com/apache/lucene/pull/12480#issuecomment-1702214309 @Jackyrie2 Seems the precommit fails? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [lucene] zhaih opened a new issue, #12533: Init HNSW merge with graph containing deleted documents

2023-08-31 Thread via GitHub
zhaih opened a new issue, #12533: URL: https://github.com/apache/lucene/issues/12533 ### Description Currently when we're merging HNSW graphs we're able to start with an exiting graph but not inserting nodes from scratch thanks to #12050. But we have set a constraint that the init gr

[GitHub] [lucene] zhaih commented on issue #12440: Make HNSW merges faster

2023-08-31 Thread via GitHub
zhaih commented on issue #12440: URL: https://github.com/apache/lucene/issues/12440#issuecomment-1702229556 One problem with optimization #12050 is that with a hybrid text-embedding index you can hardly have even one big segment that do not have the deletion, which makes real world index me

[GitHub] [lucene] zhaih commented on issue #12440: Make HNSW merges faster

2023-08-31 Thread via GitHub
zhaih commented on issue #12440: URL: https://github.com/apache/lucene/issues/12440#issuecomment-1702237913 Another idea I have is a bit wild: what if we do less merge? For example, if we have segment 1,2,3,4 wants to merge and form a new segment, can we just leave the HNSW graphs as-