[GitHub] [lucene] vsop-479 commented on pull request #11722: Binary search the entries when all suffixes have the same length in a leaf block.

2022-09-28 Thread GitBox
vsop-479 commented on PR #11722: URL: https://github.com/apache/lucene/pull/11722#issuecomment-1260490249 @jpountz Thranks for your review and suggestion. I have added a CHANGES entry and assert term value code. Please have a review. -- This is an automated message from the Apache Git

[GitHub] [lucene] uschindler commented on issue #11827: Release manager should review lucene benchmarks before building release candidates

2022-09-28 Thread GitBox
uschindler commented on issue #11827: URL: https://github.com/apache/lucene/issues/11827#issuecomment-1260501918 I fully agree, some checks should be done. But here are a few bits that came into my mind: - The @mikemccand benchmarks are running against main branch only. So the first ch

[GitHub] [lucene] jpountz commented on a diff in pull request #11722: Binary search the entries when all suffixes have the same length in a leaf block.

2022-09-28 Thread GitBox
jpountz commented on code in PR #11722: URL: https://github.com/apache/lucene/pull/11722#discussion_r982050404 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BasePostingsFormatTestCase.java: ## @@ -367,6 +367,53 @@ public void testGhosts() throws Exception {

[GitHub] [lucene] Mahdi-Seeker commented on issue #10177: Introduce IVFFlat to Lucene for ANN similarity search [LUCENE-9136]

2022-09-28 Thread GitBox
Mahdi-Seeker commented on issue #10177: URL: https://github.com/apache/lucene/issues/10177#issuecomment-1260514435 Hi guys Thanks for your great job on Lucene and specially this ANN search! Any progress on this issue? We're trying to use vector search, but HNSW seems to take too mu

[GitHub] [lucene] iverase commented on issue #11824: Performance regression on LatLonPoint#newPolygonQuery

2022-09-28 Thread GitBox
iverase commented on issue #11824: URL: https://github.com/apache/lucene/issues/11824#issuecomment-1260667934 Fix seems to bring performance back to previous levels: https://user-images.githubusercontent.com/29038686/192749125-0ac0b341-b2cb-4395-991c-9f676322592a.png";> -- This

[GitHub] [lucene] jpountz commented on a diff in pull request #1039: LUCENE-10635: Ensure test coverage for WANDScorer by using a test query

2022-09-28 Thread GitBox
jpountz commented on code in PR #1039: URL: https://github.com/apache/lucene/pull/1039#discussion_r982215123 ## lucene/core/src/test/org/apache/lucene/search/TestWANDScorer.java: ## @@ -947,4 +988,82 @@ public long cost() { }; } } + + private static class WANDSco

[GitHub] [lucene] mikemccand commented on issue #11827: Release manager should review lucene benchmarks before building release candidates

2022-09-28 Thread GitBox
mikemccand commented on issue #11827: URL: https://github.com/apache/lucene/issues/11827#issuecomment-1260778045 > yup. Possibly too if Mike M is bored he could implement an alarming system :) or export the data somehow so we could bolt one on the side? Actually I rather like the alar

[GitHub] [lucene] mikemccand commented on issue #11824: Performance regression on LatLonPoint#newPolygonQuery

2022-09-28 Thread GitBox
mikemccand commented on issue #11824: URL: https://github.com/apache/lucene/issues/11824#issuecomment-1260787630 Thanks for catching this @iverase and the quick fix, and the follow-on issue to better detect such regressions before release: #11827 -- This is an automated message from the A

[GitHub] [lucene] mikemccand commented on issue #11827: Release manager should review lucene benchmarks before building release candidates

2022-09-28 Thread GitBox
mikemccand commented on issue #11827: URL: https://github.com/apache/lucene/issues/11827#issuecomment-1260788095 This was a spinoff from #11824. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [lucene] jpountz opened a new issue, #11829: Reproducible TestShapeDocValues failure

2022-09-28 Thread GitBox
jpountz opened a new issue, #11829: URL: https://github.com/apache/lucene/issues/11829 ### Description The test seems to be creating invalid polygons. ``` 03:40:22 org.apache.lucene.document.TestShapeDocValues > testXYPolygonCentroid FAILED 03:40:22 WARNING: The Security

[GitHub] [lucene] nknize commented on issue #11824: Performance regression on LatLonPoint#newPolygonQuery

2022-09-28 Thread GitBox
nknize commented on issue #11824: URL: https://github.com/apache/lucene/issues/11824#issuecomment-1260938633 What's annoying is how incredibly trappy this override logic is. That a method call literally moving from `createWeight` to `getScorerSupplier` results in a 72.2% regression even sli

[GitHub] [lucene] mikemccand commented on a diff in pull request #11780: GH#11601: Add ability to compute reader states after refresh

2022-09-28 Thread GitBox
mikemccand commented on code in PR #11780: URL: https://github.com/apache/lucene/pull/11780#discussion_r982553986 ## lucene/core/src/java/org/apache/lucene/search/ReferenceManager.java: ## @@ -219,6 +219,36 @@ public final boolean maybeRefresh() throws IOException { return

[GitHub] [lucene] stefanvodita commented on a diff in pull request #11780: GH#11601: Add ability to compute reader states after refresh

2022-09-28 Thread GitBox
stefanvodita commented on code in PR #11780: URL: https://github.com/apache/lucene/pull/11780#discussion_r982630324 ## lucene/core/src/java/org/apache/lucene/search/ReferenceManager.java: ## @@ -219,6 +219,36 @@ public final boolean maybeRefresh() throws IOException { retur

[GitHub] [lucene] gsmiller commented on pull request #11828: TermInSetQuery optimization when all docs in a field match a term

2022-09-28 Thread GitBox
gsmiller commented on PR #11828: URL: https://github.com/apache/lucene/pull/11828#issuecomment-1261195636 > I assume we already have tests that cover this case? Good question. I'm going to go tweak our tests. We added tests that cover the completely dense case (i.e., all docs in a seg

[GitHub] [lucene] gsmiller commented on pull request #11803: DrillSideways optimizations

2022-09-28 Thread GitBox
gsmiller commented on PR #11803: URL: https://github.com/apache/lucene/pull/11803#issuecomment-1261198752 @zhaih I reexamined our test coverage and think we're in good shape already actually. We've got good coverage for covering drill-sideways correctness with multiple dimensions, etc. (inc

[GitHub] [lucene] zhaih commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
zhaih commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982692228 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982759362 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982764818 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [lucene] zhaih commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
zhaih commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982825433 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [lucene] zhaih commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
zhaih commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982825989 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [lucene] zhaih commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
zhaih commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982826316 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [lucene] gsmiller commented on pull request #11803: DrillSideways optimizations

2022-09-28 Thread GitBox
gsmiller commented on PR #11803: URL: https://github.com/apache/lucene/pull/11803#issuecomment-1261458849 @zhaih that's a good point and valid concern. I dug into the existing tests and it looks like we have lots of coverage _except_ that the majority of the coverage is using basic, single-

[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982891007 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982891128 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
mdmarshmallow commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982902771 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [lucene] mdmarshmallow commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
mdmarshmallow commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1261519604 @jpountz , in response to this: > I'm considering exposing write amplification separately for flushes (as flushedBytes / totalIndexSize), merges (as (totalIndexSize + mergedB

[GitHub] [lucene] uschindler commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
uschindler commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982909584 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [lucene] uschindler commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-28 Thread GitBox
uschindler commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1261531792 I was also doing consulting for an huge Elasticsearch user and they also had this problem of keeping deletes as low as possible and the 20% limit was way too high. 20% looks like

[GitHub] [lucene] zhaih commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-28 Thread GitBox
zhaih commented on code in PR #11796: URL: https://github.com/apache/lucene/pull/11796#discussion_r982921333 ## lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [lucene] zhaih commented on pull request #11803: DrillSideways optimizations

2022-09-28 Thread GitBox
zhaih commented on PR #11803: URL: https://github.com/apache/lucene/pull/11803#issuecomment-1261541919 @gsmiller Thank you for checking and continuous effort! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [lucene] gsmiller commented on pull request #11803: DrillSideways optimizations

2022-09-28 Thread GitBox
gsmiller commented on PR #11803: URL: https://github.com/apache/lucene/pull/11803#issuecomment-1261579341 @zhaih well, thank you for keeping me honest with testing. I think I've already found an insidious, potential bug with some beefier tests. -- This is an automated message from the Ap

[GitHub] [lucene] jtibshirani opened a new issue, #11830: Store HNSW graph connections more compactly

2022-09-28 Thread GitBox
jtibshirani opened a new issue, #11830: URL: https://github.com/apache/lucene/issues/11830 ### Description HNSW search is most efficient when all vector data fits in page cache. So good to keep the size of vector files as small as possible. We currently write all HNSW graph con