[PR] [Minor] Document operation costs for stale workflow [lucene]
stefanvodita opened a new pull request, #13000: URL: https://github.com/apache/lucene/pull/13000 Documenting the operation costs from the [latest stale workflow run](https://github.com/apache/lucene/actions/runs/7454785611/job/20282760199) for posterity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]
daixque commented on PR #12915: URL: https://github.com/apache/lucene/pull/12915#issuecomment-1882633384 @mikemccand @dungba88 Let me ping. Do I still have anything to do for this PR? If not, could you merge it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Get rid of deprecated assertThat() usages [lucene]
sabi0 commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445816935 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -485,7 +470,7 @@ public void testNRTDeletedDocFiltering() throws Exception { PrefixCompletionQuery query = new PrefixCompletionQuery(analyzer, new Term("suggest_field", "abc_")); TopSuggestDocs suggest = indexSearcher.suggest(query, numLive, false); -assertSuggestions(suggest, expectedEntries.toArray(new Entry[expectedEntries.size()])); +assertSuggestions(suggest, expectedEntries.toArray(new Entry[0])); Review Comment: I had a closer look at the `assertSuggestions()` method. It has 56 usages in 6 different classes. So here are the options I see: 1. return pre-calculated array sizes and keep this part as is 2. overload the method to accept either `Entry...` or `Collection` and update the 13 usages in the `TestSuggestField` (leaving the other classes for later) 3. proceed with the refactoring to `Collection` and update all 6 classes (in a separate PR) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Get rid of deprecated assertThat() usages [lucene]
sabi0 commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445839349 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -766,12 +749,12 @@ public void testScoring() throws Exception { assertTrue(topScore >= scoreDoc.score); } topScore = scoreDoc.score; -assertThat((float) mappings.get(scoreDoc.key.toString()), equalTo(scoreDoc.score)); +assertEquals(scoreDoc.score, (float) mappings.get(scoreDoc.key.toString()), 1.0e-8); Review Comment: There are 33 usages of `assertEquals(String, double, double, double)` and 653 usages of `assertEquals(String, float, float, float)` across the project. The deltas vary considerably, from 0 to 0.0001 to 0.001 to 1e-10. Maybe I should just use 0 delta in this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Get rid of deprecated assertThat() usages [lucene]
sabi0 commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445839349 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -766,12 +749,12 @@ public void testScoring() throws Exception { assertTrue(topScore >= scoreDoc.score); } topScore = scoreDoc.score; -assertThat((float) mappings.get(scoreDoc.key.toString()), equalTo(scoreDoc.score)); +assertEquals(scoreDoc.score, (float) mappings.get(scoreDoc.key.toString()), 1.0e-8); Review Comment: There are 33 usages of `assertEquals(String, double, double, double)` and 653 usages of `assertEquals(float, float, float)` across the project. The deltas vary considerably, from 0 to 0.0001 to 0.001 to 1e-10. Maybe I should just use 0 delta in this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Get rid of deprecated assertThat() usages [lucene]
sabi0 commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445816935 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -485,7 +470,7 @@ public void testNRTDeletedDocFiltering() throws Exception { PrefixCompletionQuery query = new PrefixCompletionQuery(analyzer, new Term("suggest_field", "abc_")); TopSuggestDocs suggest = indexSearcher.suggest(query, numLive, false); -assertSuggestions(suggest, expectedEntries.toArray(new Entry[expectedEntries.size()])); +assertSuggestions(suggest, expectedEntries.toArray(new Entry[0])); Review Comment: I had a closer look at the `assertSuggestions()` method. It has 56 usages in 6 different classes. So here are the options I see: 1. revert pre-calculated array sizes and keep this part as is 2. overload the method to accept either `Entry...` or `Collection` and update the 13 usages in the `TestSuggestField` (leaving the other classes for later) 3. proceed with the refactoring to `Collection` and update all 6 classes (in a separate PR) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix broken testAllVersionHaveCfsAndNocfs() [lucene]
sabi0 commented on code in PR #12969: URL: https://github.com/apache/lucene/pull/12969#discussion_r1445850623 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java: ## @@ -731,6 +732,7 @@ public void testAllVersionHaveCfsAndNocfs() { String prefix = prevFile.replace("-cfs", ""); assertEquals("Missing -nocfs for backcompat index " + prefix, prefix + "-nocfs", file); } + prevFile = file; Review Comment: The assert in the loop was never executed because this line was missing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix broken testAllVersionHaveCfsAndNocfs() [lucene]
dweiss merged PR #12969: URL: https://github.com/apache/lucene/pull/12969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Get rid of deprecated assertThat() usages [lucene]
dweiss commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445882248 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -485,7 +470,7 @@ public void testNRTDeletedDocFiltering() throws Exception { PrefixCompletionQuery query = new PrefixCompletionQuery(analyzer, new Term("suggest_field", "abc_")); TopSuggestDocs suggest = indexSearcher.suggest(query, numLive, false); -assertSuggestions(suggest, expectedEntries.toArray(new Entry[expectedEntries.size()])); +assertSuggestions(suggest, expectedEntries.toArray(new Entry[0])); Review Comment: Given the number of use cases, I'd go with option 3 or 1 - maybe I'm overly sensitive about that new Foo[0]... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Get rid of deprecated assertThat() usages [lucene]
dweiss commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445883098 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -766,12 +749,12 @@ public void testScoring() throws Exception { assertTrue(topScore >= scoreDoc.score); } topScore = scoreDoc.score; -assertThat((float) mappings.get(scoreDoc.key.toString()), equalTo(scoreDoc.score)); +assertEquals(scoreDoc.score, (float) mappings.get(scoreDoc.key.toString()), 1.0e-8); Review Comment: Yep, please do. Seems like this is what it was before (without any delta). Let's see if CI passes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Get rid of deprecated assertThat() usages [lucene]
sabi0 commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445888566 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -485,7 +470,7 @@ public void testNRTDeletedDocFiltering() throws Exception { PrefixCompletionQuery query = new PrefixCompletionQuery(analyzer, new Term("suggest_field", "abc_")); TopSuggestDocs suggest = indexSearcher.suggest(query, numLive, false); -assertSuggestions(suggest, expectedEntries.toArray(new Entry[expectedEntries.size()])); +assertSuggestions(suggest, expectedEntries.toArray(new Entry[0])); Review Comment: There are 135 matches for `toArray\(new \w+\[0]` ... and 138 occurrences with pre-calculated array size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]
dungba88 commented on PR #12915: URL: https://github.com/apache/lucene/pull/12915#issuecomment-1882788058 I think it's good to go, but I don't have merge permission. Mike should be able to help you, otherwise you can try notify the dev mailing list as suggested by the bot -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove stale BWC tests [lucene]
s1monw merged PR #12874: URL: https://github.com/apache/lucene/pull/12874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Get rid of deprecated assertThat() usages [lucene]
dweiss commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445940966 ## lucene/core/src/test/org/apache/lucene/geo/TestTessellator.java: ## @@ -841,20 +839,19 @@ public void testComplexPolygon50() throws Exception { public void testComplexPolygon50_WithMonitor() throws Exception { String geoJson = GeoTestUtil.readShape("lucene-10563-1.geojson.gz"); Polygon[] polygons = Polygon.fromGeoJSON(geoJson); -assertThat("Only one polygon", polygons.length, equalTo(1)); +assertEquals("Only one polygon", 1, polygons.length); Polygon polygon = polygons[0]; TestCountingMonitor monitor = new TestCountingMonitor(); Tessellator.tessellate(polygon, true, monitor); -assertThat("Expected many monitor calls", monitor.count, greaterThan(390)); -assertThat("Expected specific number of splits", monitor.splitsStarted, equalTo(3)); -assertThat( -"Expected splits to start and end", monitor.splitsStarted, equalTo(monitor.splitsEnded)); +assertTrue("Expected many monitor calls", monitor.count > 390); Review Comment: The downside of converting to an explicit comparison is that you won't know what the variable (monitor.count, etc.) was. Maybe these should be converted to hamcrest MatcherAssert.assertThat(...), just as JUnit deprecation message suggests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Forebidden Thread.sleep API [lucene]
shubhamvishu opened a new pull request, #13001: URL: https://github.com/apache/lucene/pull/13001 ### Description This API mark `Thread.sleep` API as forebidden for futures uses in the codebase and suppresses the existing usages. Closes #12946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Can we ban `Thread.sleep`? [lucene]
shubhamvishu commented on issue #12946: URL: https://github.com/apache/lucene/issues/12946#issuecomment-1882877659 Opened a PR #13001 to address this (I liked the PR id, first one in the 13k range :)). Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for index sorting with document blocks [lucene]
s1monw commented on PR #12829: URL: https://github.com/apache/lucene/pull/12829#issuecomment-1882919819 @mikemccand I did another pass on it and change the wording. I think I know why you are confused and I tired to adress it. We use the exact same wording in the soft deletes case. I will work on this in a sep. issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
stefanvodita commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883060775 We would still have a few more issues to solve after #12757, which are documented in #12596, but overall I agree that it makes sense to merge this as is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [BROKEN, for reference only] concurrent hnsw [lucene]
msokolov closed pull request #12683: [BROKEN, for reference only] concurrent hnsw URL: https://github.com/apache/lucene/pull/12683 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] NeighborArray is now fixed size [lucene]
msokolov closed pull request #11784: NeighborArray is now fixed size URL: https://github.com/apache/lucene/pull/11784 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Forebidden Thread.sleep API [lucene]
uschindler commented on code in PR #13001: URL: https://github.com/apache/lucene/pull/13001#discussion_r1446133887 ## gradle/validation/forbidden-apis/defaults.all.txt: ## @@ -74,3 +74,8 @@ javax.sql.rowset.spi.SyncFactory @defaultMessage Math.fma is insanely slow (2500x) in many environments (e.g. VMs). Use multiply/add and suffer the extra rounding java.lang.Math#fma(float,float,float) java.lang.Math#fma(double,double,double) + +@defaultMessage We should not use Thread.sleep in lucene code +java.lang.Thread#sleep(long) Review Comment: I'd use `java.lang.Thread#sleep(**)`. This prevents any sleep variant coming soon. E.g., the Duration one not yet available in JDK 17. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Forebidden Thread.sleep API [lucene]
uschindler commented on code in PR #13001: URL: https://github.com/apache/lucene/pull/13001#discussion_r1446135710 ## gradle/validation/forbidden-apis/defaults.all.txt: ## @@ -74,3 +74,8 @@ javax.sql.rowset.spi.SyncFactory @defaultMessage Math.fma is insanely slow (2500x) in many environments (e.g. VMs). Use multiply/add and suffer the extra rounding java.lang.Math#fma(float,float,float) java.lang.Math#fma(double,double,double) + +@defaultMessage We should not use Thread.sleep in lucene code +java.lang.Thread#sleep(long) Review Comment: you should also move the message into the signature. I'd also prefer a more meaningful message. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
nknize commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883224420 > There are existing tests which should fails. https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/document/TestShapeDocValues.java#L67 > However because we are catching exception and try until we get the correct polygon, it never fails. That try-catch is intentional. The evil, esoteric / invalid test geometry generation is also intentional so we can maximize our test coverage across the tessellator and Shape indexing. This is why we have [Tessellator logic](https://github.com/apache/lucene/blob/4d916a754be608e7ffb4e5626e4f99b21b8513f1/lucene/core/src/java/org/apache/lucene/geo/Tessellator.java#L1293) to filter colinear and coplanar points so we can best effort index shapes that may not be well formed or may not have been pre-validated or cleaned. While we expect well formed data, that's not the real world. Users (especially geo users) often throw dirty data at the API. And to pre-validate geometries is an expensive operation. Not all users want to pay that price. This is why we left much of that diligence out of Lucene and in the hands of the integrating API. For example, see Elasticsearch [`ignore_malformed` parameter](https://github.com/elastic/elasticsearch/pull/24654). For the explicit shape doc value boundi ng box test referenced, we expect a valid polygon bounding box for test purposes, but (as just pointed out) that reused polygon generator could create a malformed one and an invalid bounding box. Rather than write a new polygon generator I opted to reuse the existing one and just retry until a valid polygon is created. What you could do is just write a new polygon generator that always creates a valid well-formed polygon. I'd be concerned, though, about new contributors always using that to create perfect data, which is not real world. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Avoid reset BlockDocsEnum#freqBuffer when indexHasFreq is false [lucene]
rmuir commented on PR #12997: URL: https://github.com/apache/lucene/pull/12997#issuecomment-1883250347 > I tried to add checks to AssertingLeafReader to fail when reading freqs/positions/offsets/payloads if they have not been requested in the flags, and there are many test failures. It's not entirely clear to me how intentional that is, @rmuir do you have context by any chance? @jpountz The current behavior is documented: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/PostingsEnum.java#L73-L99 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Avoid reset BlockDocsEnum#freqBuffer when indexHasFreq is false [lucene]
rmuir commented on PR #12997: URL: https://github.com/apache/lucene/pull/12997#issuecomment-1883254450 as far as failing the methods, i'm not really sure if it is a good idea for frequencies. Treating the value as 1 seems reasonable, and e.g. i'm pretty sure TermScorer works correctly because of it, the `omitTF` has behaved like this for a really long time as far as i know, as the typical use case for this was super-short documents where frequency doesn't matter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Forebidden Thread.sleep API [lucene]
shubhamvishu commented on code in PR #13001: URL: https://github.com/apache/lucene/pull/13001#discussion_r1446269754 ## gradle/validation/forbidden-apis/defaults.all.txt: ## @@ -74,3 +74,8 @@ javax.sql.rowset.spi.SyncFactory @defaultMessage Math.fma is insanely slow (2500x) in many environments (e.g. VMs). Use multiply/add and suffer the extra rounding java.lang.Math#fma(float,float,float) java.lang.Math#fma(double,double,double) + +@defaultMessage We should not use Thread.sleep in lucene code +java.lang.Thread#sleep(long) Review Comment: Makes sense. I addressed these in the new revision -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Forebidden Thread.sleep API [lucene]
shubhamvishu commented on code in PR #13001: URL: https://github.com/apache/lucene/pull/13001#discussion_r1446269754 ## gradle/validation/forbidden-apis/defaults.all.txt: ## @@ -74,3 +74,8 @@ javax.sql.rowset.spi.SyncFactory @defaultMessage Math.fma is insanely slow (2500x) in many environments (e.g. VMs). Use multiply/add and suffer the extra rounding java.lang.Math#fma(float,float,float) java.lang.Math#fma(double,double,double) + +@defaultMessage We should not use Thread.sleep in lucene code +java.lang.Thread#sleep(long) Review Comment: Makes sense. I updated these in the new revision -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] HnwsGraph creates disconnected components [lucene]
angadp commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1883327952 I +1 this issue and would like to try the gains in Amazon codebase. I think with lowered max-conn this can give some latency gains. I also feel that there is a use case for indexes with less number of updates where the disconnected nodes might never be connected again. Looking forward to the results from the latest benchmark run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Explore moving HNSW's NeighborQueue to a radix heap [LUCENE-10383] [lucene]
angadp commented on issue #11419: URL: https://github.com/apache/lucene/issues/11419#issuecomment-1883368469 Given the comment by @benwtrent is this issue still relevant? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Forebidden Thread.sleep API [lucene]
uschindler commented on PR #13001: URL: https://github.com/apache/lucene/pull/13001#issuecomment-1883368845 P.S.: You don't need to force-push, it is better to keep track what was changed in a PR. The PR will get squashed anyways. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Forebidden Thread.sleep API [lucene]
shubhamvishu commented on PR #13001: URL: https://github.com/apache/lucene/pull/13001#issuecomment-1883422583 > In general this looks fine, although I think we should also work on remove some of those sleeps. For all others there should be an explanation, why the test needs sleeping. +1 to remove some to these usages. I didn't pursue since there are just too many of them. I think we could scope this PR to only add the api to forbidden ones and suppress all the usages and then follow up with a second pass or PR to remove those usages as much as possible. If that sounds good I could create a separate issue to reduce the existing usages(or we could use the same issue too). Let me know what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
heemin32 commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883449137 >That try-catch is intentional. However, the implementation between latlon and xy are different then. https://github.com/apache/lucene/blob/7b8aece125aabff2823626d5b939abf4747f63a7/lucene/core/src/test/org/apache/lucene/document/TestShapeDocValues.java#L77-L95 For latlon, it does not have try-catch statement wrapping `GeoTestUtil.nextPolygon()`. I believe `GeoTestUtil.nextPolygon()` returns valid polygon always but `ShapeTestUtil.nextPolygon()` does not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
nknize commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883477836 > ... I believe `GeoTestUtil.nextPolygon()` returns valid polygon always but `ShapeTestUtil.nextPolygon()` does not. A quick glance looks like the difference is from [LUCENE-9192](https://github.com/apache/lucene/issues/10232) where ShapeTestUtil#nextPolygon will only consider executing the try-catch path on Nightly runs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Split taxonomy arrays across chunks [lucene]
stefanvodita commented on PR #12995: URL: https://github.com/apache/lucene/pull/12995#issuecomment-1883476554 The results are in. I don't see any significant p-values (< 0.05). `python3 src/python/localrun.py -source wikimediumall -r` ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value MedPhrase 27.55 (7.2%) 27.08 (5.8%) -1.7% ( -13% - 12%) 0.408 IntNRQ 19.20 (10.6%) 18.89 (7.1%) -1.6% ( -17% - 18%) 0.566 BrowseMonthTaxoFacets9.67 (8.9%)9.52 (10.7%) -1.6% ( -19% - 19%) 0.613 BrowseRandomLabelTaxoFacets4.01 (25.1%)3.97 (26.0%) -1.0% ( -41% - 66%) 0.902 HighSloppyPhrase 28.64 (3.5%) 28.41 (2.4%) -0.8% ( -6% -5%) 0.406 AndHighLow 325.06 (3.8%) 322.67 (7.0%) -0.7% ( -11% - 10%) 0.680 BrowseRandomLabelSSDVFacets2.92 (1.8%)2.90 (2.6%) -0.6% ( -4% -3%) 0.403 BrowseDayOfYearSSDVFacets4.08 (3.8%)4.06 (3.7%) -0.6% ( -7% -7%) 0.625 Prefix3 142.86 (5.0%) 142.32 (5.4%) -0.4% ( -10% - 10%) 0.819 HighTermTitleBDVSort7.40 (6.3%)7.38 (6.4%) -0.3% ( -12% - 13%) 0.862 LowSloppyPhrase 11.47 (1.9%) 11.44 (1.7%) -0.3% ( -3% -3%) 0.657 LowSpanNear6.86 (2.0%)6.85 (1.8%) -0.2% ( -3% -3%) 0.754 HighPhrase 67.07 (3.1%) 66.94 (3.1%) -0.2% ( -6% -6%) 0.850 LowPhrase7.77 (3.2%)7.76 (3.4%) -0.1% ( -6% -6%) 0.888 HighSpanNear3.81 (3.0%)3.80 (2.8%) -0.1% ( -5% -5%) 0.875 HighIntervalsOrdered1.61 (2.5%)1.61 (2.4%) -0.1% ( -4% -4%) 0.853 AndHighMed 76.92 (6.7%) 76.83 (8.0%) -0.1% ( -13% - 15%) 0.959 MedSloppyPhrase 15.30 (1.9%) 15.29 (1.7%) -0.1% ( -3% -3%) 0.850 OrHighHigh 20.33 (7.3%) 20.31 (6.4%) -0.1% ( -12% - 14%) 0.961 LowIntervalsOrdered7.12 (2.5%)7.12 (2.4%) -0.0% ( -4% -4%) 0.955 AndHighMedDayTaxoFacets 11.03 (2.2%) 11.04 (2.2%)0.0% ( -4% -4%) 0.959 OrNotHighMed 216.20 (2.6%) 216.28 (3.0%)0.0% ( -5% -5%) 0.967 MedIntervalsOrdered 23.70 (1.8%) 23.71 (1.6%)0.0% ( -3% -3%) 0.944 MedSpanNear 13.33 (1.9%) 13.34 (1.7%)0.1% ( -3% -3%) 0.910 AndHighHighDayTaxoFacets7.73 (3.5%)7.74 (3.2%)0.1% ( -6% -7%) 0.906 MedTermDayTaxoFacets 15.49 (2.2%) 15.54 (1.6%)0.3% ( -3% -4%) 0.590 HighTermTitleSort 121.92 (1.8%) 122.33 (1.6%)0.3% ( -2% -3%) 0.533 TermDTSort 142.74 (2.1%) 143.26 (3.4%)0.4% ( -5% -5%) 0.683 PKLookup 172.65 (3.0%) 173.29 (2.8%)0.4% ( -5% -6%) 0.687 Respell 43.69 (2.3%) 43.87 (2.7%)0.4% ( -4% -5%) 0.613 HighTermDayOfYearSort 210.29 (2.8%) 211.25 (3.1%)0.5% ( -5% -6%) 0.626 AndHighHigh 23.87 (11.4%) 23.99 (11.7%)0.5% ( -20% - 26%) 0.888 BrowseDateSSDVFacets1.19 (5.2%)1.19 (4.2%)0.6% ( -8% - 10%) 0.698 Wildcard 186.59 (2.6%) 187.68 (2.9%)0.6% ( -4% -6%) 0.505 OrHighMed 90.56 (5.3%) 91.09 (4.9%)0.6% ( -9% - 11%) 0.715 OrNotHighHigh 354.86 (3.6%) 356.98 (3.9%)0.6% ( -6% -8%) 0.611 OrHighNotHigh 306.95 (3.8%) 308.88 (4.2%)0.6% ( -7% -9%) 0.620 OrHighNotMed 363.76 (3.9%) 366.46 (4.0%)0.7% ( -6% -8%) 0.548 BrowseDateTaxoFacets4.97 (27.3%)5.00 (27.3%)0.8% ( -42% - 76%) 0.929 BrowseDayOfYearTaxoFacets5.06 (26.8%)5.09 (26.9%)0.8% ( -41% - 74%) 0.927
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
heemin32 commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883491755 Yes. But also what I meant is `testLatLonPolygonCentroid()` test calls `Polygon p = GeoTestUtil.nextPolygon();` directly, but `testXYPolygonCentroid()` test calls `XYPolygon p = (XYPolygon) BaseXYShapeTestCase.ShapeType.POLYGON.nextShape();` which calls `ShapeTestUtil.nextPolygon();` inside try-catch. We need to call `ShapeTestUtil.nextPolygon();` directly from `testXYPolygonCentroid()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Explore moving HNSW's NeighborQueue to a radix heap [LUCENE-10383] [lucene]
benwtrent commented on issue #11419: URL: https://github.com/apache/lucene/issues/11419#issuecomment-1883500270 @angadp I think it's more relevant given recent refactors. A radix heap may be possible now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Explore moving HNSW's NeighborQueue to a radix heap [LUCENE-10383] [lucene]
angadp commented on issue #11419: URL: https://github.com/apache/lucene/issues/11419#issuecomment-1883522255 Thanks! I was looking for an open issue and will check this out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] migrate OpenNLP 'ant train-test-models' to Gradle [lucene]
cpoerschke opened a new issue, #13002: URL: https://github.com/apache/lucene/issues/13002 ### Description ref: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/analysis/opennlp/build.xml#L52-L84 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] upgrade to OpenNLP 2.3.1 [lucene]
cpoerschke commented on PR #12674: URL: https://github.com/apache/lucene/pull/12674#issuecomment-1883525174 > It would be good migrate the model regeneration to gradle - probably a good follow-up issue on its own though. Makes sense, opened #13002 for that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] upgrade to OpenNLP 2.3.1 [lucene]
cpoerschke commented on code in PR #12674: URL: https://github.com/apache/lucene/pull/12674#discussion_r1446425261 ## lucene/analysis/opennlp/src/tools/test-model-data/README.txt: ## @@ -3,4 +3,4 @@ Training data derived from Reuters corpus in very unscientific way. Tagging done with CCG Urbana-Champaign online demos: http://cogcomp.cs.illinois.edu/page/demos -Run 'ant train-test-models' to generate models from training data here. Review Comment: #13002 is for migration to Gradle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] upgrade to OpenNLP 2.3.1 [lucene]
cpoerschke commented on PR #12674: URL: https://github.com/apache/lucene/pull/12674#issuecomment-1883534556 > What's probably missing is ... a note on migration? Yes, indeed. It being a 1.x to 2.x dependency upgrade, I'm not yet familiar enough with OpenNLP to gauge what's involved in migrating. Perhaps @jzonthemtn and/or @epugh would have insights to share on that? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
nknize commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883537742 > We need to call `ShapeTestUtil.nextPolygon();` directly from `testXYPolygonCentroid()`. Doing that could create a polygon that can't be tessellated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
heemin32 commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883551295 > Doing that could create a polygon that can't be tessellated. But `testLatLonPolygonCentroid()` does call `GeoTestUtils.nextPolygon()` directly and tessellation works fine. https://github.com/apache/lucene/blob/7b8aece125aabff2823626d5b939abf4747f63a7/lucene/core/src/test/org/apache/lucene/document/TestShapeDocValues.java#L77-L86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Exploring GPU based kNN vector search [lucene]
chatman opened a new issue, #13003: URL: https://github.com/apache/lucene/issues/13003 ### Description Through this issue, I wish to explore integrating NVIDIA's kNN indexing and search support, https://github.com/rapidsai/raft. Through our initial benchmarks/prototypes, we found it a lot faster than our HNSW based search. Along the way, we shall add more details of our experiments through this issue. And will open a PR as soon as something takes shape (right now, things are in extremely early proof-of-concept state). One of the potential results of this work, if explorations prove worthwhile, can be a Lucene module based on a JNI wrapper around the Raft library. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Add post-filter capability to `SynonymGraphFilter` [lucene]
mikemccand opened a new issue, #13004: URL: https://github.com/apache/lucene/issues/13004 ### Description [I'm not sure how general this is but figured I'd open this to see if there is interest / other use cases:] At Amazon product search team we have synonyms that are sometimes conditionally applied depending on some context about the document or query. For example, `apple` might be a fruit in the grocery subset of Amazon's catalog, or a computer in the electronics catalog. Today we implement this very inefficiently: we compile N massive synonym maps, mostly (wastefully) with the same synonyms except for a few that are catalog / query context specific. This is wasteful and takes gobbs of heap. (Hmm, separately: `SynonymGraphFilter` should be fixed to use off-heap FSTs -- I'll open a crab spinoff issue). Maybe instead we could allow each synonym rule to optionally have some metadata that may be used, at matching time, to post-filter, only applying the synonym based on the context of the current document/query? I'm not sure how this would work -- maybe N labels that are compiled to an int/long bitset recorded into each FST rule? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] `SynonymGraphFilter` should read FSTs off-heap? [lucene]
mikemccand opened a new issue, #13005: URL: https://github.com/apache/lucene/issues/13005 ### Description [Spinoff from #13004] Recently we added off-heap FST reading, but only switched to it in limited cases, starting with the terms index in `BlockTree` terms dictionary. Should we switch to off-heap for `SynonymGraphFilter` too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Forbidden Thread.sleep API [lucene]
mikemccand commented on PR #13001: URL: https://github.com/apache/lucene/pull/13001#issuecomment-1883719867 +1 to merge this, first, so we stop the bleeding (no more `Thread.sleep` added to the code base), and in follow-on PR we can start whittling down the existing grandfather'd `Thread.sleep` calls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]
MarcusSorealheis commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1883963445 Great to finally see you in the Lucene repo @kevindrosendahl after all these years. 🍰 The work you have done here is stellar and the whole world welcomes the diligence. I hope we can keep this momentum. I have a few concerns that are not intended to block the project I second the question, "How is segment merging implemented by Lucene Vamana?" I have a suspicion that extending `MergePolicy` could be in order for a production-ready implementation but I have such limited knowledge for such a thought. I have general concerns about large copy overheads and the restrictions of POSIX AIO from experience dealing with the failures of storing data on disk, as well as random disk failures. `io_uring` certainly offers some theoretical promise in my mind. I know that is a newer Linux kernel module/interface, so that would lead me to believe actually taking this implementation forward would require a bit of comprehensive testing. One particular concern I have, as you specifically might have guessed, would be around soak testing. Has anyone looked into this issue since November, or has anyone run the tests for an extended period? The other concern I have is that I suspect the community would probably need to keep an eye on `io_uring` given its age. I was instructed long ago to use old interfaces whenever possible but the person who told me retired so I cannot ask why. I would suspect the newer interfaces tend to evolve more quickly than the ones than the ones that have been around for a while. Surely that burden would likely fall to the Policeman. In general, this is an interesting development and I look forward to continuing to explore. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] upgrade to OpenNLP 2.3.1 [lucene]
epugh commented on PR #12674: URL: https://github.com/apache/lucene/pull/12674#issuecomment-1883984387 I don't know about the migration side specifically, but based on https://opennlp.apache.org/news/release-200.html and some of the other release notes, here is a first stab. ``` OpenNLP 2x opens the door to accessing various models via the ONNX runtime. To migrate you will need to update any deprecated OpenNLP methods that you may be using and be running on Java 17. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Lazily write the FST padding byte [lucene]
github-actions[bot] commented on PR #12981: URL: https://github.com/apache/lucene/pull/12981#issuecomment-1883992139 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org