[PR] [Minor] Document operation costs for stale workflow [lucene]

2024-01-09 Thread via GitHub
stefanvodita opened a new pull request, #13000: URL: https://github.com/apache/lucene/pull/13000 Documenting the operation costs from the [latest stale workflow run](https://github.com/apache/lucene/actions/runs/7454785611/job/20282760199) for posterity. -- This is an automated message f

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2024-01-09 Thread via GitHub
daixque commented on PR #12915: URL: https://github.com/apache/lucene/pull/12915#issuecomment-1882633384 @mikemccand @dungba88 Let me ping. Do I still have anything to do for this PR? If not, could you merge it? -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Get rid of deprecated assertThat() usages [lucene]

2024-01-09 Thread via GitHub
sabi0 commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445816935 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -485,7 +470,7 @@ public void testNRTDeletedDocFiltering() throws Except

Re: [PR] Get rid of deprecated assertThat() usages [lucene]

2024-01-09 Thread via GitHub
sabi0 commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445839349 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -766,12 +749,12 @@ public void testScoring() throws Exception {

Re: [PR] Get rid of deprecated assertThat() usages [lucene]

2024-01-09 Thread via GitHub
sabi0 commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445839349 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -766,12 +749,12 @@ public void testScoring() throws Exception {

Re: [PR] Get rid of deprecated assertThat() usages [lucene]

2024-01-09 Thread via GitHub
sabi0 commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445816935 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -485,7 +470,7 @@ public void testNRTDeletedDocFiltering() throws Except

Re: [PR] Fix broken testAllVersionHaveCfsAndNocfs() [lucene]

2024-01-09 Thread via GitHub
sabi0 commented on code in PR #12969: URL: https://github.com/apache/lucene/pull/12969#discussion_r1445850623 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java: ## @@ -731,6 +732,7 @@ public void testAllVersionHaveCfsAndNocfs() {

Re: [PR] Fix broken testAllVersionHaveCfsAndNocfs() [lucene]

2024-01-09 Thread via GitHub
dweiss merged PR #12969: URL: https://github.com/apache/lucene/pull/12969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Get rid of deprecated assertThat() usages [lucene]

2024-01-09 Thread via GitHub
dweiss commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445882248 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -485,7 +470,7 @@ public void testNRTDeletedDocFiltering() throws Excep

Re: [PR] Get rid of deprecated assertThat() usages [lucene]

2024-01-09 Thread via GitHub
dweiss commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445883098 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -766,12 +749,12 @@ public void testScoring() throws Exception {

Re: [PR] Get rid of deprecated assertThat() usages [lucene]

2024-01-09 Thread via GitHub
sabi0 commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445888566 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -485,7 +470,7 @@ public void testNRTDeletedDocFiltering() throws Except

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2024-01-09 Thread via GitHub
dungba88 commented on PR #12915: URL: https://github.com/apache/lucene/pull/12915#issuecomment-1882788058 I think it's good to go, but I don't have merge permission. Mike should be able to help you, otherwise you can try notify the dev mailing list as suggested by the bot -- This is an a

Re: [PR] Remove stale BWC tests [lucene]

2024-01-09 Thread via GitHub
s1monw merged PR #12874: URL: https://github.com/apache/lucene/pull/12874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Get rid of deprecated assertThat() usages [lucene]

2024-01-09 Thread via GitHub
dweiss commented on code in PR #12982: URL: https://github.com/apache/lucene/pull/12982#discussion_r1445940966 ## lucene/core/src/test/org/apache/lucene/geo/TestTessellator.java: ## @@ -841,20 +839,19 @@ public void testComplexPolygon50() throws Exception { public void testCo

[PR] Forebidden Thread.sleep API [lucene]

2024-01-09 Thread via GitHub
shubhamvishu opened a new pull request, #13001: URL: https://github.com/apache/lucene/pull/13001 ### Description This API mark `Thread.sleep` API as forebidden for futures uses in the codebase and suppresses the existing usages. Closes #12946 -- This is an automated

Re: [I] Can we ban `Thread.sleep`? [lucene]

2024-01-09 Thread via GitHub
shubhamvishu commented on issue #12946: URL: https://github.com/apache/lucene/issues/12946#issuecomment-1882877659 Opened a PR #13001 to address this (I liked the PR id, first one in the 13k range :)). Thanks! -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Add support for index sorting with document blocks [lucene]

2024-01-09 Thread via GitHub
s1monw commented on PR #12829: URL: https://github.com/apache/lucene/pull/12829#issuecomment-1882919819 @mikemccand I did another pass on it and change the wording. I think I know why you are confused and I tired to adress it. We use the exact same wording in the soft deletes case. I will w

Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-09 Thread via GitHub
stefanvodita commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883060775 We would still have a few more issues to solve after #12757, which are documented in #12596, but overall I agree that it makes sense to merge this as is. -- This is an automated

Re: [PR] [BROKEN, for reference only] concurrent hnsw [lucene]

2024-01-09 Thread via GitHub
msokolov closed pull request #12683: [BROKEN, for reference only] concurrent hnsw URL: https://github.com/apache/lucene/pull/12683 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] NeighborArray is now fixed size [lucene]

2024-01-09 Thread via GitHub
msokolov closed pull request #11784: NeighborArray is now fixed size URL: https://github.com/apache/lucene/pull/11784 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Forebidden Thread.sleep API [lucene]

2024-01-09 Thread via GitHub
uschindler commented on code in PR #13001: URL: https://github.com/apache/lucene/pull/13001#discussion_r1446133887 ## gradle/validation/forbidden-apis/defaults.all.txt: ## @@ -74,3 +74,8 @@ javax.sql.rowset.spi.SyncFactory @defaultMessage Math.fma is insanely slow (2500x) in ma

Re: [PR] Forebidden Thread.sleep API [lucene]

2024-01-09 Thread via GitHub
uschindler commented on code in PR #13001: URL: https://github.com/apache/lucene/pull/13001#discussion_r1446135710 ## gradle/validation/forbidden-apis/defaults.all.txt: ## @@ -74,3 +74,8 @@ javax.sql.rowset.spi.SyncFactory @defaultMessage Math.fma is insanely slow (2500x) in ma

Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-09 Thread via GitHub
nknize commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883224420 > There are existing tests which should fails. https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/document/TestShapeDocValues.java#L67 > However because

Re: [PR] Avoid reset BlockDocsEnum#freqBuffer when indexHasFreq is false [lucene]

2024-01-09 Thread via GitHub
rmuir commented on PR #12997: URL: https://github.com/apache/lucene/pull/12997#issuecomment-1883250347 > I tried to add checks to AssertingLeafReader to fail when reading freqs/positions/offsets/payloads if they have not been requested in the flags, and there are many test failures. It's no

Re: [PR] Avoid reset BlockDocsEnum#freqBuffer when indexHasFreq is false [lucene]

2024-01-09 Thread via GitHub
rmuir commented on PR #12997: URL: https://github.com/apache/lucene/pull/12997#issuecomment-1883254450 as far as failing the methods, i'm not really sure if it is a good idea for frequencies. Treating the value as 1 seems reasonable, and e.g. i'm pretty sure TermScorer works correctly becau

Re: [PR] Forebidden Thread.sleep API [lucene]

2024-01-09 Thread via GitHub
shubhamvishu commented on code in PR #13001: URL: https://github.com/apache/lucene/pull/13001#discussion_r1446269754 ## gradle/validation/forbidden-apis/defaults.all.txt: ## @@ -74,3 +74,8 @@ javax.sql.rowset.spi.SyncFactory @defaultMessage Math.fma is insanely slow (2500x) in

Re: [PR] Forebidden Thread.sleep API [lucene]

2024-01-09 Thread via GitHub
shubhamvishu commented on code in PR #13001: URL: https://github.com/apache/lucene/pull/13001#discussion_r1446269754 ## gradle/validation/forbidden-apis/defaults.all.txt: ## @@ -74,3 +74,8 @@ javax.sql.rowset.spi.SyncFactory @defaultMessage Math.fma is insanely slow (2500x) in

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-01-09 Thread via GitHub
angadp commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1883327952 I +1 this issue and would like to try the gains in Amazon codebase. I think with lowered max-conn this can give some latency gains. I also feel that there is a use case for ind

Re: [I] Explore moving HNSW's NeighborQueue to a radix heap [LUCENE-10383] [lucene]

2024-01-09 Thread via GitHub
angadp commented on issue #11419: URL: https://github.com/apache/lucene/issues/11419#issuecomment-1883368469 Given the comment by @benwtrent is this issue still relevant? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Forebidden Thread.sleep API [lucene]

2024-01-09 Thread via GitHub
uschindler commented on PR #13001: URL: https://github.com/apache/lucene/pull/13001#issuecomment-1883368845 P.S.: You don't need to force-push, it is better to keep track what was changed in a PR. The PR will get squashed anyways. -- This is an automated message from the Apache Git Servic

Re: [PR] Forebidden Thread.sleep API [lucene]

2024-01-09 Thread via GitHub
shubhamvishu commented on PR #13001: URL: https://github.com/apache/lucene/pull/13001#issuecomment-1883422583 > In general this looks fine, although I think we should also work on remove some of those sleeps. For all others there should be an explanation, why the test needs sleeping.

Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-09 Thread via GitHub
heemin32 commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883449137 >That try-catch is intentional. However, the implementation between latlon and xy are different then. https://github.com/apache/lucene/blob/7b8aece125aabff2823626d5b939abf4747f63

Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-09 Thread via GitHub
nknize commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883477836 > ... I believe `GeoTestUtil.nextPolygon()` returns valid polygon always but `ShapeTestUtil.nextPolygon()` does not. A quick glance looks like the difference is from [LUCENE-9192](

Re: [PR] Split taxonomy arrays across chunks [lucene]

2024-01-09 Thread via GitHub
stefanvodita commented on PR #12995: URL: https://github.com/apache/lucene/pull/12995#issuecomment-1883476554 The results are in. I don't see any significant p-values (< 0.05). `python3 src/python/localrun.py -source wikimediumall -r` ``` TaskQPS basel

Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-09 Thread via GitHub
heemin32 commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883491755 Yes. But also what I meant is `testLatLonPolygonCentroid()` test calls `Polygon p = GeoTestUtil.nextPolygon();` directly, but `testXYPolygonCentroid()` test calls `XYPolygon p = (XYPoly

Re: [I] Explore moving HNSW's NeighborQueue to a radix heap [LUCENE-10383] [lucene]

2024-01-09 Thread via GitHub
benwtrent commented on issue #11419: URL: https://github.com/apache/lucene/issues/11419#issuecomment-1883500270 @angadp I think it's more relevant given recent refactors. A radix heap may be possible now. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [I] Explore moving HNSW's NeighborQueue to a radix heap [LUCENE-10383] [lucene]

2024-01-09 Thread via GitHub
angadp commented on issue #11419: URL: https://github.com/apache/lucene/issues/11419#issuecomment-1883522255 Thanks! I was looking for an open issue and will check this out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] migrate OpenNLP 'ant train-test-models' to Gradle [lucene]

2024-01-09 Thread via GitHub
cpoerschke opened a new issue, #13002: URL: https://github.com/apache/lucene/issues/13002 ### Description ref: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/analysis/opennlp/build.xml#L52-L84 -- This is an automated message from the Apache Git Servic

Re: [PR] upgrade to OpenNLP 2.3.1 [lucene]

2024-01-09 Thread via GitHub
cpoerschke commented on PR #12674: URL: https://github.com/apache/lucene/pull/12674#issuecomment-1883525174 > It would be good migrate the model regeneration to gradle - probably a good follow-up issue on its own though. Makes sense, opened #13002 for that. -- This is an automa

Re: [PR] upgrade to OpenNLP 2.3.1 [lucene]

2024-01-09 Thread via GitHub
cpoerschke commented on code in PR #12674: URL: https://github.com/apache/lucene/pull/12674#discussion_r1446425261 ## lucene/analysis/opennlp/src/tools/test-model-data/README.txt: ## @@ -3,4 +3,4 @@ Training data derived from Reuters corpus in very unscientific way. Tagging do

Re: [PR] upgrade to OpenNLP 2.3.1 [lucene]

2024-01-09 Thread via GitHub
cpoerschke commented on PR #12674: URL: https://github.com/apache/lucene/pull/12674#issuecomment-1883534556 > What's probably missing is ... a note on migration? Yes, indeed. It being a 1.x to 2.x dependency upgrade, I'm not yet familiar enough with OpenNLP to gauge what's involved in

Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-09 Thread via GitHub
nknize commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883537742 > We need to call `ShapeTestUtil.nextPolygon();` directly from `testXYPolygonCentroid()`. Doing that could create a polygon that can't be tessellated. -- This is an automated me

Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-09 Thread via GitHub
heemin32 commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883551295 > Doing that could create a polygon that can't be tessellated. But `testLatLonPolygonCentroid()` does call `GeoTestUtils.nextPolygon()` directly and tessellation works fine.

[I] Exploring GPU based kNN vector search [lucene]

2024-01-09 Thread via GitHub
chatman opened a new issue, #13003: URL: https://github.com/apache/lucene/issues/13003 ### Description Through this issue, I wish to explore integrating NVIDIA's kNN indexing and search support, https://github.com/rapidsai/raft. Through our initial benchmarks/prototypes, we found it

[I] Add post-filter capability to `SynonymGraphFilter` [lucene]

2024-01-09 Thread via GitHub
mikemccand opened a new issue, #13004: URL: https://github.com/apache/lucene/issues/13004 ### Description [I'm not sure how general this is but figured I'd open this to see if there is interest / other use cases:] At Amazon product search team we have synonyms that are sometime

[I] `SynonymGraphFilter` should read FSTs off-heap? [lucene]

2024-01-09 Thread via GitHub
mikemccand opened a new issue, #13005: URL: https://github.com/apache/lucene/issues/13005 ### Description [Spinoff from #13004] Recently we added off-heap FST reading, but only switched to it in limited cases, starting with the terms index in `BlockTree` terms dictionary. Shou

Re: [PR] Forbidden Thread.sleep API [lucene]

2024-01-09 Thread via GitHub
mikemccand commented on PR #13001: URL: https://github.com/apache/lucene/pull/13001#issuecomment-1883719867 +1 to merge this, first, so we stop the bleeding (no more `Thread.sleep` added to the code base), and in follow-on PR we can start whittling down the existing grandfather'd `Thread.sl

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2024-01-09 Thread via GitHub
MarcusSorealheis commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1883963445 Great to finally see you in the Lucene repo @kevindrosendahl after all these years. 🍰 The work you have done here is stellar and the whole world welcomes the diligence. I h

Re: [PR] upgrade to OpenNLP 2.3.1 [lucene]

2024-01-09 Thread via GitHub
epugh commented on PR #12674: URL: https://github.com/apache/lucene/pull/12674#issuecomment-1883984387 I don't know about the migration side specifically, but based on https://opennlp.apache.org/news/release-200.html and some of the other release notes, here is a first stab. ```

Re: [PR] Lazily write the FST padding byte [lucene]

2024-01-09 Thread via GitHub
github-actions[bot] commented on PR #12981: URL: https://github.com/apache/lucene/pull/12981#issuecomment-1883992139 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi