Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-03-05 Thread via GitHub
github-actions[bot] commented on PR #13914: URL: https://github.com/apache/lucene/pull/13914#issuecomment-2702387083 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-03-05 Thread via GitHub
lpld commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2702238939 @benwtrent Thanks for your response, it was quite helpful. Could you please also share other parameters of your benchmark (ndoc, maxConn, beamWidthIndex, fanout, etc.) ? I was able t

Re: [I] develocity build scans fail to upload sometimes [lucene]

2025-03-05 Thread via GitHub
dweiss commented on issue #14305: URL: https://github.com/apache/lucene/issues/14305#issuecomment-2702086799 Still not working - https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/lastBuild/console ``` The Develocity server (develocity.apache.org) rejected the

Re: [I] Add a workflow generating gh stats for board summary reports [lucene]

2025-03-05 Thread via GitHub
dweiss commented on issue #14332: URL: https://github.com/apache/lucene/issues/14332#issuecomment-2702057560 Ok, working now. ![Image](https://github.com/user-attachments/assets/3ab11c13-69c2-404d-85d2-5c604da545fe) -- This is an automated message from the Apache Git Service. To re

Re: [I] Add a workflow generating gh stats for board summary reports [lucene]

2025-03-05 Thread via GitHub
dweiss closed issue #14332: Add a workflow generating gh stats for board summary reports URL: https://github.com/apache/lucene/issues/14332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-05 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2702007572 I've downloaded and moved all those data sets that were present in gradle build files (specifically, in external-datasets.gradle). If there is anything else I should place there, let

Re: [I] Add a workflow generating gh stats for board summary reports [lucene]

2025-03-05 Thread via GitHub
dweiss commented on issue #14332: URL: https://github.com/apache/lucene/issues/14332#issuecomment-2701997460 Oops, something is not working with gh cli. Looking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Add a workflow generating gh stats for board summary reports [lucene]

2025-03-05 Thread via GitHub
dweiss closed issue #14332: Add a workflow generating gh stats for board summary reports URL: https://github.com/apache/lucene/issues/14332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Add a workflow generating gh stats for board summary reports [lucene]

2025-03-05 Thread via GitHub
dweiss commented on issue #14332: URL: https://github.com/apache/lucene/issues/14332#issuecomment-2701993756 https://github.com/apache/lucene/actions/workflows/activity-report.yml You have to run it manually, providing the time window: ![Image](https://github.com/user-attachmen

[I] Add a workflow generating gh stats for board summary reports [lucene]

2025-03-05 Thread via GitHub
dweiss opened a new issue, #14332: URL: https://github.com/apache/lucene/issues/14332 ### Description Apache's reporting utility is currently broken. I wrote a small gh workflow to generate the stats on demand. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-05 Thread via GitHub
msokolov commented on PR #14331: URL: https://github.com/apache/lucene/pull/14331#issuecomment-2701861504 oh, this is a neat idea! Looks like we sacrifice some query performance (in some cases) for a big improvement in indexing time. I wonder if we've tried other values of `beamWidth` to se

Re: [PR] introduce new parameter onlyLongestMatchNoSubwords replacing onlyLongestMatch [lucene]

2025-03-05 Thread via GitHub
uschindler commented on PR #14311: URL: https://github.com/apache/lucene/pull/14311#issuecomment-2701841157 > @renatoh that seems fine, If you have a way to do it so it works. Because they are just booleans I wasn't sure? > > I also can't remember if there is a way that you can signal

[PR] Speedup merging of HNSW graphs [lucene]

2025-03-05 Thread via GitHub
mayya-sharipova opened a new pull request, #14331: URL: https://github.com/apache/lucene/pull/14331 Currently when doing merging of HNSW graphs incrementally, we first initialize a graph from the biggest segment, and for other segments, we rebuild the graphs completely by going through

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-05 Thread via GitHub
mayya-sharipova commented on PR #14331: URL: https://github.com/apache/lucene/pull/14331#issuecomment-2701756507 Evaluation is done with Luceneutil on these datasets: Rebased against Lucene main branch: 1. **quora-E5-small**; 522931 docs; 384 dims; 7 bits quantized; cosine metri

Re: [PR] introduce new parameter onlyLongestMatchNoSubwords replacing onlyLongestMatch [lucene]

2025-03-05 Thread via GitHub
renatoh commented on PR #14311: URL: https://github.com/apache/lucene/pull/14311#issuecomment-2701699595 > @renatoh that seems fine, If you have a way to do it so it works. Because they are just booleans I wasn't sure? > > I also can't remember if there is a way that you can signal a

Re: [PR] introduce new parameter onlyLongestMatchNoSubwords replacing onlyLongestMatch [lucene]

2025-03-05 Thread via GitHub
rmuir commented on PR #14311: URL: https://github.com/apache/lucene/pull/14311#issuecomment-2701555710 @renatoh that seems fine, If you have a way to do it so it works. Because they are just booleans I wasn't sure? I also can't remember if there is a way that you can signal a deprecat

Re: [PR] introduce new parameter onlyLongestMatchNoSubwords replacing onlyLongestMatch [lucene]

2025-03-05 Thread via GitHub
renatoh commented on PR #14311: URL: https://github.com/apache/lucene/pull/14311#issuecomment-2701225973 @rmuir Sorry for rushing, but have you seen my suggestion regarding deprecating the constructor? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-05 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2701189251 @dweiss I can't tell from above -- are there other corpora that need a home still? -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-05 Thread via GitHub
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2701096339 > I guess thius 94GB comes from `33M x 768 x 4` bytes? Frankly I never test with indexes > ~2M docs, but maybe there is a call for the 33M-doc index in nightlies? Yeah ...