date:20250304

Re: [I] Writing too many identical vector documents can cause flush blocking [lucene]

2025-03-04 Thread via GitHub

benwtrent closed issue #14330: Writing too many identical vector documents can cause flush blocking URL: https://github.com/apache/lucene/issues/14330 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub

mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2697675763 > [@mikemccand](https://github.com/mikemccand) would you be able to expose the files [@dsmiley](https://github.com/dsmiley) rescued on your server? oh, hmm, not I haven't y

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-04 Thread via GitHub

benwtrent commented on PR #14304: URL: https://github.com/apache/lucene/pull/14304#issuecomment-2697969649 I compared this branch with main. There are measurable improvements, but the quantization step isn't the main bottle neck. Vector comparisons still dominate the costs. But, its a nice

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-03-04 Thread via GitHub

benwtrent merged PR #14078: URL: https://github.com/apache/lucene/pull/14078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub

rmuir commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2697599407 @dweiss we could fetch https://whimsy.apache.org/public/public_ldap_people.json and retrieve committer's GPG fingerprint that way? -- This is an automated message from the Apache G

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub

mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2697666568 Oooh we have an official S3 bucket to use now? I had already uploaded the benchy corpus files to my own S3 bucket ... I think the URLs are in the setup.py (just renamed to `init

Re: [I] Writing too many identical vector documents can cause flush blocking [lucene]

2025-03-04 Thread via GitHub

benwtrent commented on issue #14330: URL: https://github.com/apache/lucene/issues/14330#issuecomment-2697394527 This particular (many duplicate vectors) case is handled here: https://github.com/apache/lucene/pull/14215 But the overall issue of connectComponents taking until the "heat

Re: [I] Stop duplicating per-segment work across segment partitions [lucene]

2025-03-04 Thread via GitHub

msokolov commented on issue #13745: URL: https://github.com/apache/lucene/issues/13745#issuecomment-2698550635 HNSW vector search heavy lifting is done in `rewrite`, so out of scope for this, right? Maybe multi-term queries would need to do some work. What about join queries? TermInSet quer

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-03-04 Thread via GitHub

msokolov commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2698456606 Maybe as a short-term mitigation we should revert or disable the `connectComponents` impl since its supposed improvements are kind of theoretical and it comes with a deadly vulnera

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-03-04 Thread via GitHub

msokolov commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2698452319 I tried indexing some [NOAA climate data](https://www.ncei.noaa.gov/products/land-based-station/noaa-global-temp) that is four-dimensional (temperature over last 150 years for ever

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-03-04 Thread via GitHub

lpld commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2698569925 Hi @benwtrent Thanks again for your previous comment. I was able to modify luceneutil and run some benchmarks. I am quite new to lucene, so I would appreciate some help in understan

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub

dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2698418094 Hmm... 100 gb may be stretching Apache Infra's patience... I don't even know if this bucket has a limit of some sort. -- This is an automated message from the Apache Git Service.

Re: [PR] Use read advice consistently in the knn vector formats [lucene]

2025-03-04 Thread via GitHub

jimczi commented on PR #14076: URL: https://github.com/apache/lucene/pull/14076#issuecomment-2698222473 > I do think some APIs like updateReadAdvice and finishMerge are helpful, I would want to see if we want to keep those and have a noop for this use case. this api was added in [Lucene 10

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-03-04 Thread via GitHub

github-actions[bot] commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2699339952 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Stop duplicating per-segment work across segment partitions [lucene]

2025-03-04 Thread via GitHub

javanna commented on issue #13745: URL: https://github.com/apache/lucene/issues/13745#issuecomment-2699189350 > HNSW vector search heavy lifting is done in rewrite, so out of scope for this, right? I believe so, mostly because query rewrite does not parallelize on slices, but across

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-03-04 Thread via GitHub

navneet1v commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2699321614 > @benwtrent @navneet1v I wonder if either of you were able to replicate benchmarks? (FYI I also opened [facebookresearch/faiss#4186](https://github.com/facebookresearch/faiss/pull/418

Re: [PR] Avoid using time zones that emit warnings (jdk25+) [lucene]

2025-03-04 Thread via GitHub

uschindler commented on PR #14328: URL: https://github.com/apache/lucene/pull/14328#issuecomment-2697079923 Looks fine! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub

dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2696786476 There are two or three references in test files. There is one reference remaining in releaseWizard.py: ``` key_url = "https://home.apache.org/keys/committer/%s.asc"; % id.strip(

Re: [PR] Optimize commit retention policy to maintain only the last 5 commits [lucene]

2025-03-04 Thread via GitHub

DivyanshIITB commented on PR #14325: URL: https://github.com/apache/lucene/pull/14325#issuecomment-2696661645 Thank you for your feedback! I understand your concern that KeepOnlyLastCommit might imply retaining only a single commit. My intention behind modifying this policy to retain the la

Re: [PR] Support load per-iteration replacement of NamedSPI [lucene]

2025-03-04 Thread via GitHub

javanna commented on PR #14275: URL: https://github.com/apache/lucene/pull/14275#issuecomment-2697098397 I agree with everything you wrote above @uschindler ! I can try and update my existing PR targeted at suggestion fields (#14270), following your suggested approach. The PR current

Re: [PR] Support load per-iteration replacement of NamedSPI [lucene]

2025-03-04 Thread via GitHub

ChrisHegarty commented on PR #14275: URL: https://github.com/apache/lucene/pull/14275#issuecomment-2697094951 > My proposal would be: Let's add some key-value pairs of "codec options" like done in Analyzers, that can be passed as part of the IndexWriterConfig (while writing) or passed to Di

[I] Writing too many identical vector documents can cause flush blocking [lucene]

2025-03-04 Thread via GitHub

weizijun opened a new issue, #14330: URL: https://github.com/apache/lucene/issues/14330 ### Description I found a serious bad case. When I write all the same vector docs, It will cause flush blocked. The cost comes from the `connectComponents` process. When all vectors are the s

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-03-04 Thread via GitHub

kaivalnp commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2697323024 @benwtrent @navneet1v I wonder if either of you were able to replicate benchmarks? (FYI I also opened https://github.com/facebookresearch/faiss/pull/4186 to start publishing the C_AP

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-03-04 Thread via GitHub

kaivalnp commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2697320627 Summary of latest changes: 1. Added tests! These will only run if `libfaiss_c.so` (along with all dependencies) is present during runtime (in `$LD_LIBRARY_PATH` or `-Djava.library.pa

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-03-04 Thread via GitHub

msokolov commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2698449439 I guess thius 94GB comes from 33M*768*4 bytes? Frankly I never test with indexes > ~2M docs, but maybe there is a call for the 33M-doc index in nightlies? -- This is an automate

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-03-04 Thread via GitHub

benwtrent commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2698684914 @lpld here is my Lucene util changes: https://github.com/mikemccand/luceneutil/pull/348 > What exactly do the numbers in the description of this pull request mean? When you say

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-03-04 Thread via GitHub

benwtrent commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2698475184 The goal of connectComponents is to help graphs that have gaps in their connectivity. However, when its needed most (e.g. tons of gaps and poor connectivity), it does more harm th

Re: [I] Writing too many identical vector documents can cause flush blocking [lucene]

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

Re: [I] Writing too many identical vector documents can cause flush blocking [lucene]

Re: [I] Stop duplicating per-segment work across segment partitions [lucene]

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

Re: [PR] Use read advice consistently in the knn vector formats [lucene]

Re: [PR] Use Vector API to decode BKD docIds [lucene]

Re: [I] Stop duplicating per-segment work across segment partitions [lucene]

Re: [PR] Add a Faiss codec for KNN searches [lucene]

Re: [PR] Avoid using time zones that emit warnings (jdk25+) [lucene]

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

Re: [PR] Optimize commit retention policy to maintain only the last 5 commits [lucene]

Re: [PR] Support load per-iteration replacement of NamedSPI [lucene]

Re: [PR] Support load per-iteration replacement of NamedSPI [lucene]

[I] Writing too many identical vector documents can cause flush blocking [lucene]

Re: [PR] Add a Faiss codec for KNN searches [lucene]

Re: [PR] Add a Faiss codec for KNN searches [lucene]

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

27 matches

Site Navigation

Mail list logo

Footer information