[I] SegmentDocValuesProducer checkIntegrity might open a dropped segment [lucene]

2024-01-16 Thread via GitHub
noAfraidStart opened a new issue, #13020: URL: https://github.com/apache/lucene/issues/13020 ### Description We are using HDFS for file storage and the softUpdateDocuments interface for writing data. We have found that during concurrent writes, the dvd files selected for merging

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-16 Thread via GitHub
mayya-sharipova commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1454345777 ## lucene/core/src/java/org/apache/lucene/util/hnsw/BlockingFloatHeap.java: ## @@ -0,0 +1,190 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Split taxonomy arrays across chunks [lucene]

2024-01-16 Thread via GitHub
msfroh commented on code in PR #12995: URL: https://github.com/apache/lucene/pull/12995#discussion_r1454324230 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java: ## @@ -68,25 +90,49 @@ public TaxonomyIndexArrays(IndexReader reader, Tax

Re: [PR] Split taxonomy arrays across chunks [lucene]

2024-01-16 Thread via GitHub
msfroh commented on code in PR #12995: URL: https://github.com/apache/lucene/pull/12995#discussion_r1454250264 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyCombined.java: ## @@ -669,20 +668,26 @@ public void testChildrenArraysInvariants() throws Excepti

Re: [PR] LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches [lucene]

2024-01-16 Thread via GitHub
github-actions[bot] commented on PR #12345: URL: https://github.com/apache/lucene/pull/12345#issuecomment-1894720893 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] `SynonymGraphFilter` should read FSTs off-heap? [lucene]

2024-01-16 Thread via GitHub
msfroh commented on issue #13005: URL: https://github.com/apache/lucene/issues/13005#issuecomment-1894635526 I was looking into how to implement this and I think I've mostly got it -- essentially, I would write the `SynonymMap` to a file (which could be an offline operation, basically "prec

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-16 Thread via GitHub
mayya-sharipova commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1894402548 I have done more experiments with different `interval` values: Cohere 786 dims: 1M vectors, k=10, fanout=90 | Interval | Avg visited nodes | QPS| Recall

Re: [I] join: repeat BytesRefHash.sort() in TermsQuery after TermsIncludingScoreQuery [lucene]

2024-01-16 Thread via GitHub
cpoerschke commented on issue #13018: URL: https://github.com/apache/lucene/issues/13018#issuecomment-1894343041 Just spotted that @gf2121's #13014 is also about this? Great, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] join: avoid repeat BytesRefHash.sort() in TermsQuery after TermsIncludingScoreQuery [lucene]

2024-01-16 Thread via GitHub
cpoerschke opened a new pull request, #13019: URL: https://github.com/apache/lucene/pull/13019 naive change for the #13018 issue. TODOs -- collaboration and pushes to the PR branch welcome * test coverage * confirmation that this would solve the `org.apache.solr.search.join.Scor

[I] join: repeat BytesRefHash.sort() in TermsQuery after TermsIncludingScoreQuery [lucene]

2024-01-16 Thread via GitHub
cpoerschke opened a new issue, #13018: URL: https://github.com/apache/lucene/issues/13018 ### Description * The `TermsIncludingScoreQuery` constructor as per https://github.com/apache/lucene/blob/releases/lucene/9.9.1/lucene/join/src/java/org/apache/lucene/search/join/TermsIncludingSc

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-16 Thread via GitHub
tveasey commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1453341457 ## lucene/core/src/java/org/apache/lucene/util/hnsw/BlockingFloatHeap.java: ## @@ -0,0 +1,190 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-16 Thread via GitHub
tveasey commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1453323532 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -27,25 +29,72 @@ */ public final class TopKnnCollector extends AbstractKnnCollector {

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-16 Thread via GitHub
tveasey commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1453309333 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -27,25 +29,72 @@ */ public final class TopKnnCollector extends AbstractKnnCollector {

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-16 Thread via GitHub
tveasey commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1453309333 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -27,25 +29,72 @@ */ public final class TopKnnCollector extends AbstractKnnCollector {

Re: [PR] Modernize LineFileDocs. [lucene]

2024-01-16 Thread via GitHub
s1monw commented on PR #12929: URL: https://github.com/apache/lucene/pull/12929#issuecomment-1893520487 I was working on adding some new BWC tests for the parent field and it seems BWC index creation and testing is relying on some things that changed here. Just flagging this for now while I

Re: [PR] Backport #12829 to 9.x [lucene]

2024-01-16 Thread via GitHub
s1monw commented on PR #13013: URL: https://github.com/apache/lucene/pull/13013#issuecomment-1893514491 this doesn't work as I expected. I will close this one for now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Backport #12829 to 9.x [lucene]

2024-01-16 Thread via GitHub
s1monw closed pull request #13013: Backport #12829 to 9.x URL: https://github.com/apache/lucene/pull/13013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [PR] LUCENE-10366: Override #readVInt and #readVLong for ByteBufferDataInput to avoid the abstraction confusion of #readByte. [lucene]

2024-01-16 Thread via GitHub
uschindler commented on PR #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-1893411279 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] Prevent parent field from added to existing index [lucene]

2024-01-16 Thread via GitHub
s1monw merged PR #13016: URL: https://github.com/apache/lucene/pull/13016 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] DV update files referenced by merge will be deleted by concurrent flush [lucene]

2024-01-16 Thread via GitHub
guojialiang92 commented on issue #13015: URL: https://github.com/apache/lucene/issues/13015#issuecomment-1893384217 I saw a similar issue [11751](https://github.com/apache/lucene/issues/11751). -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] LUCENE-10366: Override #readVInt and #readVLong for ByteBufferDataInput to avoid the abstraction confusion of #readByte. [lucene]

2024-01-16 Thread via GitHub
gf2121 merged PR #592: URL: https://github.com/apache/lucene/pull/592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.o

[PR] Fix DV update files referenced by merge will be deleted by concurrent flush [lucene]

2024-01-16 Thread via GitHub
guojialiang92 opened a new pull request, #13017: URL: https://github.com/apache/lucene/pull/13017 ### Description This PR aims to address issue #13015. A more detailed explanation of the issue and the reasoning behind the fix can be found in the report link above. ### Soluti

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-16 Thread via GitHub
jimczi commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1453088407 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -27,25 +29,72 @@ */ public final class TopKnnCollector extends AbstractKnnCollector {

[I] DV update files referenced by merge will be deleted by concurrent flush [lucene]

2024-01-16 Thread via GitHub
guojialiang92 opened a new issue, #13015: URL: https://github.com/apache/lucene/issues/13015 ### Description DV update files referenced by merge will be deleted by concurrent flush. For example, according to the following execution sequence, the DV update file referenced by merge w