Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2025-02-07 Thread via GitHub
jpountz commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2644276979 > The resistance to it then and still now surprises me because (at least in my mind) there's a simple selector mechanism. I agree with the value of routing to different segmen

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2025-02-06 Thread via GitHub
dsmiley commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2641899421 Addressing this need would be amazing! Many search architectures (including where I work) always filter to a specific field (say a doc type or tenant/user; it depends). That 50-60

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2025-01-23 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2609475069 Make sense. I think we can extend MultiReader functionality to use it as a combined view if we can support couple of read side features of IndexWriter like opening a reader from

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2025-01-22 Thread via GitHub
vigyasharma commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2608302325 > Having a Multi-Reader on all the child log-group directories still won't provide a unified view of all group level segments associated with a Lucene Index. Even now, OpenSearc

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2025-01-21 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2605229476 Or to elaborate more ```Searching across the N separate shards as if they were a single index is also possible via MultiReader``` will require separate Lucene indexes for differe

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-12-12 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2539436869 @vigyasharma @jpountz @mikemccand Any thoughts on the above approach on using multiple IndexWriter for different group (tenenat) with a read only combined view? -- This is an

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-11-21 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2492296015 > Does the OpenSearch client directly work with 'n' different log-group specific IndexWriters? While writing logs, OpenSearch will interact with n' different log-group spe

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-11-12 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2470112408 On some more analysis figured out an approach which addresses all the above comments and obtain same improvement with different IndexWriter for different group as we got with usi

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-20 Thread via GitHub
vigyasharma commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2364348826 > 3\. Does require a new merge policy to merge the segments belonging to the same group. How do background index merges work with the original, separate DWPT based approa

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-19 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360645812 ## Approach 2: Using a physical directory for each group ![approach2](https://github.com/user-attachments/assets/223686c4-5c0c-49c1-b54c-1aee22a2d1bf) To segregate s

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-19 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360651201 ## Summary In summary the problem can be broken down into three sub problems. 1. Having abstraction to write the data into different groups (Multiple Writers) 2.

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-19 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360649893 ## Approach 3: Combining group level IndexWriter with addIndexes ![approach3](https://github.com/user-attachments/assets/32ea3baa-0ae6-4a60-84e9-352a0e1e6a5e) In thi

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-19 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360641099 Thanks [mikemccand](https://github.com/mikemccand) and [vigyasharma](https://github.com/vigyasharma) for suggestions. Evaluated different approaches to use different IndexWriter

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-16 Thread via GitHub
vigyasharma commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2354135495 I wonder if we can leverage IndexWriter's `addIndexes(Directory... dirs)` [API](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWrite

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-07-02 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2194731622 Thanks a lot for suggestions @jpountz and @mikemccand. As suggested above, we worked on a POC to explore using separate IndexWriter for different groups. Each IndexWriter

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-06-04 Thread via GitHub
jpountz commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2147657867 > do we have such a class already (that would distinguish the tenants via filename prefix or so)? That's a nice idea all by itself (separate from this use case) -- maybe open a spin

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-06-03 Thread via GitHub
mikemccand commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2145162839 I like @jpountz's idea of just using separate `IndexWriter`s for this use-case, instead of adding custom routing logic to the separate DWPTs inside a single `IndexWriter` and the

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-06-02 Thread via GitHub
jpountz commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144402163 > However, implementing this approach would lead to significant overhead on the client side (such as OpenSearch) both in the terms of code changes and operational overhead like meta

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-06-02 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144396439 Attaching a [preliminary PR](https://github.com/apache/lucene/pull/13409) for the POC related to above issue to share my understanding. Please note that this is not the final PR.

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-06-02 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144392824 Attaching a [preliminary PR](https://github.com/apache/lucene/pull/13409) for the POC related to above issue to share my understanding. Please note that this is not the final PR.

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-05-27 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2133354180 > I agree that better organizing data across segments yields significant benefits, I'm only advocating for doing this by maintaining a separate IndexWriter for each group instead

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-05-24 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2129984751 Thanks for the suggestion. Above suggestion for clustering within the segment does improves skipping of documents (especially when combined with [BKD optimisation](https://github

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-05-23 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2126884759 Thanks Mike and Adrian for the feedback. > You do not mention it explicitly in the issue description, but presumably this only makes sense if an index sort is configured, o

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-05-22 Thread via GitHub
jpountz commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2125281483 This is an interesting idea! You do not mention it explicitly in the issue description, but presumably this only makes sense if an index sort is configured, otherwise merges m

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-05-21 Thread via GitHub
mikemccand commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2122642094 I like this idea! I hope we can find a simple enough API exposed through IWC to enable the optional grouping. This also has nice mechanical sympathy / symmetry with the di

[I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-05-20 Thread via GitHub
RS146BIJAY opened a new issue, #13387: URL: https://github.com/apache/lucene/issues/13387 ### Description ## Issue Today, Lucene internally creates multiple DocumentWriterPerThread (DWPT) instances per shard to facilitate concurrent indexing across different ingestion threads.