jpountz commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2644276979
> The resistance to it then and still now surprises me because (at least in
my mind) there's a simple selector mechanism.
I agree with the value of routing to different segmen
dsmiley commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2641899421
Addressing this need would be amazing! Many search architectures (including
where I work) always filter to a specific field (say a doc type or tenant/user;
it depends). That 50-60
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2609475069
Make sense. I think we can extend MultiReader functionality to use it as a
combined view if we can support couple of read side features of IndexWriter
like opening a reader from
vigyasharma commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2608302325
> Having a Multi-Reader on all the child log-group directories still won't
provide a unified view of all group level segments associated with a Lucene
Index. Even now, OpenSearc
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2605229476
Or to elaborate more ```Searching across the N separate shards as if they
were a single index is also possible via MultiReader``` will require separate
Lucene indexes for differe
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2539436869
@vigyasharma @jpountz @mikemccand Any thoughts on the above approach on
using multiple IndexWriter for different group (tenenat) with a read only
combined view?
--
This is an
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2492296015
> Does the OpenSearch client directly work with 'n' different log-group
specific IndexWriters?
While writing logs, OpenSearch will interact with n' different log-group
spe
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2470112408
On some more analysis figured out an approach which addresses all the above
comments and obtain same improvement with different IndexWriter for different
group as we got with usi
vigyasharma commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2364348826
> 3\. Does require a new merge policy to merge the segments belonging to the
same group.
How do background index merges work with the original, separate DWPT based
approa
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360645812
## Approach 2: Using a physical directory for each group

To segregate s
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360651201
## Summary
In summary the problem can be broken down into three sub problems.
1. Having abstraction to write the data into different groups (Multiple
Writers)
2.
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360649893
## Approach 3: Combining group level IndexWriter with addIndexes

In thi
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360641099
Thanks [mikemccand](https://github.com/mikemccand) and
[vigyasharma](https://github.com/vigyasharma) for suggestions. Evaluated
different approaches to use different IndexWriter
vigyasharma commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2354135495
I wonder if we can leverage IndexWriter's `addIndexes(Directory... dirs)`
[API](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWrite
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2194731622
Thanks a lot for suggestions @jpountz and @mikemccand.
As suggested above, we worked on a POC to explore using separate IndexWriter
for different groups. Each IndexWriter
jpountz commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2147657867
> do we have such a class already (that would distinguish the tenants via
filename prefix or so)? That's a nice idea all by itself (separate from this
use case) -- maybe open a spin
mikemccand commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2145162839
I like @jpountz's idea of just using separate `IndexWriter`s for this
use-case, instead of adding custom routing logic to the separate DWPTs inside a
single `IndexWriter` and the
jpountz commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144402163
> However, implementing this approach would lead to significant overhead on
the client side (such as OpenSearch) both in the terms of code changes and
operational overhead like meta
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144396439
Attaching a [preliminary PR](https://github.com/apache/lucene/pull/13409)
for the POC related to above issue to share my understanding. Please note that
this is not the final PR.
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144392824
Attaching a [preliminary PR](https://github.com/apache/lucene/pull/13409)
for the POC related to above issue to share my understanding. Please note that
this is not the final PR.
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2133354180
> I agree that better organizing data across segments yields significant
benefits, I'm only advocating for doing this by maintaining a separate
IndexWriter for each group instead
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2129984751
Thanks for the suggestion. Above suggestion for clustering within the
segment does improves skipping of documents (especially when combined with [BKD
optimisation](https://github
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2126884759
Thanks Mike and Adrian for the feedback.
> You do not mention it explicitly in the issue description, but presumably
this only makes sense if an index sort is configured, o
jpountz commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2125281483
This is an interesting idea!
You do not mention it explicitly in the issue description, but presumably
this only makes sense if an index sort is configured, otherwise merges m
mikemccand commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2122642094
I like this idea! I hope we can find a simple enough API exposed through
IWC to enable the optional grouping.
This also has nice mechanical sympathy / symmetry with the di
RS146BIJAY opened a new issue, #13387:
URL: https://github.com/apache/lucene/issues/13387
### Description
## Issue
Today, Lucene internally creates multiple DocumentWriterPerThread (DWPT)
instances per shard to facilitate concurrent indexing across different
ingestion threads.
26 matches
Mail list logo