Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

via GitHub Fri, 07 Feb 2025 14:55:35 -0800


jpountz commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2644276979


   > The resistance to it then and still now surprises me because (at least in 
my mind) there's a simple selector mechanism.
   
   I agree with the value of routing to different segments based on the value 
of a field, e.g. it's probably a good idea for e-commerce catalogs to have 
their biggest categories extracted to their own Lucene indexes, but I still 
think that this should be implemented on top of `IndexWriter` rather than 
within `IndexWriter`. Doing is it in `IndexWriter` not simpler, it forces 
flushing to become aware of the routing (because flushing the largest DWPT is 
no longer the best approach when routing is at play), it forces merging to 
become aware of the routing (to preserve the routing). I like that doing it on 
top of `IndexWriter` naturally decouples it from flushing/merging, which 
in-turn makes it easier to do smarter things like learning a good grouping 
function on the way based on the data. I understand that Solr, luceneserver, 
Elasticsearch and OpenSearch are based on the assumption that a shard maps to a 
single Lucene index, and assumptions are hard to change, but it still looks to 
me li
 ke a better problem to solve.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

Reply via email to