mikemccand opened a new issue, #14182:
URL: https://github.com/apache/lucene/issues/14182

   ### Description
   
   When trying to understand why a shard seems to not do a good job merging, 
it's surprisingly difficult to gain visibility / understanding.  E.g. cases 
like https://github.com/apache/lucene/pull/14163 and 
https://github.com/apache/lucene/issues/13226.
   
   At Amazon Product Search, we are also trying to understand how our service 
behaves under update storms (many sudden real-time catalog updates), and its 
impact on merging / NRT segment replication.
   
   `IndexWriter` has an `InfoStream` which gives amazing verbosity on all that 
is happening, but it is too voluminous.
   
   I'd think we could make a small improvement to `InfoStream`.  Today, it 
writes under different components e.g. `SM` for segment merging.  I'd like to 
add a new component, `ST` (for "segment tracing"), which provides smallish 
amount of output about each flush (start and end, size, deletes), each merge 
(start and end, which segments, how many deletes at the start, how many 
carryover deletes (deletes that happened while merging was happening), when 
deletes are applied/written, and time to merge each index section (doc values, 
postings, knn, etc.)).
   
   IW/SM already writes much of this to InfoStream but it's too scattered / 
diffuse.  I'm hoping a new `ST` can be lighter weight and have the important 
debugging details that can help us understand issues like the ones 
linked/described above.  An application can set an `InfoStream` that captures 
just the `ST` messages ...
   
   Once we have this, the 2nd part of this effort is a simple tool that can 
digest the output of `ST` `InfoStream` and visualize, e.g. producing [videos 
like this one](https://youtu.be/YOklKW9LJNY?si=huEMNO8S2uVw_Djn) and mayb a 2D 
interactive canvas/chart that lays out a graphical rendition of all segments 
and their life times.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to