Hi, On Sat, Nov 29, 2014 at 2:27 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote:
> On 11/29/14 1:30 PM, Toke Eskildsen wrote: > >> Michael Sokolov [msoko...@safaribooksonline.com] wrote: >> >>> I wonder if there's any value in providing this metric (total index size >>> - stored field size - term vector size) as part of the admin panel? Is >>> it meaningful? It seems like there would be a lot of cases where it >>> could give a good rule of thumb for memory sizing, and it would save >>> having to root around in the index folder. >>> >> At Lucene/Solr Revolution, I talked with Alexandre Rafalovitch about >> this. We know (https://lucidworks.com/blog/sizing-hardware-in-the- >> abstract-why-we-dont-have-a-definitive-answer/) that we cannot get the >> full picture of an index, but it is a weekly occurrence on this mailing >> list that people asks questions where it helps to have a gist of the index >> metrics and how the index is used. >> >> Some sort of "Copy the content of this concentrated metrics box, when you >> need to talk with other people about your index"-functionality in the admin >> panel might help with this. To get an idea of usage, it could also contain >> a few non-filled fields, such as "peak queries per second" or "typical >> queries". >> >> - Toke Eskildsen >> > Yes - the cautions about the need for prototyping are all very well, but > even if you take that advice, and build a prototype, it's not clear how to > tell whether your setup has enough memory or not. You can add more and > measure response times, but even then you only have a gross measurement, > and no way of knowing where, in detail, the memory is being used. Also, > you might be able to improve your system to make better use of memory with > more precise information. It seems like we ought to be able to monitor a > running system, observe its memory requirements over time, and report on > those. > +1 to that! I haven't been following this aspect of development super closely, but I believe there are memory/size estimators for various things at Lucene level that Elasticsearch is nicely exposing via its stats API. I don't know the specifics around those estimators without digging in, otherwise I'd open a JIRA, because I think this is valuable information -- at Sematext we regularly deal with hardware sizing, memory / CPU usage estimates, etc. etc., so the more of this info is surfaced the easier it will be for people to work with Solr. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/