There are also docValues files as well, right? And they have different memory requirements depending on how they are setup. (not 100% sure what I am trying to say here, though)
Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 29 November 2014 at 13:16, Michael Sokolov <msoko...@safaribooksonline.com> wrote: > Of course testing is best, but you can also get an idea of the size of the > non-storage part of your index by looking in the solr index folder and > subtracting the size of the files containing the stored fields from the > total size of the index. This depends of course on the internal storage > strategy of Lucene and may change from release to release, but it is > documented. The .fdt and .fdx files are the stored field files (currently, > at least, and if you don't have everything in a compound file). If you are > indexing term vectors (.tvd and .tvf files) as well, I think these may also > be able to be excluded from the index size also when calculating the > required memory, at least based on typical usage patterns for term vectors > (ie highlighting). > > I wonder if there's any value in providing this metric (total index size - > stored field size - term vector size) as part of the admin panel? Is it > meaningful? It seems like there would be a lot of cases where it could give > a good rule of thumb for memory sizing, and it would save having to root > around in the index folder. > > -Mike > > > On 11/29/14 12:16 PM, Erick Erickson wrote: >> >> bq: You should have memory to fit your whole database in disk cache and >> then >> some more. >> >> I have to disagree here if for no other reason than stored data, which >> is irrelevant >> for searching, may make up virtually none or virtually all of your >> on-disk space. >> Saying it all needs to fit in disk cache is too broad-brush a >> statement, gotta test. >> >> In this case, though, I _do_ think that there's not enough memory here, >> Toke's >> comments are spot on. >> >> On Sat, Nov 29, 2014 at 2:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk> >> wrote: >>> >>> Po-Yu Chuang [ratbert.chu...@gmail.com] wrote: >>>> >>>> [...] Everything works fine now, but I noticed that the load >>>> average of the server is high because there is constantly >>>> heavy disk read access. Please point me some directions. >>>> RAM: 18G >>>> Solr home: 185G >>>> disk read access constantly 40-60M/s >>> >>> Solr search performance is tightly coupled to the speed of small random >>> reads. There are two obvious ways of ensuring that in these days: >>> >>> 1) Add more RAM to the server, so that the disk cache can hold a larger >>> part of the index. If you add enough RAM (depends on your index, but 50-100% >>> of the index size is a rule of thumb), you get "ideal" storage speed, by >>> which I mean that the bottleneck moves away from storage. If you are using >>> spinning drives, the 18GB of RAM is not a lot for a 185GB index. >>> >>> 2) Use SSDs instead of spinning drives (if you do not already do so). The >>> speed-up depends a lot on what you are doing, but is is a cheap upgrade and >>> it can later be coupled with extra RAM if it is not enough in itself. >>> >>> The Solr Wiki has this: >>> https://wiki.apache.org/solr/SolrPerformanceProblems >>> And I have this: >>> http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/ >>> >>> - Toke Eskildsen > >