Of course testing is best, but you can also get an idea of the size of
the non-storage part of your index by looking in the solr index folder
and subtracting the size of the files containing the stored fields from
the total size of the index. This depends of course on the internal
storage strategy of Lucene and may change from release to release, but
it is documented. The .fdt and .fdx files are the stored field files
(currently, at least, and if you don't have everything in a compound
file). If you are indexing term vectors (.tvd and .tvf files) as well,
I think these may also be able to be excluded from the index size also
when calculating the required memory, at least based on typical usage
patterns for term vectors (ie highlighting).
I wonder if there's any value in providing this metric (total index size
- stored field size - term vector size) as part of the admin panel? Is
it meaningful? It seems like there would be a lot of cases where it
could give a good rule of thumb for memory sizing, and it would save
having to root around in the index folder.
-Mike
On 11/29/14 12:16 PM, Erick Erickson wrote:
bq: You should have memory to fit your whole database in disk cache and then
some more.
I have to disagree here if for no other reason than stored data, which
is irrelevant
for searching, may make up virtually none or virtually all of your
on-disk space.
Saying it all needs to fit in disk cache is too broad-brush a
statement, gotta test.
In this case, though, I _do_ think that there's not enough memory here, Toke's
comments are spot on.
On Sat, Nov 29, 2014 at 2:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:
Po-Yu Chuang [ratbert.chu...@gmail.com] wrote:
[...] Everything works fine now, but I noticed that the load
average of the server is high because there is constantly
heavy disk read access. Please point me some directions.
RAM: 18G
Solr home: 185G
disk read access constantly 40-60M/s
Solr search performance is tightly coupled to the speed of small random reads.
There are two obvious ways of ensuring that in these days:
1) Add more RAM to the server, so that the disk cache can hold a larger part of the
index. If you add enough RAM (depends on your index, but 50-100% of the index size is a
rule of thumb), you get "ideal" storage speed, by which I mean that the
bottleneck moves away from storage. If you are using spinning drives, the 18GB of RAM is
not a lot for a 185GB index.
2) Use SSDs instead of spinning drives (if you do not already do so). The
speed-up depends a lot on what you are doing, but is is a cheap upgrade and it
can later be coupled with extra RAM if it is not enough in itself.
The Solr Wiki has this: https://wiki.apache.org/solr/SolrPerformanceProblems
And I have this: http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/
- Toke Eskildsen