If you’re using Solr 8.2 or newer there’s a built-in index analysis tool that
gives you a better understanding of what kind of data in your index occupies
the most disk space, so that you can tweak your schema accordingly:
https://lucene.apache.org/solr/guide/8_2/collection-management.html#colst
Yup, I find the right calculation to be as much ram as the server can take,
and as much SSD space as it will hold, when you run out, buy another server
and repeat. machines/ram/SSD's are cheap. just get as much as you can.
On Mon, Feb 3, 2020 at 11:59 AM Walter Underwood
wrote:
> What he said.
What he said.
But if you must have a number, assume that the index will be as big as your
(text) data. It might be 2X bigger or 2X smaller. Or 3X or 4X, but that is a
starting point. Once you start updating, the index might get as much as 2X
bigger before merges.
Do NOT try to get by with the
I’ve always had trouble with that advice, that RAM size should be JVM + index
size. I’ve seen 300G indexes (as measured by the size of the data/index
directory) run in 128G of memory.
Here’s the long form:
https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitiv