bq: I am anticipating that this growth will slow down because there
will be repetitions
This will be true for your indexed data, but NOT for your stored data.
Each stored
field is stored as-is per document. It'll be compressed, so won't take
up the entire
250M, but it'll still be stored.
FWIW,
Er
Each day the index grows by ~250 MB; however I am anticipating that this
growth will slow down because there will be repetitions (just a guess). Its
not the order of growth but limitation of our infrastructure. Basically a
budgetary constraint :-)
Apparently there seems to be no problem than disk
By and large, stored fields are pretty irrelevant for resource
consumption _except_ for
disk space consumed. Sharded systems work fine, the
stored data is stored in the index files (*.fdt and *.fdx) files in
each segment on each shard.
But you haven't told us anything about your data. How much are
Absolutely. Solr will return the reference along the docs/results; those
references may be used to look-up the actual stuff. Such use cases aren't
hard to solve.
If the use case demands returning the actual stuff alongside the results,
it becomes non-trivial, especially during high loads.
To avoi
You have to index something with your Solr documents that
has meaning in _your_ system so you can find the
original record. You don't search this field, you just
return it with the search results and then use it to get
the original document.
If you're storing the original in a DB, this can be the