On 05/15/2011 11:48 AM, Erick Erickson wrote:
Where are the documents coming from? Because storing them ONLY in
Solr risks losing them if your index is somehow hosed.
In our case, we generally have source documents and can reproduce the index if need be, but that's a good point.
Storing them externally only has the advantage that your index will be
much smaller, which helps when replicating as you scale. The downside
here is that highlighting will be more resource-intensive since you're
re-analyzing text in order to highlight.
I had been imagining that the Highlighter could use stored term positions so as to avoid re-analysis. Is this incompatible with external storage?

We might conceivably need to replicate the documents anyway, even if they are stored externally, in order to make them available to a farm of servers, although a SAN is another possibility here.

My main concern about storing internally was the cost of merging (optimizing) the index. Presumably that would be increased if the docs are stored in it.
So, as usual, "it depends" (tm). What is the scale you need? What
is the QPS you're thinking of supporting?
Things are working well at a small scale, and in that environment I think all of these solutions work more or less equally well. We're worrying about 10's of millions of documents and QPS around 50, so I expect we will have some significant challenges in coordinating a cluster of servers, and we're trying to plan as well as we can for that. We expect updates to be performed in a "batch" mode - they don't have to be real-time, but they might need to be daily.

-Mike

Reply via email to