Travis - Whether the index is bigger than the original content depends on what you need to do with it in Solr. One of the primary deciding factors is if you need to use highlighting, which currently requires the fields to be highlighted be stored. Stored fields will take up about the same space as the original documents (text-wise, likely a bit smaller than, say, the actual Word doc itself). If you don't need highlighting or the contents stored for other purposes, then you'll have a dramatically smaller index than the original (roughly 35% the size, generally).
Erik On Oct 11, 2011, at 08:36 , Travis Low wrote: > Greetings. I have a paltry 23,000 database records that point to a > voluminous 300GB worth of PDF, Word, Excel, and other documents. We are > planning on indexing the records and the documents they point to. I have no > clue on how we can calculate what kind of server we need for this. I > imagine the index isn't going to be bigger than the documents (is it?) so I > suppose 1TB is a starting point for disk space. But what kind of processing > power and memory might we need? Can anyone please point me in the right > direction?