Thanks, Erik!  We probably won't use highlighting.  Also, documents are
added but *never* deleted.

Does anyone have comments about memory and CPU resources required for
indexing the 300GB of documents in a "reasonable" amount of time?  It's okay
if the initial indexing takes hours or maybe even days, but not too many
days.  Do we need 16GB of memory?  32GB?  8-core processor?  I have zero
sense of server requirements and I would appreciate any guidance.

Do I need to be concerned about performance/resources later, when adding
documents to an existing (large) index?

cheers,

Travis

On Tue, Oct 11, 2011 at 9:49 AM, Erik Hatcher <erik.hatc...@gmail.com>wrote:

> Travis -
>
> Whether the index is bigger than the original content depends on what you
> need to do with it in Solr.  One of the primary deciding factors is if you
> need to use highlighting, which currently requires the fields to be
> highlighted be stored.  Stored fields will take up about the same space as
> the original documents (text-wise, likely a bit smaller than, say, the
> actual Word doc itself).  If you don't need highlighting or the contents
> stored for other purposes, then you'll have a dramatically smaller index
> than the original (roughly 35% the size, generally).
>
>        Erik
>
>
> On Oct 11, 2011, at 08:36 , Travis Low wrote:
>
> > Greetings.  I have a paltry 23,000 database records that point to a
> > voluminous 300GB worth of PDF, Word, Excel, and other documents.  We are
> > planning on indexing the records and the documents they point to.  I have
> no
> > clue on how we can calculate what kind of server we need for this.  I
> > imagine the index isn't going to be bigger than the documents (is it?) so
> I
> > suppose 1TB is a starting point for disk space.  But what kind of
> processing
> > power and memory might we need?  Can anyone please point me in the right
> > direction?
>
>


-- 

**

*Travis Low, Director of Development*


** <t...@4centurion.com>* *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed to
be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from the
content of this email.

Reply via email to