The objects are in a number of filesystems, taking up 80TB of space. The MySQL database is about 128GB, 117GB of which is table containing the metadata for the documents. We don't use all that metadata, just a subset. I don't have any way to really calculate the subset's size.

With seven shards, six of which are about 16.5GB and one that's about 1GB, the entire Solr index takes up about 100GB. In the near future we should be able to drop a number of stored fields from the index, making it smaller.

We use the dataimporthandler to get all this into Solr, with a custom build system written in Perl.


On 5/13/2011 11:05 AM, Jack Repenning wrote:
On May 13, 2011, at 7:59 AM, Shawn Heisey wrote:

The entire archive is about 80 terabytes, but we only index a subset of the 
metadata, stored in a MySQL database, which is about 100GB or so in size.

The Solr index (version 1.4.1) consists of six large shards, each about 16GB in 
size,
This is really useful data, Shawn, thanks! It's particularly interesting 
because the numbers are in the same ball-park as a project I'm considering.

Can you clarify one thing? What's the relationship you're describing between 
MySQL and Solr? I think you're saying that there's a 80TB MySQL database, with 
a 100GB Solr system in front, is that right? Or is the entire 80TB accessible 
through Solr directly?

-==-
Jack Repenning
Technologist
Codesion Business Unit
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
twitter: http://twitter.com/jrep










Reply via email to