Re: Support for huge data set?

Shawn Heisey Fri, 13 May 2011 10:36:24 -0700

The objects are in a number of filesystems, taking up 80TB of space.The MySQL database is about 128GB, 117GB of which is table containingthe metadata for the documents. We don't use all that metadata, just asubset. I don't have any way to really calculate the subset's size.

With seven shards, six of which are about 16.5GB and one that's about1GB, the entire Solr index takes up about 100GB. In the near future weshould be able to drop a number of stored fields from the index, makingit smaller.

We use the dataimporthandler to get all this into Solr, with a custombuild system written in Perl.



On 5/13/2011 11:05 AM, Jack Repenning wrote:

On May 13, 2011, at 7:59 AM, Shawn Heisey wrote:

The entire archive is about 80 terabytes, but we only index a subset of the 
metadata, stored in a MySQL database, which is about 100GB or so in size.

The Solr index (version 1.4.1) consists of six large shards, each about 16GB in 
size,

This is really useful data, Shawn, thanks! It's particularly interesting 
because the numbers are in the same ball-park as a project I'm considering.

Can you clarify one thing? What's the relationship you're describing between 
MySQL and Solr? I think you're saying that there's a 80TB MySQL database, with 
a 100GB Solr system in front, is that right? Or is the entire 80TB accessible 
through Solr directly?

-==-
Jack Repenning
Technologist
Codesion Business Unit
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
twitter: http://twitter.com/jrep

Re: Support for huge data set?

Reply via email to