Re: TB scale

2014-04-26 Thread Walter Underwood
I think Hathi Trust has a few terabytes of index. They do full-text search on 10 million books. http://www.hathitrust.org/blogs/Large-scale-Search wunder On Apr 26, 2014, at 8:36 AM, Toke Eskildsen wrote: >> Anyone with experience, suggestions or lessons learned in the 10 -100 TB >> scale th

RE: TB scale

2014-04-26 Thread Toke Eskildsen
> Anyone with experience, suggestions or lessons learned in the 10 -100 TB > scale they'd like to share? > Researching optimum design for a Solr Cloud with, say, about 20TB index. We're building a web archive with a projected index size of 20TB (distributed in 20 shards). Some test results and a

Re: TB scale

2014-04-25 Thread Shawn Heisey
On 4/25/2014 1:48 PM, Ed Smiley wrote: > Anyone with experience, suggestions or lessons learned in the 10 -100 TB > scale they'd like to share? > Researching optimum design for a Solr Cloud with, say, about 20TB index. You've gotten some good information already in the replies that have come your

Re: TB scale

2014-04-25 Thread Yonik Seeley
How many documents? That can be just as important (often more important) than total index size. Some other details, like the types of requests, would be helpful (i.e. what the index will be used for... the latency requirements of requests, if you will be faceting, etc). -Yonik http://heliosearch.

Re: TB scale

2014-04-25 Thread Ed Smiley
Not looking for a cookbook. Just curious to hear some war stories since this is relatively rare. ‹Ed :) -- Ed Smiley, Senior Software Architect, Ebooks ProQuest | 161 Evelyn Ave. | Mountain View, CA 94041 USA | +1 640 475 8700 ext. 3772 ed.smi...@proquest.com www.proquest.com

Re: TB scale

2014-04-25 Thread Otis Gospodnetic
Hi Ed, Unfortunately, there is no good *general* advice, so you'd need to provide a lot more detail to get useful help. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Fri, Apr 25, 2014 at 3:48 PM, Ed Smiley wrote: > Any

Re: TB scale

2014-04-25 Thread Jack Krupansky
Also take a look at using DataStax Enterprise for managing large distributed databases, using Cassandra for the system of record data storage and Solr for indexing and search. See: http://www.datastax.com/what-we-offer/products-services/datastax-enterprise How many documents is your 20TB? --