What are the limits? Billions of records anyone?

tim robertson Mon, 24 Mar 2008 14:30:42 -0700

Hi all,
I have just got a SOLR index working for the first time on a few 100,000
records from a custom database dump, and the results are very impressive,
both in the speed it indexes (even on my macbook) and the response times.


If I want to index "what, where(grid based to 0.1 degree cells), when, who"
type information (lets say a schema of 10 strings, 2 dates, 4 ints) what are
the limitations going to be?

Is there any documentation on whether indexes can be partitioned easily, so
scaling is somewhat linear?

My reasoning to look for this is our current searchable "index" is on a
mysql database with 2 main fact tables of 150,000,000 records and 15,000,000
records which are normally joined for most queries.  We are looking to
increase to 10x that size so I am looking at Billions of records...

How likely will this scale on SOLR?
What's the biggest number of items people have indexed?
How complicated do the queries have to get before things get slow? This is
the kind of thing I am looking for:
(name:"Passer domesticus*" AND cell:[36543 TO 43324] AND mod360Cell[45 TO
65] AND year:[1950 TO *])
- if you care, this is a search for "The bird of type Sparrows in a geo
bounding box and collected/observed after 1950"...

I'm going to be trying anyway, but any pointers appreciated (Hadoop
perhaps?)

Thanks,

Tim
PS - This is an open source open access project to create an index of
biodiversity data (http://data.gbif.org) so your help is going towards a
worthwhile cause!

What are the limits? Billions of records anyone?

Reply via email to