Hi all, I have just got a SOLR index working for the first time on a few 100,000 records from a custom database dump, and the results are very impressive, both in the speed it indexes (even on my macbook) and the response times.
If I want to index "what, where(grid based to 0.1 degree cells), when, who" type information (lets say a schema of 10 strings, 2 dates, 4 ints) what are the limitations going to be? Is there any documentation on whether indexes can be partitioned easily, so scaling is somewhat linear? My reasoning to look for this is our current searchable "index" is on a mysql database with 2 main fact tables of 150,000,000 records and 15,000,000 records which are normally joined for most queries. We are looking to increase to 10x that size so I am looking at Billions of records... How likely will this scale on SOLR? What's the biggest number of items people have indexed? How complicated do the queries have to get before things get slow? This is the kind of thing I am looking for: (name:"Passer domesticus*" AND cell:[36543 TO 43324] AND mod360Cell[45 TO 65] AND year:[1950 TO *]) - if you care, this is a search for "The bird of type Sparrows in a geo bounding box and collected/observed after 1950"... I'm going to be trying anyway, but any pointers appreciated (Hadoop perhaps?) Thanks, Tim PS - This is an open source open access project to create an index of biodiversity data (http://data.gbif.org) so your help is going towards a worthwhile cause!