Sam, These are big numbers you are throwing around, especially the query volume. How big are these records that you have 4 billion of -- or put another way, how much space would it take up in a pure form like in CSV? And should I assume the searches you are doing are more than geospatial? In any case, a Solr solution here is going to involve many machines. The biggest number you propose is 10k queries per second which is hard to imagine.
I've seen some say Solr 4 might have 100M records per shard, although there is a good deal variability -- as usual, YMMV. But lets go with that for this paper-napkin calculation. You would need 40 shards of 100M documents each to get to 4000M (4B) documents. That is a lot of shards, but people have done it, I believe. This scales out to your document collection but not up to your query volume which is extremely high. I have some old benchmarks suggesting ~10ms geo queries on spatial queries for SOLR-2155 which was rolled into the spatial code in Lucene 4 (Solr adapters are on the way). But for a full query overhead and for a safer estimate, lets say 50ms. So perhaps you might get 20 concurrent queries per second (which seems high but we'll go with it). But you require 10k/sec(!) so this means you need 500 times the 20qps which means 500 *times* the base hardware to support the 40 shards I mentioned before. In other words, the 4B documents need to be replicated 500 times to support 10k/second queries. So theoretically, we're talking 500 clusters, each cluster having 40 shards -- at ~4 shards/machine this is 10 machines per cluster: 5,000 machines in total. Wow. Doesn't seem realistic. If you have a reference to some system or person's experience with any system that can, Solr or not, then please share. If you or anyone were to attempt to see if Solr scale's for their needs, a good approach is to consider just one shard non-replicated, or even better a handful that would all exist on one machine. Optimize it as much as you can. Then see how much data you can put on this machine and with what query-volume. From this point, it's basic math to see how many more such machines are required to scale out to your data size and up to your query volume. Care to explain why so much data needs to be searched at such a volume? Maybe you work for Google ;-) To your question on scalability vs PostGIS, I think Solr shines in its ability to scale out if you have the resources to do it. ~ David Smiley ----- Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995197.html Sent from the Solr - User mailing list archive at Nabble.com.