I don't know anything about trying to use map-reduce with Solr.

But I can tell you that with about 6 million entries in the result set, and around 10 million values to facet on (facetting on a multi-value field) -- I still get fine performance in my application. In the worst case it can take maybe 800ms for my complete query when nothing useful is in the caches, which isn't great, but is FAR from 5 minutes!

Now, 100 million values is an order of magnitude more than 10 million -- but it still seems like it ought not to be that slow. Not sure what's making it so slow for you. Could you need more RAM allocated to the JVM? I have found that facetting sometimes gets pathologically slow when I don't have enough RAM -- even though I'm not getting any OOM errors or anything. Of course, I'm not sure exactly what "enough RAM" is for your use case -- in my case I'm giving my JVM about 5G of heap. I also make sure to use facet.method=fc for these high-ordinality fields (forget if that's the default in 1.4.1 or not). I also do some warming queries at startup to try and fill the various caches that might be involved in facetting -- but I don't entirely understand what I'm doing there, and that isn't your problem, because that would only effect the first time you did such a facetting query, but you're getting the pathological 5min result times on subsequent times too.

I am definitely not an expert in the internals of Solr that effect this stuff, I'm just reporting my experience, and from my experience -- your experience does not match mine.

Jonathan

On 3/16/2011 8:05 AM, Dmitry Kan wrote:
Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the
trigrams field with about 1 million of entries in the result set and more
than 100 million of entries to facet on in the index. Currently the faceted
search is very slow, taking about 5 minutes per query. Would running on a
cloud with Hadoop make it faster (to seconds) as faceting seems to be a
natural map-reduce task?

Are there any other options to look into before stepping into the cloud?

Please let me know, if you need specific details on the schema / solrconfig
setup or the like.

Reply via email to