On 6/13/2013 7:51 PM, Utkarsh Sengar wrote: > Sure, I will reduce the count and see how it goes. The problem I have is, > after such a change, I need to reindex everything again, which again is > slow and takes time (40-60hours).
There should be no need to reindex after changing most things in solrconfig.xml. Changing cache sizes does not require it. Most of the time, reindexing is only required after changing schema.xml, but there are a few changes you can make to schema that don't require it. > Some queries are really bad, like this one: > http://explain.solr.pl/explains/bzy034qi > How can this be improved? I understand that there is something horribly > wrong here, but not sure what points to look at (Been using solr from the > last 20 days). You are using a *LOT* of query clauses against your allText field in that boost query. I assume that allText is your largest field. I'm not really sure, but based on what we're seeing here, I bet that a bq parameter doesn't get cached. With some additional RAM available, this might not be such a big problem. > The query is simple, although it used edismax. I have shared an explain > query above. Other than the query, this is my performance stats: > > iostat -m 5 result: http://apaste.info/hjNV > > top result: http://apaste.info/jlHN You've got a pretty well-sustained iowait around ten percent. You are I/O bound. You need more total RAM. With indexing only happening once a day, that doesn't sound like it's a factor. If you are also having problems with garbage collection because your heap is a little bit too small, that makes all the other problems worse. > For the initial training, I will hit solr 1.3M times and request 2000 > documents in each query. By the current speed (just one machine), it will > take me ~20 days to do the initial training. This is really mystifying. There is no need to send a million plus queries to warm your index. A few dozen or a few hundred queries should be all you need, and you don't need 2000 docs returned per query. Go with ten rows, or maybe a few dozen rows at most. Because you're using SSD, I'm not sure you need warming queries at all. Thanks, Shawn