James, Not enough memory and Garbage Collection? Connecting to Solr via JConsole should show it.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: James Brady <james.colin.br...@gmail.com> > To: solr-user@lucene.apache.org > Sent: Saturday, July 18, 2009 5:02:42 PM > Subject: Truncated XML responses from CoreAdminHandler > > The Solr application I'm working on has many concurrently active cores - of > the order of 1000s at a time. > > The management application depends on being able to query Solr for the > current set of live cores, a requirement I've been satisfying using the > STATUS core admin handler method. > > However, once the number of active cores reaches a particular threshold > (which I haven't determined exactly), the response to the STATUS method is > truncated, resulting in malformed XML. > > My debugging so far has revealed: > > - when doing STATUS queries from the local machine, they succeed, > untruncated, >90% of the time > - when local STATUS queries do fail, they are always truncated to the > same length: 73685 bytes in my case > - when doing STATUS queries from a remote machine, they fail due to > truncation every time > - remote STATUS queries are always truncated to the same length: 24704 > bytes in my case > - the failing STATUS queries take visibly longer to complete on the > client - a few seconds for a truncated result versus <1 second for an > untruncated result > - all STATUS queries return a successful 200 HTTP code > - all STATUS queries are logged as returning in ~700ms in Solr's info log > - during failing (truncated) responses, Solr's CPU usage spikes to > saturation > - behaviour seems the same whatever client I use: wget, curl, Python, ... > > Using Solr 1.3.0 694707, Jetty 6.1.3. > > At the moment, the main puzzles for me are that the local and remote > behaviour is so different. It leads me to think that it is something to do > with the network transmission speed. But the response really isn't that big > (untruncated it's ~1MB), and the CPU spike seems to suggest that something > in the process of serialising the core information is taking too long and > causing a timeout? > > Any suggestions on settings to tweak, ways to get extra debug information, > or ascertain the active core list in some other way would be much > appreciated! > > James