James,

Not enough memory and Garbage Collection?  Connecting to Solr via JConsole 
should show it.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: James Brady <james.colin.br...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Saturday, July 18, 2009 5:02:42 PM
> Subject: Truncated XML responses from CoreAdminHandler
> 
> The Solr application I'm working on has many concurrently active cores - of
> the order of 1000s at a time.
> 
> The management application depends on being able to query Solr for the
> current set of live cores, a requirement I've been satisfying using the
> STATUS core admin handler method.
> 
> However, once the number of active cores reaches a particular threshold
> (which I haven't determined exactly), the response to the STATUS method is
> truncated, resulting in malformed XML.
> 
> My debugging so far has revealed:
> 
>    - when doing STATUS queries from the local machine, they succeed,
>    untruncated, >90% of the time
>    - when local STATUS queries do fail, they are always truncated to the
>    same length: 73685 bytes in my case
>    - when doing STATUS queries from a remote machine, they fail due to
>    truncation every time
>    - remote STATUS queries are always truncated to the same length: 24704
>    bytes in my case
>    - the failing STATUS queries take visibly longer to complete on the
>    client - a few seconds for a truncated result versus <1 second for an
>    untruncated result
>    - all STATUS queries return a successful 200 HTTP code
>    - all STATUS queries are logged as returning in ~700ms in Solr's info log
>    - during failing (truncated) responses, Solr's CPU usage spikes to
>    saturation
>    - behaviour seems the same whatever client I use: wget, curl, Python, ...
> 
> Using Solr 1.3.0 694707, Jetty 6.1.3.
> 
> At the moment, the main puzzles for me are that the local and remote
> behaviour is so different. It leads me to think that it is something to do
> with the network transmission speed. But the response really isn't that big
> (untruncated it's ~1MB), and the CPU spike seems to suggest that something
> in the process of serialising the core information is taking too long and
> causing a timeout?
> 
> Any suggestions on settings to tweak, ways to get extra debug information,
> or ascertain the active core list in some other way would be much
> appreciated!
> 
> James

Reply via email to