On 7/25/2013 11:39 AM, Tom Burton-West wrote:
Hello,
I am running solr 4.2.1 on 3 shards and have about 365 million documents in
the index total.
I sent a query asking for 1 million rows at a time, but I keep getting an
error claiming that there is an invalid version or data not in javabin
format (see below)
If I lower the number of rows requested to 100,000, I have no problems.
Does Solr have a limit on number of rows that can be requested or is this
a bug?
That particular javabin error (expected 2, but 60) usually means that
the response it got was something other than javabin, typically HTML or XML.
I was going to say that you should hopefully get a more meaningful error
message from the server log, but it appears that what you included *IS*
the server log, so I'm really confused. The error message you're
getting is typically something you see on the *client* side.
After some testing on my server, I suspect that what's happening here is
that the initial shard query (the one with fl=uniqueKeyField,score) is
working, but then when Solr makes the HUGE subsequent requests for the
actual documents it is interested in, the list is too big to fit in the
server-side POST buffer, which defaults to 2MB. Those queries need to
be big enough to include an "ids" parameter that is a comma-separated
list of values from your uniqueKey. In my case, each of those values
could be 32 characters, so the id list could be up to 33MB for a million
of them. Most of them are significantly shorter, so a 32MB buffer would
be big enough.
Either multipartUploadLimitInKB doesn't work properly, or there may be
some hard limits built into the servlet container, because I set
multipartUploadLimitInKB in the requestDispatcher config to 32768 and it
still didn't work. I wonder, perhaps there is a client-side POST buffer
limit as well as the servlet container limit, which comes in to play
because the Solr server is acting as a client for the distributed requests?
Thanks,
Shawn