On 7/25/2013 11:39 AM, Tom Burton-West wrote:
Hello,

I am running solr 4.2.1 on 3 shards and have about 365 million documents in
the index total.
I sent a query asking for 1 million rows at a time,  but I keep getting an
error claiming that there is an invalid version or data not in javabin
format (see below)

If I lower the number of rows requested to 100,000, I have no problems.

Does Solr have  a limit on number of rows that can be requested or is this
a bug?

That particular javabin error (expected 2, but 60) usually means that the response it got was something other than javabin, typically HTML or XML.

I was going to say that you should hopefully get a more meaningful error message from the server log, but it appears that what you included *IS* the server log, so I'm really confused. The error message you're getting is typically something you see on the *client* side.

After some testing on my server, I suspect that what's happening here is that the initial shard query (the one with fl=uniqueKeyField,score) is working, but then when Solr makes the HUGE subsequent requests for the actual documents it is interested in, the list is too big to fit in the server-side POST buffer, which defaults to 2MB. Those queries need to be big enough to include an "ids" parameter that is a comma-separated list of values from your uniqueKey. In my case, each of those values could be 32 characters, so the id list could be up to 33MB for a million of them. Most of them are significantly shorter, so a 32MB buffer would be big enough.

Either multipartUploadLimitInKB doesn't work properly, or there may be some hard limits built into the servlet container, because I set multipartUploadLimitInKB in the requestDispatcher config to 32768 and it still didn't work. I wonder, perhaps there is a client-side POST buffer limit as well as the servlet container limit, which comes in to play because the Solr server is acting as a client for the distributed requests?

Thanks,
Shawn

Reply via email to