On 7/25/2013 3:09 PM, Tom Burton-West wrote:
Thanks Shawn,

I was confused by the error message: "Invalid version (expected 2, but 60)
or the data in not in 'javabin' format"

Your explanation makes sense.  I didn't think about what the shards have to
send back to the head shard.
Now that I look in my logs, I can see the posts that  the shards are
sending to the head shard and actually get a good measure of how many bytes
are being sent around.

I'll poke around and look at multipartUploadLimitInKB, and also see if
there is some servlet container limit config I might need to mess with.

I think I figured it out, after a peek at the source code. I upgraded to Solr 4.4 first, my 100,000 row query still didn't work. By setting formdataUploadLimitInKB (in addition to multipartUploadLimitInKB, not sure if both are required), I was able to get a 100,000 row query to work.

A query for one million rows did finally respond to my browser query, but it took a REALLY REALLY long time (82 million docs in several shards, only 16GB RAM on the dev server) and it crashed firefox due to the size of the response. It also seemed to error out on some of the shard responses. My handler has shards.tolerant=true, so that didn't seem to kill the whole query ... but because the response crashed firefox, I couldn't tell.

I repeated the query using curl so I could save the response. It's been running for several minutes without any server-side errors, but I still don't have any results.

Your servers are much more robust than my little dev server, so this might work for you - if you aren't using the start parameter in addition to the rows parameter. You might need to sort ascending by your unique key field and use a range query ([* TO *] for the first one), find the highest value in the response, and then send a targeted range query (the value {max_from_last_run TO *] would work) asking for the next million records.

Thanks,
Shawn

Reply via email to