On 7/25/2013 3:09 PM, Tom Burton-West wrote:
Thanks Shawn,
I was confused by the error message: "Invalid version (expected 2, but 60)
or the data in not in 'javabin' format"
Your explanation makes sense. I didn't think about what the shards have to
send back to the head shard.
Now that I look in my logs, I can see the posts that the shards are
sending to the head shard and actually get a good measure of how many bytes
are being sent around.
I'll poke around and look at multipartUploadLimitInKB, and also see if
there is some servlet container limit config I might need to mess with.
I think I figured it out, after a peek at the source code. I upgraded
to Solr 4.4 first, my 100,000 row query still didn't work. By setting
formdataUploadLimitInKB (in addition to multipartUploadLimitInKB, not
sure if both are required), I was able to get a 100,000 row query to work.
A query for one million rows did finally respond to my browser query,
but it took a REALLY REALLY long time (82 million docs in several
shards, only 16GB RAM on the dev server) and it crashed firefox due to
the size of the response. It also seemed to error out on some of the
shard responses. My handler has shards.tolerant=true, so that didn't
seem to kill the whole query ... but because the response crashed
firefox, I couldn't tell.
I repeated the query using curl so I could save the response. It's been
running for several minutes without any server-side errors, but I still
don't have any results.
Your servers are much more robust than my little dev server, so this
might work for you - if you aren't using the start parameter in addition
to the rows parameter. You might need to sort ascending by your unique
key field and use a range query ([* TO *] for the first one), find the
highest value in the response, and then send a targeted range query (the
value {max_from_last_run TO *] would work) asking for the next million
records.
Thanks,
Shawn