Hi Shawn, Thanks for your help. I found a workaround for this use case, which is to avoid using a shards query and just asking each shard for a dump of the unique ids. i.e. run an *:* query and ask for 1 million rows at a time. This should be a no scoring query, so I would think that it doesn't have to do any ranking or sorting. What I am now seeing is that qtimes have gone up from about 5 seconds per request to nearly a minute as the start parameter gets higher. I don't know if this is actually because of the start parameter or if something is happening with memory use and/or caching that is just causing things to take longer. I'm at around 35 out of 119 million for this shard and queries have gone from taking 5 seconds to taking almost a minute.
INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=36000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=52952 Tom -------- INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=7000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=9772 Jul 25, 2013 5:39:43 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=8000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=11274 Jul 25, 2013 5:41:44 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=9000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=13104 Jul 25, 2013 5:43:39 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=10000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=13568 ... ... INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=13000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=26703 Jul 25, 2013 5:58:20 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=17000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=22607 Jul 25, 2013 6:00:31 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=18000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=24109 ... INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=30000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=41034 Jul 25, 2013 6:31:36 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=31000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=42844 Jul 25, 2013 6:34:16 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=32000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=45046 Jul 25, 2013 6:36:57 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=33000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=49792 Jul 25, 2013 6:39:43 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=/dev-1 path=/select params={fl=vol_id&indent=on&start=34000000&q=*:*&rows=1000000} hits=119220943 status=0 QTime=58699 On Thu, Jul 25, 2013 at 6:18 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 7/25/2013 3:09 PM, Tom Burton-West wrote: > >> Thanks Shawn, >> >> I was confused by the error message: "Invalid version (expected 2, but 60) >> or the data in not in 'javabin' format" >> >> Your explanation makes sense. I didn't think about what the shards have >> to >> send back to the head shard. >> Now that I look in my logs, I can see the posts that the shards are >> sending to the head shard and actually get a good measure of how many >> bytes >> are being sent around. >> >> I'll poke around and look at multipartUploadLimitInKB, and also see if >> there is some servlet container limit config I might need to mess with. >> > > I think I figured it out, after a peek at the source code. I upgraded to > Solr 4.4 first, my 100,000 row query still didn't work. By setting > formdataUploadLimitInKB (in addition to multipartUploadLimitInKB, not sure > if both are required), I was able to get a 100,000 row query to work. > > A query for one million rows did finally respond to my browser query, but > it took a REALLY REALLY long time (82 million docs in several shards, only > 16GB RAM on the dev server) and it crashed firefox due to the size of the > response. It also seemed to error out on some of the shard responses. My > handler has shards.tolerant=true, so that didn't seem to kill the whole > query ... but because the response crashed firefox, I couldn't tell. > > I repeated the query using curl so I could save the response. It's been > running for several minutes without any server-side errors, but I still > don't have any results. > > Your servers are much more robust than my little dev server, so this might > work for you - if you aren't using the start parameter in addition to the > rows parameter. You might need to sort ascending by your unique key field > and use a range query ([* TO *] for the first one), find the highest value > in the response, and then send a targeted range query (the value > {max_from_last_run TO *] would work) asking for the next million records. > > Thanks, > Shawn > >