I think we should add the suggestion about docValues to the cursormark wiki (documentation), we too ran in the same problem.
On Jan 18, 2017 5:52 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: > Is your ID field docValues? Making it a docValues field should reduce > the amount of JVM heap you need. > > > But the export is _much_ preferred, it'll be lots faster as well. Of > course to export you need the values you're returning to be > docValues... > > Erick > > On Wed, Jan 18, 2017 at 1:12 PM, Slomin, David <david.slo...@here.com> > wrote: > > The export feature sounds promising, although I'll have to talk with our > deployment folks here about enabling it. > > > > The query I'm issuing is: > > > > http://<host>:8983/solr/<collection>_shard1_replica1/ > select?q=*:*&sort=id+asc&rows=1000&cursorMark=<cursorMark>& > fl=id&omitHeader=true&distrib=false&wt=json > > > > Thanks, > > Div. > > > > > > On 1/18/17, 3:54 PM, "Jan Høydahl" <jan....@cominvent.com> wrote: > > > > Don't know why you have mem problems. Can you paste in examples of > full query strings during cursor mark querying? Sounds like you may be > using it wrong. > > > > Or try exporting > > > > https://emea01.safelinks.protection.outlook.com/?url= > https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay% > 2Fsolr%2FExporting%2BResult%2BSets&data=01%7C01%7C% > 7Ccc878ba7e8364e60387008d43fe4316a%7C6d4034cd72254f72b85391feaea6 > 4919%7C1&sdata=9FYFbyop1VzT2aLuZPEcY8unQnMO5R5VZEMyhCKA6iM%3D&reserved=0 > > > > -- > > Jan Høydahl > > > > > Den 18. jan. 2017 kl. 21.44 skrev Slomin, David < > david.slo...@here.com>: > > > > > > Hi -- > > > > > > I'd like to retrieve the ids of all the docs in my Solr 5.3.1 > index. In my query, I've set rows=1000, fl=id, and am using the cursorMark > mechanism to split the overall traversal into multiple requests. Not > because I care about the order, but because the documentation implies that > it's necessary to make cursorMark work reliably, I've also set sort=id > asc. While this does give me the data I need on a smaller index, it causes > the heap memory utilization to go through the roof; for our large indices, > the Solr JVM throws an out of memory exception, and we've already > configured it as large as is practical given the physical memory of the > machine. > > > > > > For what it's worth, we do use Solr cloud to split each of our > indices into multiple shards. However for this query, I'm addressing a > single shard directly (connecting to the correct Solr server instance for > one replica of that shard and setting distrib=false in my query) rather > than relying on Solr to route and assemble the results. > > > Thanks in advance, > > > Div Slomin. > > > > > > > >