retrieve ids of all indexed docs efficiently

Slomin, David Wed, 18 Jan 2017 12:45:52 -0800

Hi --

I'd like to retrieve the ids of all the docs in my Solr 5.3.1 index.  In my 
query, I've set rows=1000, fl=id, and am using the cursorMark mechanism to 
split the overall traversal into multiple requests.  Not because I care about 
the order, but because the documentation implies that it's necessary to make 
cursorMark work reliably, I've also set sort=id asc.  While this does give me 
the data I need on a smaller index, it causes the heap memory utilization to go 
through the roof; for our large indices, the Solr JVM throws an out of memory 
exception, and we've already configured it as large as is practical given the 
physical memory of the machine.


For what it's worth, we do use Solr cloud to split each of our indices into 
multiple shards.  However for this query, I'm addressing a single shard 
directly (connecting to the correct Solr server instance for one replica of 
that shard and setting distrib=false in my query) rather than relying on Solr 
to route and assemble the results.
Thanks in advance,
Div Slomin.

retrieve ids of all indexed docs efficiently

Reply via email to