Re: retrieve ids of all indexed docs efficiently

Erick Erickson Wed, 18 Jan 2017 14:52:55 -0800

Is your ID field docValues? Making it a docValues field should reduce
the amount of JVM heap you need.



But the export is _much_ preferred, it'll be lots faster as well. Of
course to export you need the values you're returning to be
docValues...

Erick

On Wed, Jan 18, 2017 at 1:12 PM, Slomin, David <david.slo...@here.com> wrote:
> The export feature sounds promising, although I'll have to talk with our 
> deployment folks here about enabling it.
>
> The query I'm issuing is:
>
> http://<host>:8983/solr/<collection>_shard1_replica1/select?q=*:*&sort=id+asc&rows=1000&cursorMark=<cursorMark>&fl=id&omitHeader=true&distrib=false&wt=json
>
> Thanks,
> Div.
>
>
> On 1/18/17, 3:54 PM, "Jan Høydahl" <jan....@cominvent.com> wrote:
>
>     Don't know why you have mem problems. Can you paste in examples of full 
> query strings during cursor mark querying? Sounds like you may be using it 
> wrong.
>
>     Or try exporting
>
>     
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FExporting%2BResult%2BSets&data=01%7C01%7C%7Ccc878ba7e8364e60387008d43fe4316a%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=9FYFbyop1VzT2aLuZPEcY8unQnMO5R5VZEMyhCKA6iM%3D&reserved=0
>
>     --
>     Jan Høydahl
>
>     > Den 18. jan. 2017 kl. 21.44 skrev Slomin, David <david.slo...@here.com>:
>     >
>     > Hi --
>     >
>     > I'd like to retrieve the ids of all the docs in my Solr 5.3.1 index.  
> In my query, I've set rows=1000, fl=id, and am using the cursorMark mechanism 
> to split the overall traversal into multiple requests.  Not because I care 
> about the order, but because the documentation implies that it's necessary to 
> make cursorMark work reliably, I've also set sort=id asc.  While this does 
> give me the data I need on a smaller index, it causes the heap memory 
> utilization to go through the roof; for our large indices, the Solr JVM 
> throws an out of memory exception, and we've already configured it as large 
> as is practical given the physical memory of the machine.
>     >
>     > For what it's worth, we do use Solr cloud to split each of our indices 
> into multiple shards.  However for this query, I'm addressing a single shard 
> directly (connecting to the correct Solr server instance for one replica of 
> that shard and setting distrib=false in my query) rather than relying on Solr 
> to route and assemble the results.
>     > Thanks in advance,
>     > Div Slomin.
>     >
>
>

Re: retrieve ids of all indexed docs efficiently

Reply via email to