Re: retrieve ids of all indexed docs efficiently

Erick Erickson Wed, 18 Jan 2017 18:20:58 -0800

Added a tip on the CursorMark CWiki page, thanks for the suggestion!


On Wed, Jan 18, 2017 at 5:21 PM, Pushkar Raste <pushkar.ra...@gmail.com> wrote:
> I think we should add the suggestion about docValues to the cursormark wiki
> (documentation), we too ran in the same problem.
>
> On Jan 18, 2017 5:52 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:
>
>> Is your ID field docValues? Making it a docValues field should reduce
>> the amount of JVM heap you need.
>>
>>
>> But the export is _much_ preferred, it'll be lots faster as well. Of
>> course to export you need the values you're returning to be
>> docValues...
>>
>> Erick
>>
>> On Wed, Jan 18, 2017 at 1:12 PM, Slomin, David <david.slo...@here.com>
>> wrote:
>> > The export feature sounds promising, although I'll have to talk with our
>> deployment folks here about enabling it.
>> >
>> > The query I'm issuing is:
>> >
>> > http://<host>:8983/solr/<collection>_shard1_replica1/
>> select?q=*:*&sort=id+asc&rows=1000&cursorMark=<cursorMark>&
>> fl=id&omitHeader=true&distrib=false&wt=json
>> >
>> > Thanks,
>> > Div.
>> >
>> >
>> > On 1/18/17, 3:54 PM, "Jan Høydahl" <jan....@cominvent.com> wrote:
>> >
>> >     Don't know why you have mem problems. Can you paste in examples of
>> full query strings during cursor mark querying? Sounds like you may be
>> using it wrong.
>> >
>> >     Or try exporting
>> >
>> >     https://emea01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%
>> 2Fsolr%2FExporting%2BResult%2BSets&data=01%7C01%7C%
>> 7Ccc878ba7e8364e60387008d43fe4316a%7C6d4034cd72254f72b85391feaea6
>> 4919%7C1&sdata=9FYFbyop1VzT2aLuZPEcY8unQnMO5R5VZEMyhCKA6iM%3D&reserved=0
>> >
>> >     --
>> >     Jan Høydahl
>> >
>> >     > Den 18. jan. 2017 kl. 21.44 skrev Slomin, David <
>> david.slo...@here.com>:
>> >     >
>> >     > Hi --
>> >     >
>> >     > I'd like to retrieve the ids of all the docs in my Solr 5.3.1
>> index.  In my query, I've set rows=1000, fl=id, and am using the cursorMark
>> mechanism to split the overall traversal into multiple requests.  Not
>> because I care about the order, but because the documentation implies that
>> it's necessary to make cursorMark work reliably, I've also set sort=id
>> asc.  While this does give me the data I need on a smaller index, it causes
>> the heap memory utilization to go through the roof; for our large indices,
>> the Solr JVM throws an out of memory exception, and we've already
>> configured it as large as is practical given the physical memory of the
>> machine.
>> >     >
>> >     > For what it's worth, we do use Solr cloud to split each of our
>> indices into multiple shards.  However for this query, I'm addressing a
>> single shard directly (connecting to the correct Solr server instance for
>> one replica of that shard and setting distrib=false in my query) rather
>> than relying on Solr to route and assemble the results.
>> >     > Thanks in advance,
>> >     > Div Slomin.
>> >     >
>> >
>> >
>>

Re: retrieve ids of all indexed docs efficiently

Reply via email to