On 05/06/2013 06:03 AM, Michael Sokolov wrote:
On 5/5/13 7:48 PM, Mingfeng Yang wrote:
Dear Solr Users,
Does anyone know what is the best way to iterate through each document in a
Solr index with billion entries?
I tried to use select?q=*:*&start=xx&rows=500 to get 500 docs each time
and then change start value, but it got very slow after getting through
about 10 million docs.
Thanks,
Ming-
You need to use a unique and stable sort key and get documents>
sortkey. For example, if you have a unique key, retrieve documents
ordered by the unique key, and for each batch get documents> max (key)
from the previous batch
-Mike
There is more details on the wiki :
http://wiki.apache.org/solr/CommonQueryParameters#pageDoc_and_pageScore
--
André Bois-Crettez
Search technology, Kelkoo
http://www.kelkoo.com/
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris
Ce message et les pièces jointes sont confidentiels et établis à l'attention
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce
message, merci de le détruire et d'en avertir l'expéditeur.