Re: Processing a lot of results in Solr

2013-07-25 Thread Otis Gospodnetic
Mikhail, Yes, +1. This question comes up a few times a year. Grant created a JIRA issue for this many moons ago. https://issues.apache.org/jira/browse/LUCENE-2127 https://issues.apache.org/jira/browse/SOLR-1726 Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring

Re: Processing a lot of results in Solr

2013-07-24 Thread Chris Hostetter
: Subject: Processing a lot of results in Solr : Message-ID: : In-Reply-To: <1374612243070-4079869.p...@n3.nabble.com> https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message

Re: Processing a lot of results in Solr

2013-07-24 Thread Mikhail Khludnev
fwiw, i did some prototype with the following differences: - it streams straight to the socket output stream - it streams on-going during collecting, without necessity to store a bitset. It might have some limited extreme usage. Is there anyone interested? On Wed, Jul 24, 2013 at 7:19 PM, Roman C

Re: Processing a lot of results in Solr

2013-07-24 Thread Roman Chyla
On Tue, Jul 23, 2013 at 10:05 PM, Matt Lieber wrote: > That sounds like a satisfactory solution for the time being - > I am assuming you dump the data from Solr in a csv format? > JSON > How did you implement the streaming processor ? (what tool did you use for > this? Not familiar with that)

Re: Processing a lot of results in Solr

2013-07-24 Thread Roman Chyla
Mikhail, It is a slightly hacked JSONWriter - actually, while poking around, I have discovered that dumping big hitsets would be possible - the main hurdle right now, is that writer is expecting to receive docuemnts with fields loaded, but if it received something that loads docs lazily, you could

Re: Processing a lot of results in Solr

2013-07-24 Thread Mikhail Khludnev
Roman, Can you disclosure how that streaming writer works? What does it stream docList or docSet? Thanks On Wed, Jul 24, 2013 at 5:57 AM, Roman Chyla wrote: > Hello Matt, > > You can consider writing a batch processing handler, which receives a query > and instead of sending results back, it

Re: Processing a lot of results in Solr

2013-07-23 Thread Matt Lieber
That sounds like a satisfactory solution for the time being - I am assuming you dump the data from Solr in a csv format? How did you implement the streaming processor ? (what tool did you use for this? Not familiar with that) You say it takes a few minutes only to dump the data - how long does it t

Re: Processing a lot of results in Solr

2013-07-23 Thread Roman Chyla
Hello Matt, You can consider writing a batch processing handler, which receives a query and instead of sending results back, it writes them into a file which is then available for streaming (it has its own UUID). I am dumping many GBs of data from solr in few minutes - your query + streaming write

Re: Processing a lot of results in Solr

2013-07-23 Thread Timothy Potter
Hi Matt, This feature is commonly known as deep paging and Lucene and Solr have issues with it ... take a look at http://solr.pl/en/2011/07/18/deep-paging-problem/ as a potential starting point using filters to bucketize a result set into sets of sub result sets. Cheers, Tim On Tue, Jul 23, 2013