Mikhail,
Yes, +1.
This question comes up a few times a year. Grant created a JIRA issue
for this many moons ago.
https://issues.apache.org/jira/browse/LUCENE-2127
https://issues.apache.org/jira/browse/SOLR-1726
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring
: Subject: Processing a lot of results in Solr
: Message-ID:
: In-Reply-To: <1374612243070-4079869.p...@n3.nabble.com>
https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message
fwiw,
i did some prototype with the following differences:
- it streams straight to the socket output stream
- it streams on-going during collecting, without necessity to store a
bitset.
It might have some limited extreme usage. Is there anyone interested?
On Wed, Jul 24, 2013 at 7:19 PM, Roman C
On Tue, Jul 23, 2013 at 10:05 PM, Matt Lieber wrote:
> That sounds like a satisfactory solution for the time being -
> I am assuming you dump the data from Solr in a csv format?
>
JSON
> How did you implement the streaming processor ? (what tool did you use for
> this? Not familiar with that)
Mikhail,
It is a slightly hacked JSONWriter - actually, while poking around, I have
discovered that dumping big hitsets would be possible - the main hurdle
right now, is that writer is expecting to receive docuemnts with fields
loaded, but if it received something that loads docs lazily, you could
Roman,
Can you disclosure how that streaming writer works? What does it stream
docList or docSet?
Thanks
On Wed, Jul 24, 2013 at 5:57 AM, Roman Chyla wrote:
> Hello Matt,
>
> You can consider writing a batch processing handler, which receives a query
> and instead of sending results back, it
That sounds like a satisfactory solution for the time being -
I am assuming you dump the data from Solr in a csv format?
How did you implement the streaming processor ? (what tool did you use for
this? Not familiar with that)
You say it takes a few minutes only to dump the data - how long does it t
Hello Matt,
You can consider writing a batch processing handler, which receives a query
and instead of sending results back, it writes them into a file which is
then available for streaming (it has its own UUID). I am dumping many GBs
of data from solr in few minutes - your query + streaming write
Hi Matt,
This feature is commonly known as deep paging and Lucene and Solr have
issues with it ... take a look at
http://solr.pl/en/2011/07/18/deep-paging-problem/ as a potential
starting point using filters to bucketize a result set into sets of
sub result sets.
Cheers,
Tim
On Tue, Jul 23, 2013