Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

Mikhail Khludnev Sat, 27 Jul 2013 22:30:18 -0700

On Sun, Jul 28, 2013 at 1:25 AM, Yonik Seeley <yo...@lucidworks.com> wrote:


>
> Which part is problematic... the creation of the DocList (the search),
>
Literally DocList is a copy of TopDocs. Creating TopDocs is not a search,
but ranking.
And ranking costs is log(rows+start) beside of numFound, which the search
takes.
Interesting that we still pay that log() even if ask for collecting docs
as-is with _docid_


> or it's memory requirements (an int per doc)?
>
TopXxxCollector as well as XxxComparators allocates same [rows+start]

it's clear that after we have deep paging, we need to handle heaps just
with size of rows (without start).
It's fairly ok, if we use Solr like site navigation engine, but it's
'sub-optimal' for data analytic use-cases, where we need something like
SELECT * FROM ... in rdbms. In this case any memory allocation on billions
docs index is a bummer. That's why I'm asking about removing heap based
collector/comparator.


> -Yonik
> http://lucidworks.com
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

Reply via email to