On Sun, Jul 28, 2013 at 1:25 AM, Yonik Seeley <yo...@lucidworks.com> wrote:
> > Which part is problematic... the creation of the DocList (the search), > Literally DocList is a copy of TopDocs. Creating TopDocs is not a search, but ranking. And ranking costs is log(rows+start) beside of numFound, which the search takes. Interesting that we still pay that log() even if ask for collecting docs as-is with _docid_ > or it's memory requirements (an int per doc)? > TopXxxCollector as well as XxxComparators allocates same [rows+start] it's clear that after we have deep paging, we need to handle heaps just with size of rows (without start). It's fairly ok, if we use Solr like site navigation engine, but it's 'sub-optimal' for data analytic use-cases, where we need something like SELECT * FROM ... in rdbms. In this case any memory allocation on billions docs index is a bummer. That's why I'm asking about removing heap based collector/comparator. > -Yonik > http://lucidworks.com > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>