For a reasonable top-N, the space efficiency should still be the same
as it is really just dominated by the FieldCache representation (is it
in-memory or disk-docvalue based).  Directly sorting on that numeric
field vs deriving a score from the field and sorting on that shouldn't
really be that different.

-Yonik
http://heliosearch.com -- making solr shine


On Tue, Nov 12, 2013 at 12:00 PM, Erick Erickson
<erickerick...@gmail.com> wrote:
> Before I go and pat myself on the back, what do people think about this
> trick? The base problem is "Is there a space-efficient way to return the
> top N documents, sorted by a numeric field". The numeric field includes
> dates.
>
> It come to me in a vision in a flash! (The Pickle Song, Arlo Guthrie). If
> we could return the numeric field in question as the score of a document it
> should work without allocating the internal arrays for holding all the
> timestamps.
>
> So what about something like this?
> /select?q={!boost b=manufacturedate_dt}text:*
> and reverse order by
> /select?q={!boost b=div(1,manufacturedate_dt)}text:*
>
> It works on the test data. So let's assume that we're space constrained. It
> _seems_ like this would only allocate enough space for the top N documents
> in the result set which is insignificant in terms of memory consumption for
> a large number of documents in a core. Any obvious problems that people see?
>
> I see a couple of shortcomings:
>
> 1>  You only get one field. Unless you can create a really clever function
> that incorporates all the values in multiple fields, this is going to be
> hard to use with more than one field.
>
> 2> The boost syntax doesn't allow for a *:*, so you have to specify an
> existing field. If there happen to be documents that don't have anything in
> the field, you'll miss them.
>
> 3> I'm not sure what the performance issues are, especially in the case
> where _every_ document scores better than the current top-N
>
> Erick

Reply via email to