Sorting memory-efficiently by any numeric field (dates too?)

Erick Erickson Tue, 12 Nov 2013 09:01:22 -0800

Before I go and pat myself on the back, what do people think about this
trick? The base problem is "Is there a space-efficient way to return the
top N documents, sorted by a numeric field". The numeric field includes
dates.


It come to me in a vision in a flash! (The Pickle Song, Arlo Guthrie). If
we could return the numeric field in question as the score of a document it
should work without allocating the internal arrays for holding all the
timestamps.

So what about something like this?
/select?q={!boost b=manufacturedate_dt}text:*
and reverse order by
/select?q={!boost b=div(1,manufacturedate_dt)}text:*

It works on the test data. So let's assume that we're space constrained. It
_seems_ like this would only allocate enough space for the top N documents
in the result set which is insignificant in terms of memory consumption for
a large number of documents in a core. Any obvious problems that people see?

I see a couple of shortcomings:

1>  You only get one field. Unless you can create a really clever function
that incorporates all the values in multiple fields, this is going to be
hard to use with more than one field.

2> The boost syntax doesn't allow for a *:*, so you have to specify an
existing field. If there happen to be documents that don't have anything in
the field, you'll miss them.

3> I'm not sure what the performance issues are, especially in the case
where _every_ document scores better than the current top-N

Erick

Sorting memory-efficiently by any numeric field (dates too?)

Reply via email to