For a reasonable top-N, the space efficiency should still be the same as it is really just dominated by the FieldCache representation (is it in-memory or disk-docvalue based). Directly sorting on that numeric field vs deriving a score from the field and sorting on that shouldn't really be that different.
-Yonik http://heliosearch.com -- making solr shine On Tue, Nov 12, 2013 at 12:00 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Before I go and pat myself on the back, what do people think about this > trick? The base problem is "Is there a space-efficient way to return the > top N documents, sorted by a numeric field". The numeric field includes > dates. > > It come to me in a vision in a flash! (The Pickle Song, Arlo Guthrie). If > we could return the numeric field in question as the score of a document it > should work without allocating the internal arrays for holding all the > timestamps. > > So what about something like this? > /select?q={!boost b=manufacturedate_dt}text:* > and reverse order by > /select?q={!boost b=div(1,manufacturedate_dt)}text:* > > It works on the test data. So let's assume that we're space constrained. It > _seems_ like this would only allocate enough space for the top N documents > in the result set which is insignificant in terms of memory consumption for > a large number of documents in a core. Any obvious problems that people see? > > I see a couple of shortcomings: > > 1> You only get one field. Unless you can create a really clever function > that incorporates all the values in multiple fields, this is going to be > hard to use with more than one field. > > 2> The boost syntax doesn't allow for a *:*, so you have to specify an > existing field. If there happen to be documents that don't have anything in > the field, you'll miss them. > > 3> I'm not sure what the performance issues are, especially in the case > where _every_ document scores better than the current top-N > > Erick