Hi Erick, I like your idea, FWIW please also leave room for boost by function query which takes many numeric fields as input but results in a single value. I don't know if this counts as a really clever function but here's one that I currently use:
{!boost b=pow(sum(log(sum(product(boosted,9000),product(product(image,stocked),300),product(product(image,taxonomyCategoryTypeId),300),product(product(image,sales),150),product(stocked,2),product(sales,2),views)),1),3)} Note, image is an int/bool field: 1=has image, 0=no image, hence all the product(product(image,...),...) terms above as they negate the boosts if there isn't an image! Thanks Robi -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, November 12, 2013 9:01 AM To: solr-user@lucene.apache.org Subject: Sorting memory-efficiently by any numeric field (dates too?) Before I go and pat myself on the back, what do people think about this trick? The base problem is "Is there a space-efficient way to return the top N documents, sorted by a numeric field". The numeric field includes dates. It come to me in a vision in a flash! (The Pickle Song, Arlo Guthrie). If we could return the numeric field in question as the score of a document it should work without allocating the internal arrays for holding all the timestamps. So what about something like this? /select?q={!boost b=manufacturedate_dt}text:* and reverse order by /select?q={!boost b=div(1,manufacturedate_dt)}text:* It works on the test data. So let's assume that we're space constrained. It _seems_ like this would only allocate enough space for the top N documents in the result set which is insignificant in terms of memory consumption for a large number of documents in a core. Any obvious problems that people see? I see a couple of shortcomings: 1> You only get one field. Unless you can create a really clever 1> function that incorporates all the values in multiple fields, this is going to be hard to use with more than one field. 2> The boost syntax doesn't allow for a *:*, so you have to specify an existing field. If there happen to be documents that don't have anything in the field, you'll miss them. 3> I'm not sure what the performance issues are, especially in the case where _every_ document scores better than the current top-N Erick