RE: Sorting memory-efficiently by any numeric field (dates too?)

Petersen, Robert Tue, 12 Nov 2013 13:08:00 -0800

Hi Erick,

I like your idea, FWIW please also leave room for boost by function query which 
takes many numeric fields as input but results in a single value.  I don't know 
if this counts as a really clever function but here's one that I currently use:


{!boost 
b=pow(sum(log(sum(product(boosted,9000),product(product(image,stocked),300),product(product(image,taxonomyCategoryTypeId),300),product(product(image,sales),150),product(stocked,2),product(sales,2),views)),1),3)}

Note, image is an int/bool field:  1=has image, 0=no image, hence all the 
product(product(image,...),...) terms above as they negate the boosts if there 
isn't an image!

Thanks
Robi

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 12, 2013 9:01 AM
To: solr-user@lucene.apache.org
Subject: Sorting memory-efficiently by any numeric field (dates too?)

Before I go and pat myself on the back, what do people think about this trick? 
The base problem is "Is there a space-efficient way to return the top N 
documents, sorted by a numeric field". The numeric field includes dates.

It come to me in a vision in a flash! (The Pickle Song, Arlo Guthrie). If we 
could return the numeric field in question as the score of a document it should 
work without allocating the internal arrays for holding all the timestamps.

So what about something like this?
/select?q={!boost b=manufacturedate_dt}text:* and reverse order by 
/select?q={!boost b=div(1,manufacturedate_dt)}text:*

It works on the test data. So let's assume that we're space constrained. It 
_seems_ like this would only allocate enough space for the top N documents in 
the result set which is insignificant in terms of memory consumption for a 
large number of documents in a core. Any obvious problems that people see?

I see a couple of shortcomings:

1>  You only get one field. Unless you can create a really clever 
1> function
that incorporates all the values in multiple fields, this is going to be hard 
to use with more than one field.

2> The boost syntax doesn't allow for a *:*, so you have to specify an
existing field. If there happen to be documents that don't have anything in the 
field, you'll miss them.

3> I'm not sure what the performance issues are, especially in the case
where _every_ document scores better than the current top-N

Erick

RE: Sorting memory-efficiently by any numeric field (dates too?)

Reply via email to