Re: Tuning caching of geofilt queries

2012-08-10 Thread Lance Norskog
In other computations I found exactly zero performance difference between floats & doubles. Even with long arrays number which you would expect to be sensitive to locality effects. On Fri, Aug 10, 2012 at 11:20 AM, David Smiley (@MITRE.org) wrote: > Yeah it is... I rather like this write-up: > ht

Re: Tuning caching of geofilt queries

2012-08-10 Thread David Smiley (@MITRE.org)
Yeah it is... I rather like this write-up: https://sites.google.com/site/trescopter/Home/concepts/required-precision-for-gps-calculations#TOC-Precision-of-Float-and-Double -- which also arrives at 2.37m worse case. Aside from RAM savings, I wonder if there is any noticeable performance differenc

Re: Tuning caching of geofilt queries

2012-08-10 Thread Yonik Seeley
On Fri, Aug 10, 2012 at 1:47 PM, David Smiley (@MITRE.org) wrote: > Information I've read vary on exactly what is the accuracy of float > vs double but at a kilometer there's no question a double is overkill. Back of the envelope: 23 mantissa bits + 1 implied bit == 24 effective mantissa bits in

Re: Tuning caching of geofilt queries

2012-08-10 Thread David Smiley (@MITRE.org)
Chris's response is quite good, and I have a couple things to add: 1. Since you can tolerate 1km slop, try defining the dynamic field *_coordinate as tfloat instead of tdouble. This will halve your memory requirements, but I'm not sure if it will be any faster -- it's worth a shot since you've al

Re: Tuning caching of geofilt queries

2012-08-09 Thread Chris Hostetter
: My question is: Does it make sense to round these coordinates (a) while : indexing and/or (b) while querying to optimize cache hits? Our maximum : required resolution for geo queries is 1km and we can tolerate minor errors : so I could round to two decimal points for most of our queries. : fq=_

Re: Tuning caching of geofilt queries

2012-08-04 Thread Erick Erickson
I don't think rounding will affect cache hits in either case _unless_ the input point for different queries can be very close to each other. Think of the filter cache as being composed of a map where the key is the (raw) filter query and the value is the set of documents in your corpus that satisf