Hi Joe,
Thanks for the link, I'll check it out, I'm not sure it'll help in my
situation though since the clustering should happen at runtime due to
faceted browsing (unless I'm mistaken at what the preprocessing does).
More on my progress though, I thought some more about using Hilbert
curve mapping and it seems really suited for what I want. I've just
added a Hilbert field to my schema (Trie Integer field) with latitude
and longitude at 15bits precision (didn't use 16 bits to avoid the sign
bit) so I have a 30 bit number in said field. Getting facet counts for 0
to (2^30 - 1) should get me the entire map while getting counts for 0 to
(2^28 - 1), 2^28 to (2^29 - 1), 2^29 to (2^29 + 2^28 - 1) and (2^29 +
2^28) to (2^30 - 1) should give me counts for four equal quadrants, all
the way down to 0 to 3, 4 to 7, 8 to 11 .... (2^30 - 4 to 2^30 - 1) and
of course faceting on every separate term. Of course since if you're
zoomed in far enough to need such fine grained clustering you'll be
looking at a small portion of the map and only a part of the whole range
should be counted, but that should be doable by calculating the Hilbert
number for the lower and upper bounds.
The only problem is the location of the clusters, if I use this method
I'll only have the Hilbert number and the number of items in that part
of the, what is essentially a quadtree. But I suppose I can calculate
the facet counts for one precision finer than the requested precision
and use a weighted average of the four parts of the cluster, I'll have
to see if that is accurate enough.
Hopefully I'll have the time to complete this today or tomorrow. I'll
report back if it has worked.
Regards,
gwk
Joe Calderon wrote:
there are clustering libraries like
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/, that have
bindings to perl/python, you can preprocess your results and create
clusters for each zoom level
On Tue, Sep 8, 2009 at 8:08 AM, gwk<g...@eyefi.nl> wrote:
Hi,
I just completed a simple proof-of-concept clusterer component which
naively clusters with a specified bounding box around each position,
similar to what the javascript MarkerClusterer does. It's currently very
slow as I loop over the entire docset and request the longitude and
latitude of each document (Not to mention that my unfamiliarity with
Lucene/Solr isn't helping the implementations performance any, most code
is copied from grep-ing the solr source). Clustering a set of about
80.000 documents takes about 5-6 seconds. I'm currently looking into
storing the hilber curve mapping in Solr and clustering using facet
counts on numerical ranges of that mapping but I'm not sure it will pan out.
Regards,
gwk
Grant Ingersoll wrote:
Not directly related to geo clustering, but
http://issues.apache.org/jira/browse/SOLR-769 is all about a pluggable
interface to clustering implementations. It currently has Carrot2
implemented, but the APIs are marked as experimental. I would definitely be
interested in hearing your experience with implementing your clustering
algorithm in it.
-Grant
On Sep 8, 2009, at 4:00 AM, gwk wrote:
Hi,
I'm working on a search-on-map interface for our website. I've created a
little proof of concept which uses the MarkerClusterer
(http://code.google.com/p/gmaps-utility-library-dev/) which clusters the
markers nicely. But because sending tens of thousands of markers over Ajax
is not quite as fast as I would like it to be, I'd prefer to do the
clustering on the server side. I've considered a few options like storing
the morton-order and throwing away precision to cluster, assigning all
locations to a grid position. Or simply cluster based on country/region/city
depending on zoom level by adding latitude on longitude fields for each zoom
level (so that for smaller countries you have to be zoomed in further to get
the next level of clustering).
I was wondering if anybody else has worked on something similar and if so
what their solutions are.
Regards,
gwk
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
Solr/Lucene:
http://www.lucidimagination.com/search