Hi, I have an index with 300K docs with lat,lon. I need to cluster the docs based on lat,lon for display in the UI. The user then needs to be able to click on any cluster and zoom in (up to 11 levels deep).
I'm using Solr 4.6 and I'm wondering how best to implement this efficiently? A bit more specific questions below. I need to: 1) cluster data points at different zoom levels 2) click on a specific cluster and zoom in 3) be able to select a region (bounding box or polygon) and show clusters in the selected area What's the best way to implement this so that queries are fast? What I thought I would try, but maybe there are better ways: * divide the world in NxM large squares and then each of these squares into 4 more squares, and so on - 11 levels deep * at index time figure out all squares (at all 11 levels) each data point belongs to and index that info into 11 different fields: e.g. <id=1 name=foo lat=x lon=y zoom1=square1_62 zoom2=square1_62_47 zoom3=square1_62_47_33 ....> * at search time, use field collapsing on zoomX field to get which docs belong to which square on particular level * calculate center point of each square (by calculating mean value of positions for all points in that square) using StatsComponent (facet on zoomX field, avg on lat and lon fields) - I would consider those squares as separate clusters (one square is one cluster) and center points of those squares as center points of clusters derived from them I *think* the problem with this approach is that: * there will be many unique fields for bigger zoom levels, which means field collapsing / StatsComponent maaay not work fast enough * clusters will not look very natural because I would have many clusters on each zoom level and what are "real" geographical clusters would be displayed as multiple clusters since their points would in some cases be dispersed into multiple squares. But that may be OK * a lot will depend on how the squares are calculated - linearly dividing 360 degrees by N to get "equal" size squares in degrees would produce issues with "real" square sizes and counts of points in each of them So I'm wondering if there is a better way? Thanks, Bojan