Solr support for compound geospatial indexs?
Hello, I've started to evaluate Solr and so far haven't seen anything mentions for support of compound indexes. I'm looking to either radius or share based geospatial proximity queries (find all document that are 20km from given lat,lng) I would also at times be doing geo queries bonded with another term (for ex. "house rooms" = 5). My aim is to do very fast queries against the indexed data. I have no real constraints on the time it would take to build this index. Does Solr support building and index on the 2 types of fields lat,lng & "house rooms" ? Thank you, Maxim.
Re: Solr support for compound geospatial indexs?
Hello Mikhail Thank you for the fast reply, please find my answers inline. On Tue, Jan 3, 2012 at 11:00 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Hello, > > Please find my thoughts below. > > On Wed, Jan 4, 2012 at 12:39 AM, Maxim Veksler wrote: > > > > Hello, > > > > I've started to evaluate Solr and so far haven't seen anything mentions > for > > support of compound indexes. > > If I get you right, it doesn't. AFAIK It combines separate indexes > basing on the condensed internal ids aka docNums > > > > > I'm looking to either radius or share based geospatial proximity queries > > (find all document that are 20km from given lat,lng) > > http://wiki.apache.org/solr/SpatialSearch#geofilt_-_The_distance_filter > consider https://issues.apache.org/jira/browse/SOLR-2155 if you are > dealing with multivalue coordinates. > > Thank you for the reference to SOLR-2155. I've studied geohash[1] and the work David Smiley[2] is doing[3] for Solr 4 thoroughly. I think that my problem is simpler - I don't need multivalue coordinates support because my locations are represented by a single lat,lng point and I will be searching for all the points that fall into my defined radius. Where there is a 1:1 mapping between a document and the point it is categorized by. > I would also at times be doing geo queries bonded with another term (for > > ex. "house rooms" = 5). > > just add separate ...&fq=H_ROOMS:5&... > http://wiki.apache.org/solr/CommonQueryParameters#fq > > > > > > > My aim is to do very fast queries against the indexed data. I have no > real > > constraints on the time it would take to build this index. > > > > Does Solr support building and index on the 2 types of fields lat,lng & > > "house rooms" ? > Sure. It sounds like intersecting fqs. > > Wonderful to hear this, guess I'm no really understanding how Solr / Lucene works then. Could you please reference me or explain how Solr builds it's index? I'm especially interested in how the search is implemented under the hood - Given geo & "plular" terms what would lucene do? How would it do the actual searching / or perhaps what I need to be asking is what & how the "intersecting fqs" are implemented? I apologize for the messy question, I'm only starting to understand Lucene. Please let me know if it works for you. > > > > > > > Thank you, > > Maxim. > > [1] http://gis.stackexchange.com/questions/18330/would-it-be-possible-to-use-geohash-for-proximity-searches [2] http://www.basistech.com/pdf/events/open-source-search-conference/oss-2011-smiley-geospatial-search.pdf [3] http://code.google.com/p/lucene-spatial-playground/
Re: SolrJ Embedded
On Tue, Jan 17, 2012 at 3:13 AM, Erick Erickson wrote: > I don't see why not. I'm assuming a *nix system here so when Solr > updated an index, any deleted files would hang around. > > But I have to ask why bother with the Embedded server in the > first place? You already have a Solr instance up and running, > why not just query that instead, perhaps using SolrJ? > > Wouldn't querying the Solr server using the HTTP interface be slower? > Best > Erick > > On Mon, Jan 16, 2012 at 3:00 PM, wrote: > > Hi, > > > > is it possible to use the same index in a solr webapp and additionally > in a > > EmbeddedSolrServer? The embbedded one would be read only. > > > > Thank you. > > >
How to return the distance geo distance on solr 3.5 with bbox filtering
Hello, I'm querying with bbox which should be faster then geodist, my queries are looking like this: http://localhost:8983/solr/select?indent=true&fq={!bbox}&sfield=loc&pt=39.738548,-73.130322&d=100&sort=geodist()%20asc&q=trafficRouteId:235 the trouble is, that with bbox solr does not return the distance of each document, I couldn't get it to work even with tips from http://wiki.apache.org/solr/SpatialSearch#Returning_the_distance Something I'm missing ?
Re: How to return the distance geo distance on solr 3.5 with bbox filtering
Hello Mikhail, Please see reply inline. On Wed, Jan 18, 2012 at 11:00 AM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Maxim, > > Which version of Solr you are using? > As mentioned in the title, I'm using Solr 3.5. > Why the second approach at the link doesn't work for you? > just move q=trafficRouteId:235< > http://localhost:8983/solr/select?indent=true&fq=%7B%21bbox%7D&sfield=loc&pt=39.738548,-73.130322&d=100&sort=geodist%28%29%20asc&q=trafficRouteId:235 > >to > fq=, because it's pretty a filter, and use geodist() as a function > query. &sort=score%20asc&q={!func}geodist()< > http://localhost:8983/solr/select?indent=true&fl=name,store&sfield=store&pt=45.15,-93.85&sort=score%20asc&q=%7B%21func%7Dgeodist%28%29 > > > > I'm not sure I'm following. I'm trying to use bbox instead of I use the fq fields to define the bbox filtering. I also need to query by another parameter (trafficRouteId). I would optimally would be happy to get the distance calculation from Solr but that doesn't seem to work in any format of query I tried. Being new to Solr query language I'm not sure how to form the search terms to combine all of this with the score. > what do you get on this case? pls provide, logs, exception, and debug > response. > > Thanks > > > On Tue, Jan 17, 2012 at 10:06 PM, Maxim Veksler > wrote: > > > Hello, > > > > I'm querying with bbox which should be faster then geodist, my queries > are > > looking like this: > > > > > http://localhost:8983/solr/select?indent=true&fq={!bbox}&sfield=loc&pt=39.738548,-73.130322&d=100&sort=geodist()%20asc&q=trafficRouteId:235 > < > http://localhost:8983/solr/select?indent=true&fq=%7B%21bbox%7D&sfield=loc&pt=39.738548,-73.130322&d=100&sort=geodist%28%29%20asc&q=trafficRouteId:235 > > > > > > the trouble is, that with bbox solr does not return the distance of each > > document, I couldn't get it to work even with tips from > > http://wiki.apache.org/solr/SpatialSearch#Returning_the_distance > > > > Something I'm missing ? > > > > > > -- > Sincerely yours > Mikhail Khludnev > Lucid Certified > Apache Lucene/Solr Developer > Grid Dynamics > > <http://www.griddynamics.com> > >
Re: How to return the distance geo distance on solr 3.5 with bbox filtering
It works. The query: * http://localhost:8983/solr/select?indent=true&fq={!bbox}&sfield=loc&pt=34.0415954,-118.298797&d=1000.0&sort=score%20asc&fq=trafficRouteId:887&q={!func}geodist()&fl=*,score&rows=1 * works perfectly, doing all the filtering needed and returning the distance as score. Thank you very much for this help. As a new comer to lucene/solr I must admit the the syntax is confusing. For ex. why for the "fq" parameter the "function name" is *in* the {} (as in fq={!bbox}), yet for the "q" parameter the "name of the function" is outside the {} (as in q={!func}geodist()) ? Or the parameters to the bbox search function are specified as separate query parameters? (the sfield and pt parameters) ? Would be happy to understand the logic behind it. Thank you, Maxim. On Wed, Jan 18, 2012 at 9:11 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Can you try to specify two fqs, geodist as a function query, sort by score? > > > fq={!bbox}&.&sort=score%20asc&fq=trafficRouteId:235&q={!func}geodist()&fl=*,score > > On Wed, Jan 18, 2012 at 4:46 PM, Maxim Veksler wrote: > > > Hello Mikhail, > > > > Please see reply inline. > > > > On Wed, Jan 18, 2012 at 11:00 AM, Mikhail Khludnev < > > mkhlud...@griddynamics.com> wrote: > > > > > Maxim, > > > > > > Which version of Solr you are using? > > > > > > > As mentioned in the title, I'm using Solr 3.5. > > > > I see > > > > > > > > > Why the second approach at the link doesn't work for you? > > > > > I'm not sure I'm following. I'm trying to use bbox instead of > > > > I use the fq fields to define the bbox filtering. > > I also need to query by another parameter (trafficRouteId). > > > you can put trafficRouteId as a second fq, as I did above > > > > > > I would optimally would be happy to get the distance calculation from > Solr > > but that doesn't seem to work in any format of query I tried. > > > pls try my approach above and let me know what you get. > > > > > > > Being new to Solr query language I'm not sure how to form the search > terms > > to combine all of this with the score. > > > > > > > > > what do you get on this case? pls provide, logs, exception, and debug > > > response. > > > > > > Thanks > > > > > > > > > On Tue, Jan 17, 2012 at 10:06 PM, Maxim Veksler > > > wrote: > > > > > > > Hello, > > > > > > > > I'm querying with bbox which should be faster then geodist, my > queries > > > are > > > > looking like this: > > > > > > > > > > > > > > http://localhost:8983/solr/select?indent=true&fq={!bbox}&sfield=loc&pt=39.738548,-73.130322&d=100&sort=geodist()%20asc&q=trafficRouteId:235 > > > < > > > > > > http://localhost:8983/solr/select?indent=true&fq=%7B%21bbox%7D&sfield=loc&pt=39.738548,-73.130322&d=100&sort=geodist%28%29%20asc&q=trafficRouteId:235 > > > > > > > > > > > > the trouble is, that with bbox solr does not return the distance of > > each > > > > document, I couldn't get it to work even with tips from > > > > http://wiki.apache.org/solr/SpatialSearch#Returning_the_distance > > > > > > > > Something I'm missing ? > > > > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > Lucid Certified > > > Apache Lucene/Solr Developer > > > Grid Dynamics > > > > > > <http://www.griddynamics.com> > > > > > > > > > > > > -- > Sincerely yours > Mikhail Khludnev > Lucid Certified > Apache Lucene/Solr Developer > Grid Dynamics > > <http://www.griddynamics.com> > >
Solr Cluster - Is it wise to run optimize() on the master after each update
I'm planning on having 1 Master and multiple slaves (cloud based, slaves are going up / down randomly). The slaves should be constantly available, meaning searching performance should optimally not be affected by the updates at all. It's unclear to me how the Cluster based replication works, does it copy the files from the master and updates in place? In which case am I correct to assume that except for cache being emptied the search performance in not affects? Does optimize on the master some how affects the performance of the slaves? Is it recommended to run optimize after each update, assuming I'm not concerted about locking the master for updates and it's OK if the optimize finishes in under 20min? Thank you, Maxim.
Re: Solr Cluster - Is it wise to run optimize() on the master after each update
Wonderful input. Thank you very much Erick. One question, I've been told that Solr supports an operation mode of multi core where you build the index on the master (optimize or not) then pass it to the "stand by" core on the slaves. Once the synchronization is complete you switch on the slave between the active and passive core (an operation that is claimed to be atomic, and can happen at run time). Have you or other members of this list had experience with this mode of operation? Thank you. On Mon, Jan 23, 2012 at 7:25 PM, Erick Erickson wrote: > In general, do not optimize unless you > 1> have a very static index > 2> actually test the search performance afterwards. > > First, as Andrew says, optimizing will force a complete > copy of the entire index at replication. If you do NOT > optimize, only the most recent segments to be written > are copied. > > Second, unless you have a quite large number of > segments, optimizing despite its cool-sounding name, > doesn't buy you much. In fact there's a JIRA to > rename it to something less good-sounding precisely > because people think "of course I want the index > optimizied". > > Third, under no circumstances should you optimize > after every update. This will absolutely kill your > indexing. Optimizing copies all segments into > a single segment. In other words you'll spend a lot > of time copying junk around for no good reason. Here > I'm assuming by "update" you mean after every batch > of documents is added. If you're talking after an entire > indexing run, it's not so bad. > > Fourth, one tangible result of optimizing is that the > index is purged of all deleted documents (and remember > that a document update is really a delete followed by > an add). But the same thing happens on segment > merges, which happen without optimizing. > > Bottom line: Don't bother to optimize unless and until > you demonstrate that optimizing provides enough of a > performance boost to be worth it. Even then re-check > your assumptions. Look at the various merge policies > to have more control over when merges occur and > the number of segments you have, but try to forget > that optimization even exists > > Best > Erick > > > There's some good info here... > http://wiki.apache.org/solr/SolrPerformanceFactors > > Best > Erick > > On Mon, Jan 23, 2012 at 12:22 AM, Andrew Harvey > wrote: > > We found that optimising too often killed our slave performance. An > optimise will cause you to merge and ship the whole index rather than just > the relevant portions when you replicate. > > > > The change on our slaves in terms of IO and CPU as well as RAM was > marked. > > > > Andrew > > > > Sent on the run. > > > > On 23/01/2012, at 19:03, Maxim Veksler wrote: > > > >> I'm planning on having 1 Master and multiple slaves (cloud based, slaves > >> are going up / down randomly). > >> > >> The slaves should be constantly available, meaning searching performance > >> should optimally not be affected by the updates at all. > >> It's unclear to me how the Cluster based replication works, does it copy > >> the files from the master and updates in place? In which case am I > correct > >> to assume that except for cache being emptied the search performance in > not > >> affects? > >> > >> Does optimize on the master some how affects the performance of the > slaves? > >> Is it recommended to run optimize after each update, assuming I'm not > >> concerted about locking the master for updates and it's OK if the > optimize > >> finishes in under 20min? > >> > >> Thank you, > >> Maxim. >