Using Solr Spatial in conjunction with HBASE/Hadoop
Hello, I have point data (lat/lon) stored in hbase/hadoop and would like to query the data spatially with polygons. (If I pass in a few polygons find me all the records that exist within these polygons. I need it to support polygons not just box queries). Hadoop doesn't really have much support that I could find for these types of queries. I was wondering if I could leverage SOLR spatial 4 and create spatial indexes on the hbase data that could be used to query this data?? I need near real-time answers (within a couple seconds). If anyone has any thoughts on this I would greatly appreciate them. Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr Spatial in conjunction with HBASE/Hadoop
Thanks for your response! I appreciate it. There will be cases where I want to "AND or OR" the query between HBASE and Lucene. Would it make sense to custom code querying both repositories at the same time or sequentiallyOr are there any tools out there to do this? Basically I'm thinking that HBASE will keep the majority of my data columns and lucene will keep the index and a unique pointer to the HBASE record. Like HBASE UID = 12345, COL1, COL2, COL3, COL4, COL5, COL6 LUCENE ID = 999, UID = 12345 , INDEX Columns (LAT/LON) My query would be something like where lat/lon in (Polygon) AND COL3 = 'ABC' Would this kind of setup make sense? Is there a better way? I'll be working with Terabytes of data Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4034400.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr Spatial in conjunction with HBASE/Hadoop
Thanks guys! David, In general and in your opinion would Lucene Spatial be the way to go to index hundreds of terabytes of spatial data that continually grows. Mostly point data, mostly structured, however, could be polygons. The searches would be within or contains in a polygon. Do you have any thoughts on using a NOSQL database (like Mongodb) or something else comparable. I need response times in the seconds. My thoughts are that I need some type of distributed system. I was thinking about SOLRCLOUD to solve this. I'm fairly new to Lucene/Solr.Most of the data is currently in HDFS/HBASE. I've investigated sharding Oracle and Postgres databases but this just doesn't seem like the ideal solution and since all the data already exists in HDFS, I'd like to build a solution that works on top of it but "real-time" or as "near" as I can get. Anyways, I've read some of your work in the past and appreciate your input. I don't mind putting in some development work, just not sure the right approach. Thanks for your time. I appreciate it! -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4034639.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr Spatial in conjunction with HBASE/Hadoop
David, I appreciate your time. I'm going to take a crack at the Lucene sharded index approach and will let you know how I fare. Thanks again -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4035211.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr Spatial in conjunction with HBASE/Hadoop
Hi Otis, Yes, My data is in HBASE and I just need a fast Spatial Index where I can do lookups and then take the ID's back to HBASE to retrieve the results. HBASE doesn't support polygon searches that I'm aware of. You can do bounding box queries but that doesn't meet my requirements. I thought about using Lucene, Mongo, or sharded Postgres. (want to keep it Open Source). I believe Lucene might work the best. I don't think Mongo really supports polygon searches either. Postgres could possibly work. As I mentioned, my source data could be hundreds of terabytes. I guess I'll do some bench marks with these but if anyone has done similar and would like to share results I'd appreciate it. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4035504.html Sent from the Solr - User mailing list archive at Nabble.com.