Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread oakstream
Hello,
I have point data (lat/lon) stored in hbase/hadoop and would like to query
the data spatially with polygons.  (If I pass in a few polygons find me all
the records that exist within these polygons.  I need it to support polygons
not just box queries).  Hadoop doesn't really have much support that I could
find for these types of queries.  I was wondering if I could leverage SOLR
spatial 4 and create spatial indexes on the hbase data that could be used to
query this data?? I need near real-time answers (within a couple seconds). 

If anyone has any thoughts on this I would greatly appreciate them.

Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-17 Thread oakstream
Thanks for your response!  I appreciate it.  

There will be cases where I want to "AND or OR" the query between HBASE and
Lucene.  Would it make sense to custom code querying both repositories at
the same time or sequentiallyOr are there any tools out there to do
this?

Basically I'm thinking that HBASE will keep the majority of my data columns
and lucene will keep the index and a unique pointer to the HBASE record. 

Like
HBASE

UID = 12345, COL1, COL2, COL3, COL4, COL5, COL6

LUCENE
ID = 999, UID = 12345 , INDEX Columns (LAT/LON)

My query would be something like where lat/lon in (Polygon) AND COL3 = 'ABC'

Would this kind of setup make sense?  Is there a better way?

I'll be working with Terabytes of data

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4034400.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-18 Thread oakstream
Thanks guys!
David,

In general and in your opinion would Lucene Spatial be the way to go to
index hundreds of terabytes of spatial data that continually grows.  Mostly
point data, mostly structured, however, could be polygons.  The searches
would be within or contains in a polygon.  
Do you have any thoughts on using a NOSQL database (like Mongodb) or
something else comparable.  I need response times in the seconds.  My
thoughts are that I need some type of distributed system.  I was thinking
about SOLRCLOUD to solve this.  I'm fairly new to Lucene/Solr.Most of
the data is currently in HDFS/HBASE.  

I've investigated sharding Oracle and Postgres databases but this just
doesn't seem like the ideal solution and since all the data already exists
in HDFS, I'd like to build a solution that works on top of it but
"real-time" or as "near" as I can get.  

Anyways, I've read some of your work in the past and appreciate your input.  
I don't mind putting in some development work, just not sure the right
approach. 

Thanks for your time. I appreciate it!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4034639.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-21 Thread oakstream
David,
I appreciate your time.  I'm going to take a crack at the Lucene sharded
index approach and will let you know how I fare.  Thanks again



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4035211.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-22 Thread oakstream
Hi Otis,
Yes, My data is in HBASE and I just need a fast Spatial Index where I can do
lookups and then take the ID's back to HBASE to retrieve the results.  HBASE
doesn't support polygon searches that I'm aware of. You can do bounding box
queries but that doesn't meet my requirements.   I thought about using
Lucene, Mongo, or sharded Postgres. (want to keep it Open Source).  I
believe Lucene might work the best.  I don't think Mongo really supports
polygon searches either.   Postgres could possibly work.  As I mentioned, my
source data could be hundreds of terabytes.   I guess I'll do some bench
marks with these but if anyone has done similar and would like to share
results I'd appreciate it.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4035504.html
Sent from the Solr - User mailing list archive at Nabble.com.