Edward, I think for now we'll start with modeling how to store triples such that we can run real time SPARQL queries on them and then later look at the Pregel model and how we can leverage that for bulk processing. The Bigtable data model doesnt lend itself directly to store triples such that fast querying is possible. Do you have any idea on how Google stores linked data in bigtable? We can build on it from there.
-ak Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, Apr 4, 2010 at 10:50 PM, Edward J. Yoon <[email protected]>wrote: > Hi, I'm a proposer/sponsor of heart project. > > I have no doubt that RDF can be stored in HBase because google also > stores linked-data in their bigtable. > > However, If you want to focus on large-scale (distributed) processing, > I would recommend you to read google pregel project (google's graph > computing framework). because the SPARQL is a basically graph query > language for RDF graph data. > > On Fri, Apr 2, 2010 at 7:09 AM, Jürgen Jakobitsch <[email protected]> > wrote: > > hi again, > > > > i'm definitly interested. > > > > you probably heard of the heart project, but there's hardly something > going on, > > so i think it's well worth the effort. > > > > for your discussion days i'd recommend taking a look at openrdf sail api > > > > @http://www.openrdf.org/doc/sesame2/system/ > > > > the point is that there is allready everything you need like query engine > and the > > like.. > > to make it clear for beginning a quad store its close to perfect because > it > > actually comes down to implement the getStatements method as accurate as > possible. > > > > the query engine does the same by parsing the sparql query and using the > getStatements method. > > > > now this method simply has five arguments : > > > > subject, predicate, object, includeinferred and contexts, where subject > predicate, object can > > be null, includeinferred can be ignored for starting and contexts can > also be null for a starter > > or an array of uris. > > > > also note that the sail api is quite commonly used (virtuoso, > openrdfsesame, neo4j, bigdata, even oracle has an old version, > > we'll be having one implementation for talis and 4store in the coming > weeks and of course my quadstore "tuqs") > > > > if you find the way to retrieve the triples (quads) from hbase i could > implement a sail > > store in a day - et voila ... > > > > anyways it would be nice if you keep me informed .. i'd really like to > contribute... > > > > wkr www.turnguard.com > > > > > > ----- Original Message ----- > > From: "Amandeep Khurana" <[email protected]> > > To: [email protected] > > Sent: Thursday, April 1, 2010 11:45:00 PM > > Subject: Re: Using SPARQL against HBase > > > > Andrew and I just had a chat about exploring how we can leverage HBase > for a > > scalable RDF store and we'll be looking at it in more detail over the > next > > few days. Is anyone of you interested in helping out? We are going to be > > looking at what all is required to build a triple store + query engine on > > HBase and how HBase can be used as is or remodeled to fit the problem. > > Depending on what we find out, we'll decide on taking the project further > > and committing efforts towards it. > > > > -Amandeep > > > > > > Amandeep Khurana > > Computer Science Graduate Student > > University of California, Santa Cruz > > > > > > On Thu, Apr 1, 2010 at 1:12 PM, Jürgen Jakobitsch <[email protected] > >wrote: > > > >> hi, > >> > >> this sounds very interesting to me, i'm currently fiddling > >> around with a suitable row and column setup for triples. > >> > >> i'm about to implement openrdf's sail api for hbase (i just did > >> a lucene quad store implementation which is superfast a scales > >> to a couple of hundreds of millions of triples ( > http://turnguard.com/tuqs > >> )) > >> but i'm in my first days of hbase encounters, so my experience > >> in row column design is manageable. > >> > >> from my point of view the problem is to really efficiantly store > >> besides the triples themselves the contexts (named graphs) and > >> languages of literal. > >> > >> by the way : i just did a small tablemanager (in beta) that lets > >> you create htables -> from <- rdf (see > >> http://sourceforge.net/projects/hbasetablemgr/) > >> > >> i'd be really happy to contribute on the rdf and sparql side, > >> but certainly could need some help on the hbase table design side. > >> > >> wkr www.turnguard.com/turnguard > >> > >> > >> > >> ----- Original Message ----- > >> From: "Raffi Basmajian" <[email protected]> > >> To: [email protected], [email protected] > >> Sent: Thursday, April 1, 2010 9:45:59 PM > >> Subject: RE: Using SPARQL against HBase > >> > >> > >> This is an interesting article from a few guys over at BBN/Raytheon. By > >> storing triples in flat files theu used a custom algorithm, detailed in > >> the article, to iterate the WHERE clause from a SPARQL query and reduce > >> the map into the desired result. > >> > >> This is very similar to what I need to do; the only difference being > >> that our data is stored in Hbase tables, not as triples in flat files. > >> > >> > >> -----Original Message----- > >> From: Amandeep Khurana [mailto:[email protected]] > >> Sent: Wednesday, March 31, 2010 3:30 PM > >> To: [email protected]; [email protected] > >> Subject: Re: Using SPARQL against HBase > >> > >> Why do you need to build an in-memory graph which you would want to > >> read/write to? You could store the graph in HBase directly. As pointed > >> out, HBase might not be the best suited for SPARQL queries, but its not > >> impossible to do. Using the triples, you can form a graph that can be > >> represented in HBase as an adjacency list. I've stored graphs with > >> 16-17M nodes which was data equivalent to about 600M triples. And this > >> was on a small cluster and could certainly scale way more than 16M graph > >> nodes. > >> > >> In case you are interested in working on SPARQL over HBase, we could > >> collaborate on it... > >> > >> -ak > >> > >> > >> Amandeep Khurana > >> Computer Science Graduate Student > >> University of California, Santa Cruz > >> > >> > >> On Wed, Mar 31, 2010 at 11:56 AM, Andrew Purtell > >> <[email protected]>wrote: > >> > >> > Hi Raffi, > >> > > >> > To read up on fundamentals I suggest Google's BigTable paper: > >> > http://labs.google.com/papers/bigtable.html > >> > > >> > Detail on how HBase implements the BigTable architecture within the > >> > Hadoop ecosystem can be found here: > >> > > >> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture > >> > > http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html > >> > > >> > > http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-l > >> > og.html > >> > > >> > Hope that helps, > >> > > >> > - Andy > >> > > >> > > From: Basmajian, Raffi <[email protected]> > >> > > Subject: RE: Using SPARQL against HBase > >> > > To: [email protected], [email protected] > >> > > Date: Wednesday, March 31, 2010, 11:42 AM If Hbase can't respond to > >> > > SPARQL-like queries, then what type of query language can it respond > >> > >> > > to? In a traditional RDBMS database one would use SQL; so what is > >> > > the counterpart query language with Hbase? > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > ------------------------------------------------------------------------------ > >> This e-mail transmission may contain information that is proprietary, > >> privileged and/or confidential and is intended exclusively for the > person(s) > >> to whom it is addressed. Any use, copying, retention or disclosure by > any > >> person other than the intended recipient or the intended recipient's > >> designees is strictly prohibited. If you are not the intended recipient > or > >> their designee, please notify the sender immediately by return e-mail > and > >> delete all copies. OppenheimerFunds may, at its sole discretion, > monitor, > >> review, retain and/or disclose the content of all email communications. > >> > >> > ============================================================================== > >> > >> > >> -- > >> punkt. netServices > >> ______________________________ > >> Jürgen Jakobitsch > >> Codeography > >> > >> Lerchenfelder Gürtel 43 Top 5/2 > >> A - 1160 Wien > >> Tel.: 01 / 897 41 22 - 29 > >> Fax: 01 / 897 41 22 - 22 > >> > >> netServices http://www.punkt.at > >> > >> > > > > -- > > punkt. netServices > > ______________________________ > > Jürgen Jakobitsch > > Codeography > > > > Lerchenfelder Gürtel 43 Top 5/2 > > A - 1160 Wien > > Tel.: 01 / 897 41 22 - 29 > > Fax: 01 / 897 41 22 - 22 > > > > netServices http://www.punkt.at > > > > > > > > -- > Best Regards, Edward J. Yoon @ NHN, corp. > [email protected] > http://blog.udanax.org >
