Aldo Bucchi wrote:
Kingsley,
On Thu, Feb 12, 2009 at 6:10 PM, Kingsley Idehen <kide...@openlinksw.com> wrote:
Aldo Bucchi wrote:
Kingsley,
[...]
Coming, but current cut is based on Physical Quad Store Triples.
The Virtual (RDF Views) variant is certainly coming too, but maybe not in
the first release of v6.0.
Roger that. So then the general strategy could be:
MySQL Production DB --> (?A) --> Ph. Quad Store --> FCT --> UI
For (?A) we have several options.
The critical step in the data route is RDF mapping ( and rule
materialization for labels, etc ). I want to do this using RDF Views
over internal physical tables to do it the Virtuoso way. So, one
possible ?A:
MySQL --> ( (?B) --> Virtuoso Physical Tables --> RDFViews -->
select/insert into Quad ) --> FCT --> UI
For ?B we can use an ETL tool, but if there's a way to do this a la
Vituoso I would prefer that.
For now you Map, and then use the Mappings to triggered a physical quad
store bulk load. Basically, we used this approach to produce the MusicBrainz
dump that's making its way to EC2.
Virtuoso does offer ETL via replication options. And we plan to offer some
other option inline with SQL Servers offerings (re. SQL-SQL data transfers).
These are planned for the post V6.0 release dev. cycle though. Short-term,
you can leverage Virtuoso's in-built replication functionality (but this is
in the commercial edition) and implement for you specific use case
scenarios.
I think RDF Views to Quad Store is what you are looking for. Once the data
is in the Quad Store you can also leverage the reasoning capabilities via
inference rules and SPARQL pragmas (which is what is happening withing
"description.vsp" template).
Kingsley
Got it. Thanks!
Just to put this on the list for others to debate/see/etc.
This is the initial plan ( subject to change ).
= TBox (once) =
* Make SQL dump of MySQL/PG schema(s)
* Load schema(s) into virtuoso ( Create Tables )
** Minor manual syntax changes might be needed. Manual intervention is
actually *good* as it forces us to understand the schema in depth (
and it is a reasonable amount of work ).
* Create RDF Views
Optionally, generate the "Data Source" ontology from the MySQL Schema
using our Automatic Data Source Ontology generator (this is part of the
Virtuoso Conductor). Basically, we generate an RDF View, Ontology data
(TBox), and Instance Data (ABox).
In your case, you could use the Views as the basis for physical triples
dumps into the Quad Store.
= ABox (daily) =
To load data from production DB ( batch ).
* Dump data as CSV
* Compress and upload to WebDAV folder on EC2 instance
* Virtuoso will automatically load/update/etc
Yes, if you are going to replicate between MySQL and Virtuoso SQL, and
run the physical triples dump into the ABox daily.
Kingsley
This way we 1)pay careful attention to the mappings and 2)keep the
contract with the provider of the data simple: Upload an agreed CSV
formatted file via WebDAV to update instance data.
Comments appreciated
Thanks yet again,
A
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com