Re: [Virtuoso-users] Faceter service release dates?

Kingsley Idehen Fri, 13 Feb 2009 14:05:07 +0000

Aldo Bucchi wrote:

Kingsley,


On Thu, Feb 12, 2009 at 6:10 PM, Kingsley Idehen <kide...@openlinksw.com> wrote:

Aldo Bucchi wrote:

Kingsley,

[...]

Coming, but current cut is based on Physical Quad Store Triples.

The Virtual (RDF Views) variant is certainly coming too, but maybe not in
the first release of v6.0.

Roger that. So then the general strategy could be:

MySQL Production DB --> (?A) --> Ph. Quad Store --> FCT --> UI

For (?A) we have several options.

The critical step in the data route is RDF mapping ( and rule
materialization for labels, etc ). I want to do this using RDF Views
over internal physical tables to do it the Virtuoso way. So, one
possible ?A:

MySQL --> ( (?B) --> Virtuoso Physical Tables --> RDFViews -->
select/insert into Quad ) --> FCT --> UI

For ?B we can use an ETL tool, but if there's a way to do this a la
Vituoso I would prefer that.

For now you Map, and then use the Mappings to triggered a physical quad
store bulk load. Basically, we used this approach to produce the MusicBrainz
dump that's making its way to EC2.

Virtuoso does offer ETL via replication options. And we plan to offer some
other option inline with SQL Servers offerings (re. SQL-SQL data transfers).
These are planned for the post V6.0 release dev. cycle though. Short-term,
you can leverage Virtuoso's in-built replication functionality (but this is
in the commercial edition) and implement for you specific use case
scenarios.

I think RDF Views to Quad Store is what you are looking for. Once the data
is in the Quad Store you can also leverage the reasoning capabilities via
inference rules and SPARQL pragmas (which is what is happening withing
"description.vsp" template).

Kingsley


Got it. Thanks!
Just to put this on the list for others to debate/see/etc.

This is the initial plan ( subject to change ).

= TBox (once) =
* Make SQL dump of MySQL/PG schema(s)
* Load schema(s) into virtuoso ( Create Tables )
** Minor manual syntax changes might be needed. Manual intervention is
actually *good* as it forces us to understand the schema in depth (
and it is a reasonable amount of work ).
* Create RDF Views

Optionally, generate the "Data Source" ontology from the MySQL Schemausing our Automatic Data Source Ontology generator (this is part of theVirtuoso Conductor). Basically, we generate an RDF View, Ontology data(TBox), and Instance Data (ABox).

In your case, you could use the Views as the basis for physical triplesdumps into the Quad Store.

= ABox (daily) =
To load data from production DB ( batch ).
* Dump data as CSV
* Compress and upload to WebDAV folder on EC2 instance
* Virtuoso will automatically load/update/etc

Yes, if you are going to replicate between MySQL and Virtuoso SQL, andrun the physical triples dump into the ABox daily.


Kingsley

This way we 1)pay careful attention to the mappings and 2)keep the
contract with the provider of the data simple: Upload an agreed CSV
formatted file via WebDAV to update instance data.

Comments appreciated

Thanks yet again,
A



--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen

President & CEOOpenLink Software Web: http://www.openlinksw.com

Re: [Virtuoso-users] Faceter service release dates?

Reply via email to