Re: [RFC] Metadata access and storage

Anders Feder Thu, 08 Sep 2011 01:56:19 -0700

Den 08-09-2011 07:18, Jürg Billeter skrev:

On Thu, 2011-09-08 at 01:04 +0200, Anders Feder wrote:

Den 07-09-2011 23:17, Jürg Billeter skrev:

I'm currently favoring a more modular approach where we define a core
storage API that is based on the RDF model but is kept much simpler.
That is, I would no longer use SPARQL (or any other query language) on
the lowest level and instead provide a simple CRUD API on the level of
RDF resources.

Cool, that is very much what I've been aiming for - though in the
reverse direction. What I propose is capturing your use case (E-D-S
pushing CRUDs to Tracker) and my use case (Tracker pulling triples from
E-D-S) in a single, backend agnostic interface. Then one can apply this
interface to Tracker, E-D-S, Soprano or whatever backend one sees fit to
achieve compatibility with the rest of the desktop.


Here are the primitives I've been operating with:

InsertStatement (triple)
MatchStatements (pattern)
DeleteStatement (triple)

MatchStatements returns solutions satisfying a basic graph pattern, the
other two are self-explanatory.

What primitives do you imagine would be needed? Are the three above
low-level enough?

Basic graph patterns can be very complex if you allow arbitrary SPARQL
filters. I'd say, that's not something you want to implement as part of
your application's own data store. I'd start with something like this:

I wouldn't require applications to implement basic graph patterns infull (though it would be useful for some). Sequences of triple patternsshould suffice for most things.


# Insert/replace statements (predicate-object pairs) about uri (subject)
InsertResource (uri, statements)
# Delete all statements about uri
DeleteResource (uri)
# Return all statements about uri
GetResource (uri)
# Return all resources that have been touched since specified point
GetUpdates (since)

What I would do is map the first three above on the API level (as Carlossuggested) to the three I suggested on the transport level. This way,greater granularity is available on the transport level (you caninsert/delete individual triples), but the application developer isshielded from the complexity of it on the API level.


The last point is important as it allows application databases/indices
to be kept in sync. The interface should also provide at least a very
limited form of transactions.

Can you describe your use case in more detail? If I understand you
correctly, you'd like an interface that could be used by an
application-independent Tracker miner. However, I don't see how you
could keep Tracker uptodate with the above interface without retrieving
all data with MatchStatements on every sync - unless you're saying that
the signals you describe on semantk.org should be used for incremental
updates. The latter would imply that you have to guarantee that the
Tracker miner is running whenever the application is running (to not
miss any signals).

I had assumed the miner would query for changes when it is loaded andact on signals while it is running. (Is this not what miners normally do(indexing)?) You're right that it would make sense to amend theinterface to support "GetUpdates (since)"-type operations though.

Wouldn't it be much easier to push data from the application to the RDF
store as in my proposal? Or in other words, what advantages are you
seeing by encouraging applications to provide a generic pull interface
instead of encouraging applications to push their data to a service
using a generic interface?

There are some situations (outside the immediate scope of Tracker) wherepush is not ideal. I give the example of 'location' under 'Use cases<http://semantk.org/use-cases.php>'.

Another example might be DBpedia <http://dbpedia.org/About>. Serviceslike this are very relevant sources of information for the desktop, butit doesn't make sense to push their whole database into one's localTracker store.

Push is good for information that has to do with the past (e.g. "Ireceived an e-mail yesterday") but it neglects the whole realm ofinformation that has to do with the present (e.g. "My current locationis 12.345 67.890").

Another disadvantage is that it duplicates information (e.g. one entryin E-D-S and one in Tracker).


Anders Feder

_______________________________________________
xdg mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: [RFC] Metadata access and storage

Reply via email to