Hi there, I like this feature list. Actually, when reading it I was thinking of a "(Web 3.0 | Linked Data | GGG | call-it-whatever) validation service" similar to [1]. Maybe another Google SOC project ;-) It would basically mean to write several checkers more or less sophisticated to compute scores summed up to a highscore. And then I would create a cool looking highscore website which could move web developers and site maintainers (although they will host only small sets of data, the community is big -> the Long Tail ;-) and this could bring people to jump onto the bandwagon and expose data. I've added my thoughts below...
[1] http://web2.0validator.com/ > - There's a well-thought-out ontology for the dataset with smart > mappings to existing popular ontologies and vocabularies. score based on popularity of vocab and referred vocabs (may be evaluated by SWSE/Sindice/Swoogle/dbpedia/etc.), consistency (checking of DL ontologies), quality measurement of ontologies in general - I'm sure some work has been done towards this direction > - All items of interest in the dataset have been assigned unique URIs. > - All URIs are dereferenceable, according to the recommendations given > in [1]. Checking the first requirement is somehow impossible without knowing about the underlying dataset. The second is easy but could take a long time (however, a "Semantic Link" checker would probably be very useful anyway to keep the GGG consistent and check your own site) > - Resolving the URIs returns information about the resource, ideally > in RDF/XML and N3 and HTML, based on content negotiation can be checked, support for each format adds to the score > - The HTML pages where the data shows up are marked up with RDFa can be checked, score ~ ratio number of rdfa-tags / html-tags > - There's a SPARQL endpoint that makes all the RDF data available looking for the sitemap extension; simple test for Request-Path "/ sparql", or by following internal links looking for SPARQL endpoint > - There's an RDF data dump that contains all the data looking for the sitemap extension, dumps available? > - The dataset is richly interlinked internally, so you can use e.g. > Tabulator to browse through the dataset, jumping from one node to the > next score ~ ratio number of triples with IRI as object / all triples > - The project team engages with other dataset maintainers to create > RDF links between resources that are described in multiple datasets, > or that are related score ~ ratio number of triples with outgoing IRI as object / triples with IRI as object > - The URIs and what kind of data is available is all clearly > documented, to make it easy for people to e.g. link from their FOAF > files into the dataset difficult, not possible to include in score regards Andy _______________________________________________ Linking-open-data mailing list [email protected] http://simile.mit.edu/mailman/listinfo/linking-open-data
