On 09/11/10 16:22, Florent Guillaume wrote:
On Tue, Nov 9, 2010 at 1:19 PM, Andy Seaborne
<[email protected]> wrote:
Jeremy identified the IRI library as a potential contribution to a commons
area. It is free-standing, and does not use or call any Jena RDF code - it
depends only on ICU4J (and JUnit + Jflex in the build).
Please note that Abdera already has an IRI library.
http://svn.apache.org/repos/asf/abdera/java/trunk/dependencies/i18n/src/main/java/org/apache/abdera/i18n/iri/
Florent,
Thanks for pointing that out. I see it has a test suite as well and it
would be good to make sure we've got things right.
Illegal IRIs in data have been a bit of a plague in RDF data and the IRI
library (written by Jeremy) is a response to that. It checks both rules
for specific IRI schemes and also recommended forms as IRIs are often
com pared for equality. The library is quite picky. It includes
profiles for RDF URI references, IRI and the compromise we use in Jena
as a balance of legacy and spec exactness.
There is an online test service for RDF data in non-RDF/XML formats at:
http://sparql.org/data-validator.html
The IRIs are checked with the IRI library.
Andy
A few examples:
http://example/a b
Code: 17/WHITESPACE in PATH: A single whitespace character. These match
no grammar rules of URIs/IRIs. These characters are permitted in RDF URI
References, XML system identifiers, and XML Schema anyURIs.
http://example/a[]b
Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar
rules for URIs/IRIs.
http://example:80/
Code: 13/DEFAULT_PORT_SHOULD_BE_OMITTED in PORT: If the port is the
default one for the scheme it should be omitted.
<http://example:80/> Code: 14/PORT_SHOULD_NOT_BE_WELL_KNOWN in PORT:
Ports under 1024 should be accessed using the appropriate scheme name
urn:xyz
Code: 61/SCHEME_PATTERN_MATCH_FAILED in PATH: The scheme specific syntax
rules are violated.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]