[ 
https://issues.apache.org/jira/browse/TAVERNA-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stian Soiland-Reyes updated TAVERNA-1044:
-----------------------------------------
    Description: 
When parsing a COMBINE archive from [JWS Online|http://jjj.mib.ac.uk/] such as 
http://jjj.mib.ac.uk/models/experiments/adlung2017_fig2f/export/combinearchive?download=1
 - then the metadata.rdf does not seem to be parsed. 

h2. Error trace

{code}
stain@biggie:/tmp$ curl -fO --remote-header-name 
'http://jjj.mib.ac.uk/models/experiments/adlung2017_fig2f/export/combinearchive?download=1'
curl: Saved to filename 'adlung2017_fig2f.sedx'

stain@biggie:/tmp$ java -jar 
~/software/taverna-tavlang-tool-0.15.1-incubating.jar convert --robundle 
adlung2017_fig2f.sedx 
..

May 10, 2018 10:35:43 AM 
org.apache.taverna.robundle.manifest.combine.CombineManifest findAnnotations
WARNING: Can't parse /metadata.rdf
org.apache.jena.riot.RiotException: [line: 6, col: 43] {E202} Expecting XML 
start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. Maybe 
there should be an rdf:parseType='Literal' for embedding mixed XML content in 
RDF. Maybe a striping error.
        at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error(ErrorHandlerFactory.java:128)
        at 
org.apache.jena.riot.lang.LangRDFXML$ErrorHandlerBridge.error(LangRDFXML.java:246)
        at 
org.apache.jena.rdfxml.xmlinput.impl.ARPSaxErrorHandler.error(ARPSaxErrorHandler.java:37)
        at 
org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:196)
        at 
org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:173)
        at 
org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:168)
        at 
org.apache.jena.rdfxml.xmlinput.impl.ParserSupport.warning(ParserSupport.java:194)
        at org.apache.jena.rdfxml.xmlinput.states.Frame.warning(Frame.java:55)
        at 
org.apache.jena.rdfxml.xmlinput.states.Frame.characters(Frame.java:164)
        at 
org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.characters(XMLHandler.java:137)
        at org.apache.xerces.parsers.AbstractSAXParser.characters(Unknown 
Source)
        at org.apache.xerces.impl.XMLNamespaceBinder.characters(Unknown Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown 
Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
 Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
Source)
        at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
        at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at 
org.apache.jena.rdfxml.xmlinput.impl.RDFXMLParser.parse(RDFXMLParser.java:150)
        at org.apache.jena.rdfxml.xmlinput.ARP.load(ARP.java:118)
        at org.apache.jena.riot.lang.LangRDFXML.parse(LangRDFXML.java:142)
        at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:175)
        at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:905)
        at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:256)
        at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:242)
        at 
org.apache.taverna.robundle.manifest.combine.CombineManifest.parseRDF(CombineManifest.java:240)
        at 
org.apache.taverna.robundle.manifest.combine.CombineManifest.findAnnotations(CombineManifest.java:332)
        at 
org.apache.taverna.robundle.manifest.combine.CombineManifest.readCombineArchive(CombineManifest.java:465)
        at 
org.apache.taverna.robundle.Bundle.readOrPopulateManifest(Bundle.java:121)
        at org.apache.taverna.robundle.Bundle.getManifest(Bundle.java:87)
        at 
org.apache.taverna.tavlang.tools.convert.ToRobundle.convert(ToRobundle.java:60)
        at 
org.apache.taverna.tavlang.tools.convert.ToRobundle.<init>(ToRobundle.java:47)
        at 
org.apache.taverna.tavlang.CommandLineTool$CommandConvert.runcommand(CommandLineTool.java:226)
        at 
org.apache.taverna.tavlang.CommandLineTool$CommandConvert.execute(CommandLineTool.java:220)
        at 
org.apache.taverna.tavlang.CommandLineTool.parse(CommandLineTool.java:71)
        at 
org.apache.taverna.tavlang.TavernaCommandline.main(TavernaCommandline.java:26)

{code}

h2. Analysis

This seems to be caused by invalid RDF/XML in the metadata.rdf added by JWS 
Online:

{code:xml}
stain@biggie:/tmp$ unzip adlung2017_fig2f.sedx

stain@biggie:/tmp$ riot metadata.rdf 
10:39:17 ERROR riot                 :: [line: 6, col: 43] {E202} Expecting XML 
start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. Maybe 
there should be an rdf:parseType='Literal' for embedding mixed XML content in 
RDF. Maybe a striping error.
10:39:17 ERROR riot                 :: [line: 43, col: 43] {E202} Expecting XML 
start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. Maybe 
there should be an rdf:parseType='Literal' for embedding mixed XML content in 
RDF. Maybe a striping error.
10:39:17 ERROR riot                 :: [line: 152, col: 43] {E202} Expecting 
XML start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. 
Maybe there should be an rdf:parseType='Literal' for embedding mixed XML 
content in RDF. Maybe a striping error.
...
<file:///tmp/> <http://purl.org/dc/terms/description> "Built by JWS Online." .
_:B5145c9a4X2Df8feX2D4a36X2Daba1X2Dacab299dd7d7 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/dc/terms/W3CDTF> .
<file:///tmp/> <http://purl.org/dc/terms/created> 
_:B5145c9a4X2Df8feX2D4a36X2Daba1X2Dacab299dd7d7 .
<file:///tmp/models/adlung1.sbml> <http://purl.org/dc/terms/description> 
"Exported by JWS Online from ..."
{code}

The broken RDF/XML follows this pattern:

{code:xml}
  <rdf:Description rdf:about=".">
    <dcterms:description>Built by JWS Online.</dcterms:description>
    <dcterms:created>
      <dcterms:W3CDTF>2018-05-10T02:38:51Z</dcterms:W3CDTF>
    </dcterms:created>
  </rdf:Description>
{code}

As Jena points out, this is not valid RDF/XML, as here it says a property 
dcterms:createdto a new anonymous W3CDTF resource - but a resource can't 
directly wrap a literal. The literal needs then a new nested property like 
<rdf:value>:

{code:xml}
    <dcterms:created>
      <dcterms:W3CDTF>
        <rdf:value>2018-05-10T02:38:51Z</rdf:value>
      </dcterms:W3CDTF>
    </dcterms:created>
{code}

This is probably a confusion from 
http://identifiers.org/combine.specifications/omex.version-1 which in its 
example, for some reason, uses dcterms:W3CDTF as a property of an untyped 
anonymous resource under dcterms:created:

{code:xml}
<dcterms:created rdf:parseType="Resource">
  <dcterms:W3CDTF>2014-06-26T10:29:00Z</rdf:value></dcterms:W3CDTF>
</dcterms:created>
{code}

This is semantically wrong as 
[dcterms:W3CDTF|http://dublincore.org/documents/dcmi-terms/#terms-W3CDTF] is 
defined as a Datatype (like int), not a Property. Similarly 
[dcterms:created|http://dublincore.org/documents/dcmi-terms/#terms-created] is 
defined with a range rdfs:Literal, which would not include a new W3CDTF 
Resource.

I believe dcterms:W3CDTF is meant as a grouping of the XSD datatypes like 
[xsd:dateTime|https://www.w3.org/TR/xmlschema11-2/#dateTime] but is listed in 
DCTerms for pure XML users. 

dcterms:created is more commonly used with a typed RDF literal rather than 
through some kind of anonymous "timestamp" resource. So normal use (outside 
COMBINE) would be:

{code:xml}
<dcterms:created 
rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime";>2014-06-26T10:29:00Z</dcterms:created>
{code}

Our [CombineManifest 
code|https://github.com/apache/incubator-taverna-language/blob/0.15.1-incubating/taverna-robundle/src/main/java/org/apache/taverna/robundle/manifest/combine/CombineManifest.java#L366]
 supports both variants as the {{parseType=Resource}} variant is commonly used 
by COMBINE producers.

The example from JWS Online however is in-between - I have let the authors know 
and recommended they use rdf:value or rdf:datatype variant. However the tavlang 
converter should then recognize rdf:value 

While it seems Jena's "riot" on the command line can ignore this syntactic 
error and parse the other triples, loading with Jena's RDFDataMgr.read() seems 
to bail out on the first error, meaning we also lose dcterms:creator which are 
correctly defined in the metadata.rdf.

This bug is to investigate if it's possible to reduce this error to a warning, 
as well as add support for the rdf:value variant that we can recommend to 
JWSOnline instead of the semantically broken parseType="Resource" pattern.

  was:
When parsing a COMBINE archive from [JWS Online|http://jjj.mib.ac.uk/] such as 
http://jjj.mib.ac.uk/models/experiments/adlung2017_fig2f/export/combinearchive?download=1
 - then the metadata.rdf does not seem to be parsed. 

h2. Error trace

{code}
stain@biggie:/tmp$ curl -fO --remote-header-name 
'http://jjj.mib.ac.uk/models/experiments/adlung2017_fig2f/export/combinearchive?download=1'
curl: Saved to filename 'adlung2017_fig2f.sedx'

stain@biggie:/tmp$ java -jar 
~/software/taverna-tavlang-tool-0.15.1-incubating.jar convert --robundle 
adlung2017_fig2f.sedx 
..

May 10, 2018 10:35:43 AM 
org.apache.taverna.robundle.manifest.combine.CombineManifest findAnnotations
WARNING: Can't parse /metadata.rdf
org.apache.jena.riot.RiotException: [line: 6, col: 43] {E202} Expecting XML 
start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. Maybe 
there should be an rdf:parseType='Literal' for embedding mixed XML content in 
RDF. Maybe a striping error.
        at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error(ErrorHandlerFactory.java:128)
        at 
org.apache.jena.riot.lang.LangRDFXML$ErrorHandlerBridge.error(LangRDFXML.java:246)
        at 
org.apache.jena.rdfxml.xmlinput.impl.ARPSaxErrorHandler.error(ARPSaxErrorHandler.java:37)
        at 
org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:196)
        at 
org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:173)
        at 
org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:168)
        at 
org.apache.jena.rdfxml.xmlinput.impl.ParserSupport.warning(ParserSupport.java:194)
        at org.apache.jena.rdfxml.xmlinput.states.Frame.warning(Frame.java:55)
        at 
org.apache.jena.rdfxml.xmlinput.states.Frame.characters(Frame.java:164)
        at 
org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.characters(XMLHandler.java:137)
        at org.apache.xerces.parsers.AbstractSAXParser.characters(Unknown 
Source)
        at org.apache.xerces.impl.XMLNamespaceBinder.characters(Unknown Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown 
Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
 Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
Source)
        at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
        at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at 
org.apache.jena.rdfxml.xmlinput.impl.RDFXMLParser.parse(RDFXMLParser.java:150)
        at org.apache.jena.rdfxml.xmlinput.ARP.load(ARP.java:118)
        at org.apache.jena.riot.lang.LangRDFXML.parse(LangRDFXML.java:142)
        at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:175)
        at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:905)
        at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:256)
        at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:242)
        at 
org.apache.taverna.robundle.manifest.combine.CombineManifest.parseRDF(CombineManifest.java:240)
        at 
org.apache.taverna.robundle.manifest.combine.CombineManifest.findAnnotations(CombineManifest.java:332)
        at 
org.apache.taverna.robundle.manifest.combine.CombineManifest.readCombineArchive(CombineManifest.java:465)
        at 
org.apache.taverna.robundle.Bundle.readOrPopulateManifest(Bundle.java:121)
        at org.apache.taverna.robundle.Bundle.getManifest(Bundle.java:87)
        at 
org.apache.taverna.tavlang.tools.convert.ToRobundle.convert(ToRobundle.java:60)
        at 
org.apache.taverna.tavlang.tools.convert.ToRobundle.<init>(ToRobundle.java:47)
        at 
org.apache.taverna.tavlang.CommandLineTool$CommandConvert.runcommand(CommandLineTool.java:226)
        at 
org.apache.taverna.tavlang.CommandLineTool$CommandConvert.execute(CommandLineTool.java:220)
        at 
org.apache.taverna.tavlang.CommandLineTool.parse(CommandLineTool.java:71)
        at 
org.apache.taverna.tavlang.TavernaCommandline.main(TavernaCommandline.java:26)

{code}

h2. Analysis

This seems to be caused by invalid RDF/XML in the metadata.rdf added by JWS 
Online:

{code:xml}
stain@biggie:/tmp$ unzip adlung2017_fig2f.sedx

stain@biggie:/tmp$ riot metadata.rdf 
10:39:17 ERROR riot                 :: [line: 6, col: 43] {E202} Expecting XML 
start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. Maybe 
there should be an rdf:parseType='Literal' for embedding mixed XML content in 
RDF. Maybe a striping error.
10:39:17 ERROR riot                 :: [line: 43, col: 43] {E202} Expecting XML 
start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. Maybe 
there should be an rdf:parseType='Literal' for embedding mixed XML content in 
RDF. Maybe a striping error.
10:39:17 ERROR riot                 :: [line: 152, col: 43] {E202} Expecting 
XML start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. 
Maybe there should be an rdf:parseType='Literal' for embedding mixed XML 
content in RDF. Maybe a striping error.
...
<file:///tmp/> <http://purl.org/dc/terms/description> "Built by JWS Online." .
_:B5145c9a4X2Df8feX2D4a36X2Daba1X2Dacab299dd7d7 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/dc/terms/W3CDTF> .
<file:///tmp/> <http://purl.org/dc/terms/created> 
_:B5145c9a4X2Df8feX2D4a36X2Daba1X2Dacab299dd7d7 .
<file:///tmp/models/adlung1.sbml> <http://purl.org/dc/terms/description> 
"Exported by JWS Online from ..."
{code}

The broken RDF/XML follows this pattern:

{code:xml}
  <rdf:Description rdf:about=".">
    <dcterms:description>Built by JWS Online.</dcterms:description>
    <dcterms:created>
      <dcterms:W3CDTF>2018-05-10T02:38:51Z</dcterms:W3CDTF>
    </dcterms:created>
  </rdf:Description>
{code}

As Jena points out, this is not valid RDF/XML, as here it says a property 
dcterms:createdto a new anonymous W3CDTF resource - but a resource can't 
directly wrap a literal. The literal needs then a new nested property like 
<rdf:value>.

This is probably a confusion from 
http://identifiers.org/combine.specifications/omex.version-1 which in its 
example, for some reason, uses dcterms:W3CDTF as a property of an untyped 
anonymous resource under dcterms:created:

{code:xml}
<dcterms:created rdf:parseType="Resource">
  <dcterms:W3CDTF>2014-06-26T10:29:00Z</dcterms:W3CDTF>
</dcterms:created>
{code}

This is semantically wrong as 
[dcterms:W3CDTF|http://dublincore.org/documents/dcmi-terms/#terms-W3CDTF] is 
defined as a Datatype (like int), not a Property. Similarly 
[dcterms:created|http://dublincore.org/documents/dcmi-terms/#terms-created] is 
defined with a range rdfs:Literal, which would not include a new W3CDTF 
Resource.

I believe dcterms:W3CDTF is meant as a grouping of the XSD datatypes like 
[xsd:dateTime|https://www.w3.org/TR/xmlschema11-2/#dateTime] but is listed in 
DCTerms for pure XML users. 

dcterms:created is more commonly used with a typed RDF literal rather than 
through some kind of anonymous "timestamp" resource. So normal use (outside 
COMBINE) would be:

{code:xml}
<dcterms:created 
rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime";>2014-06-26T10:29:00Z</dcterms:created>
{code}

Our [CombineManifest 
code|https://github.com/apache/incubator-taverna-language/blob/0.15.1-incubating/taverna-robundle/src/main/java/org/apache/taverna/robundle/manifest/combine/CombineManifest.java#L366]
 supports both variants as the {{parseType=Resource}} variant is commonly used 
by COMBINE producers.

The example from JWS Online however is in-between - I have let the authors know 
and recommended they use rdf:value or rdf:datatype variant. However the tavlang 
converter should then recognize rdf:value 

While it seems Jena's "riot" on the command line can ignore this syntactic 
error and parse the other triples, loading with Jena's RDFDataMgr.read() seems 
to bail out on the first error, meaning we also lose dcterms:creator which are 
correctly defined in the metadata.rdf.

This bug is to investigate if it's possible to reduce this error to a warning, 
as well as add support for the rdf:value variant that we can recommend to 
JWSOnline instead of the semantically broken parseType="Resource" pattern.


> Parsing COMBINE archive from JWSOnline skips metadata.rdf
> ---------------------------------------------------------
>
>                 Key: TAVERNA-1044
>                 URL: https://issues.apache.org/jira/browse/TAVERNA-1044
>             Project: Apache Taverna
>          Issue Type: Bug
>          Components: Taverna Language
>    Affects Versions: language 0.15.1
>            Reporter: Stian Soiland-Reyes
>            Assignee: Stian Soiland-Reyes
>            Priority: Major
>             Fix For: language 0.16.0
>
>
> When parsing a COMBINE archive from [JWS Online|http://jjj.mib.ac.uk/] such 
> as 
> http://jjj.mib.ac.uk/models/experiments/adlung2017_fig2f/export/combinearchive?download=1
>  - then the metadata.rdf does not seem to be parsed. 
> h2. Error trace
> {code}
> stain@biggie:/tmp$ curl -fO --remote-header-name 
> 'http://jjj.mib.ac.uk/models/experiments/adlung2017_fig2f/export/combinearchive?download=1'
> curl: Saved to filename 'adlung2017_fig2f.sedx'
> stain@biggie:/tmp$ java -jar 
> ~/software/taverna-tavlang-tool-0.15.1-incubating.jar convert --robundle 
> adlung2017_fig2f.sedx 
> ..
> May 10, 2018 10:35:43 AM 
> org.apache.taverna.robundle.manifest.combine.CombineManifest findAnnotations
> WARNING: Can't parse /metadata.rdf
> org.apache.jena.riot.RiotException: [line: 6, col: 43] {E202} Expecting XML 
> start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. 
> Maybe there should be an rdf:parseType='Literal' for embedding mixed XML 
> content in RDF. Maybe a striping error.
>       at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error(ErrorHandlerFactory.java:128)
>       at 
> org.apache.jena.riot.lang.LangRDFXML$ErrorHandlerBridge.error(LangRDFXML.java:246)
>       at 
> org.apache.jena.rdfxml.xmlinput.impl.ARPSaxErrorHandler.error(ARPSaxErrorHandler.java:37)
>       at 
> org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:196)
>       at 
> org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:173)
>       at 
> org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:168)
>       at 
> org.apache.jena.rdfxml.xmlinput.impl.ParserSupport.warning(ParserSupport.java:194)
>       at org.apache.jena.rdfxml.xmlinput.states.Frame.warning(Frame.java:55)
>       at 
> org.apache.jena.rdfxml.xmlinput.states.Frame.characters(Frame.java:164)
>       at 
> org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.characters(XMLHandler.java:137)
>       at org.apache.xerces.parsers.AbstractSAXParser.characters(Unknown 
> Source)
>       at org.apache.xerces.impl.XMLNamespaceBinder.characters(Unknown Source)
>       at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown 
> Source)
>       at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>  Source)
>       at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
>       at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
>       at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
>       at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>       at 
> org.apache.jena.rdfxml.xmlinput.impl.RDFXMLParser.parse(RDFXMLParser.java:150)
>       at org.apache.jena.rdfxml.xmlinput.ARP.load(ARP.java:118)
>       at org.apache.jena.riot.lang.LangRDFXML.parse(LangRDFXML.java:142)
>       at 
> org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:175)
>       at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:905)
>       at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:256)
>       at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:242)
>       at 
> org.apache.taverna.robundle.manifest.combine.CombineManifest.parseRDF(CombineManifest.java:240)
>       at 
> org.apache.taverna.robundle.manifest.combine.CombineManifest.findAnnotations(CombineManifest.java:332)
>       at 
> org.apache.taverna.robundle.manifest.combine.CombineManifest.readCombineArchive(CombineManifest.java:465)
>       at 
> org.apache.taverna.robundle.Bundle.readOrPopulateManifest(Bundle.java:121)
>       at org.apache.taverna.robundle.Bundle.getManifest(Bundle.java:87)
>       at 
> org.apache.taverna.tavlang.tools.convert.ToRobundle.convert(ToRobundle.java:60)
>       at 
> org.apache.taverna.tavlang.tools.convert.ToRobundle.<init>(ToRobundle.java:47)
>       at 
> org.apache.taverna.tavlang.CommandLineTool$CommandConvert.runcommand(CommandLineTool.java:226)
>       at 
> org.apache.taverna.tavlang.CommandLineTool$CommandConvert.execute(CommandLineTool.java:220)
>       at 
> org.apache.taverna.tavlang.CommandLineTool.parse(CommandLineTool.java:71)
>       at 
> org.apache.taverna.tavlang.TavernaCommandline.main(TavernaCommandline.java:26)
> {code}
> h2. Analysis
> This seems to be caused by invalid RDF/XML in the metadata.rdf added by JWS 
> Online:
> {code:xml}
> stain@biggie:/tmp$ unzip adlung2017_fig2f.sedx
> stain@biggie:/tmp$ riot metadata.rdf 
> 10:39:17 ERROR riot                 :: [line: 6, col: 43] {E202} Expecting 
> XML start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. 
> Maybe there should be an rdf:parseType='Literal' for embedding mixed XML 
> content in RDF. Maybe a striping error.
> 10:39:17 ERROR riot                 :: [line: 43, col: 43] {E202} Expecting 
> XML start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. 
> Maybe there should be an rdf:parseType='Literal' for embedding mixed XML 
> content in RDF. Maybe a striping error.
> 10:39:17 ERROR riot                 :: [line: 152, col: 43] {E202} Expecting 
> XML start or end element(s). String data "2018-05-10T02:38:51Z" not allowed. 
> Maybe there should be an rdf:parseType='Literal' for embedding mixed XML 
> content in RDF. Maybe a striping error.
> ...
> <file:///tmp/> <http://purl.org/dc/terms/description> "Built by JWS Online." .
> _:B5145c9a4X2Df8feX2D4a36X2Daba1X2Dacab299dd7d7 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
> <http://purl.org/dc/terms/W3CDTF> .
> <file:///tmp/> <http://purl.org/dc/terms/created> 
> _:B5145c9a4X2Df8feX2D4a36X2Daba1X2Dacab299dd7d7 .
> <file:///tmp/models/adlung1.sbml> <http://purl.org/dc/terms/description> 
> "Exported by JWS Online from ..."
> {code}
> The broken RDF/XML follows this pattern:
> {code:xml}
>   <rdf:Description rdf:about=".">
>     <dcterms:description>Built by JWS Online.</dcterms:description>
>     <dcterms:created>
>       <dcterms:W3CDTF>2018-05-10T02:38:51Z</dcterms:W3CDTF>
>     </dcterms:created>
>   </rdf:Description>
> {code}
> As Jena points out, this is not valid RDF/XML, as here it says a property 
> dcterms:createdto a new anonymous W3CDTF resource - but a resource can't 
> directly wrap a literal. The literal needs then a new nested property like 
> <rdf:value>:
> {code:xml}
>     <dcterms:created>
>       <dcterms:W3CDTF>
>         <rdf:value>2018-05-10T02:38:51Z</rdf:value>
>       </dcterms:W3CDTF>
>     </dcterms:created>
> {code}
> This is probably a confusion from 
> http://identifiers.org/combine.specifications/omex.version-1 which in its 
> example, for some reason, uses dcterms:W3CDTF as a property of an untyped 
> anonymous resource under dcterms:created:
> {code:xml}
> <dcterms:created rdf:parseType="Resource">
>   <dcterms:W3CDTF>2014-06-26T10:29:00Z</rdf:value></dcterms:W3CDTF>
> </dcterms:created>
> {code}
> This is semantically wrong as 
> [dcterms:W3CDTF|http://dublincore.org/documents/dcmi-terms/#terms-W3CDTF] is 
> defined as a Datatype (like int), not a Property. Similarly 
> [dcterms:created|http://dublincore.org/documents/dcmi-terms/#terms-created] 
> is defined with a range rdfs:Literal, which would not include a new W3CDTF 
> Resource.
> I believe dcterms:W3CDTF is meant as a grouping of the XSD datatypes like 
> [xsd:dateTime|https://www.w3.org/TR/xmlschema11-2/#dateTime] but is listed in 
> DCTerms for pure XML users. 
> dcterms:created is more commonly used with a typed RDF literal rather than 
> through some kind of anonymous "timestamp" resource. So normal use (outside 
> COMBINE) would be:
> {code:xml}
> <dcterms:created 
> rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime";>2014-06-26T10:29:00Z</dcterms:created>
> {code}
> Our [CombineManifest 
> code|https://github.com/apache/incubator-taverna-language/blob/0.15.1-incubating/taverna-robundle/src/main/java/org/apache/taverna/robundle/manifest/combine/CombineManifest.java#L366]
>  supports both variants as the {{parseType=Resource}} variant is commonly 
> used by COMBINE producers.
> The example from JWS Online however is in-between - I have let the authors 
> know and recommended they use rdf:value or rdf:datatype variant. However the 
> tavlang converter should then recognize rdf:value 
> While it seems Jena's "riot" on the command line can ignore this syntactic 
> error and parse the other triples, loading with Jena's RDFDataMgr.read() 
> seems to bail out on the first error, meaning we also lose dcterms:creator 
> which are correctly defined in the metadata.rdf.
> This bug is to investigate if it's possible to reduce this error to a 
> warning, as well as add support for the rdf:value variant that we can 
> recommend to JWSOnline instead of the semantically broken 
> parseType="Resource" pattern.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to