[Virtuoso-users] Regexps with UTF-8-problems

Kjetil Kjernsmo Fri, 03 Apr 2009 12:23:18 +0000

All (in practise, Ivan :-) ),

A colleague has experienced some problems with UTF-8 when using a regular 
expression. This is experienced on both the 5.0.10 release and a build with 
the latest patches.


He reports that he inserts the following:

 INSERT INTO
<http://msone.computas.no/graphs/inferred/tmp-classification> {
<http://example.com/test>  <http://www.w3.org/2000/01/rdf-schema#label>
"P\u00E5l"^^<http://www.w3.org/2001/XMLSchema#string> .
}

Then, the following query returns the inserted triple:

CONSTRUCT { ?s ?p ?o . }  
FROM <http://msone.computas.no/graphs/inferred/tmp-classification>
WHERE {
 ?s ?p ?o .
 FILTER regex(?o, "P", "i" )  
}

Whereas this does not:
CONSTRUCT { ?s ?p ?o . }  
FROM <http://msone.computas.no/graphs/inferred/tmp-classification>
WHERE {
 ?s ?p ?o .
 FILTER regex(?o, "P\u00E5", "i" )  
}

I'll add that s/P\u00E5/På/ does not change the behaviour.

However, we have seen that this problem is not universal, we have some queries 
that go through, and it may have something to with how the data has been 
read, but we haven't seen a pattern.

I suppose that these queries are so simple, the corresponding SQL isn't the 
interesting part here?

Kind regards 

Kjetil Kjernsmo
-- 
Senior Knowledge Engineer
Mobile: +47 986 48 234
Email: kjetil.kjern...@computas.com   
Web: http://www.computas.com/

|  SHARE YOUR KNOWLEDGE  |

Computas AS  PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 | Fax:+47 6783 
1001

[Virtuoso-users] Regexps with UTF-8-problems

Reply via email to