Hello Rajeev,

I'm unable to read the text in question but I see codepoint numbers in
the sample, ans I see that they're in range that described as
{ "Malayalam", UCP_ALPHA, 0x0D00, 0x0D7F, NULL, NULL, NULL      }
in libsrc/langfunc/langfunc.c source file of Virtuoso. I've tested that
things using my favorite range
{ "Cyrillic", UCP_ALPHA , 0x0400, 0x04FF, NULL, NULL, NULL }
that has no fundamental differences in properties with Malayalam.
The default encoding of free-text queries is default SQL encoding of the
SQL connection, if not set in the connection explicitely then default
SQL enoding of the server, if that is not set in virtuoso.ini as well
then LATIN-1. UTF-8 can not be single-character SQL encoding because
it's, well, not single-character. So you should set the encoding locally
inside the free-text expression, as described in the documentation:

select ?s where { ?s rdfs:label ?o. FILTER bif:contains(?o, '[__enc
"UTF-8"] "abc*"'). }

or (if the text is not killed by mailer)

select ?s where { ?s rdfs:label ?o. FILTER bif:contains(?o, '[__enc
"UTF-8"] "അചച*"'). }

There are known issues with internationalization of Virtuoso, most of
them are about non-LATIN-1 names of files and DAV resources. We're
working on them but it's a large extension that will not be completed
before next release.

Best Regards,
Ivan Mikhailov.




On Thu, 2008-01-31 at 20:08 +0530, Rajeev J Sebastian wrote:
> Hello Ivan and friends,
> 
> So far, I did not face any problems and found many things that made me happy 
> ;)
> 
> But today, I came upon another issue, and would like to have your
> suggestions on the same ...
> 
> 
> I am trying to make use of the free text index in virtuoso. E.g. in the 
> triple:
> 
> urn:blah:foo rdfs:label "abcd" .
> 
> For clarity i gave the literal above with latin characters. In my own
> graph though, I have a set of Malayalam codepoints in it.
> 
> The problem is, that a query like
> 
> select ?s where { ?s rdfs:label ?o. FILTER bif:contains(?o, '"abc*"'). }
> 
> fails with the following error:
> 
> 22023 Error FT370: Wildcard word needs at least 3 leading characters
> 
> 
> Hopefully, it wont get munged in transit, but the actual literal is
> 
> അച്ചന്‍
> 
> What I am searching for is:
> 
> "അചച*"
> 
> As you can examine and see, there are correctly 4 characters before the *.
> 
> 
> 
> Also, if I search for
> 
> അച്ചന്‍
> 
> ... i.e., same as the literal.
> 
> I get the following error:
> 37000 Error XM029: Free-text expression, line 0: Invalid character in
> free-text search expression, it may not appear outside quoted string
> at �
> 
> Both the above work for latin strings. I dont know about other scripts though.
> 
> Regards
> Rajeev J Sebastian
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Reply via email to