Hi David,

Can you contact me off the mailing list to arrange for the database to be 
provided and we can setup a local instance to look into this ?

Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.      //              http://www.openlinksw.com/
Weblog   -- http://www.openlinksw.com/blogs/
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers

> On 17 Feb 2015, at 14:54, Davide Alocci <davide.alo...@isb-sib.ch> wrote:
> 
> Dear Virtuoso Users, 
> 
> I am Davide Alocci a Ph. D. student at the Swiss Institute of Bioinformatics. 
> Currently we are working on a software for doing substructure search in 
> database of glycan structures. Here you can find some more information about 
> glycan (http://en.wikipedia.org/wiki/Glycan 
> <http://en.wikipedia.org/wiki/Glycan>),but what it is really important is 
> knowing that a glycan is a tree where every node and edge can carry different 
> information. Our goal is to design a software that can retrieve all the 
> structures in the database which contains a specific motif.
> 
> From the begin we decided to translate every structure in triples and use 
> Virtuoso for doing the search.  
> In our model each node becomes an entity and we encode the edges linking the 
> entities with predicates. 
> Because an edge has different properties we have multiple triples with 
> different predicates. 
> Moreover we have self-loops for specify node's properties. 
> 
> In the end every structure is a long list of triples and here there is an 
> example: 
> 
> <http://mzjava.expasy.org/structureConnection/A> 
> <http://mzjava.expasy.org/structureConnection/A> 
>         <http://mzjava.expasy.org/predicate/has_components> 
> <http://mzjava.expasy.org/predicate/has_components> 
>                 <http://mzjava.expasy.org/component/A/4> 
> <http://mzjava.expasy.org/component/A/4> , 
> <http://mzjava.expasy.org/component/A/3> 
> <http://mzjava.expasy.org/component/A/3> , 
>                 <http://mzjava.expasy.org/component/A/2> 
> <http://mzjava.expasy.org/component/A/2> , 
> <http://mzjava.expasy.org/component/A/1> 
> <http://mzjava.expasy.org/component/A/1> , 
>                 <http://mzjava.expasy.org/component/A/0> 
> <http://mzjava.expasy.org/component/A/0> . 
> 
> <http://mzjava.expasy.org/component/A/0> 
> <http://mzjava.expasy.org/component/A/0> 
> <http://mzjava.expasy.org/predicate/is_GlycosidicLinkage> 
> <http://mzjava.expasy.org/predicate/is_GlycosidicLinkage> 
>                 <http://mzjava.expasy.org/component/A/3> 
> <http://mzjava.expasy.org/component/A/3> , 
> <http://mzjava.expasy.org/component/A/2> 
> <http://mzjava.expasy.org/component/A/2> ; 
> <http://mzjava.expasy.org/predicate/is_SubstituentLinkage> 
> <http://mzjava.expasy.org/predicate/is_SubstituentLinkage> 
>                 <http://mzjava.expasy.org/component/A/1> 
> <http://mzjava.expasy.org/component/A/1> ; 
>         <http://mzjava.expasy.org/predicate/is_a_Glc> 
> <http://mzjava.expasy.org/predicate/is_a_Glc> 
>                 <http://mzjava.expasy.org/component/A/0> 
> <http://mzjava.expasy.org/component/A/0> ; 
>         <http://mzjava.expasy.org/predicate/is_connected> 
> <http://mzjava.expasy.org/predicate/is_connected> 
>                 <http://mzjava.expasy.org/component/A/3> 
> <http://mzjava.expasy.org/component/A/3> , 
> <http://mzjava.expasy.org/component/A/2> 
> <http://mzjava.expasy.org/component/A/2> , 
> <http://mzjava.expasy.org/component/A/1> 
> <http://mzjava.expasy.org/component/A/1> ; 
>         <http://mzjava.expasy.org/predicate/is_monosaccharide> 
> <http://mzjava.expasy.org/predicate/is_monosaccharide> 
>                 <http://mzjava.expasy.org/component/A/0> 
> <http://mzjava.expasy.org/component/A/0> . 
> 
> 
> <http://mzjava.expasy.org/component/A/1> 
> <http://mzjava.expasy.org/component/A/1> 
>         <http://mzjava.expasy.org/predicate/is_a_NAcetyl> 
> <http://mzjava.expasy.org/predicate/is_a_NAcetyl> 
>                 <http://mzjava.expasy.org/component/A/1> 
> <http://mzjava.expasy.org/component/A/1> ; 
>         <http://mzjava.expasy.org/predicate/is_substituent> 
> <http://mzjava.expasy.org/predicate/is_substituent> 
>                 <http://mzjava.expasy.org/component/A/1> 
> <http://mzjava.expasy.org/component/A/1> . 
> 
> <http://mzjava.expasy.org/component/A/2> 
> <http://mzjava.expasy.org/component/A/2> 
>         <http://mzjava.expasy.org/predicate/is_a_Gal> 
> <http://mzjava.expasy.org/predicate/is_a_Gal> 
>                 <http://mzjava.expasy.org/component/A/2> 
> <http://mzjava.expasy.org/component/A/2> ; 
>         <http://mzjava.expasy.org/predicate/is_monosaccharide> 
> <http://mzjava.expasy.org/predicate/is_monosaccharide> 
>                 <http://mzjava.expasy.org/component/A/2> 
> <http://mzjava.expasy.org/component/A/2> . 
> 
> <http://mzjava.expasy.org/component/A/4> 
> <http://mzjava.expasy.org/component/A/4> 
>         <http://mzjava.expasy.org/predicate/is_a_Fuc> 
> <http://mzjava.expasy.org/predicate/is_a_Fuc> 
>                 <http://mzjava.expasy.org/component/A/4> 
> <http://mzjava.expasy.org/component/A/4>; 
>         <http://mzjava.expasy.org/predicate/is_monosaccharide> 
> <http://mzjava.expasy.org/predicate/is_monosaccharide> 
>                 <http://mzjava.expasy.org/component/A/4> 
> <http://mzjava.expasy.org/component/A/4> 
> 
> 
> At the moment our endpoint contains around 30000 structures and it has a size 
> of 200mb. 
> For querying the endpoint we use more or less the some strategy, we first 
> translate the substructure in a sparql query and we retrieve the id of the 
> structures that contains it. 
> Here there is an example of query: 
> 
> SELECT DISTINCT ?structureConnection 
>     WHERE { 
>         ?structureConnection predicate:has_components ?component0 . { 
>                     SELECT * WHERE { 
>                             ?component0 predicate:is_a_Glc ?component0 . 
>                             ?component1 predicate:is_a_NAcetyl ?component1 . 
>                             ?component0 predicate:is_connected ?component1 . 
>                             ?component0 predicate:is_SubstituentLinkage 
> ?component1 . 
>                             ?component0 predicate:has_linkedCarbon_2 
> ?component1 . 
>                             ?component2 predicate:is_a_Glc ?component2 . 
>                             ?component0 predicate:is_connected ?component2 . 
>                             ?component0 predicate:is_GlycosidicLinkage 
> ?component2 . 
>                             ?component0 predicate:has_anomerConnection_beta 
> ?component2 . 
>                             ?component0 predicate:has_linkedCarbon_4 
> ?component2 . 
>                             ?component0 predicate:has_anomerCarbon_1 
> ?component2 . 
>                             ?component3 predicate:is_a_NAcetyl ?component3 . 
>                             ?component2 predicate:is_connected ?component3 . 
>                             ?component2 predicate:is_SubstituentLinkage 
> ?component3 . 
>                             ?component2 predicate:has_linkedCarbon_2 
> ?component3 . 
>                             ?component4 predicate:is_a_Man ?component4 . 
>                             ?component2 predicate:is_connected ?component4 . 
>                             ?component2 predicate:is_GlycosidicLinkage 
> ?component4 . 
>                             ?component2 predicate:has_anomerConnection_beta 
> ?component4 . 
>                             ?component2 predicate:has_linkedCarbon_4 
> ?component4 . 
>                             ?component2 predicate:has_anomerCarbon_1 
> ?component4 . 
>                             ?component5 predicate:is_a_Man ?component5 . 
>                             ?component4 predicate:is_connected ?component5 . 
>                             ?component4 predicate:is_GlycosidicLinkage 
> ?component5 . 
>                             ?component4 predicate:has_anomerConnection_alpha 
> ?component5 . 
>                             ?component4 predicate:has_linkedCarbon_3 
> ?component5 . 
>                             ?component4 predicate:has_anomerCarbon_1 
> ?component5 . 
>                             ?component6 predicate:is_a_Man ?component6 . 
>                             ?component4 predicate:is_connected ?component6 . 
>                             ?component4 predicate:is_GlycosidicLinkage 
> ?component6 . 
>                             ?component4 predicate:has_anomerConnection_alpha 
> ?component6 . 
>                             ?component4 predicate:has_linkedCarbon_6 
> ?component6 . 
>                             ?component4 predicate:has_anomerCarbon_1 
> ?component6 . 
>                             ?component7 predicate:is_a_Glc ?component7 . 
>                             ?component5 predicate:is_connected ?component7 . 
>                             ?component5 predicate:is_GlycosidicLinkage 
> ?component7 . 
>                             ?component5 predicate:has_anomerConnection_beta 
> ?component7 . 
>                             ?component5 predicate:has_anomerCarbon_1 
> ?component7 . 
>                             ?component5 predicate:has_linkedCarbon_2 
> ?component7 . 
>                             ?component8 predicate:is_a_NAcetyl ?component8 . 
>                             ?component7 predicate:is_connected ?component8 . 
>                             ?component7 predicate:is_SubstituentLinkage 
> ?component8 . 
>                             ?component7 predicate:has_linkedCarbon_2 
> ?component8 . 
>                             ?component9 predicate:is_a_Glc ?component9 . 
>                             ?component6 predicate:is_connected ?component9 . 
>                             ?component6 predicate:is_GlycosidicLinkage 
> ?component9 . 
>                             ?component6 predicate:has_anomerConnection_beta 
> ?component9 . 
>                             ?component6 predicate:has_anomerCarbon_1 
> ?component9 . 
>                             ?component6 predicate:has_linkedCarbon_2 
> ?component9 . 
>                             ?component10 predicate:is_a_NAcetyl ?component10 
> . 
>                             ?component9 predicate:is_connected ?component10 . 
>                             ?component9 predicate:is_SubstituentLinkage 
> ?component10 . 
>                             ?component9 predicate:has_linkedCarbon_2 
> ?component10 . 
>                             } 
>                 } 
> } 
> 
> As you could see the length of the query is related to the size of the 
> substructure.
> Substructures can have 30 components that means more than 200 triples in the 
> query. 
> At the moment we are facing the problem of having an extremely long query 
> that possibly is not a common problem. 
> 
> We are trying to optimized Virtuoso for our goal and so far the problem is 
> not related to ram or cpu but it seems more connected with the size of the 
> query and the time to parse it.
> Switching from 7.1 to 7.2 version we saw a good improvement of the 
> performance, for our test the new version is twice as fast than the 7.1 
> (great job :) ). 
> We tested even graph database like Neo4j but the performance is really poor. 
> 
> At the moment for a substructure with 25 components we have a query time of 
> 29 seconds whereas, with few components, the query time is under the second.
> I am keen to share our little database and some test queries because I think 
> is not a really common use case for Virtuoso.
> Any ideas for optimizing our model or our queries are welcome.
> 
> Best regards,
> Davide
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk_______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to