Hi David,
Can you contact me off the mailing list to arrange for the database to be
provided and we can setup a local instance to look into this ?
Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc. // http://www.openlinksw.com/
Weblog -- http://www.openlinksw.com/blogs/
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter -- http://twitter.com/OpenLink
Google+ -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers
> On 17 Feb 2015, at 14:54, Davide Alocci <davide.alo...@isb-sib.ch> wrote:
>
> Dear Virtuoso Users,
>
> I am Davide Alocci a Ph. D. student at the Swiss Institute of Bioinformatics.
> Currently we are working on a software for doing substructure search in
> database of glycan structures. Here you can find some more information about
> glycan (http://en.wikipedia.org/wiki/Glycan
> <http://en.wikipedia.org/wiki/Glycan>),but what it is really important is
> knowing that a glycan is a tree where every node and edge can carry different
> information. Our goal is to design a software that can retrieve all the
> structures in the database which contains a specific motif.
>
> From the begin we decided to translate every structure in triples and use
> Virtuoso for doing the search.
> In our model each node becomes an entity and we encode the edges linking the
> entities with predicates.
> Because an edge has different properties we have multiple triples with
> different predicates.
> Moreover we have self-loops for specify node's properties.
>
> In the end every structure is a long list of triples and here there is an
> example:
>
> <http://mzjava.expasy.org/structureConnection/A>
> <http://mzjava.expasy.org/structureConnection/A>
> <http://mzjava.expasy.org/predicate/has_components>
> <http://mzjava.expasy.org/predicate/has_components>
> <http://mzjava.expasy.org/component/A/4>
> <http://mzjava.expasy.org/component/A/4> ,
> <http://mzjava.expasy.org/component/A/3>
> <http://mzjava.expasy.org/component/A/3> ,
> <http://mzjava.expasy.org/component/A/2>
> <http://mzjava.expasy.org/component/A/2> ,
> <http://mzjava.expasy.org/component/A/1>
> <http://mzjava.expasy.org/component/A/1> ,
> <http://mzjava.expasy.org/component/A/0>
> <http://mzjava.expasy.org/component/A/0> .
>
> <http://mzjava.expasy.org/component/A/0>
> <http://mzjava.expasy.org/component/A/0>
> <http://mzjava.expasy.org/predicate/is_GlycosidicLinkage>
> <http://mzjava.expasy.org/predicate/is_GlycosidicLinkage>
> <http://mzjava.expasy.org/component/A/3>
> <http://mzjava.expasy.org/component/A/3> ,
> <http://mzjava.expasy.org/component/A/2>
> <http://mzjava.expasy.org/component/A/2> ;
> <http://mzjava.expasy.org/predicate/is_SubstituentLinkage>
> <http://mzjava.expasy.org/predicate/is_SubstituentLinkage>
> <http://mzjava.expasy.org/component/A/1>
> <http://mzjava.expasy.org/component/A/1> ;
> <http://mzjava.expasy.org/predicate/is_a_Glc>
> <http://mzjava.expasy.org/predicate/is_a_Glc>
> <http://mzjava.expasy.org/component/A/0>
> <http://mzjava.expasy.org/component/A/0> ;
> <http://mzjava.expasy.org/predicate/is_connected>
> <http://mzjava.expasy.org/predicate/is_connected>
> <http://mzjava.expasy.org/component/A/3>
> <http://mzjava.expasy.org/component/A/3> ,
> <http://mzjava.expasy.org/component/A/2>
> <http://mzjava.expasy.org/component/A/2> ,
> <http://mzjava.expasy.org/component/A/1>
> <http://mzjava.expasy.org/component/A/1> ;
> <http://mzjava.expasy.org/predicate/is_monosaccharide>
> <http://mzjava.expasy.org/predicate/is_monosaccharide>
> <http://mzjava.expasy.org/component/A/0>
> <http://mzjava.expasy.org/component/A/0> .
>
>
> <http://mzjava.expasy.org/component/A/1>
> <http://mzjava.expasy.org/component/A/1>
> <http://mzjava.expasy.org/predicate/is_a_NAcetyl>
> <http://mzjava.expasy.org/predicate/is_a_NAcetyl>
> <http://mzjava.expasy.org/component/A/1>
> <http://mzjava.expasy.org/component/A/1> ;
> <http://mzjava.expasy.org/predicate/is_substituent>
> <http://mzjava.expasy.org/predicate/is_substituent>
> <http://mzjava.expasy.org/component/A/1>
> <http://mzjava.expasy.org/component/A/1> .
>
> <http://mzjava.expasy.org/component/A/2>
> <http://mzjava.expasy.org/component/A/2>
> <http://mzjava.expasy.org/predicate/is_a_Gal>
> <http://mzjava.expasy.org/predicate/is_a_Gal>
> <http://mzjava.expasy.org/component/A/2>
> <http://mzjava.expasy.org/component/A/2> ;
> <http://mzjava.expasy.org/predicate/is_monosaccharide>
> <http://mzjava.expasy.org/predicate/is_monosaccharide>
> <http://mzjava.expasy.org/component/A/2>
> <http://mzjava.expasy.org/component/A/2> .
>
> <http://mzjava.expasy.org/component/A/4>
> <http://mzjava.expasy.org/component/A/4>
> <http://mzjava.expasy.org/predicate/is_a_Fuc>
> <http://mzjava.expasy.org/predicate/is_a_Fuc>
> <http://mzjava.expasy.org/component/A/4>
> <http://mzjava.expasy.org/component/A/4>;
> <http://mzjava.expasy.org/predicate/is_monosaccharide>
> <http://mzjava.expasy.org/predicate/is_monosaccharide>
> <http://mzjava.expasy.org/component/A/4>
> <http://mzjava.expasy.org/component/A/4>
>
>
> At the moment our endpoint contains around 30000 structures and it has a size
> of 200mb.
> For querying the endpoint we use more or less the some strategy, we first
> translate the substructure in a sparql query and we retrieve the id of the
> structures that contains it.
> Here there is an example of query:
>
> SELECT DISTINCT ?structureConnection
> WHERE {
> ?structureConnection predicate:has_components ?component0 . {
> SELECT * WHERE {
> ?component0 predicate:is_a_Glc ?component0 .
> ?component1 predicate:is_a_NAcetyl ?component1 .
> ?component0 predicate:is_connected ?component1 .
> ?component0 predicate:is_SubstituentLinkage
> ?component1 .
> ?component0 predicate:has_linkedCarbon_2
> ?component1 .
> ?component2 predicate:is_a_Glc ?component2 .
> ?component0 predicate:is_connected ?component2 .
> ?component0 predicate:is_GlycosidicLinkage
> ?component2 .
> ?component0 predicate:has_anomerConnection_beta
> ?component2 .
> ?component0 predicate:has_linkedCarbon_4
> ?component2 .
> ?component0 predicate:has_anomerCarbon_1
> ?component2 .
> ?component3 predicate:is_a_NAcetyl ?component3 .
> ?component2 predicate:is_connected ?component3 .
> ?component2 predicate:is_SubstituentLinkage
> ?component3 .
> ?component2 predicate:has_linkedCarbon_2
> ?component3 .
> ?component4 predicate:is_a_Man ?component4 .
> ?component2 predicate:is_connected ?component4 .
> ?component2 predicate:is_GlycosidicLinkage
> ?component4 .
> ?component2 predicate:has_anomerConnection_beta
> ?component4 .
> ?component2 predicate:has_linkedCarbon_4
> ?component4 .
> ?component2 predicate:has_anomerCarbon_1
> ?component4 .
> ?component5 predicate:is_a_Man ?component5 .
> ?component4 predicate:is_connected ?component5 .
> ?component4 predicate:is_GlycosidicLinkage
> ?component5 .
> ?component4 predicate:has_anomerConnection_alpha
> ?component5 .
> ?component4 predicate:has_linkedCarbon_3
> ?component5 .
> ?component4 predicate:has_anomerCarbon_1
> ?component5 .
> ?component6 predicate:is_a_Man ?component6 .
> ?component4 predicate:is_connected ?component6 .
> ?component4 predicate:is_GlycosidicLinkage
> ?component6 .
> ?component4 predicate:has_anomerConnection_alpha
> ?component6 .
> ?component4 predicate:has_linkedCarbon_6
> ?component6 .
> ?component4 predicate:has_anomerCarbon_1
> ?component6 .
> ?component7 predicate:is_a_Glc ?component7 .
> ?component5 predicate:is_connected ?component7 .
> ?component5 predicate:is_GlycosidicLinkage
> ?component7 .
> ?component5 predicate:has_anomerConnection_beta
> ?component7 .
> ?component5 predicate:has_anomerCarbon_1
> ?component7 .
> ?component5 predicate:has_linkedCarbon_2
> ?component7 .
> ?component8 predicate:is_a_NAcetyl ?component8 .
> ?component7 predicate:is_connected ?component8 .
> ?component7 predicate:is_SubstituentLinkage
> ?component8 .
> ?component7 predicate:has_linkedCarbon_2
> ?component8 .
> ?component9 predicate:is_a_Glc ?component9 .
> ?component6 predicate:is_connected ?component9 .
> ?component6 predicate:is_GlycosidicLinkage
> ?component9 .
> ?component6 predicate:has_anomerConnection_beta
> ?component9 .
> ?component6 predicate:has_anomerCarbon_1
> ?component9 .
> ?component6 predicate:has_linkedCarbon_2
> ?component9 .
> ?component10 predicate:is_a_NAcetyl ?component10
> .
> ?component9 predicate:is_connected ?component10 .
> ?component9 predicate:is_SubstituentLinkage
> ?component10 .
> ?component9 predicate:has_linkedCarbon_2
> ?component10 .
> }
> }
> }
>
> As you could see the length of the query is related to the size of the
> substructure.
> Substructures can have 30 components that means more than 200 triples in the
> query.
> At the moment we are facing the problem of having an extremely long query
> that possibly is not a common problem.
>
> We are trying to optimized Virtuoso for our goal and so far the problem is
> not related to ram or cpu but it seems more connected with the size of the
> query and the time to parse it.
> Switching from 7.1 to 7.2 version we saw a good improvement of the
> performance, for our test the new version is twice as fast than the 7.1
> (great job :) ).
> We tested even graph database like Neo4j but the performance is really poor.
>
> At the moment for a substructure with 25 components we have a query time of
> 29 seconds whereas, with few components, the query time is under the second.
> I am keen to share our little database and some test queries because I think
> is not a really common use case for Virtuoso.
> Any ideas for optimizing our model or our queries are welcome.
>
> Best regards,
> Davide
>
>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk_______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users