Re: [DBpedia-discussion] DBpedia download vs DBPedia SPARQL data

Sebastian Hellmann Tue, 04 Jun 2019 17:27:04 -0700

Hi Denny,

On 04.06.19 19:40, Denny Vrandečić wrote:

Thank you, I read now through the paper - great work, congratulations!
I am glad to see DBpedia evolve. I am very much looking forward to seeit move out of Beta.
I am also very glad (and quietly amused) to see you moving to opaqueidentifiers. I think that is the right decision.

We see them as Data Management URIs. You can connect to DBpedia GlobalID Clusters and thus gain access to everything else. For semantics andvalidation it is strongly advised to keep a local URI from the cluster.

A few questions on the paper:
- the releases for the chapters - the enriched sets - aren't theybasically just a subset of the complete fused set? What is the pointof these, why would anyone prefer the Catalan release over the wholefused set? I am missing something here.

There are minor variations, but you could say it is a subset. However,the thing is called FlexiFusion for a reason. Chapters can choose howthey want the data:

1. See Fig 2. the 'Rocket Ice' chart. One can safely assume that qualityis better, if you choose the blue and green part only.

2. Chapters can easily tailor their ID space. So let's say they usetheir DBpedia language plus some well-linked, authorative nationaldatasets and then enrich with everything else. We can tailor nationalinformation infrastructures with it.

- why would you load the English chapter into the main SPARQL endpointinstead of the fused set?

We are moving towards integration of LOD. So at some point is doesn'tmake sense to host ALL data in one place for free. Potentially, we canFlexiFuse anything with links like throw in Geonames, GND,WorldFactbook, MusicBrainz in the mix. If it gets really popular, thereis no funding to add 10 or 100 servers to the cluster.

On the other hand, maybe you know some organisation who would donate biglump sums whenever 'free' hosting is at its limits due to a magnitude inrequest increase. This would make a lot of people very happy.

Otherwise it is free, unlimited dump downloads for self-hosting or paiddedicated hosting.

- in Table 2, the Fusion dataset boasts 66M entities over Wikidata's45M entities. Where do the 21M more entities come from? Shouldn't mostof the individual Wikipedias' articles be already matched to WikidataIDs and thus fused together?

Wikidata only has an ID, if the article exists in at least twolanguages. These are the singletons with no equivalent in other languages.

My understanding is that if I want to compare Wikidata to the fuseddataset, I need to download both the mappings and the fused dataset,and then translate the latter using the former. Or is there some wayfor the databus to create me a fused dataset using Wikidata IDs("canonicalized", as it used to be called) instead of the new DBpediaIDs? I thought I read something like that in your previous answer, butI couldn't find that, and am now thinking of just doing it myself.

That sounds a lot like the enriched Wikidata. Yes, we can produce that.It would look like <wikidata.dbpedia.org/resource/Q64> <dbo:postalcode> "" .

Catalan is a prototype for now. All other sources will come soon,including Wikidata. Maybe, we will also produce a <wikidata.org> URIdatasets. You actually want the enriched Wikidata-DBpedia. There is noway to rewrite all 66M entities to WKD-URIs.

We also have two funded projects Energy-Databus andSupplyChainManagement-Databus, which will tackle bi-directional mappingsof properties and classes. They start in August.


Note:

If you build something to remap properties and classes of the data, wewould be happy, if you could share it via the Databus again:http://dev.dbpedia.org/Databus_Upload_User_Manual

Could also just be the reverse mapping, this is what we need. Or betterwe need infoboxes and their properties to Wikidata. So it would save ussome work. Or you wait some months.

Finally, as Table 4 shows, the Fused dataset has an appreciable winof +2.16% over Wikidata on a property such as birth date. Would youconsider publishing these diffs under a CC0 license so they could beprovided to Wikidata for consideration and to enrich the source itself?

https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE-> WMF Grant to build the application to assist in this. I don't thinkwe need to republish the dump. The app will hopefully have thereferences too and then it should be easy to import/sync. FlexiFusionwas built with syncing Wikipedia and Wikidata in mind.

Getting traction here is the problem. How would you approach it? Can youmentor the project?


Cheers,

Sebastian


Cheers,
Denny

On Mon, Jun 3, 2019 at 10:48 PM Sebastian Hellmann<[email protected]<mailto:[email protected]>> wrote:


    Hi Denny,

    On 03.06.19 23:31, Denny Vrandečić wrote:

    Thank you a lot for the answer, that is super useful.

    I'll see if I can get the canonicalized version recreated :)

    One question though, is there a cleaned version of the DBpedia
    ontology mapping based data? I only found the uncleaned version.



    Two scripts are missing: Type consistency check and Redirect
    resolution. We will do them after the Unicode bug.


    Do you have any plans when the next release of DBpedia is going
    to be available?



    These are signed with the public key from
    http://webid.dbpedia.org/webid.ttl so they are the next releases.
    The structure will stay the same and each release should be a bit
    better than the previous one. We are just working on the handful
    of issues and a better way to comment on mistakes before we
    announce them on all channels.

    It is an open platform now. We will have core dataset releases
    (including raw data) and then the community can create their own
    additions.

    https://propi.github.io/webid.ttl will add the LHD dataset and
    Heiko Paulheim DBkWik, etc.

    If you do any analysis, you can get an account and publish the
    data on the bus. Links like
    https://databus.dbpedia.org/denny/analysis are stable redirects to
    files, just like purl.org <http://purl.org>.


    -- Sebastian




    On Mon, Jun 3, 2019 at 2:19 PM Sebastian Hellmann
    <[email protected]
    <mailto:[email protected]>> wrote:

        Hi Denny,

        you didn't find them really, because they are not yet
        publicly released. Please see them as a beta.

        The main reason is, that there are a handful of missing
        features and a handful of stupid bugs.

        One example:

        - we discovered a unicode issue in URIs which still allows
        valid analysis, but would not allow to load it into
        dbpedia.org/sparql <http://dbpedia.org/sparql>

        - we built the Databus to have a group changelog and a
        dataset/artifact changelog, however, these can only be
        changed at release time, so we can not update reported errors
        after it was published, like the one above.

        It is not hard and marvin did new extractions already:
        https://databus.dbpedia.org/marvin , there is just a bit
        missing.

        i.e. files such as
        
http://downloads.dbpedia.org/2016-10/core-i18n/de/mappingbased_objects_wkd_uris_de.ttl.bz2
        - can you point me where I can find the canonicalized
        versions in the new files?


        These are discontinued. Instead there is:

        https://databus.dbpedia.org/dbpedia/id-management/global-ids
        loaded into this webservice:
        
https://global.dbpedia.org/same-thing/lookup/?uri=http://www.wikidata.org/entity/Q8087
        where you can resolve many URIs against clusters.

        and the fused and enriched versions as described in
        https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf

        Flexifusion is more systematic and can rewrite any datasetś
        subject with any other subject from the ID management. So we
        could produce these datasets any way.

        Thanks for these pointers! I have run a few analyses, and
        now can rerun them again with the actual current data :) I
        expect this to improve DBpedia numbers by quite a bit.

        You could also try the fused version:
        https://databus.dbpedia.org/dbpedia/fusion This is the one we
        are working on most and will aggregate a lot more data in the
        future.

        I find it all a bit hard to navigate (although Databus has a
        few really neat features, thanks for that).


        Any feedback welcome, the issue tracker is linked on top of
        the website.


        Yes, another missing feature. However, we thought that the
        pros will just look at the dataid files and then write sparql
        queries at https://databus.dbpedia.org/yasgui/

        -- Sebastian


        On 03.06.19 19:49, Denny Vrandečić wrote:

        Oh, wow, thanks Sebastian, thanks Kingsley for the answers!

        I was entirely unaware of the DBpedia datasets over at
        databus.dbpedia.org <http://databus.dbpedia.org> - when I
        search for "dbpedia downloads" that's not where I get to.
        Also, when I go to dbpedia.org <http://dbpedia.org> and then
        click on "Downloads", I get to the 2016 datasets.

        https://wiki.dbpedia.org/Datasets

        https://wiki.dbpedia.org/develop/datasets

        I honestly thought, that the 2016 dataset is the latest one,
        and was rather disappointed. Thank you for showing me that I
        was just looking in the wrong place - but I would really
        suggest that you update your Websites to point to databus. I
        am sure I am not the only one who believes that there has
        been no DBpedia update since 2016.

        Thanks for these pointers! I have run a few analyses, and
        now can rerun them again with the actual current data :) I
        expect this to improve DBpedia numbers by quite a bit.

        One question, I liked to use the canonicalized versions from
        here https://wiki.dbpedia.org/downloads-2016-10, i.e. files
        such as
        
http://downloads.dbpedia.org/2016-10/core-i18n/de/mappingbased_objects_wkd_uris_de.ttl.bz2
        - can you point me where I can find the canonicalized
        versions in the new files? I find it all a bit hard to
        navigate (although Databus has a few really neat features,
        thanks for that).

        Cheers,
        Denny





        On Sat, Jun 1, 2019 at 9:43 AM Kingsley Idehen
        <[email protected] <mailto:[email protected]>> wrote:

            On 6/1/19 5:45 AM, Sebastian Hellmann wrote:


            Hi Denny,

            * the old system was like this:

            we load from here:
            http://downloads.dbpedia.org/2016-10/core/

            metadata is in
            http://downloads.dbpedia.org/2016-10/core/2016-10_dataid_core.ttl
            with void:sparqlEndpoint <http://dbpedia.org/sparql>
            <http://dbpedia.org/sparql> ;


            Hi Sebastian,


            I will also have the TTL referenced above loaded to a
            named graph so that it becomes accessible from the query
            solution I shared in my prior post.


            * the new system is here:
            https://databus.dbpedia.org/dbpedia

            There are 6 new releases and the metadata is in the
            endpoint https://databus.dbpedia.org/repo/sparql

            Once the collection saving feature  is finished, we
            will build a collection of datasets on the bus, which
            will then be loaded. It is basically a sparql query
            retrieving the downloadurls like this:

            http://dev.dbpedia.org/Data#example-application-virtuoso-docker


            Okay.

            Please install the Faceted Browser so that URIs like
            http://dev.dbpedia.org/Data#example-application-virtuoso-docker
            can also be looked up.

            As an aside, here's an Entity Type overview query
            results page
            
<https://databus.dbpedia.org/repo/sparql?default-graph-uri=&query=SELECT+%28SAMPLE%28%3Fs%29+AS+%3Fsample%29+%28COUNT%281%29+AS+%3Fcount%29++%28%3Fo+AS+%3FentityType%29%0D%0AWHERE+%7B%0D%0A++++++++%3Fs+a+%3Fo.+%0D%0A%09%09FILTER+%28isIRI%28%3Fs%29%29+%0D%0A++++++++++++++++FILTER+%28%21+contains%28str%28%3Fs%29%2C%22virt%22%29%29%0D%0A++++++%7D+%0D%0AGROUP+BY+%3Fo%0D%0AORDER+BY+DESC+%28%3Fcount%29&format=text%2Fhtml&timeout=0&debug=on>
            for future use etc..


            Kingsley




            On 31.05.19 21:59, Denny Vrandečić wrote:

            Thank you for the answer!

            I don't see how the query solution page that you
            linked indicates that this is the English Wikipedia
            extraction. Where does it say that? How can I tell? I
            am trying to understand, thanks.

            Also, when I download the set of English extractions
            from here,

            http://downloads.dbpedia.org/2016-10/core-i18n/en/

            particularly this one,

            
http://downloads.dbpedia.org/2016-10/core-i18n/en/mappingbased_objects_en.ttl.bz2


            it is only about 17,467 people with parents, not
            20,120, so that dataset seems out of sync with the one
            in the SPARQL endpoint.

            I am curious where do you load the dataset from?

            Thank you!


            On Fri, May 31, 2019 at 11:49 AM Kingsley Idehen
            <[email protected]
            <mailto:[email protected]>> wrote:

                On 5/31/19 2:23 PM, Denny Vrandečić wrote:

                When I query the dbpedia.org/sparql
                <http://dbpedia.org/sparql> endpoint asking for
                "how many people with a parent do you know?",
                i.e. select (count (distinct ?p) as ?c) where {
                ?s dbo:parent ?o }, I get as the answer 20,120.

                Where among the Downloads on
                wiki.dbpedia.org/downloads-2016-10
                <http://wiki.dbpedia.org/downloads-2016-10> can I
                find the dataset that the SPARQL endpoint
                actually serves? Is it the English
                Wikipedia-based "Mappingbased" one? Or is t the
                "Infobox Properties Mapped"?

                Cheers,
                Denny


                The query solution page
                
<http://dbpedia.org/sparql?default-graph-uri=&query=prefix+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E+%0D%0A%0D%0Aselect+%3Fg+%28count+%28distinct+%3Fs%29+as+%3Fc%29%0D%0Awhere+%7B+%0D%0A+++++++%0D%0A+++++++++graph+%3Fg+%7B%3Fs+dbo%3Aparent+%3Fo.%7D%0D%0A%0D%0A+++++%7D%0D%0Agroup+by+%3Fg&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on&run=+Run+Query+>
                indicates this is the English Wikipedia dataset.
                That's what we've always loaded into the Virtuoso
                instance from which DBpedia Linked Data and its
                associated SPARQL endpoint are deployed.

--Regards,


                Kingsley Idehen 
                Founder & CEO
                OpenLink Software
                Home Page:http://www.openlinksw.com
                Community Support:https://community.openlinksw.com
                Weblogs (Blogs):
                Company Blog:https://medium.com/openlink-software-blog
                Virtuoso Blog:https://medium.com/virtuoso-blog
                Data Access Drivers 
Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

                Personal Weblogs (Blogs):
                Medium Blog:https://medium.com/@kidehen
                Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/
                               http://kidehen.blogspot.com

                Profile Pages:
                Pinterest:https://www.pinterest.com/kidehen/
                Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
                Twitter:https://twitter.com/kidehen
                Google+:https://plus.google.com/+KingsleyIdehen/about
                LinkedIn:http://www.linkedin.com/in/kidehen

                Web Identities (WebID):
                
Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
                         
:http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

                _______________________________________________
                DBpedia-discussion mailing list
                [email protected]
                <mailto:[email protected]>
                https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



            _______________________________________________
            DBpedia-discussion mailing list
            [email protected]  
<mailto:[email protected]>
            https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

--All the best,

            Sebastian Hellmann

            Director of Knowledge Integration and Linked Data
            Technologies (KILT) Competence Center
            at the Institute for Applied Informatics (InfAI) at
            Leipzig University
            Executive Director of the DBpedia Association
            Projects: http://dbpedia.org, http://nlp2rdf.org,
            http://linguistics.okfn.org,
            https://www.w3.org/community/ld4lt
            <http://www.w3.org/community/ld4lt>
            Homepage: http://aksw.org/SebastianHellmann
            Research Group: http://aksw.org


            _______________________________________________
            DBpedia-discussion mailing list
            [email protected]  
<mailto:[email protected]>
            https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

--Regards,


            Kingsley Idehen     
            Founder & CEO
            OpenLink Software
            Home Page:http://www.openlinksw.com
            Community Support:https://community.openlinksw.com
            Weblogs (Blogs):
            Company Blog:https://medium.com/openlink-software-blog
            Virtuoso Blog:https://medium.com/virtuoso-blog
            Data Access Drivers 
Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

            Personal Weblogs (Blogs):
            Medium Blog:https://medium.com/@kidehen
            Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/
                           http://kidehen.blogspot.com

            Profile Pages:
            Pinterest:https://www.pinterest.com/kidehen/
            Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
            Twitter:https://twitter.com/kidehen
            Google+:https://plus.google.com/+KingsleyIdehen/about
            LinkedIn:http://www.linkedin.com/in/kidehen

            Web Identities (WebID):
            
Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
                     
:http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

            _______________________________________________
            DBpedia-discussion mailing list
            [email protected]
            <mailto:[email protected]>
            https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



        _______________________________________________
        DBpedia-discussion mailing list
        [email protected]  
<mailto:[email protected]>
        https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

--All the best,

        Sebastian Hellmann

        Director of Knowledge Integration and Linked Data
        Technologies (KILT) Competence Center
        at the Institute for Applied Informatics (InfAI) at Leipzig
        University
        Executive Director of the DBpedia Association
        Projects: http://dbpedia.org, http://nlp2rdf.org,
        http://linguistics.okfn.org,
        https://www.w3.org/community/ld4lt
        <http://www.w3.org/community/ld4lt>
        Homepage: http://aksw.org/SebastianHellmann
        Research Group: http://aksw.org

--All the best,

    Sebastian Hellmann

    Director of Knowledge Integration and Linked Data Technologies
    (KILT) Competence Center
    at the Institute for Applied Informatics (InfAI) at Leipzig University
    Executive Director of the DBpedia Association
    Projects: http://dbpedia.org, http://nlp2rdf.org,
    http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
    <http://www.w3.org/community/ld4lt>
    Homepage: http://aksw.org/SebastianHellmann
    Research Group: http://aksw.org

--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT)Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association

Projects: http://dbpedia.org, http://nlp2rdf.org,http://linguistics.okfn.org, https://www.w3.org/community/ld4lt<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org

_______________________________________________
DBpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [DBpedia-discussion] DBpedia download vs DBPedia SPARQL data

Reply via email to