Hi Denny,
On 04.06.19 19:40, Denny Vrandečić wrote:
Thank you, I read now through the paper - great work, congratulations!
I am glad to see DBpedia evolve. I am very much looking forward to see
it move out of Beta.
I am also very glad (and quietly amused) to see you moving to opaque
identifiers. I think that is the right decision.
We see them as Data Management URIs. You can connect to DBpedia Global
ID Clusters and thus gain access to everything else. For semantics and
validation it is strongly advised to keep a local URI from the cluster.
A few questions on the paper:
- the releases for the chapters - the enriched sets - aren't they
basically just a subset of the complete fused set? What is the point
of these, why would anyone prefer the Catalan release over the whole
fused set? I am missing something here.
There are minor variations, but you could say it is a subset. However,
the thing is called FlexiFusion for a reason. Chapters can choose how
they want the data:
1. See Fig 2. the 'Rocket Ice' chart. One can safely assume that quality
is better, if you choose the blue and green part only.
2. Chapters can easily tailor their ID space. So let's say they use
their DBpedia language plus some well-linked, authorative national
datasets and then enrich with everything else. We can tailor national
information infrastructures with it.
- why would you load the English chapter into the main SPARQL endpoint
instead of the fused set?
We are moving towards integration of LOD. So at some point is doesn't
make sense to host ALL data in one place for free. Potentially, we can
FlexiFuse anything with links like throw in Geonames, GND,
WorldFactbook, MusicBrainz in the mix. If it gets really popular, there
is no funding to add 10 or 100 servers to the cluster.
On the other hand, maybe you know some organisation who would donate big
lump sums whenever 'free' hosting is at its limits due to a magnitude in
request increase. This would make a lot of people very happy.
Otherwise it is free, unlimited dump downloads for self-hosting or paid
dedicated hosting.
- in Table 2, the Fusion dataset boasts 66M entities over Wikidata's
45M entities. Where do the 21M more entities come from? Shouldn't most
of the individual Wikipedias' articles be already matched to Wikidata
IDs and thus fused together?
Wikidata only has an ID, if the article exists in at least two
languages. These are the singletons with no equivalent in other languages.
My understanding is that if I want to compare Wikidata to the fused
dataset, I need to download both the mappings and the fused dataset,
and then translate the latter using the former. Or is there some way
for the databus to create me a fused dataset using Wikidata IDs
("canonicalized", as it used to be called) instead of the new DBpedia
IDs? I thought I read something like that in your previous answer, but
I couldn't find that, and am now thinking of just doing it myself.
That sounds a lot like the enriched Wikidata. Yes, we can produce that.
It would look like <wikidata.dbpedia.org/resource/Q64> <dbo:postalcode> "" .
Catalan is a prototype for now. All other sources will come soon,
including Wikidata. Maybe, we will also produce a <wikidata.org> URI
datasets. You actually want the enriched Wikidata-DBpedia. There is no
way to rewrite all 66M entities to WKD-URIs.
We also have two funded projects Energy-Databus and
SupplyChainManagement-Databus, which will tackle bi-directional mappings
of properties and classes. They start in August.
Note:
If you build something to remap properties and classes of the data, we
would be happy, if you could share it via the Databus again:
http://dev.dbpedia.org/Databus_Upload_User_Manual
Could also just be the reverse mapping, this is what we need. Or better
we need infoboxes and their properties to Wikidata. So it would save us
some work. Or you wait some months.
Finally, as Table 4 shows, the Fused dataset has an appreciable win
of +2.16% over Wikidata on a property such as birth date. Would you
consider publishing these diffs under a CC0 license so they could be
provided to Wikidata for consideration and to enrich the source itself?
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE
-> WMF Grant to build the application to assist in this. I don't think
we need to republish the dump. The app will hopefully have the
references too and then it should be easy to import/sync. FlexiFusion
was built with syncing Wikipedia and Wikidata in mind.
Getting traction here is the problem. How would you approach it? Can you
mentor the project?
Cheers,
Sebastian
Cheers,
Denny
On Mon, Jun 3, 2019 at 10:48 PM Sebastian Hellmann
<[email protected]
<mailto:[email protected]>> wrote:
Hi Denny,
On 03.06.19 23:31, Denny Vrandečić wrote:
Thank you a lot for the answer, that is super useful.
I'll see if I can get the canonicalized version recreated :)
One question though, is there a cleaned version of the DBpedia
ontology mapping based data? I only found the uncleaned version.
Two scripts are missing: Type consistency check and Redirect
resolution. We will do them after the Unicode bug.
Do you have any plans when the next release of DBpedia is going
to be available?
These are signed with the public key from
http://webid.dbpedia.org/webid.ttl so they are the next releases.
The structure will stay the same and each release should be a bit
better than the previous one. We are just working on the handful
of issues and a better way to comment on mistakes before we
announce them on all channels.
It is an open platform now. We will have core dataset releases
(including raw data) and then the community can create their own
additions.
https://propi.github.io/webid.ttl will add the LHD dataset and
Heiko Paulheim DBkWik, etc.
If you do any analysis, you can get an account and publish the
data on the bus. Links like
https://databus.dbpedia.org/denny/analysis are stable redirects to
files, just like purl.org <http://purl.org>.
-- Sebastian
On Mon, Jun 3, 2019 at 2:19 PM Sebastian Hellmann
<[email protected]
<mailto:[email protected]>> wrote:
Hi Denny,
you didn't find them really, because they are not yet
publicly released. Please see them as a beta.
The main reason is, that there are a handful of missing
features and a handful of stupid bugs.
One example:
- we discovered a unicode issue in URIs which still allows
valid analysis, but would not allow to load it into
dbpedia.org/sparql <http://dbpedia.org/sparql>
- we built the Databus to have a group changelog and a
dataset/artifact changelog, however, these can only be
changed at release time, so we can not update reported errors
after it was published, like the one above.
It is not hard and marvin did new extractions already:
https://databus.dbpedia.org/marvin , there is just a bit
missing.
i.e. files such as
http://downloads.dbpedia.org/2016-10/core-i18n/de/mappingbased_objects_wkd_uris_de.ttl.bz2
- can you point me where I can find the canonicalized
versions in the new files?
These are discontinued. Instead there is:
https://databus.dbpedia.org/dbpedia/id-management/global-ids
loaded into this webservice:
https://global.dbpedia.org/same-thing/lookup/?uri=http://www.wikidata.org/entity/Q8087
where you can resolve many URIs against clusters.
and the fused and enriched versions as described in
https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf
Flexifusion is more systematic and can rewrite any datasetś
subject with any other subject from the ID management. So we
could produce these datasets any way.
Thanks for these pointers! I have run a few analyses, and
now can rerun them again with the actual current data :) I
expect this to improve DBpedia numbers by quite a bit.
You could also try the fused version:
https://databus.dbpedia.org/dbpedia/fusion This is the one we
are working on most and will aggregate a lot more data in the
future.
I find it all a bit hard to navigate (although Databus has a
few really neat features, thanks for that).
Any feedback welcome, the issue tracker is linked on top of
the website.
Yes, another missing feature. However, we thought that the
pros will just look at the dataid files and then write sparql
queries at https://databus.dbpedia.org/yasgui/
-- Sebastian
On 03.06.19 19:49, Denny Vrandečić wrote:
Oh, wow, thanks Sebastian, thanks Kingsley for the answers!
I was entirely unaware of the DBpedia datasets over at
databus.dbpedia.org <http://databus.dbpedia.org> - when I
search for "dbpedia downloads" that's not where I get to.
Also, when I go to dbpedia.org <http://dbpedia.org> and then
click on "Downloads", I get to the 2016 datasets.
https://wiki.dbpedia.org/Datasets
https://wiki.dbpedia.org/develop/datasets
I honestly thought, that the 2016 dataset is the latest one,
and was rather disappointed. Thank you for showing me that I
was just looking in the wrong place - but I would really
suggest that you update your Websites to point to databus. I
am sure I am not the only one who believes that there has
been no DBpedia update since 2016.
Thanks for these pointers! I have run a few analyses, and
now can rerun them again with the actual current data :) I
expect this to improve DBpedia numbers by quite a bit.
One question, I liked to use the canonicalized versions from
here https://wiki.dbpedia.org/downloads-2016-10, i.e. files
such as
http://downloads.dbpedia.org/2016-10/core-i18n/de/mappingbased_objects_wkd_uris_de.ttl.bz2
- can you point me where I can find the canonicalized
versions in the new files? I find it all a bit hard to
navigate (although Databus has a few really neat features,
thanks for that).
Cheers,
Denny
On Sat, Jun 1, 2019 at 9:43 AM Kingsley Idehen
<[email protected] <mailto:[email protected]>> wrote:
On 6/1/19 5:45 AM, Sebastian Hellmann wrote:
Hi Denny,
* the old system was like this:
we load from here:
http://downloads.dbpedia.org/2016-10/core/
metadata is in
http://downloads.dbpedia.org/2016-10/core/2016-10_dataid_core.ttl
with void:sparqlEndpoint <http://dbpedia.org/sparql>
<http://dbpedia.org/sparql> ;
Hi Sebastian,
I will also have the TTL referenced above loaded to a
named graph so that it becomes accessible from the query
solution I shared in my prior post.
* the new system is here:
https://databus.dbpedia.org/dbpedia
There are 6 new releases and the metadata is in the
endpoint https://databus.dbpedia.org/repo/sparql
Once the collection saving feature is finished, we
will build a collection of datasets on the bus, which
will then be loaded. It is basically a sparql query
retrieving the downloadurls like this:
http://dev.dbpedia.org/Data#example-application-virtuoso-docker
Okay.
Please install the Faceted Browser so that URIs like
http://dev.dbpedia.org/Data#example-application-virtuoso-docker
can also be looked up.
As an aside, here's an Entity Type overview query
results page
<https://databus.dbpedia.org/repo/sparql?default-graph-uri=&query=SELECT+%28SAMPLE%28%3Fs%29+AS+%3Fsample%29+%28COUNT%281%29+AS+%3Fcount%29++%28%3Fo+AS+%3FentityType%29%0D%0AWHERE+%7B%0D%0A++++++++%3Fs+a+%3Fo.+%0D%0A%09%09FILTER+%28isIRI%28%3Fs%29%29+%0D%0A++++++++++++++++FILTER+%28%21+contains%28str%28%3Fs%29%2C%22virt%22%29%29%0D%0A++++++%7D+%0D%0AGROUP+BY+%3Fo%0D%0AORDER+BY+DESC+%28%3Fcount%29&format=text%2Fhtml&timeout=0&debug=on>
for future use etc..
Kingsley
On 31.05.19 21:59, Denny Vrandečić wrote:
Thank you for the answer!
I don't see how the query solution page that you
linked indicates that this is the English Wikipedia
extraction. Where does it say that? How can I tell? I
am trying to understand, thanks.
Also, when I download the set of English extractions
from here,
http://downloads.dbpedia.org/2016-10/core-i18n/en/
particularly this one,
http://downloads.dbpedia.org/2016-10/core-i18n/en/mappingbased_objects_en.ttl.bz2
it is only about 17,467 people with parents, not
20,120, so that dataset seems out of sync with the one
in the SPARQL endpoint.
I am curious where do you load the dataset from?
Thank you!
On Fri, May 31, 2019 at 11:49 AM Kingsley Idehen
<[email protected]
<mailto:[email protected]>> wrote:
On 5/31/19 2:23 PM, Denny Vrandečić wrote:
When I query the dbpedia.org/sparql
<http://dbpedia.org/sparql> endpoint asking for
"how many people with a parent do you know?",
i.e. select (count (distinct ?p) as ?c) where {
?s dbo:parent ?o }, I get as the answer 20,120.
Where among the Downloads on
wiki.dbpedia.org/downloads-2016-10
<http://wiki.dbpedia.org/downloads-2016-10> can I
find the dataset that the SPARQL endpoint
actually serves? Is it the English
Wikipedia-based "Mappingbased" one? Or is t the
"Infobox Properties Mapped"?
Cheers,
Denny
The query solution page
<http://dbpedia.org/sparql?default-graph-uri=&query=prefix+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E+%0D%0A%0D%0Aselect+%3Fg+%28count+%28distinct+%3Fs%29+as+%3Fc%29%0D%0Awhere+%7B+%0D%0A+++++++%0D%0A+++++++++graph+%3Fg+%7B%3Fs+dbo%3Aparent+%3Fo.%7D%0D%0A%0D%0A+++++%7D%0D%0Agroup+by+%3Fg&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on&run=+Run+Query+>
indicates this is the English Wikipedia dataset.
That's what we've always loaded into the Virtuoso
instance from which DBpedia Linked Data and its
associated SPARQL endpoint are deployed.
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Home Page:http://www.openlinksw.com
Community Support:https://community.openlinksw.com
Weblogs (Blogs):
Company Blog:https://medium.com/openlink-software-blog
Virtuoso Blog:https://medium.com/virtuoso-blog
Data Access Drivers
Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs):
Medium Blog:https://medium.com/@kidehen
Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/
http://kidehen.blogspot.com
Profile Pages:
Pinterest:https://www.pinterest.com/kidehen/
Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter:https://twitter.com/kidehen
Google+:https://plus.google.com/+KingsleyIdehen/about
LinkedIn:http://www.linkedin.com/in/kidehen
Web Identities (WebID):
Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
:http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
_______________________________________________
DBpedia-discussion mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
_______________________________________________
DBpedia-discussion mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
All the best,
Sebastian Hellmann
Director of Knowledge Integration and Linked Data
Technologies (KILT) Competence Center
at the Institute for Applied Informatics (InfAI) at
Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org,
http://linguistics.okfn.org,
https://www.w3.org/community/ld4lt
<http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
_______________________________________________
DBpedia-discussion mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Home Page:http://www.openlinksw.com
Community Support:https://community.openlinksw.com
Weblogs (Blogs):
Company Blog:https://medium.com/openlink-software-blog
Virtuoso Blog:https://medium.com/virtuoso-blog
Data Access Drivers
Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs):
Medium Blog:https://medium.com/@kidehen
Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/
http://kidehen.blogspot.com
Profile Pages:
Pinterest:https://www.pinterest.com/kidehen/
Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter:https://twitter.com/kidehen
Google+:https://plus.google.com/+KingsleyIdehen/about
LinkedIn:http://www.linkedin.com/in/kidehen
Web Identities (WebID):
Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
:http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
_______________________________________________
DBpedia-discussion mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
_______________________________________________
DBpedia-discussion mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
All the best,
Sebastian Hellmann
Director of Knowledge Integration and Linked Data
Technologies (KILT) Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig
University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org,
http://linguistics.okfn.org,
https://www.w3.org/community/ld4lt
<http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
--
All the best,
Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies
(KILT) Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org,
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
<http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
--
All the best,
Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT)
Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org,
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
<http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
_______________________________________________
DBpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion