Re: [DBpedia-discussion] DBpedia Open Text Extraction Challenge - TextExt

John Flynn Mon, 06 Mar 2017 13:29:52 -0800

I applaud this initiative to extract triples from Wikipedia open text. However, 
it would be useful to initiate a parallel challenge/effort to represent a 
limited portion of the current Wikipedia article text as semantic 
representation, eliminating the text altogether. In this approach, the 
Wikipedia information would be semantically encoded as its original 
representation, as opposed to using text to represent the information. A small 
subset of Wikipedia subject matter could be used for this experiment. After the 
limited Wikipedia domain of interest was fully semantically represented, tools 
could be developed to translate the semantic representation into human readable 
text. It seems over the long run creating the original knowledge as a semantic 
representation, instead of text, would result in a Wikipedia knowledge base 
that upon query by humans could automatically perform the necessary translation 
into text in whichever human language the user desired. This concept would also 
facilitate machine to machine use of the Wikipedia knowledge base, which is 
currently difficult, if not impossible, due to the textual nature of the 
information. You could also envision tools that would eventually make it easy 
for authors to source the article information directly in semantic 
representation. The end results would be a DBpedia on steroids and the 
eventually elimination of Wikipedia as the original article text sources would 
no longer be needed.
 
John Flynn
http://semanticsimulations.com
 
From: Sebastian Hellmann [mailto:[email protected]] 
Sent: Monday, March 06, 2017 5:56 AM
To: DBpedia
Subject: [DBpedia-discussion] DBpedia Open Text Extraction Challenge - TextExt
 
 
DBpedia Open Text Extraction Challenge - TextExt
Website: http://wiki.dbpedia.org/textext
Disclaimer: The call is under constant development, please refer to the news 
section. We also acknowledge the initial engineering effort and will be lenient 
on technical requirements for the first submissions and will focus evaluation 
on the extracted triples and allow late submissions, if they are coordinated 
with us.


Background

DBpedia and Wikidata currently focus primarily on representing factual 
knowledge as contained in Wikipedia infoboxes. A vast amount of information, 
however, is contained in the unstructured Wikipedia article texts. With the 
DBpedia Open Text Extraction Challenge, we aim to spur knowledge extraction 
from Wikipedia article texts in order to dramatically broaden and deepen the 
amount of structured DBpedia/Wikipedia data and provide a platform for 
benchmarking various extraction tools.

Mission

Wikipedia has become the ubiquitous source of knowledge for the world enabling 
humans to lookup definitions, quickly become familiar with new topics, read up 
background infos for news event and many more - even settling coffee house 
arguments via a quick mobile research. The mission of DBpedia in general is to 
harvest Wikipedia’s knowledge, refine and structure it and then disseminate it 
on the web - in a free and open manner - for IT users and businesses.

News and next events

Twitter: Follow @dbpedia <https://twitter.com/dbpedia> , Hashtag: #dbpedianlp 
<https://twitter.com/search?f=tweets&q=%23dbpedianlp&src=typd> 
·         LDK <http://ldk2017.org/>  conference joined the challenge (Deadline 
March 19th and April 24th) 
·         SEMANTiCS <http://2017.semantics.cc/>  joined the challenge (Deadline 
June 11th and July 17th) 
·         Feb 20th, 2017: Full example added to this website 
·         March 1st, 2017: Docker image (beta) 
https://github.com/NLP2RDF/DBpediaOpenDBpediaTextExtractionChallenge 
Coming soon:
·         beginning of March: full example within the docker image 
·         beginning of March: DBpedia full article text and tables (currently 
only abstracts) http://downloads.dbpedia.org/2016-10/core-i18n/ 

Methodology

The DBpedia Open Text Extraction Challenge differs significantly from other 
challenges in the language technology and other areas in that it is not a one 
time call, but a continuous growing and expanding challenge with the focus to 
sustainably advance the state of the art and transcend boundaries in a 
systematic way. The DBpedia Association and the people behind this challenge 
are committed to provide the necessary infrastructure and drive the challenge 
for an indefinite time as well as potentially extend the challenge beyond 
Wikipedia.
We provide the extracted and cleaned full text for all Wikipedia articles from 
9 different languages in regular intervals for download and as Docker in the 
machine readable NIF-RDF <http://persistence.uni-leipzig.org/nlp2rdf/>  format 
(Example for Barrack Obama in English 
<https://github.com/NLP2RDF/DBpediaOpenDBpediaTextExtractionChallenge/blob/master/BO.ttl>
 ). Challenge participants are asked to wrap their NLP and extraction engines 
in Docker images and submit them to us. We will run participants’ tools in 
regular intervals in order to extract:
1.      Facts, relations, events, terminology, ontologies as RDF triples 
(Triple track)
2.      Useful NLP annotations such as pos-tags, dependencies, co-reference 
(Annotation track)
We allow submissions 2 months prior to selected conferences (currently 
http://ldk2017.org/ and http://2017.semantics.cc/ ). Participants that fulfil 
the technical requirements and provide a sufficient description will be able to 
present at the conference and be included in the yearly proceedings. Each 
conference, the challenge committee will select a winner among challenge 
participants, which will receive 1000€. 

Results

Every December, we will publish a summary article and proceedings of 
participants’ submissions at http://ceur-ws.org/ . The first proceedings are 
planned to be published in Dec 2017. We will try to briefly summarize any 
intermediate progress online in this section.

Acknowledgements

We would like to thank the Computer Center of Leipzig University to give us 
access to their 6TB RAM server Sirius to run all extraction tools.
The project was created with the support of the H2020 EU project HOBBIT 
<https://project-hobbit.eu/>  (GA-688227) and ALIGNED 
<http://aligned-project.eu/>  (GA-644055) as well as the BMWi project Smart 
Data Web <http://smartdataweb.de/>  (GA-01MD15010B).

Challenge Committee

·         Sebastian Hellmann, AKSW, DBpedia Association, KILT Competence 
Center, InfAI, Leipzig
·         Sören Auer, Fraunhofer IAIS, University of Bonn
·         Ricardo Usbeck, AKSW, Simba Competence Center, Leipzig University
·         Dimitris Kontokostas, AKSW, DBpedia Association, KILT Competence 
Center, InfAI, Leipzig
·         Sandro Coelho, AKSW, DBpedia Association, KILT Competence Center, 
InfAI, Leipzig
Contact Email: [email protected]

------------------------------------------------------------------------------
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford

_______________________________________________
DBpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [DBpedia-discussion] DBpedia Open Text Extraction Challenge - TextExt

Reply via email to