Hello!
My name is Andrey Pechenezhskiy, I have been studying at the Perm State
University, Russia for six years. My research interests lie in the fields
of NLP and web-mining for Competitive Intelligence tasks.
I am interested in the table extractor project
<http://wiki.dbpedia.org/ideas/idea/59/the-table-extractor/> that aims to
extract data hidden in tables because I have experience in the web content
mining and Scala. I have studied the code of the soccer extractor
<https://bitbucket.org/tsiteam/soccer-extractor> which parses a Wikipedia
template “CarrieraSportivo” and composes an RDF graph. I decided to
continue working with the football domain on the first step because this
domain contains many different tables. I have been researching the
Wikipedia templates that formats tables. Then I have worked with the
extraction-framework and found some infobox mappings for the table
templates that could be useful.
I think the project will be based on the systematization of hypothesis
testing results. So, have the result of the project to be a new table
extractor in extractor-framework and should intermediate work of the
project be Scala scripts without extraction-framework?
I will continue to explore extractor-framework, Wikipedia tables, and
articles about the table extraction, then I will write the draft of the
proposal and will implement extractor for some table templates in Scala. I
will appreciate if you invited me in the DBpedia #gsoc slack channel or
give me some suggestions.
Thanks in advance!
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc