Connected Intelligence

Master internship – Creating Semantic Web data from a rich and structured knowledge base

 

Advisors: Antoine Zimmermann, Pierre Maret, José M. Giménez-García
Send application to: antoine.zimmermann@emse.fr; pierre.maret@univ-st-etienne.fr; jose.gimenez.garcia@univ-st-etienne.fr
Location: Laboratoire Hubert Curien, Saint Etienne, France
Team: Connected Intelligence

 

Summary

The topic of this master thesis relates to the domain of knowledge extraction and representation. We are currently interacting with Carnegie Melon University (CMU, United States) and University of San Carlos (UFSCar, Brazil) for the extension of their NELL initiative [1] to the Semantic Web domain. NELL generates a knowledge base from automatically reading and understanding web pages. The aim of the thesis will be to contribute to the design and implementation of the translation of this knowledge base into the RDF format (Semantic Web data) following a number of data and meta-data representations [2].

Challenges related to this “knowledge format translation” are numerous. Which data and meta-data from NELL should be considered? How could the meta-data be represented in the RDF format [3]? How to handle multilingualism [4]? Versioning? Trust? Data provenance? Our team tackles such challenges. The aim of the internship will be (1) to understand the environment and the proposals, and (2) to design, implement, and compare solutions.

The approach will be generic, while the proof of concept may target a specific domain of knowledge and some meta-data from the NELL knowledge base. The work should result in the publication of one or several LOD sets (linked open data sets) generated from NELL and using a given representation model. The ultimate objective would take the form of an online application (and/or an API, a Web service) where users could i) select a domain and some meta-data, ii) select a representation model for Semantic Web data, and iii) generate and publish their Semantic Web data sets. If successful, results will be presented in an international conference and published.

 

Expected results

• Theoretical:
-Ontologies to represent NELL data and meta-data in several RDF representation models.
-Algorithms to transform NELL data and meta-data to the representation models using those ontologies.
• Practical:
-Implementation of the aforementioned algorithms.
-RDF datasets with NELL data and metadata using different metadata representation models.

 

Keywords: Linked Open Data, Meta-Data, Semantic Web, NELL, Knowledge extraction, RDF, Java

 

References:

[1] Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an Architecture for Never-Ending Language Learning. In Ronan Fox and David Poole, editors, Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11-15, 2010. AAAI Press, July 2010.

[2] Antoine Zimmermann, Christophe Gravier, Julien Subercaze, and Quentin Cruzille. Nell2RDF: Read the Web, and turn it into RDF. KNOW@LOD 2013: 2-8

[3] José M. Giménez-García, Antoine Zimmermann, Pierre Maret. NdFluents: A Multi-Dimensional Contexts Ontology. arXiv:1609.07102

[4] Maisa Duarte, Pierre Maret. A new instance of NELL: French NELL. Internal report. 2016