DATAI Seminars. Course 2023-2024
Monitoring research and innovation from heterogeneous sources using knowledge graphs
11/22/23 / Vanni Zaravella
Knowledge Graphs are machine-readable representations of the information via predicative triples, typically defined by an underlining ontology schema. The recent rise of the Open Science paradigm and advances in Natural Language Processing models has led to the creation of Information Extraction pipelines that can generate large-scale scholarly Knowledge Graphs from scientific publications and patents, enabling advanced 'semantic' services such as fine-grained document classification, retrieval, question answering, and innovation tracking. However, tracking the complex research-industry dynamics of a target technological domain requires also incorporating alternative text sources like news and micro-blogging posts, from where conventional NLP methods and models typically struggle to accurately extract information with high recall. In this talk, we present an enhanced information extraction pipeline tailored to the generation of a knowledge graph comprising open-domain entities from micro-blogging posts. It leverages dependency parsing and classifies entity relations in an unsupervised manner through hierarchical clustering over non-contextual word embeddings. We provide a use case that demonstrates the extraction of semantic triples within the domain of Digital Transformation from X/Twitter.
Resource-Constrained Project Scheduling Problem: A bi-objective approach with time-dependent resource costs
10/25/23 / Laura Anton Sanchez
This talk provides new insights on bi-criteria resource-constrained project scheduling problems. We define a realistic problem where the objectives to combine are the makespan and the total cost for resource usage. Time-dependent costs are assumed for the resources, i.e., they depend on when a resource is used. An optimization model is presented, followed by the development of an algorithm aiming at finding the set of Pareto solutions. The intractability of the optimization models underlying the problem also justifies the development of a metaheuristic for approximating the same front. We design a bi-objective evolutionary algorithm that includes problem-specific knowledge and is based on the Non-dominated Sorting Genetic Algorithm (NSGA-II). The results demonstrate the efficiency of the proposed metaheuristic. In a more recent work, another six multi-objective evolutionary algorithms have been implemented to solve this problem and then, an exhaustive comparison of their performance with the NSGA-II based algorithm has been carried out. A computational and statistically supported study is conducted, using instances built from those available in the literature and applying a set of performance measures to the solution sets obtained by each methodology.