Datenmanagement - virtuelle Information
Bestimmt kennt jeder diese Situation: Man gibt bei Google einen Suchbegriff ein und wird überhäuft mit Unmengen an Daten zu diesem Thema. Ob die Information qualitativ hochwertig ist, ob sie vollständig ist und wie man diese Daten am besten einordnet, ist nicht immer ganz einfach herauszufinden. Fariz Darari, Doktorand an der Fakultät für Informatik der Freien Universität Bozen untersuchte diese Frage nach Datenqualität- und Datenvollständigkeit. Mit seiner Arbeit Managing and Consuming Completeness Information for RDF Data Sources gewann er den diesjährigen Preis für die weltweit beste Doktorarbeit auf dem Forschungsgebiet des Semantic Web. Dabei untersuchen Informatiker, wie man Computer dazu bringt, Webseiten zu verstehen und intelligent zu verarbeiten.
Unter der Leitung von Professor Werner Nutt entstand die Arbeit im Rahmen des Projekts MAGIC — Managing the Completeness of Data, das von der Provinz gefördert wurde. Mitarbeiter in der Informatik-Abteilung der Provinz hatten Professor Nutt auf das Problem aufmerksam gemacht, Entscheidungen auf unvollständige Daten zu stützen. Die Idee, das Projekt auf Daten von Webseiten auszuweiten, entstand, als Peter Patel-Schneider, eine Größe auf dem Gebiet des Semantic Web, aus den USA zu Besuch in Bozen war und darauf hinwies, das Internet-Daten auch oft unvollständig sind. „Fariz hat bei mir an einer Vorlesung über Grundlagen von Datenbanken teilgenommen und sich sehr dafür interessiert, weswegen ich ihn als Masterstudenten in ein Forschungsteam aufgenommen habe“, erinnert sich Werner Nutt. „Für seine Masterarbeit habe ich ihm dann das Thema der Datenvollständigkeit auf dem Semantic Web genannt. Dieses hat er schließlich weiterverfolgt für das Doktorat und dort entsprechend vertieft.“
Das Doktorat (PhD), das Fariz absolvierte, war ein gemeinsames Projekt zwischen der Uni Bozenund der TU Dresden in Deutschland. Heute arbeitet Fariz als Assistenzprofessor an der Universitas Indonesia in Djakarta, der renommiertesten Universität in Indonesien. Über die Relevanz seiner Forschungsergebnisse, seiner Zeit an der unibz und den Forschungsprozess erzählt Fariz Darari im Interview mit Salto.bz.
Dear Mr. Darari, you won the Semantic Web Science Association award 2018. Congratulations for that! What exactly does this prize award and what does it mean for you?
The award recognizes the PhD dissertation in the Semantic Web (that is, a branch of Computer Science that deals with how to best present, organize, and integrate data on the Web) from 2017 with the highest originality, significance, and impact. I feel honored and blessed to be awarded the 2018 SWSA Distinguished Dissertation Award. To me, the award also recognizes the invaluable support from people around me during my PhD studies, especially my supervisors, Prof. Werner Nutt and Prof. Sebastian Rudolph, and my colleague, Dr. Simon Razniewski.
You did your PHD at the University of Bozen-Bolzano. Why did you choose it and how would you assess the years spend here?
Before my PhD, I did a joint Master’s programme in Bolzano and Dresden, Germany. The vast learning experience I gained during my Master’s there was the most important reason why I chose Bolzano as a joint university for my PhD studies. To me, the Free University of Bozen-Bolzano provides an ideal environment for both studying and conducting research. The city of Bolzano and its mountainous surroundings are a great bonus!
How did you develop your interest in the field of data managing? Can you briefly explain what this field covers?
Data is everywhere, and on the Web, it is growing faster than ever before. IBM reported that an enormous amount of 2.5 quintillion bytes of data is being created every day. Now the question arises as to how we can manage such data? One of the data management aspects is data quality, that is, how to distinguish between good data and poor data. Data quality has now become increasingly important, in particular due to the growing culture of data-driven decision making. When the quality of data is vague and unreliable, any decisions taken from the data would become unreliable, too. In other words, to make informed decisions, one has to also be well-informed about the (quality of) data. This situation motivates me to conduct research studies in the field of (Semantic) Web data quality management.
The title of your winning dissertation is „Managing and Consuming Completeness Information for RDF Data Sources“. What exactly is it about?
If someone googles for the children of Joko Widodo (the current President of Indonesia), the information returned is: Kaesang Pangarep, Gibran Rakabuming Raka, and Kahiyang Ayu. To people who are not familiar with Indonesia, it might be unclear whether the Google query has already returned all the children, or not. Yet to most of the Indonesian people, it is clear that those three are all the children of Joko Widodo. To the best of my knowledge, such knowledge about data completeness is actually a thing that even Google has not addressed yet. My dissertation deals with how to annotate (Semantic) Web data with completeness information, and how to consume such information in a novel way. When data is stated to be complete, any conclusions drawn using that data will become more reliable, since now we are certain that we have considered all required data in making the conclusions. So, in the above case of the children of Joko Widodo, when the data has been stated to be complete, we can now answer questions like “How many children does Joko Widodo have?” in a more reliable way (that is, there are three children, and there cannot be other children of Joko Widodo, as guaranteed by the completeness annotation). Additionally, my dissertation investigates how to efficiently implement data completeness management in practice, and studies how proof-of-concept systems for completeness management can be developed, one of which is COOL-WD (http://cool-wd.inf.unibz.it/).
Did you face significant difficulties during your research process? What was the biggest challenge and how did you overcome it?
One of the biggest challenges during my PhD studies was the work-life balance. Time management was also one of the most important lessons I learned during my PhD. My self-advice back then was to focus, focus, and focus while at work, but don’t forget also to enjoy the (results of the) work, and life.
Can you shortly explain the significant results that you have obtained in your work and how it can be developed further?
The obtained results would serve as a little but crucial step toward a Web with not only a huge amount of data, but also a huge amount of quality data. I am foreseeing that more and more decision makers become more aware and more demanding to the quality of data needed for their tasks. I would also be interested to see how our data management framework can be applied more in real-world applications in various domains such as healthcare, digital library, and e-commerce.
Now you work as assistance professor at the University of Indonesia in Djakarta. What are you plans for the future? Do you plan to come back to Bolzano?
I plan to be a better lecturer, and a better researcher. I hope to connect and collaborate more with researchers around the world (and yes, including researchers in Bolzano). I would love to visit Bolzano again someday. I miss the pizzas and going for a hike in the Dolomites.