This year was the 2nd International Keystone Conference. Together with Francesco Guerra and Joel Azzopardi, we presented a report on experiments triggered by the discussions during the Marseille meeting in February. Namely, about the combination of explicit semantic information and probabilistic IR methods. Our paper is available here and the slides I used are here.
Also at the conference, my colleague Serwah Sabetghadam presented a part of her PhD thesis, focusing on the information retrieval and search in graph-based models. This is work we did together with Prof. Rauber here at the TU Wien. The paper is available here. The abstracts of the two papers are below.
Last week I was teaching in Saratov, Russia, at the 10th edition of the Russian Summer School on Information Retrieval (RuSSIR). As usual, my favourite topics were part of the set of 5 lectures I gave there: patent search, eHealth, evaluation. The stack of slides were of course updates of slides I had used before, but there was also something completely new: a lecture on credibility in IR, based on the recent survey written with my colleagues in CEA. All materials were made available by the organisers on the website of the school, so you have not only my slides, but also those of all the other lecturers.
Immediately after the main conference in our field, SIGIR, which took place this year in Pisa, Emine Yilmaz and Kal Järvelin organised a workshop on Task Based Information Interaction, to which they kindly invited me. It’s aim:
“bringing together researchers with different expertise that are necessary for devising methodologies for evaluating the quality of task based information access systems (ranging from human computer interaction to algorithms for designing task based information access systems) in order to discuss the challenges in the design of task based information access systems.”
This week I attended the Keystone Keyword search in Big Linked Data Training School in Santiago de Compostella, Spain.
Excellently organised by our colleagues at the University of Santiago de Compostella, the school focused on a realtively wide range of topics, including Big Data, NLP, Semantic Web, IR, and other related areas. During the sessions the speakers will explore a large spectrum of current exciting research, development and innovation related to various research areas and society itself. Myself, I spoke about the evaluation procedures we have in IR and considered, together with the attendees, how these may be applied or extended to these other fields. My slides are here, together with those of all the other lectures.
I was particularly impressed by Prof. Asunción Gómez-Pérez’s presentation, on the opportunities and challenges of Linked Data. As she put it, we need not focus only on Linked Open Data, but also work on licensing models for the not-so-open data. To some extent the technology solutions are already here. Business models need to be developed in parallel in order to create a positive cycle. Prof. Gómez-Pérez is Vice-Rector for Research, Innovation and Doctoral Studies and Full Professor at Universidad Politécnica de Madrid (UPM).
Repeatedly in our evaluation campaigns we hit against the problem of resources when generating ground truth. Pooling helps (actually, pooling makes the whole thing possible), but the best way to do pooling is not always obvious. Together with our colleagues at the Queensland University of Technology (Guido Zuccon) and at the Australian e-Health Research Centre (Bevan Koopman) we looked at the best way to generate our pool of documents for the CLEF eHealth 2016 track. Here is our
pre-print draft. At the TUW, the paper is co-authored by Aldo Lipani, myself, and Allan Hanbury.
This week the 38 edition of the Colloquium of the TU Ilmenau on patent information (PatInfo) took place in Ilmenau. Together with our partners Uppdragshuset I had the honour of talking about our collaboration and about the importance of evaluation in IR. Our slides are available here.
This week the conference on Multimedia Systems took place in Klagenfurt, Austria. This was obviously significantly more convenient to reach for me than MMSys 2015.
Of course, I’m here because we have a paper in the dataset track. You can get the PDF of the paper, as well as the slides.
Overall, what it’s really great about this conference is the wide area of application domains. I mean, where do we not find multimedia systems today? Just looking at the dataset track – we have our domain, with the use-case in tourism, but there were also talks about movies in flights, football games, a lot of videos and observations on what people look at when they look at them (in some cases eye-tracking, in other cases using cellphone sensors to identify what the camera is actually pointing at).
Together with Symeon Papadopoulos, Kalina Bontcheva, Eva Jaho, and Carlos Castillo.
From a business and government point of view, there is an increasing need to interpret and act upon information from large-volume media, such as Twitter, Facebook and Web news. However, knowledge gathered from such online sources comes with a major caveat—it cannot always be trusted, nor is it always factual or of high quality. Rumors tend to spread rapidly through social networks, and their veracity is hard to estab- lish in a timely fashion. For instance, during an earthquake in Chile, rumors spread through Twitter that a volcano became active and there was a tsunami warning in Valparaiso [Castillo et al. 2013]. Later, these reports were found to be false…
Together with Aldo Lipani and Allan Hanbury.
Recently, it has been discovered that it is possible to mitigate the Pool Bias of Precision at cut-off (P@n) when used with the fixed-depth pooling strategy, by measuring the effect of the tested run against the pooled runs. In this paper we extend this analysis and test the existing methods on different pooling strategies, simulated on a selection of 12 TREC test collections. We observe how the different methodologies to correct the pool bias behave, and provide guidelines about which pooling strategy should be chosen.
Today I started preparing some materials for my classes and for a workshop to be held later in April. Since January, after the Dagstuhl seminar, I had worked with Jupyter and Python3, connecting to my Solr server via HTTP requests. This was all perfectly fine, so I started by making some slides directly out of my notebooks. Continue Reading…