2nd Edition of Current Challenges in Patent Information Retrieval


Finally, after four years of efforts, the second edition of the Current Challenges in Patent Information Retrieval book has arrived. Just got my hardcopy yesterday. What a pleasure to hold! And to read, of course.
Before writing more about it, a big thank you to my co-editors, in particular to Katja Mayer, and to all the contributing authors.
Continue Reading…

EPO Patent Information Conference


The EPOPIC – the Patent Information Conference organised by the European Patent Office is the premier patent information event organised yearly in Europe. It draws just over 300 participants, the vast majority from the industry (actually, i only saw 3 participants from universities, including myself). Nevertheless, it was extremely interesting to see what the current systems are already able to offer their users.
Continue Reading…

Keystone Conference


This year was the 2nd International Keystone Conference. Together with Francesco Guerra and Joel Azzopardi, we presented a report on experiments triggered by the discussions during the Marseille meeting in February. Namely, about the combination of explicit semantic information and probabilistic IR methods. Our paper is available here and the slides I used are here.

Also at the conference, my colleague Serwah Sabetghadam presented a part of her PhD thesis, focusing on the information retrieval and search in graph-based models. This is work we did together with Prof. Rauber here at the TU Wien. The paper is available here. The abstracts of the two papers are below.

Continue Reading…

RuSSIR 2016


Last week I was teaching in Saratov, Russia, at the 10th edition of the Russian Summer School on Information Retrieval (RuSSIR). As usual, my favourite topics were part of the set of 5 lectures I gave there: patent search, eHealth, evaluation. The stack of slides were of course updates of slides I had used before, but there was also something completely new: a lecture on credibility in IR, based on the recent survey written with my colleagues in CEA. All materials were made available by the organisers on the website of the school, so you have not only my slides, but also those of all the other lecturers.
Continue Reading…

Task Based Information Interaction Workshop

Immediately after the main conference in our field, SIGIR, which took place this year in Pisa, Emine Yilmaz and Kal Järvelin organised a workshop on Task Based Information Interaction, to which they kindly invited me. It’s aim:

“bringing together researchers with different expertise that are necessary for devising methodologies for evaluating the quality of task based information access systems (ranging from human computer interaction to algorithms for designing task based information access systems) in order to discuss the challenges in the design of task based information access systems.”

Continue Reading…

Keyword search in Big Linked Data Training School


This week I attended the Keystone Keyword search in Big Linked Data Training School in Santiago de Compostella, Spain.

Excellently organised by our colleagues at the University of Santiago de Compostella, the school focused on a realtively wide range of topics, including Big Data, NLP, Semantic Web, IR, and other related areas. During the sessions the speakers will explore a large spectrum of current exciting research, development and innovation related to various research areas and society itself. Myself, I spoke about the evaluation procedures we have in IR and considered, together with the attendees, how these may be applied or extended to these other fields. My slides are here, together with those of all the other lectures.

I was particularly impressed by Prof. Asunción Gómez-Pérez’s presentation, on the opportunities and challenges of Linked Data. As she put it, we need not focus only on Linked Open Data, but also work on licensing models for the not-so-open data. To some extent the technology solutions are already here. Business models need to be developed in parallel in order to create a positive cycle. Prof. Gómez-Pérez is Vice-Rector for Research, Innovation and Doctoral Studies and Full Professor at Universidad Politécnica de Madrid (UPM).

The Impact of Fixed-Cost Pooling Strategies on Test Collection Bias

Repeatedly in our evaluation campaigns we hit against the problem of resources when generating ground truth. Pooling helps (actually, pooling makes the whole thing possible), but the best way to do pooling is not always obvious. Together with our colleagues at the Queensland University of Technology (Guido Zuccon) and at the Australian e-Health Research Centre (Bevan Koopman) we looked at the best way to generate our pool of documents for the CLEF eHealth 2016 track. Here is our
pre-print draft. At the TUW, the paper is co-authored by Aldo Lipani, myself, and Allan Hanbury.

MMSys 2016


This week the conference on Multimedia Systems took place in Klagenfurt, Austria. This was obviously significantly more convenient to reach for me than MMSys 2015.

Of course, I’m here because we have a paper in the dataset track. You can get the PDF of the paper, as well as the slides.

Overall, what it’s really great about this conference is the wide area of application domains. I mean, where do we not find multimedia systems today? Just looking at the dataset track – we have our domain, with the use-case in tourism, but there were also talks about movies in flights, football games, a lot of videos and observations on what people look at when they look at them (in some cases eye-tracking, in other cases using cellphone sensors to identify what the camera is actually pointing at).

Overview of the Special Issue on Trust and Veracity of Information in Social Media

Together with Symeon Papadopoulos, Kalina Bontcheva, Eva Jaho, and Carlos Castillo.

From a business and government point of view, there is an increasing need to interpret and act upon information from large-volume media, such as Twitter, Facebook and Web news. However, knowledge gathered from such online sources comes with a major caveat—it cannot always be trusted, nor is it always factual or of high quality. Rumors tend to spread rapidly through social networks, and their veracity is hard to estab- lish in a timely fashion. For instance, during an earthquake in Chile, rumors spread through Twitter that a volcano became active and there was a tsunami warning in Valparaiso [Castillo et al. 2013]. Later, these reports were found to be false…