Immediately after the main conference in our field, SIGIR, which took place this year in Pisa, Emine Yilmaz and Kal Järvelin organised a workshop on Task Based Information Interaction, to which they kindly invited me. It’s aim:
“bringing together researchers with different expertise that are necessary for devising methodologies for evaluating the quality of task based information access systems (ranging from human computer interaction to algorithms for designing task based information access systems) in order to discuss the challenges in the design of task based information access systems.”
This week I attended the Keystone Keyword search in Big Linked Data Training School in Santiago de Compostella, Spain.
Excellently organised by our colleagues at the University of Santiago de Compostella, the school focused on a realtively wide range of topics, including Big Data, NLP, Semantic Web, IR, and other related areas. During the sessions the speakers will explore a large spectrum of current exciting research, development and innovation related to various research areas and society itself. Myself, I spoke about the evaluation procedures we have in IR and considered, together with the attendees, how these may be applied or extended to these other fields. My slides are here, together with those of all the other lectures.
I was particularly impressed by Prof. Asunción Gómez-Pérez’s presentation, on the opportunities and challenges of Linked Data. As she put it, we need not focus only on Linked Open Data, but also work on licensing models for the not-so-open data. To some extent the technology solutions are already here. Business models need to be developed in parallel in order to create a positive cycle. Prof. Gómez-Pérez is Vice-Rector for Research, Innovation and Doctoral Studies and Full Professor at Universidad Politécnica de Madrid (UPM).
Repeatedly in our evaluation campaigns we hit against the problem of resources when generating ground truth. Pooling helps (actually, pooling makes the whole thing possible), but the best way to do pooling is not always obvious. Together with our colleagues at the Queensland University of Technology (Guido Zuccon) and at the Australian e-Health Research Centre (Bevan Koopman) we looked at the best way to generate our pool of documents for the CLEF eHealth 2016 track. Here is our
pre-print draft. At the TUW, the paper is co-authored by Aldo Lipani, myself, and Allan Hanbury.
This week the 38 edition of the Colloquium of the TU Ilmenau on patent information (PatInfo) took place in Ilmenau. Together with our partners Uppdragshuset I had the honour of talking about our collaboration and about the importance of evaluation in IR. Our slides are available here.
This week the conference on Multimedia Systems took place in Klagenfurt, Austria. This was obviously significantly more convenient to reach for me than MMSys 2015.
Of course, I’m here because we have a paper in the dataset track. You can get the PDF of the paper, as well as the slides.
Overall, what it’s really great about this conference is the wide area of application domains. I mean, where do we not find multimedia systems today? Just looking at the dataset track – we have our domain, with the use-case in tourism, but there were also talks about movies in flights, football games, a lot of videos and observations on what people look at when they look at them (in some cases eye-tracking, in other cases using cellphone sensors to identify what the camera is actually pointing at).
Together with Symeon Papadopoulos, Kalina Bontcheva, Eva Jaho, and Carlos Castillo.
From a business and government point of view, there is an increasing need to interpret and act upon information from large-volume media, such as Twitter, Facebook and Web news. However, knowledge gathered from such online sources comes with a major caveat—it cannot always be trusted, nor is it always factual or of high quality. Rumors tend to spread rapidly through social networks, and their veracity is hard to estab- lish in a timely fashion. For instance, during an earthquake in Chile, rumors spread through Twitter that a volcano became active and there was a tsunami warning in Valparaiso [Castillo et al. 2013]. Later, these reports were found to be false…
Together with Aldo Lipani and Allan Hanbury.
Recently, it has been discovered that it is possible to mitigate the Pool Bias of Precision at cut-off (P@n) when used with the fixed-depth pooling strategy, by measuring the effect of the tested run against the pooled runs. In this paper we extend this analysis and test the existing methods on different pooling strategies, simulated on a selection of 12 TREC test collections. We observe how the different methodologies to correct the pool bias behave, and provide guidelines about which pooling strategy should be chosen.
Today I started preparing some materials for my classes and for a workshop to be held later in April. Since January, after the Dagstuhl seminar, I had worked with Jupyter and Python3, connecting to my Solr server via HTTP requests. This was all perfectly fine, so I started by making some slides directly out of my notebooks. Continue Reading…
Last week was the 5th Management Committee meeting of the Keystone COST action, as well as the winter Working Groups meeting. It was a good opportunity to catch up with my colleagues working in related areas (some IR, but many in datatabase and semantic representations). Of the 28 member countries (coincidentally only the number of members of the EU), I think almost 20 were present, including new members, like Albania and Ukraine.
Last week I attended the Dagstuhl seminar on Reproducibility. I’ll let you read the objectives on the main website of the Seminar, rather than repeating them here. Rather, I’ll simply present here my own impressions.