3rd international conference
on similarity search and applications
Istanbul, Turkey - September 18-19, 2010
3rd International Conference on SImilarity Search and APplications (SISAP 2010)
Tomáš Skopal, Charles University in PragueWhere Are You Heading, Metric Access Methods? A Provocative Survey.
In this talk the impact of the metric indexing paradigm on the real-world applications will be discussed. We pose questions whether the priorities and measures of metric access methods established in the past two decades reflect the actual needs of practitioners. In particular, we will ask and investigate the following questions: Are the established performance measures - the number of distance computations and IO cost - still relevant? Isn't the entire metric space model too general when the majority of real-world applications use vector spaces under Lp distances? On the other hand, isn't the metric model too restrictive with respect to the growing community of practitioners using non-metric distances? Isn't the assumption on separation of the model from the retrieval method (i.e., separation of distance function) too artificial and annoying for the practitioners? Are the established similarity queries - range and kNN queries - competitive to other means of content-based retrieval? Have the real-world similarity search engines ever used a general metric access method, or do they use distance-specific indexing? Is there even a real demand for content-based similarity search or will the annotations and keyword search win? We will try to give answers to these questions, investigating relevant literature and search engines. Finally, we try to transform the answers into research challenges and perspectives of the similarity search methods.
Vladimir Pestov, University of OttawaIndexability, Concentration, and VC Theory.
We will discuss links between similarity search in metric spaces, the phenomenon of concentration of measure on high-dimensional structures, and the Vapnik-Chervonenkis theory of statistical learning. On the negative side, we will derive lower bounds on the performance of some well-known indexing schemes such as pivots and metric trees used to index into datasets randomly drawn from high-dimensional distributions. Then we will discuss the curse of dimensionality conjecture for the cell probe model, which is still an open problem. On the positive side, we will outline some approaches to intrinsic dimension, and then show how statistical learning methods can be used to build indexing schemes in datasets drawn from distributions of lower intrinsic dimension. A major example here is the approximate NN search algorithm by Kushilevitz, Ostrovsky and Rabani. We further discuss the fundamental importance of pivots, and present some results on efficient pivot selection using dissipation of measure and/or VC theory.