Invited speakers

Santosh S. Vempala (Georgia Tech)

Effective Principal Component Analysis

Principal Component Analysis (PCA) is arguably the most widely used technique on high-dimensional or large data sets. This is despite the fact that for standard applications (finding nearest neighbors, clustering, learning etc.), it is easy to build examples on which PCA *fails*. In this lecture, we discuss some problems where the performance of PCA is provably near-optimal, and no other method is known to have similar guarantees. The problems include: (a) unraveling a mixture of unknown Gaussians and (b) learning a function of an unknown subpsace. On the way, we describe extensions of standard PCA that are noise-resistant, affine-invariant and use higher moments, thus addressing some of the know drawbacks of standard PCA.

Pavel Zezula (Masaryk University in Brno, Czech Republic)

Future Trends in Similarity Searching

Similarity searching has been a research issue for many years, and searching has probably become the most important web application today. As the complexity of data objects grows, it is more and more difficult to reason about digital objects otherwise than through the similarity. In this article, we first discuss the concepts of similarity and searching in light of future perspectives before a concise history of similarity searching technology is presented. We use the historical knowledge to extend the trends to future. We analyze the bottlenecks of application development and discuss perspectives of search computing for future applications. We also present a model of search technology and its position in computer clouds for application development. Finally, execution platforms for multi-modal findability and security for outsourced similarity searching are suggested as important research challenges.