Special Sessions – SISAP 2021

For SISAP 2021, we call for contributions for the following two special sessions:

Special session papers will supplement the regular research papers and be included in the proceedings of SISAP 2021, which will be published by Springer as a volume in the Lecture Notes in Computer Science (LNCS) series.

Special session submissions may include vision/position papers, which will be evaluated based on the quality of the arguments and ideas proposed in the papers. In order to ensure high quality of the conference papers, all papers submitted to special sessions will be peer-reviewed, including papers solicited by the special session chairs. If a special session has many high-quality submissions, some of the submissions may potentially be moved to regular sessions; likewise, relevant accepted submissions may be moved into special sessions.

Similarity Search in Graph-Structured Data (SISEG)

Organized by Nils Kriege, University of Vienna, Austria

Graphs are a versatile data structure for representing structured objects such as molecules in computational drug discovery [4, 1], proteins and their interaction networks in bioinformatics [6], scene graphs in computer vision, social networks, and knowledge graphs. Analyzing the large amounts of data in these domains often involves defining a similarity measure on graphs (or nodes). The complex structure and heterogeneity of the graphs appearing in real-world applications makes their comparison challenging. Various concepts for this have been proposed including general-purpose and domain-specific approaches. These can be divided into techniques based on graph matching and graph embedding. Graph matching refers to methods finding a mapping between the nodes of two graphs that preserves the adjacency structure. Widely used methods are based on the graph edit distance [5, 3], the maximum common subgraph problem [1], and network alignment [6]. Similarity search regarding measures from graph matching is challenging, since the underlying graph problems are typically NP-hard, and the properties of a metric are not necessarily satisfied but depend on subtle differences in definition. Graph embedding methods map graphs into a vector space, such that similar graphs are mapped to close vectors. This renders standard approaches possible for graph data and significantly simplifies the problem of similarity search. Various general-purpose embedding techniques exist including graph kernels [2] and graph neural networks [7] developed for machine learning with graphs. Moreover, application-specific techniques have a long history, e.g., chemical fingerprints for representing molecules [4]. With the growing amount of available heterogeneous graph-structured data, similarity search becomes increasingly important.

Topics of interest for this special session include, but are not limited to:

Submission

Papers submitted to this special session must follow the regular paper submission and author guidelines of SISAP 2021 (please check out the submission guidelines). Papers will be submitted in PDF format through EasyChair; please be sure to select “Special Session: Similarity Search in Graph-Structured Data” in the appropriate field of the submission form.

References

[1] O. Koch, N. M. Kriege, and L. Humbeck. Chemical similarity and substructure searches. In Encyclopedia of Bioinformatics and Computational Biology - Volume 2, pages 640–649. Elsevier, 2019. doi: 10.1016/ b978-0-12-809633-8.20195-7.

[2] N. M. Kriege, F. D. Johansson, and C. Morris. A survey on graph kernels. Applied Network Science, 5 (1):6, 2020. doi: 10.1007/s41109-019-0195-3.

[3] Z. Qin, Y. Bai, and Y. Sun. Ghashing: Semantic graph hashing for approximate similarity search in graph databases. In R. Gupta, Y. Liu, J. Tang, and B. A. Prakash, editors, KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, pages 2062–2072. ACM, 2020. doi: 10.1145/3394486.3403257.

[4] D. Rogers and M. Hahn. Extended-connectivity fingerprints. J. Chem. Inf. Model., 50(5):742–754, May 2010. doi: 10.1021/ci100050t.

[5] A. Sanfeliu and K. S. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 13(3), 1983.

[6] R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences, 105 (35):12763–12768, 2008. ISSN 0027-8424. doi: 10.1073/pnas.0806627105.

[7] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu. A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst., 32(1):4–24, 2021. doi: 10.1109/TNNLS.2020. 2978386.


Semantics-Based Search

Organized by Richard Connor, University of St Andrews, Scotland, UK Alan Dearle, University of St Andrews, Scotland, UK, Lia Morra, Politecnico di Torino, Turin, Italy, Lucia Vadicamo, CNR Pisa, Pisa, Italy

Most work in similarity search takes place in a domain far distant from the actual semantics of a space. Given a metric and a set of values, there is a wealth of research in how to efficiently perform search. However almost without exception, the aspect of mapping the results to the “real” closeness semantic the metric is intended to model is ignored. In part, this is often due to the lack of an unbiased ground truth, which is extremely difficult to establish in large collections. Within this context, large scale near-duplicate detection provides a realistic and challenging task on which different techniques can be compared. On one hand, it entails a subtle differentiation between actual near duplicates, and image pairs which are visually similar, but not semantically related. On the other hand, since the number of image pairs grows quadratically with the size of the collection, it requires both effective and computationally efficient search techniques.

MirFlickr1M is a collection of one million images. Collected for research purposes as a benchmark for image tagging, the original selection of images was guided by certain factors, but no checking for similarity was performed at the time. By chance, however, a large number of near-duplicate clusters do occur, along with a small set of identical image clusters. These similar images exist for “natural” reasons. As an example, some are images of the same highly predictable subject and context (for example the moon); some are alterations of others within the collection after cropping, re-hueing etc., some are subsequent shots taken in quick succession from a single camera, etc.

In the MirFlickr Near Duplicate (MFND) dataset, around 10,000 known similar clusters have been identified and checked by the proposers of this Session. There is strong statistical evidence that this is the large majority of all the near-duplicate images within the set. The remainder of the collection therefore contains over 10 11 visually similar, but not semantically related, image pairs. Hence, differentiating actual near-duplicate images from visually similar images provides a subtle and challenging task.

Purpose of the Session

The main purpose of this session is to allow fair and immediate comparisons among different techniques with respect to the well-defined semantic ground truth. Near-duplicate image detection is an important similarity task in its own right. More generally however the provision of a semantic ground truth over a large collection allows comparison of techniques in a domain where scalability is crucial, as judged by the semantic outcome. For example one aspect, usually unmentioned in similarity research, is the importance of false positive results: the number of objects in a collection which are similar to a given query is very unlikely to be uniform, and most research completely fails to address this issue.

We are not proposing a challenge to find the best technique, but only a framework in which different techniques may be meaningfully compared. For example, a very efficient technique with relatively low recall but a low false positive rate may, for some tasks, be more useful than a less scalable one with better recall, or indeed than a faster outcome with more false positives. However any of these three may be most appropriate for a given context.

Research topics that could benefit or be applied to the MFND benchmark include, but are not limited, to:

Submissions providing novel insights on the population characteristics of the MFND collection, or existing similar collections, are also welcome.

Facilities available

The MirFlickr image collection is published and available, along with a set of image representations such as MPEG7 features, edge and texture histograms, GIST and SIFT. The identities of duplicate and two classes of near-duplicate images are available from the proposers. We also have representations of the FC6 layer derived from AlexNet, as well as the last convolutional layers of ResNet152 and VGG16, all of which will be made available for authors who wish to contribute to the session. Java, MatLab and Python code to produce ROC analyses based on distances can also be provided.

Submission

Papers submitted to this special session must follow the regular paper submission and author guidelines of SISAP 2021 (please check out the submission guidelines). Papers will be submitted in PDF format through EasyChair; please be sure to select “Special Session: Semantics-Based Search” in the appropriate field of the submission form.

Important Dates