Optimizing Fair Approximate Nearest Neighbor Searches using Threaded B+-Trees
Omid Jafari, Preeti Maurya, Khandker Mushfiqul Islam and Parth Nagarkar
Similarity search in high-dimensional spaces is an important primitive operation in many diverse application domains. Locality Sensitive Hashing (LSH) is a popular technique for solving the Approximate Nearest Nearest Neighbor (ANN) problem in high-dimensional spaces. Along with creating fair machine learning models, there is also a need for creating data structures that target different types of fairness. In this paper, we propose a fair variant of the ANN problem that targets Equal opportunity in group fairness in the ANN domain. We formally introduce the notion of fair ANN for Equal opportunity in group fairness. Additionally, we present an efficient disk-based index structure for finding Fair approximate nearest neighbors using Locality Sensitive Hashing (FairLSH ). Moreover, we present an advanced version of FairLSH that uses cost models to further balance the trade-off between I/O cost and processing time. Finally, we experimentally show that FairLSH returns fair results with a very low I/O cost and processing time when compared with the state-of-the-art LSH techniques.
