Embedding Earth Mover's Distance in L1 metric space for LSH indexing
Background
" The low-distortion embedding of EMD given in [1] pro-
vides a way to map weighted point sets A and B from the
metric space into the normed space L1 , such that the L1
distance between the resulting embedded vectors is compa-
rable to the EMD distance between A and B themselves.
Working in a normed space is desirable since it allows the
use of fast approximate NN search techniques such as LSH.
The general idea of the embedding is to compute and con-
catenate several weighted histograms of decreasing resolu-
tion for a given point set. " from [2]
The dissimilarity metric (distance) of EMD can be embedded into a simple metric space (L1) with little loss.
Approach
Code sketches
Command-line options
AudioDB requires a method to identify a feature set as statistical (e.g. GMM):
How about:
- audioDB -d foo.db --INSERT -f feature1 --gmm
A database tagged as GMM then follows a different code path, it must:
- not do power normalization on the passed features
- not do L2 normalization on the passed features
- ignore the sequence flag (I think, any ideas?)
- use EMD (exact) or L1-emdedding (approximate)
- enable indexing via L1-embedding for LSH retrieval
Result formats should mirror those for L2 norm spaces
--
MichaelCasey - 17 Sep 2008
- Embedding EMD into L1 Algorithm from [2]: