7-相似性度量和性能评价
基于文本的检索方法采用的是文本的精确匹配, 而基于内容的图像检索则通过计算查询和候选图像 之间视觉特征的相似度来完成的。
13
相似性度量方法
Suppose we have four stars objects as shown in the figure below. Which ones of them are similar? Which ones of them are different?
作业: 编写一个检索系统
使用颜色、纹理、形状中的至少两种特征(不局限于 课堂讲授的特征) 最多三人一组
1
数据集
每个文件夹下约500张图片
2
airplane
3
Cheetah (猎豹)
4
tiger
5
grass
6
Query image
7
提交的作业
源代码和可以直接运行的程序 报告:检索中使用的特征、融合方法、相似性 度量、检索性能、结果分析、做这个检索系统 的感受
两个向量p、q之间的切比雪夫距离可以按照下式计算:
DChebyshev (p, q ): max (|pi qi|)
i
This equals the limit of the Lp metrics:
( | pi qi | )1/ k lim k
i 1
31
n
k
5. 切比雪夫距离(Chebyshev distance)(2)
SU 2 0 15 0 5 50 0 3 0 10 0
|X1-X2| 10 15 0 30 40 0 3 0 20 2
10 0 0 35 10 0 0 0 30 2
D
M 1 ,2
X i1 X i 2
i 1
s
120
TOTAL
87
83
120
3. Euclidean Distance(1)
SU 2 0 15 0 5 50 0 3 0 10
(X1-X2)2 100 225 0 900 1600 0 9 0 400
10 0 0 35 10 0 0 0 30
D
E 1 ,2
X
i 1
s
i1
Xi2
2
3238
56.9
Species 9 Species 10
2
0
4
TOTAL
50
0
0
10
20
30
40
50
0.32
0
0.35 0.3 0.25 0.2 0.2 0.15 0.15 0.1 0.05 0 0.1
0.59
0.25
0.60
0.4 0.35 0.3 0.25
0.21
0.39
0.25
0.43
0.2
0.2
0.15
0.15
0.1
0.1
0.05 0.05 0 0
0.05
0
10
2
2
y
15 10
SU 1 x
SU 3
5
0 0 5 10 15 20 25 30 35 40
Species 1 Abundance
3.欧氏距离(3)
easily generalizes to an s-dimensional species space
D
E jk
X
s i 1
ij
X ik
3.11
0.2
0.2
0.15
0.15
0.1
0.1
0.05 0.05 0 0
0.05
0
10
20
30
40
50
0
10
20
30
40
50
0
10
20
30
40
50
0
0
10
20
30
40
50
2. Example calculation of Manhattan Distance(4)
SU 1
Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10
The Chebyshev distance between two spaces on a chess board gives the Minimum number of moves a king requires to move between them. This is because a king can move diagonally, so that the jumps to cover the smaller distance parallel to a rank or column is effectively absorbed into the jumps covering the larger. Above are the Chebyshev distances of each square from the square f6. a 5 5 5 5 5 5 5 5 a b 4 4 4 4 4 4 4 5 b c 3 3 3 3 3 3 4 5 c d 2 2 2 2 2 3 4 5 d e 2 1 1 1 2 3 4 5 e f 2 1 1 2 3 4 5 f g 2 1 1 1 2 3 4 5 g h 2 2 2 2 2 3 4 5 h
17
1.Hamming distance-L0_norm
Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. It measures the minimum number of substitutions required to change one string into the other, or the number of errors that transformed one string into the other.
16
各种距离和相似性度量
1. 海明距离 (Hamming distance)0-norm 2. 街区距离(Manhattan Distance) 1-norm 3. 欧氏距离(Euclidean Distance) 2-norm 4. Minkowski distance p-norm distance 5. 切比雪夫距离(Chebyshev distance)The infinity norm 6. 直方图相交 7. 卡方距离 8. 余弦距离(Cosine similarity) 9. 相关系数(Pearson‘s correlation) 10.K-L散度 11.二次式距离 12.马氏距离(Mahalanobis Distance) 13. Earth mover’s distance
D X ij X ik
M jk i 1
s
for binary (presence) data:
SU 2
D bc
22
M jk
Present Present SU 1 Absent
Absent
a c
b d
0.35 0.3 0.25 0.2
0.4 0.35
0.25
0.25
0.2 0.3 0.25 0.2 0.15 0.15 0.1 0.05 0 0.1 0.05 0.05 0 0 0.1 0.15
You may say that star A is similar to star C. Star A, B and C has the same size, while star A, C and D has the same color. Size and color are examples of features that can be measure.
87
83
3238
0.35 0.3 0.25 0.2
0.4 0.35
0.25
0.25
0.2 0.3 0.25 0.2 0.15 0.15 0.1 0.05 0 0.1 0.05 0.05 0 0 0.1 0.15
0.2
0.15
0.1
0.05
0
10
20
30
40
50
0
10
20
30
40
50
0
10
20
30
40
30
SU 2
25
Species 2 Abundance
20
15
10
SU 1
5
SU 3
0 0 5 10 15 20 25 30 35 40
Species 1 Abundance
3. Pythagorean Theorem(勾股定理)(2)
30
SU 2
25
Species 2 Abundance
ห้องสมุดไป่ตู้20
x y
14
Why do we need to measure Similarity?
区分一个物体和另外一个物体 对物体聚类、分组
分析每组的行为、特征 图像检索、物体分类
简化数据表示 挖掘数据结构信息
15
Distance的分类
Similarity and dissimilarity can be measured for two objects based on several features variables. Depending on the measurement scale of the features variable, similarity and dissimilarity (distance) can be determined. After the distance or similarity of each variable is determined, we can aggregate all features variables together into single Similarity (or dissimilarity) .