当前位置:文档之家› 深度迁移度量学习

深度迁移度量学习


J
W (m)
2 Nk1
N i 1
N
P (L h (m) (m1)T ij ij i j 1
L h (m) (m1)T ji j
)
2
Nk2
N i 1
N
Q (L h (m) (m1)T ij ij i j 1
L h (m) (m1)T ji j
)
2
(1 Nt
L h Nt (m) (m1)T ti ti i 1
1 Ns
Ns
L h (m) (m1)T si si
)
2
i 1
W (m)
DMTL
the updating equations for all layers
1 ≤ m ≤ ℓ − 1 are computed as follows :
L(M ) ij
(hi(M ) h(jM ) ) e
' ( zi(M ) )
Deep Transfer Metric Learning
Introduction
Problems
Metric learning
Deep learning
Deep Metric learning
DSTML
Linear feature space
explicit nonlinear mapping functions
Deep network with three layers (M = 2), and neural nodes from bottom to top layer are set as: 500→400→300. The nonlinear activation function:The tanh function
DMTL
Then, W (m) and b(m) can be updated by using the gradient
descent algorithm as follows until convergence:
W (m)
W (m)ຫໍສະໝຸດ J W (m)b(m)
b(m)
J b(m)
DMTL
STEP:
z(m) i
W
h (m) (m1) i
b(m)
L(M ) ji
(h(jM ) hi(M ) ) e
'
(
z
(M j
)
)
L( m ) ij
(W
( m1)T
L( m 1) ij
)
e
' ( zi(m) )
L( m ) ji
(W
( m1)T
L( m 1) ji
)
e
'
(
z
(m j
)
)
L(M ) ti
the output of all layers as much as well.
M 1
min J J (M ) w(m)h(J (m) (m) ),
f (M )
m1
discriminative information from the output of all layers.
J (m)
criterion
D(m) ts
(
X
t
,
X
s
)
1 Nt
Nt i 1
f
(m) ( xti
)
1 Ns
Ns i 1
f (m) (xsi )
2 2
We formulate DTML as the following optimization problem:
min J
f (M )
M
Sc( M
)
Sb(M
The nonlinear activation function:The tanh function
Person ReIdentification
Person ReIdentification
learn feature representations
nonlinear relationship
kernel trick h, W, b
training and test samples
DTML
same Transfer learning
DTML
DML
DML
The nonlinear mapping function can be explicitly obtained:
f (m) (x) h(m) (W h (m) (m1) b(m) ) R p(m) is a nonlinear activation function which operates component-wisely.
For the first layer, we assume h(0) x .
Face Verification
Person ReIdentification
Person ReIdentification
Source domain Target domain
color and texture histograms
color and texture histograms
DTML
intra-class variations are minimized
inter-class variations are maximized
Transfer learning
Deep learning
DSTML
For exploiting discriminative information from
)
D(M ts
)
(
X
t
,
Xs)
( W (m)
m1
2 F
b(m)
2
)
2
stochastic sub-gradient
descent method
DMTL
the objective function J with respect to the parameters
W and b are computed as follows:
1 (
Nt
Nt j 1
h(M ) tj
1 Ns
Ns
h(M sj
)
)
e
j 1
'
(
z(M ti
)
)
L(M ) si
(1 Ns
Ns j 1
h(M ) sj
1 Nt
Nt
h(M tj
)
)
e
j 1
'
(
z(M si
)
)
L( m ) ti
(W
( m1)T
L( m 1) ti
)
e
'
(
z(m ti
)
)
L(m) (W (m1)T L(m1) ) e ' ( z (m) )
M
Sc(M ) Sb(M ) ( W (m)
m1
2 F
b(m)
2
)
2
intra-class variations are minimized
inter-class variations are maximized
S (m) c
=
1 Nk1
N i 1
N
Pij
d
2 f (m)
(
xi
,
x
j
)
output
parameters:
0.1, 10, 0.1, w(1) 1, (1) 0, k1 3, k2 10, 0.2
Deep network with three layers (M = 2), and neural nodes from bottom to top layer are set as: 200→200→100.
Sc( M
)
Sb(M
)
D(M ts
)
(Xt
,
X
s
)
M
( W (m)
2 F
b(m)
2
)
2
m1
Then compute the gradient.
Face Verification
Face Verification
LFW
LBP
WDRef
LBP
DTML
output
parameters:
0.1, 10, 0.1, w(1) 1, (1) 0, k1 5, k2 10, 0.2
j 1
S (m) b
=
1 Nk2
N i 1
N
Qij
d
2 f (m)
(
xi
,
x
j
)
j 1
DMTL
Given target domain data X t and source domain data X s ,to reduce the
distribution difference, we apply the Maximum Mean Discrepancy (MMD)
For each pair of samples xi and x j , their distance metric is
d
2 f
(m)
(
xi
,
x
j
)
f
(m) (xi )
f
(m)(xj )
2 2
Enforce the marginal fisher analysis criterion on the output of all the
相关主题