当前位置:文档之家› 深度迁移度量学习

深度迁移度量学习


J 2 N N ( m ) ( m 1)T ( m ) ( m 1)T Pij ( Lij hi L ji h j ) (m) W Nk1 i 1 j 1 2 Nk2
Q ( L
i 1 j 1 Nt ij
N
N
( m ) ( m 1)T ij i
h
T
L h
( m ) ( m 1)T ji j
L(ijM ) ( hi( M ) h (j M ) ) ' ( zi( M ) )
M) L(ji ( h (j M ) hi( M ) ) ' ( z (j M ) )
zi( m ) W ( m ) hi( m 1) b ( m )
L L L
(m) ij (m) ji (M ) ti
(W (W
( m 1)T ( m 1)T Nt
L(ijm 1) ) ' ( zi( m ) )
m 1) L(ji ) ' ( z (jm ) ) (M ) tj
1 ( Nt 1 ( Ns (W (W
h
j 1 Ns j 1
1 Ns 1 Nt
(M ) ' (M ) h ) ( z ) sj ti j 1 Nt (M ) ' (M ) h ) ( z ) tj si j 1
2 ( m)
D
( m) ts
1 (Xt , X s ) Nt
f
i 1
Nt
( m)
1 ( xti ) Ns
f
i 1
Ns
( xsi )
2
We formulate DTML as the following optimization problem:
min J S (M )
f
(M ) c
Face Verification
Person ReIdentification
Person ReIdentification
Source domain
Target domain color and texture histograms color and texture histograms
Deep Transfer Metric Learning
Introduction
Problems
Metric learning
nonlinear relationship Linear feature space
explicit nonlinear mapping functions
kernel trick
DML
Enforce the marginal fisher analysis criterion on the output of all the training samples at the top layer:
M
min J S (M )
f
(M ) c
S
(M ) b
( W
m 1
Face Verification
Face Verification
LFW WDRef
LBP
DTML LBP
output
parameters:
0.1, 10, 0.1, w 1,
(1)
(1)
0, k1 5, k2 10, 0.2
Deep network with three layers (M = 2), and neural nodes from bottom to top layer are set as: 500→400→300. The nonlinear activation function:The tanh function
Deep learning
DSTML
For exploiting discriminative information from the output of all layers as much as well.
M 1 m 1
min JJ (M )
f
(M )
w h( J
( m) (M ) b
Person ReIdentification
Person ReIdentification
( m)

( m)
),
M
discriminative information from the output of all layers.
( m) 2 F
J
( m)
S
(M ) c
S
D
(M ) ts
(Xt , X s ) ( W
m 1
b
( m) 2 2
)
Then compute the gradient.
S
(M ) b
D
(M ) ts
(Xt , X s ) ( W
m 1
M
( m) 2 F
b
( m) 2 2
)
stochastic sub-gradient descent method
DMTL
the objective function J with respect to the parameters W and b are computed as follows:
descent algorithm as follows until convergence:
J W W (m) W J (m) (m) b b (m) b
(m) (m)
DMTL
STEP:
intra-class variations are minimized inter-class variations are maximized Transfer learning
) ) 2 W
( m)
1 2 ( Nt
L
i 1
( m ) ( m 1) ti ti
h
1 Ns
L
i 1
Ns
( m ) ( m 1)T si si
hations for all layers 1 ≤ m ≤ ℓ − 1 are computed as follows :
p( m )
For each pair of samples xi and x j , their distance metric is
d
2 f ( m)
( xi , x j ) f
( m)
( xi ) f
( m)
(xj )
2 2
Enforce the marginal fisher analysis criterion on the output of all the training samples at the top layer
DTML
output
parameters:
0.1, 10, 0.1, w(1) 1, (1) 0, k1 3, k2 10, 0.2
Deep network with three layers (M = 2), and neural nodes from bottom to top layer are set as: 200→200→100. The nonlinear activation function:The tanh function
DMTL
Given target domain data X t and source domain data X s , to reduce the distribution difference, we apply the Maximum Mean Discrepancy (MMD) criterion
is a nonlinear activation function which operates component-wisely.
For the first layer, we assume h
(0)
f
( m)
( x) h
( m)
(W h
x.
( m) ( m1)
b )R
( m)
Deep learning
learn feature representations
h, W, b
Deep Metric learning
DSTML
training and test samples DTML
same
Transfer learning
DTML
DML
DML
The nonlinear mapping function can be explicitly obtained:
Ns
M) L(si
(M ) h sj
L L
(m) ti (m)
( m 1)T ( m 1)T
( m) L(tim 1) ) ' ( zti )
L( m 1) ) ' ( z ( m ) )
DMTL
Then,
W
(m)
and b
(m)
can be updated by using the gradient
( m) 2 F
b
( m) 2 2
)
intra-class variations are minimized inter-class variations are maximized
N N 1 2 Sc( m ) = P d ij f ( m ) ( xi , x j ) Nk1 i 1 j 1 N N 1 (m) 2 Sb = Qij d f ( m ) ( xi , x j ) Nk2 i 1 j 1
相关主题