1. Berclaz, J., Fleuret, F., Turetken, E., and Fua, P. Multiple Object Tracking using
K-Shortest Paths Optimization. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2011.
2. Carroll, J. D. and Chang, J.-J. Analysis of individual differences in multidimensional
scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika, 35
(3):283–319, Sep 1970.
3. El-Yaniv, R. and Pechyony, D. Transductive rademacher complexity and its applications.
In Learning Theory, volume 4539, pp. 157–171. 2007.
4. Fazel, M., Hindi, H., and Boyd, S. P. A rank minimization heuristic with application
to minimum order system approximation. In Proceedings of the 2001 American Control
Conference. (Cat. No.01CH37148), volume 6, pp. 4734–4739, June 2001.
5. Guo, X., Yao, Q., and Kwok, J. T. Efficient sparse low-rank tensor completion using the
frank-wolfe algorithm. In Proceedings of the Thirty-First AAAI Conference on Artificial
Intelligence, February 4-9, 2017, San Francisco, California, USA., pp. 1948–1954, 2017.
6. Hackbusch, W. Tensor Spaces and Numerical Tensor Calculus. Springer Series in
Computational Mathematics. Springer Berlin Heidelberg, 2012. ISBN 9783642280276.
7. Harshman, R. A. Foundations of the PARAFAC procedure: models and conditions for an
explanatory multimodal factor analysis. UCLA Working Papers in Phonetics, 16:1–84,
1970.
8. Hillar, C. J. and Lim, L.-H. Most tensor problems are np-hard. J. ACM, 60(6), November
2013. ISSN 0004-5411.
9. Hitchcock, F. L. The expression of a tensor or a polyadic as a sum of products. J. Math.
Phys, 6(1):164–189, 1927.
10. Imaizumi, M., Maehara, T., and Hayashi, K. On tensor train rank minimization :
Statistical efficiency and scalable algorithm. In NIPS, pp. 3933–3942, 2017.
11. Karatzoglou, A., Amatriain, X., Baltrunas, L., and Oliver, N. Multiverse recommendation:
N-dimensional tensor factorization for context-aware collaborative filtering. In RecSys, pp.
79–86. ACM, 2010.
12. Kolda, T. G. and Bader, B. W. Tensor Decompositions and Applications. SIAM Review,
51(3):455–500, 2009.
A Self-archived copy in
Kyoto University Research Information Repository
https://repository.kulib.kyoto-u.ac.jp
18
Kishan Wimalawarne, Hiroshi Mamitsuka
13. Latala, R. Some estimates of norms of random matrices. Proc. Amer. Math. Soc., 133
(5):1273–1282, 2005.
14. Lim, L. and Comon, P. Blind multilinear identification. IEEE Trans. Information
Theory, 60(2):1260–1280, 2014.
15. Liu, J., Musialski, P., Wonka, P., and Ye, J. Tensor completion for estimating missing
values in visual data. In ICCV, pp. 2114–2121, 2009.
16. Mu, C., Huang, B., Wright, J., and Goldfarb, D. Square deal: Lower bounds and
improved relaxations for tensor recovery. In ICML, pp. 73–81, 2014.
17. Oseledets, I. V. Tensor-train decomposition. SIAM J. Sci. Comput., 33(5):2295–2317,
September 2011. ISSN 1064-8275.
18. Rai, P., Hu, C., Harding, M., and Carin, L. Scalable probabilistic tensor factorization for
binary and count data. IJCAI’15, pp. 3770–3776. AAAI Press, 2015. ISBN 978-1-57735738-4.
19. Raskutti, G., Chen, H., and Yuan, M. Convex regularization for high-dimensional
multi-response tensor regression. CoRR, abs/1512.01215v2, 2015.
20. Shamir, O. and Shalev-Shwartz, S. Matrix completion with the trace norm: Learning,
bounding, and transducing. Journal of Machine Learning Research, 15:3401–3423, 2014.
21. Song, Q., Ge, H., Caverlee, J., and Hu, X. Tensor completion algorithms in big data
analytics. CoRR, abs/1711.10105, 2017.
22. Tomioka, R. and Suzuki, T. Convex tensor decomposition via structured schatten norm
regularization. In NIPS, 2013.
23. Wimalawarne, K., Sugiyama, M., and Tomioka, R. Multitask learning meets tensor
factorization: task imputation via convex optimization. In NIPS. 2014.
24. Yang, Y., Feng, Y., and Suykens, J. A. K. A rank-one tensor updating algorithm for
tensor completion. IEEE Signal Processing Letters, 22(10):1633–1637, Oct 2015. ISSN
1070-9908.
25. Yuan, M. and Zhang, C.-H. On tensor completion via nuclear norm minimization.
Foundations of Computational Mathematics, 16(4):1031–1068, Aug 2016.
26. Zheng, V. W., Cao, B., Zheng, Y., Xie, X., and Yang, Q. Collaborative filtering meets
mobile recommendation: A user-centered approach. In AAAI, AAAI’10, pp. 236–241.
AAAI Press, 2010.
A Dual Norms of Reshaped Tensor Nuclear Norms
In this section, we discuss the dual norm of the proposed reshaped tensor nuclear norm. The
dual norm is useful in developing optimization procedures and proving theoretical bounds.
The dual norm the tensor nuclear norm [24, 25] of a K-mode tensor T ∈ Rn1 ×···×nK is
given by
kT kop =
max
hT , y1 ⊗ y2 ⊗ · · · ⊗ yK i.
(6)
kyi k2 =1,1≤i≤K
This definition applies to all the tenor nuclear norms including reshaped norms.
The next Lemma provides the dual norm for the reshaped latent tensor nuclear norm.
Lemma 1 The dual norm of the reshaped latent tensor nuclear norm for a tensor W ∈
Rn1 ×···×nK for a collection of G reshaping sets DL = (D(1) , . . . , D(G) ) is
kWkr
latent(DL )∗
= max kW(D(g) ) kop .
Proof Using the standard formulation of the dual norm, we write the dual norm for
kWkr latent(DL )∗ as
kWkr
latent(DL )∗
= sup
k=1
(k)
,W
s.t.
inf
X (1) +···+X (G) =X k=1
(k)
k?
(D (k) )
kX
≤ 1.
(7)
A Self-archived copy in
Kyoto University Research Information Repository
https://repository.kulib.kyoto-u.ac.jp
Reshaped Tensor Nuclear Norms for Higher Order Tensor Completion
The solution to (7) resides on the simplex of inf X (1) +···+X (G) =X
19
PG
k=1
(k)
k?
(D (k) )
kX
≤ 1 and
one of the edges of the simplex is a solution. Then, we can take any g ∈ 1, . . . , G such that
X (g) = X and all X (k6=g) = 0, such that we arrange (7) as
kWkr latent(DL )∗ = sup
X(D(g) ) , W(D(g) )
s.t. kX(D(g) ) k? ≤ 1,
g∈1,...,G
which results in the following
kWkr
latent(DL )∗
max
g∈1,...,G
kW(D(g) ) kop .
B Proofs of theoretical analysis
In this section, we provide proofs of the theoretical analysis in Section 4.
First, we prove the following useful lemmas. These lemmas bound the tensor nuclear
norm and the reshaped tensor nuclear norms with respect to the multilinear rank of a tensor.
Lemma 2 Let X ∈ Rn1 ×...×nK be a random K-mode tensor with a multilinear rank of
(r1 , . . . , rK ). Let rcp be the CP rank of X , then
kX k? =
( rcp
γj |X =
j=1
rcp
γj u1j ⊗ u2j · · · ⊗
uKj , kukj k22
= 1, γj ≥ γj+1 > 0
j=1
QK
k=1 rk
γ1 ,
maxj=1,...,K ri
where γi is the ith singular value of X .
Proof Let us consider the Tucker decomposition of X as
X =
r1 X
r2
···
j1 =1 j2 =1
rK
(1)
(2)
(K)
Cj1 ,...,jK uj1 ⊗ uj2 ⊗ · · · ⊗ uj
jK =1
where C ∈ Rr1 ×···×rK is the core tensor and uj(i) ∈ Rnj , kuj(i) k2 = 1, i = 1, . . . , ri , j =
1, . . . , K are component vectors.
Following Chapter 8 of [6], we can express the above Tucker decomposition as
rK
r2
r1
(1)
(2)
(K)
(8)
X =
···
Cj1 ,...,jK uj1 ⊗uj2 ⊗ · · · ⊗ uj ,
j2 =1
jK =1
j1 =1
{z
ˆ (1) [j2 ,...,jK ]∈Rn1
where we have taken summation over the multiplications of core tensor elements with
component vectors of the mode 1. It is easy to see that we can consider the summation over
component vectors of any other mode.
By considering u
ˆ(1) [j2 , . . . , jK ] = γ[j2 , . . . , jK ]
ˆ (1) [j2 ,...,jK ]
kˆ
u(1) [j2 ,...,jK ]k2
where γ[j2 , . . . , jK ] =
kˆ
u(1) [j2 , . . . , jK ]k2 , it leads to a CP decomposition with rank of rcp =
QK
k=1 rk
maxj=1,...,K ri
(1)
vectors u
ˆ [j2 , . . . , jK ]
By arranging γ[j2 , . . . , jK ] in descending order along with component
and renaming them as γ1 ≥ γ2 ≥ . . . and u1j , respectively, we obtain
( rcp
rcp
kX k? =
γj |X =
γj u1j ⊗ u2j · · · ⊗ uKj , kukj k2 = 1, γj ≥ γj+1 > 0 ,
j=1
j=1
A Self-archived copy in
Kyoto University Research Information Repository
https://repository.kulib.kyoto-u.ac.jp
20
Kishan Wimalawarne, Hiroshi Mamitsuka
(k)
(k)
where ukj ∈ [u1 , . . . urk ] are component vectors from (8) for each k = 2, . . . , K.
Then the final bound is trivial
kX k? =
( rcp
γj |X =
rcp
γj u1j ⊗ u2j · · · ⊗
uKj , kukj k22
= 1, γj ≥ γj+1 > 0
j=1
j=1
QK
k=1 rk
γ1 .
maxj=1,...,K ri
Lemma 3 Let X ∈ Rn×...×n be a random K-mode tensor with multilinear rank of
(r1 , . . . , rK ). We consider a set of M reshaping modes Di , i = 1, . . . , M . Let rcp be the CP
rank of X , then
( rcp
kX(D1 ,...,DM ) k? =
γj |X(D1 ,...,DM ) =
j=1
rcp
γj u1j ⊗ u2j · · · ⊗ uM j ,
j=1
kukj k22 = 1, γj ≥ γj+1 > 0
QK
k=1 rk
maxj=1,...,M
i∈Dj
ri
γ1 ,
where γi is the ith singular value of X(D1 ,...,DM ) .
Proof Let us consider the Tucker decomposition of X as
X =
r1 X
r2
···
j1 =1 j2 =1
rK
(1)
(2)
(K)
Cj1 ,...,jK uj1 ⊗ uj2 ⊗ · · · ⊗ uj
jK =1
where C ∈ Rr1 ×···×rK is the core tensor, (r1 , . . . , rK ) is the multilinear rank and uji ∈
Rnj , kuji k2 = 1, i = 1, . . . , ri , j = 1, . . . , K are component vectors. We rearrange the Tucker
decomposition for the reshaped tensor X(D1 ,...,DM ) as
X(D1 ,...,DM ) =
j 0 0 ,j 0 0 ,...∈D2
···
(b)
Cj1 ,...,jK ΠD1 (uja ⊗ uj · · · )
(a)
j 0000 ,j 0000 ,...∈DM
ja ,jb ,...∈D1
{z
0 ,...,D 0 ]∈Rprod(D1 )
ˆ 1 [D2
(a0 )
⊗ ΠDM (uj 0
(b0 )
⊗ uj 0
···)
⊗ ··· .
ˆ [D 0 ,...,D 0 ]
0 ] = γ[D 0 , . . . , D 0 ]
0 ] = kˆ
0 ]k .
Taking u
ˆ1 [D20 , . . . , DM
where γ[D20 , . . . , DM
u1 [D20 , . . . , DM
M kˆ
u [D 0 ,...,D 0 ]k
We can consider the about summation over any reshaping set and it is easy
to see that the
arrangement takes a CP decomposition with a CP rank of rcp =
k=1
maxj=1,...,M
rk
i∈Dj
ri
By arranging γ[D2 , . . . , DM ] in descending order order along with component vectors
ˆ(1) [D2 , . . . , DM ] and renaming them as γ1 ≥ γ2 ≥ . . . and u1j , respectively, we obtain
kX(D1 ,...,DM ) k? =
( rcp
γj |X(D1 ,...,DM ) =
(a )
where ukj ∈ [ΠDk (u1
γj u1j ⊗u2j · · ·⊗uM j , kukj k22 = 1, γj ≥ γj+1 > 0 ,
j=1
j=1
rcp
(b )
⊗ u1
(a0 )
· · · ), . . . , ΠDk (ur0
k = 2, . . . , M and a0 , b0 , . . . ∈ Dk .
(b0 )
⊗ ur0 · · · )] are components for each
A Self-archived copy in
Kyoto University Research Information Repository
https://repository.kulib.kyoto-u.ac.jp
Reshaped Tensor Nuclear Norms for Higher Order Tensor Completion
21
The following inequality is trivial,
kX(D1 ,...,DM ) k? =
( rcp
γj |X(D1 ,...,DM ) =
rcp
γj u1j ⊗ u2j · · · ⊗ uM j ,
j=1
j=1
kukj k22
= 1, γj ≥ γj+1 > 0
QK
k=1 rk
maxj=1,...,M
i∈Dj
ri
γ1 ,
Lemma 4 Let X ∈ Rn1 ×...×nK be a random K-mode tensor with CP rank of rcp . We
consider a set of M reshaping sets Di , i = 1, . . . , M . Then
kX(D1 ,...,DM ) k? ≤ rcp γ1 ,
where γi is the ith singular value of X(D1 ,...,DM ) .
Proof Let us consider X as
X =
rcp
γj u1j ⊗ u2j · · · ⊗ uKj ,
j=1
with kukj k22 = 1, γj ≥ γj+1 > 0. For the reshaping set (D1 , . . . , DM ), we rearrange the X as
X(D1 ,...,DM ) =
rcp
γj (◦i1 ∈D1 ui1 j ) ⊗ (◦i2 ∈D2 ui2 j ) · · · ⊗ (◦iM ∈DM uiM j ),
j=1
where a ◦ b = [a1 b, a2 b, . . . , an b]> is the Khatri-Rao product [12]. It is easy to verify that
vec((a ◦ b) ⊗ (c ◦ d)) = vec(a ⊗ b ⊗ c ⊗ d), which indicates that vec(X ) = vec(X(D1 ,...,DM ) ).
Using the fact that Rank(a ⊗ b) ≤ Rank(a)Rank(b) from [12], we have
Rank(X(D1 ,...,DM ) ) ≤ Rank(X ) = rcp .
This lead to the final observation
kX(D1 ,...,DM ) k? ≤ rcp γ1 .
In order to prove Rademacher complexities in Theorem 1 and 2, we use the following
Lemma form [19].
Lemma 5 (Raskutti, Chen and Yuan,2015) Consider a K-mode tensor X ∈ Rn1 ×···×nK
with random samples from an i.i.d. Gaussian tensor ensemble. Then
EkX kop ≤ 4 log(4K)
nk .
k=1
Given a tensor X ∈ Rn1 ×···×nK with Gaussian entries, we can write
EX = E
Xi1 ,i2 ,...,iK ei1 ⊗ ei2 ⊗ · · · ⊗ eiK ,
i1 ,i2 ,...,iK
where eik is the vector with 1 at the kth element and rest of the elements are zero. Due to
each Xi1 ,i2 ,...,iK being a Gaussian entry, we have
EX = Eg E
i1 ,i2 ,...,iK
i1 ,i2 ,...,iK |Xi1 ,i2 ,...,iK |ei1 ⊗ ei2 ⊗ · · · ⊗ eiK ,
A Self-archived copy in
Kyoto University Research Information Repository
https://repository.kulib.kyoto-u.ac.jp
22
Kishan Wimalawarne, Hiroshi Mamitsuka
where i1 ,i2 ,...,iK ∈ {−1, 1}. Using the Jensen’s inequality, we have
Eg E
i1 ,i2 ,...,iK |Xi1 ,i2 ,...,iK |ei1 ⊗ ei2 ⊗ · · · eiK
i1 ,i2 ,...,iK
≥ E
i1 ,i2 ,...,iK Eg |Xi1 ,i2 ,...,iK |ei1 ⊗ ei2 ⊗ · · · eiK
i1 ,i2 ,...,iK
2πE
i1 ,i2 ,...,iK ei1 ⊗ ei2 ⊗ · · · eiK .
i1 ,i2 ,...,iK
This shows that we can use the Lemma 3 to bound tensors with Bernoulli random variables.
Next we give the detailed proof of Theorem 1.
Proof of Theorem 1: We expand the Rademacher complexity in (5) as
RS (l ◦ W) =
Eσ
|S|
sup
W∈W i ,..,i
Σi1 ,...,iK l(Xi1 ,...,iK , Wi1 ,...,iK ) ,
where Σi1 ,...,iK = σj when (i1 , . . . , iK ) = σj ∈ S and Σi1 ,...,iK = 0, otherwise.
We analyze the Rademacher complexity
RS (l ◦ W) =
Eσ
|S|
Eσ
|S|
sup
W∈W i ,..,i
sup
W∈W i ,..,i
Σi1 ,...,iK l(Xi1 ,...,iK , Wi1 ,...,iK ) ,
Σi1 ,...,iK Wi1 ,...,iK
(Rademacher contraction) (9)
Eσ sup kW(D1 ,...,DM ) k? kΣ(D1 ,...,DM ) k?∗ ,
|S|
W∈W
(Duality relationship)
(a) Given that tensor has a multilinear rank of (r1 , . . . , rK ), using the Lemma 3, we
know that
QK
kW(D1 ,...,DM ) k? ≤
k=1 rk
maxj=1,...,M
i∈Dj
ri
γ1 (W(D1 ,...,DM ) ).
(10)
Using Lemma 5, we can bound Eσ kΣ(D1 ,...,DM ) k?∗ as
Eσ kΣ(D1 ,...,DM ) k?∗ ≤ 4 log(4M )
M s Y
j=1
np .
(11)
p∈Dj
By substituting (10) and (11) to (9), we obtain the following bound
RS (l ◦ W) ≤
cΛ
|S|
QK
k=1 rk
maxj=1,...,M
i∈Dj
ri
M s Y
γ1 (W(D1 ,...,DM ) ) log(4M )
np . (12)
j=1
p∈Dj
(b) Given that tensor has a CP rank of rcp , using the Lemma 4, we have
kW(D1 ,...,DM ) k? ≤ rcp γ1 (W(D1 ,...,DM ) ).
(13)
From Lemma 5, we have
Eσ kΣ(D1 ,...,DM ) k?∗ ≤ 4 log(4M )
M s Y
j=1
p∈Dj
np .
(14)
A Self-archived copy in
Kyoto University Research Information Repository
https://repository.kulib.kyoto-u.ac.jp
Reshaped Tensor Nuclear Norms for Higher Order Tensor Completion
23
By substituting (13) and (14) to (9), we obtain the desired bound
RS (l ◦ W) ≤
M s Y
cΛ
rcp γ1 (W(D1 ,...,DM ) ) log(4M )
n|Dj | .
|S|
j=1
p∈D
(15)
Next, we give the proof for theorem 2.
Proof of theorem 2: We expand the Rademacher complexity in (5) using latent tensors
W (1) , . . . , W (G) for the reshaped latent tensor nuclear norm as
RS (l ◦ (W (1) + · · · + W (G) )) =
Σi1 ,...,iK l(Xi1 ,...,iK , Wi1 ,...,iK ) ,
sup
Eσ
|S|
W (1) +···+W (G) =W∈Wrl i ,..,i
where Σi1 ,...,iK = σj when (i1 , . . . , iK ) = σj ∈ S and Σi1 ,...,iK = 0, otherwise.
We can analyze the Rademacher complexity as
RS (l ◦ (W (1) + · · · + W (G) )) =
Eσ
|S|
Eσ
|S|
sup
W (1) +···+W (G) =W∈Wrl i1 ,..,iK
sup
Σi1 ,...,iK l(Xi1 ,...,iK , Wi1 ,...,iK ) ,
W (1) +···+W (G) =W∈Wrl i1 ,..,iK
Σi1 ,...,iK Wi1 ,...,iK
(Rademacher contraction)
kWkr
Eσ
sup
|S|
W (1) +···+W (G) =W∈Wrl
latent kΣkr latent∗
(Duality relationship).
(16)
(a) For tensor with multilinear rank, using Lemma 4, we obtain
kWkr
latent
inf
≤ min
g∈G
maxj=1,...,M ...