Bashtannyk, D. M. and Hyndman, R. J. (2001). Estimating and visualizing conditional
densities, Computational Statistics & Data Analysis, 36(3), 279–298.
Burman, P. and Chen, K. W. (1989). Nonparametric estimation of a regression function, The Annals of Statistics, 1567–1596.
Cwik, J. and Mielniczuk, J. (1989). Estimating density ratio with application to discriminant analysis, Communications in Statistics-Theory and Methods, 18(8), 3057–
3069.
Faugeras, O. P. (2009). A quantile-copula approach to conditional density estimation,
Journal of Multivariate Analysis, 100(9), 2083–2099.
Georgiev, A. A. (1988). Asymptotic properties of the multivariate nadaraya-watson regression function estimate: the fixed design case, Statistics & probability letters, 7(1),
35–40.
Hall, P., Wolff, R. C., and Yao, Q. (1999). Methods for estimating a conditional distribution function, Journal of the American Statistical Association, 94(445), 154–163.
Hall, P., Racine, J., and Li, Q. (2004). Cross-validation and the estimation of conditional probability densities, Journal of the American Statistical Association, 99(468),
1015–1026.
Hyndman, R. J., Bashtannyk, D. M., and Grunwald, G. K. (1996). Reaction and Control I. Mixing Additive and Multiplicative Network Algebras, Journal of Computational and Graphical Statistics, 5(4), 315–336.
Lei, J. and Wasserman, L. (2014). Distribution-free prediction bands for nonparametric regression, Journal of the Royal Statistical Society: Series B (Statistical
Methodology), 76(1), 71–96.
Otneim, H. and Tjøstheim, D. (2017). Conditional density estimation using the local
gaussian correlation, Statistics and Computing, 28(2), 301–321.
Rosenblatt, M. (1969). Conditional probability density and regression estimators, in
Multivariate analysis II (P. R. Krishnaiah, ed.), 25–31. Academic Press, New York.
Ruppert, D. and Cline, D. B. H. (1994). Bias reduction in kernel density estimation by
smoothed empirical transformations, The Annals of Statistics, 22(1), 185–210.
Vieu, P. (1991). Nonparametric regression: Optimal local bandwidth choice. Journal of
the Royal Statistical Society Series B (Methodological), 53(2), 453–464.
Wand, M. P. and Jones M. C. (1995). Kernel Smoothing, Chapman and Hall, London.
Conditional probability density and regression function estimations with transformation of data
17
Appendix: Proofs of Theorems
Proof of Theorem 1
Suppose that the sample size n is large enough, which ensures
G(x0 ) 1 − G(x0 )
min
> d.
The following expansion holds:
1 X
G(Xi ) − G(x0 )
Yi − y0
fbY⋆ |x0 (y0 ) = 2
nh i=1
1 X
G(Xi ) − G(x0 )
Yi − y0
nh3 i=1
{[Gn (Xi ) − G(Xi )] − [Gn (x0 ) − G(x0 )]}
G(Xi ) − G(x0 )
Yi − y0
′′
nh4 i=1
{[Gn (Xi ) − G(Xi )] − [Gn (x0 ) − G(x0 )]}2
∗
Yi − y0
1 X
Gn (Xi ) − Gn ∗ (x0 )
(3)
nh5 i=1
{[Gn (Xi ) − G(Xi )] − [Gn (x0 ) − G(x0 )]}3
=:J1 + J2 + J3 + J4 ∗
where Gn ∗ (Xi ) is an r.v. between Gn (Xi ) and G(Xi ) with probability 1 and Gn ∗ (x0 ) is
between Gn (x0 ) and G(x0 ) with probability 1.
First, we evaluate J1 , which is the sum of i.i.d. r.v.s. Let us use ψ(u) := G−1 (G(x0 )+
hu). The expectation of J1 is obtained as follows:
ZZ
z − y0
G(w) − G(x0 )
f (w, z)dzdw
E[J1 ] = 2
Z
z − y0
f (ψ(u), z)
du
dz K(u)
g(ψ(u))
"
z − y0
f (x0 , z)
g(x0 )g ′′ (x0 ) − 3(g ′ (x0 ))2
g ′ (x0 )
− f (x0 , z)
− f (1,0) (x0 , z) 4
g(x0 )
2g (x0 )
g (x0 )
+ 4
g(x0 )f (2,0) (x0 , z) − g ′ (x0 )f (1,0) (x0 , z)
h2 A1,2 dz + O(h4 )
2g (x0 )
) (
"(
g(x0 )g ′′ (x0 ) − 3(g ′ (x0 ))2
(hv)2
(0,2)
− f (x0 , y0 )
= K (v)
f (x0 , y0 ) + f
(x0 , y0 )
g(x0 )
2g 5 (x0 )
(x
g(x0 )f (2,0) (x0 , y0 ) − g ′ (x0 )f (1,0) (x0 , y0 )
− f (1,0) (x0 , y0 ) 4
h2 A1,2 dv
g (x0 ) 2g 4 (x0 )
+ O(h4 ).
18
T. Moriyama and Y. Maesono
From the calculation of the following second moment:
ZZ
z − y0
G(w) − G(x0 )
f (w, z)dzdw
nh4
f (ψ(u), z)
z − y0
= 3
K2
K 2 (u)
dzdu
nh
g(ψ(u))
A22,0
= 2 fY |x0 (y0 ) + O
nh
nh
we can see that
V [J1 ] = E[(J1 )2 ] − E[J1 ]2 =
A22,0
fY |x0 (y0 ) + O
nh2
nh
By applying a conditional expectation to J2 , we can obtain
1 XX
Yi − y0
G(Xi ) − G(x0 )
n2 h3 i=1 j=1
{[I(Xj ≤ Xi ) − G(Xi )] − [I(Xj ≤ x0 ) − G(x0 )]}
Yi − y0
G(Xi ) − G(x0 )
= 3E K
K′
nh
E [I(Xj ≤ Xi ) − G(Xi )] − [I(Xj ≤ x0 ) − G(x0 )] Yi , Xi
E[J2 ] =
j=1
G(Xi ) − G(x0 )
Yi − y0
≈ 3E K
{[1 − G(Xi )] − [I(Xi ≤ x0 ) − G(x0 )]}
nh
ZZ
z − y0
f (ψ(u), z)
= 2
K ′ (u) {[1 − G(ψ(u))] − [I(ψ(u) ≤ x0 ) − G(x0 )]}
dzdu
nh
g(ψ(u))
=O
The squared value of J2 is given by
1 XXX X
Yi − y0
Yk − y0
G(Xi ) − G(x0 )
(J2 ) = 4 6
n h i=1 j=1
k=1 m=1
G(Xk ) − G(x0 )
{[I(Xj ≤ Xi ) − G(Xi )] − [I(Xj ≤ x0 ) − G(x0 )]}
K′
{[I(Xm ≤ Xk ) − G(Xk )] − [I(Xm ≤ x0 ) − G(x0 )]}
1 XXX X
=: 4 6
Ξ(i, j, k, m).
n h i=1 j=1
m=1
k=1
Conditional probability density and regression function estimations with transformation of data
19
We can find the following:
if all of (i, j, k, m) are different, E[Ξ(i, j, k, m)] = E[E[Ξ(i, j, k, m)|Xi , Xk ]] = 0
if i = j and all of (i, k, m) are different, E[Ξ(i, j, k, m)] = E[E[Ξ(i, j, k, m)|Xi , Xk ]] = 0
if i = k and all of (i, j, m) are different, E[Ξ(i, j, k, m)] = E[E[Ξ(i, j, k, m)|Xi ]] = 0
if i = m and all of (i, j, k) are different, E[Ξ(i, j, k, m)] = E[E[Ξ(i, j, k, m)|Xi ]] = 0.
From the above results, we can see that the terms in which j = m and all (i, j, k) are
different is the main of E[(J2 )2 ]. For the term in which j = m and all (i, j, k) are
different, the expectation is given by
E[Ξ(i, j, k, m)]
n(n − 1)(n − 2)
Yi − y0
Yk − y0
G(Xi ) − G(x0 )
n4 h4
G(X
G(x
K′
{[I(Xj ≤ Xi ) − G(Xi )] − [I(Xj ≤ x0 ) − G(x0 )]}
{[I(Xj ≤ Xk ) − G(Xk )] − [I(Xj ≤ x0 ) − G(x0 )]} .
It holds that
h h
E E {[I(Xj ≤ Xi ) − G(Xi )] − [I(Xj ≤ x0 ) − G(x0 )]} ...