Asymptotic properties of Cucconi test statistics
概要
The statistical testing hypothesis is one of the most important techniques used in nonparametric statistics. Various nonparametric statistics have been proposed and discussed for several years. The nonparametric test is a method used to evaluate a statistical hypothesis without assuming a specific distribution function. In general, researchers assume a normal distribution when they analyze experimental data. However, it is difficult to assume normality if there is a large number of outliers. The nonparametric test is beneficial when normal or specific distributions cannot be clearly assumed. There are various nonparametric test statistics, most of which are based on ranking method. Triggered by Wilcoxon (1945), researchers studied the rank- based mathod. From the late 1940s to the 1950s, many researchers studied the association with nonparametric test statistics and parametric test statistics. As a result of them, from the viewpoint of the asymptotic efficiency, nonparametric test statistics are shown to be effective as same as parametric test statistics. Since the 1960s, the nonparametric test statistics have become the most important task and have been extremely valuable. In the late 1960s, many theories underlying the nonparametric test statistics were constructed. With the development of information technology, the nonparametric test statistics have developed remarkably. A study of the statistics of non-parametric tests is actively performed, and the importance is confirmed to the present day.
If it is assumed that population distributions may differ only in location, many nonparametric tests may be used, such as Wilcoxon (1945), Mann and Whitney (1947). There are also many tests for the scale problem, such as Mood (1954), Ansari and Bradley (1960). If the scale parameters change, the test statistic for the location parameter is not useful. Similarly, if the location parameters change, the test statistic for the scale parameter is not useful. To resolve the dilemma, for example, the Lepage test (Lepage, 1971) is well-known in determining the two- sample location-scale problem. It combines the Wilcoxon (1945) and Ansari and Bradley (1960) test statistics. After the Lepage test was developed, many researchers studied combinations of test statistics, such as Pettitt (1976). In addition, many researchers investigated Lepage- type tests, such as Bu¨ning and Thadewald (2000), Neuh¨auser (2000), Bu¨ning (2002), Murakami (2007). In contrast, the test statistics for the location-scale problem were suggested by Cucconi (1968). The structure of the Cucconi test is based on the Mahalanobis distance between two rank-sum test statistics. Although the Cucconi test was developed earlier than the Lepage test, little is known about it. The explanation was published in Italian in a paper by Cucconi (1968). Marozzi (2009) pioneered the Cucconi test and determined its advantages. First, convergence to the limiting distribution is excellent when the sample sizes are almost the same in comparison with the exact critical values of the Cucconi test. Second, the Cucconi test is more powerful than the Lepage and four Podgor-Gastwirth (Podgor and Gastwithe, 1994) tests. Third, less computing is required compared with the Lepage test. Recently, the Cucconi test has been applied in various fields, including hydrology (Rutkowska and Banasik, 2016) and psychology (Marmolejo-Ramos et al., 2017). Moreover, the Cucconi test is highly valued in industrial quality control, and several control charts have been based on this test statistics; see Chowdhury et al. (2014), Mukherjee and Marozzi (2017a), Mukherjee and Marozzi (2017b).
Because we sometimes conduct analyses to determine the presence of various cumbersome data, we require techniques that correspond to these situations. Censored data is one of the most significant categories that is frequently observed in survival analysis. Censoring can be divided into types. If the event of interest has already occurred (or will occur), and the data are included this information, we call it left-censoring (or right-censoring) . For the cause to be generated, two types require classification. Type-I censored data are obtained by setting a fixed time to run the units to determine whether they survive or fail. In addition, Type-II censoring occurs if the number for taking the data is fixed. We note the testing hypothesis of Type-I left-censored and right-censored data. Epstein (1954) established the test statistics for right-censored at a fixed point and small samples under the exponential distributions based on the maximum likelihood estimation. In nonparametric test, several researchers established the two-sample nonparametric significance test for censored samples. Halperin (1960) proposed the test statistics for right-censored data based on the Mann-Whitney test. Sugiura (1963) suggested Wilcoxon-type left-censored test statistics. By focusing on the kernel function, which is used in comparing the magnitude of two observation values, Gehan (1965) proposed the single-censored test statistic. As in the other tests, the log-rank test (Peto and Peto, 1972) and a class of distance test (Pepe and Fleming, 1989) were presented.
Additionally, the nonparametric one-way layout analysis of variance (ANOVA) plays an im- portant role in biometry. The extension of the Cucconi test to multisample location-scale prob- lems was proposed by Marozzi (2014), who showed that the multisample Cucconi test was more powerful than the multisample Lepage test suggested by Rubl´ık (2005). Because the derivation of the critical value of the multisample Cucconi test is dependent on the permutation method, the amount of calculation required is enormous. However, asymptotic and limiting distributions are unknown. More recently, Murakami (2016a) presented test statistics based on the all-pair Cucconi test for multiple comparisons. To challenge the assumption of population distribution functions, many researchers have applied the tied ranking method to various nonparametric tests; see Hemelrijk (1952), Putter (1955), Paul and Mielke (1967).
In this paper, we focus on the versatility of the Cucconi test in (i) a two-sample case and (ii) a multisample case. The results of this paper are based on Nishino and Murakami (2018), Nishino and Murakami (2019a), Nishino and Murakami (2019b) and Nishino and Murakami (2020). The paper is organized as follows. First, we discuss the two-sample case. In Chapter 2, we propose a generalized two-sample Cucconi test and investigate its properties and specifications based on Nishino and Murakami (2019b). In Chapter 3, we suggest the Cucconi test for use with specific censored data, and derive the limiting distribution. Moreover, we confirm the empirical power and analyze the actual data. This chapter is based on Nishino and Murakami (2019a). Then we discuss the multisample case. In Chapter 4, we derive the null and non- null limiting distributions of the multisample Cucconi test based on Nishino and Murakami (2018). In Chapter 5, we propose the generalized multisample Cucconi test statistics for not only continuous but also discrete populations based on Nishino and Murakami (2020). Finally, in Chapter 6, we conclude the paper.