On the T-test

The $T$-test is probably the most popular statistical test; it is routinely recommended by the textbooks. The applicability of the test relies upon the validity of normal or Student's approximation to the distribution of Student's statistic $\,t_n$. However, the latter assumption is not valid as often as assumed. We show that normal or Student's approximation to $\,\L(t_n)\,$ does not hold uniformly even in the class $\,{\cal P}_n\,$ of samples from zero-mean unit-variance bounded distributions. We present lower bounds to the corresponding error. The fact that a non-parametric test is not applicable uniformly to samples from the class $\,{\cal P}_n\,$ seems to be established for the first time. It means the $T$-test can be misleading, and should not be recommended in its present form. We suggest a generalisation of the test that allows for variability of possible limiting/approximating distributions to $\,\L(t_n)$.


Introduction
Testing a hypothesis concerning the mean IEX of the unknown distribution is one of the major tasks in statistical hypotheses testing. In particular, one can be interested in testing the hypothesis (see, e.g., Lehmann [6]).
Throughout the paper we assume that IEX 2 < ∞, and denote σ 2 = var X. Below a bar over a random variable means that it is centered by its mathematical expectation. The natural estimator of IEX is the sample mean X = S n /n, leading to the test with the test-statistic (X−a) √ n/σ if σ is known (the so-called Z-test) or to the test with the test-statistic (X −a) √ n/σ n , whereσ n is an estimator of the standard deviation (the T -test).
The T -test is arguably the most popular statistical test. In view of the law of large numbers (LLN) and the central limit theorem (CLT) it appears perfectly justified if IEX 2 < ∞.
However, we show below that the T -test has problems even in the simplest case where the main hypothesis is H 0 = {IEX = a}, the alternative hypothesis is H A = {IEX = b} (a < b) and var X is known. We argue that the T -test is not automatically applicable, and requires prior checks.
Here [a−c ε σ/ √ n; a+c ε σ/ √ n ] is the asymptotic confidence interval based on CLT. A more robust approach suggests using estimates of the accuracy of normal approximation, i.e., replacing the asymptotic confidence interval with a sub-asymptotic confidence interval (see [12], ch. 9). In the case of a "one-sided" Z-test of (in the case of the T -test σ is replaced byσ n and Φ is replaced by Student's d.f.). Set δ = (a−b)/σ, and let The probability of the type-II error in the case of (2) equals In the case of (2 * ) the probability of the type-II error equals Thus, the probabilities of the type-II errors are the large deviations probabilities. The need to approximate the probability of the type-II error led to the rise of the theory of large deviations (see, e.g., [8,14,15] and references therein). In view of (3) one needs to approximate the asymptotics of probabilities IP(ζ n ≥ x n ), where {x n } ↑ is a sequence of positive numbers, x n = O( √ n ) as n → ∞. Under certain assumptions on L(X) and the rate of x n In more general situations (in particular, if x n ≍ √ n ) the asymptotics of IP(ζ n ≥ x n ) can be expressed in terms of the so-call "rate function" Λ (Legendre transform of function ψ(t) = ln IEe tX -see, e.g., Petrov [14] or [12], ch. 14.5).
One would prefer normal approximation (4) as the rate function is typically unknown, and the task of estimating it can be demanding. Note that one can use the Erdös-Rényi maximum of partial sums as an estimator of Λ −1 (cf. [12], ch. 2); however, the accuracy of estimation is likely to be pour.
Thus, the question appears answered in the case of known σ and "close alternatives" (i.e., δ ≡ δ(n) → 0 and 1 ≪ x n ≪ √ n as n → ∞). However, we argue that the use of normal approximation is not properly justified. The reason for that is that the test is effectively applied as a non-parametric one -textbooks implicitly assume that the T -test "works" uniformly over the non-parametric class P σ (a 1 , a 2 ) of distributions with mean IEX ∈ [a 1 ; a 2 ] and standard deviation σ.
We show below that weak convergence of L(S n / √ n) to the normal law cannot hold uniformly in the class P 1 (0, 0) of zero-mean unit-variance distributions (the issue with uniform convergence is known in the literature though not in the context of the T -test -see [13] and references therein concerning weak convergence uniformly in a class of distributions). Note that this problem does not arise if one deals with a typical parametric family of distributions since a typical parametric family {P θ , θ∈Θ} has one-to-one correspondence between a parameter and a distribution.
In applications the standard deviation is usually unknown and has to be replaced by its estimator, e.g.,σ n orσ n , wherê ζ n is to be replaced with Student's statistic t n ≡ t n (X 1 , ..., X n ), where The test of the hypothesis H 0 = {IEX = a} involving test statistic t n is called the T -test. It is one of the most widely used statistical tests.
where c ε is given as in (1) with Student's d.f. instead of Φ, so that the probability of the type-I error is asymptotically ε (assuming normal or Student's approximation to L(t n ), cf. (7)). Let whereδ = (a−b)/σ n . The probability of the type-II error equals in the case of a two-sided test (say, a > b) or IP(t n ≥δ √ n −c ε ) in the case of a one-sided These are the large deviations probabilities.
Since the test involvesS n , we may assume in the sequel that IEX = 0. Denote Self-normalised sum t * n is closely related to Student's statistic t n : Note that Thus, probabilities of the events involving t n that appear in the test can be presented as probabilities of the events involving t * n . In particular, the limiting distributions of t n and t * n coincide. In the sequel we will mainly speak about t * n . Student's statistic converges weakly to the standard normal law if and only if L(X) is in the domain of attraction of a normal law and EX = 0 (Giné et al. [4]). The class L S of limiting distributions of Student's statistic in the case of a triangle array of i.i.d. in each row r.v.s has been described by Mason [9]. An estimate of the accuracy of normal approximation to the distribution of Student's statistic with explicit constants has been given by Novak [10,11] in the case of identically distributed r.v.s and by Shao [20] in the case of non-identically distributed r.v.s (see also [12,17] and references therein).
A (6)-type result for the self-normalised sum t * n has been suggested as well [5]: there exists an absolute constant A such that In view of (26), bound (8) is equivalent to As in the case of known σ, one can easily get an impression that it is save to apply the T -test to any sample of i.i.d. observations with a finite second moment.
Recall that the T -test was originally formulated for samples of i.i.d. normal r.v.s. In most applications the observations are not normally distributed. Nonetheless, textbooks suggest that normal approximation is applicable if the sample size is large: "the size of the one-and two-sample T -tests is relatively insensitive to nonnormality (at least for large samples). Power values of the T -tests obtained under normality are asymptotically valid also for all other distributions with finite variance. This is a useful result..." ( [6], p. 207).
This opinion appears widely accepted suggesting that the probabilities of the type-I and type-II errors can be accurately approximated using normal or Student's d.f..
The intuition behind such a suggestion is obvious: By CLT, S n /σ √ n is expected to converge weakly to a standard normal r.v., while by LLN T n /nσ 2 is expected to converge to 1 as n → ∞ (cf. [6], p. 205). The purpose of this article is to show that such an impression can be misleading. What textbooks are missing is that weak convergence of L(t n ) and L(t * n ) to the normal law is not uniform. Theorems 1, 2 show that the T -test cannot be applied uniformly even in the class of bounded zero-mean unit-variance distributions.
Let Ψ n denote the distribution function of Student's statistic with n degrees of freedom. The use of Student's distribution instead of normal one has been inherited from the case of normally distributed observations. However, it is easy to check that Ψ n is close to Φ: sup (cf. Pinelis [16]). The table of Student's distribution function shows little difference between Ψ n (·) and Φ(·) if n ≥ 60. Thus, preference to Ψ n over Φ appears questionable.
We argue that normal or Student's approximation to the distribution of Student's statistic is not automatically applicable. We suggest performing prior checks in order to find out if a particular (not necessarily normal) approximation to the distribution of the test statistic is applicable. This leads to a generalisation of the T -test that allows for non-conventional approximating distributions. We discuss implications for the choice of critical levels.
Section 2 addresses the question of validity of the T -test uniformly over a class of distributions with a finite variance. Section 3 presents an example of non-normal approximation as well as an estimate of the accuracy of such approximation in terms of the total variation distance. The approximating distribution appears new in the literature on the topic. Section 4 suggests a generalisation of the T -test. Proofs are postponed to section 5.
The aim of this article is to show that normal or Student's approximation to the distribution of Student's statistic is not automatically applicable, and the test can be misleading.
Textbooks effectively suggest applying the T -test as a non-parametric test; the class of distributions considered applicable is effectively the class of all distributions with finite variances. We show below that the use of the T -test is not justified in such generality even in the case of testing a simple hypothesis H 0 = {IEX = a} against a simple alternative H A = {IEX = b} in the assumption that var X < ∞.
W.l.o.g. we may assume in the sequel that a = 0. Let P n denote the class of distributions L(X 1 , ..., X n ) of random vectors (X 1 , ..., X n ) such that X, X 1 , ..., X n are independent and identically distributed bounded random variables, IEX = 0, IEX 2 = 1. The use of normal approximation in the T -test would be justified if normal approximation held uniformly in the class P n .
We show below that normal approximation is not applicable uniformly in the class P n . In particular, there exists an absolute constant c > 0 such that for any n > 3 A similar result holds if Φ in (10) is replaced with Ψ n or Ψ n−k , where k ∈ IN. A comparison of (8 * ) with Linnik's result (6) suggests L(t * n ) has "better" asymptotic properties than L(ζ n ). However, it has been noticed in [11] that L(t * n ) has certain disadvantages comparing to L(ζ n ). In particular, a non-uniform Berry-Esseen-type inequality does not hold for Student's statistic (though a modified non-uniform Berry-Esseen-type inequality is valid, see Theorem 12.24 in [12] as n → ∞ for a particular sequence {x n } such that 0 ≤ x n ≪ √ n .
Theorems 1, 2 below answer that question. In particular, we show that the T -test is not applicable uniformly over P n regardless of the size of the sample. In other words, the outcome of the test can be misleading even for large-size samples.

Theorem 1 For any
If {x n } is a non-decreasing sequence of positive numbers such that 1 ≪ x n ≤ √ n as n → ∞, then sup A similar result holds if normal approximation to L(t * n ) has been replaced with Student's approximation. Denote Ψ c n = 1−Ψ n , ψ n = Ψ ′ n .
If {x n } is a non-decreasing sequence of positive numbers such that 1 ≪ x n ≪ √ n as The result holds if Ψ n in (14) has been replaced with Ψ n−k , where k is a fixed natural number.

An example of non-normal approximation
It may be counter-intuitive to expect that Poisson (or Binomial) distribution may play any role in the study of the properties of the T -test but Proposition 3 below states it may.
In this section we present an example of non-normal/non-Student's approximation to L(t n ) and L(t * n ) and evaluate the accuracy of the approximation. The example highlights the fact that the limiting distribution of Student's statistic may take on value ∞ with positive probability.
Given r.v.s Y and Z, we denote by d T V (Y ; Z) ≡ d T V (L(Y ); L(Z)) the total variation distance between L(Y ) and L(Z).
Weak convergence (18) In situations where L(t * n ) can be approximated by L(Y ) or L(Y λ ), the "asymptotic approach" suggests the critical values c − , c + can be chosen according to the equations with λ = np replaced by its consistent estimator (the "two-sided" test); the "subasymptotic approach" suggests incorporating estimate (17).
A possible alternative to distribution L(X) given by (16) is pq , then the probability IP(c − ≤ t * n (X ′ 1 , ..., X ′ n ) ≤ c + ) of the type-II error is a probability of large deviations for the Binomial distribution.

A generalised test
The T -test relies on the validity of normal (or Student's) approximation to L(t n ). The common impression is that L(t n ) is close to the standard normal distribution if the sample size n is large (see, e.g., Lehman [6], p. 205).
It is known that the limiting distribution of t n is not always normal (see Mason [9]). In this section we suggest a generalised T -test. The idea is to check first if a particular approximation (not necessarily normal or Student's) is applicable. The latter can be done using sharp estimates of the accuracy of approximation.
Thus, the generalised T -test requires (1) a list of possible limiting/approximating distributions; (2) sharp estimates of the accuracy of approximation of L(t n ) by the corresponding distributions; (3) estimation of certain quantities involved in those estimates of the accuracy of approximation (e.g., estimation of σ and IE|X 3 | in the case of normal approximation).
Traditionally, the obvious candidate for the approximating distribution is the standard normal law. One can employ the following approximate bound to the uniform distance between L(t * n ) and N (0; 1) (cf. [11], Corollary 2): whereμ k denotes a consistent estimator of µ k := IE|X−IEX| k , k ≥ 1; we denoteσ 2 :=μ 2 . Bound (19) is based on the estimate of the accuracy of normal approximation to L(t * n ) from [11] that seems to be the sharpest available in the case of i.i.d. observations (cf. the discussion in [17], Remarks 4. 16-4.17).
The bound in [11] involves a term (say, γ n ) which is of order o(n −1/2 ). In applications moments {µ k } have to be substituted by their consistent estimators, generating an extra error. Therefore, it is reasonable to omit the term γ n and arrive at (19).
The use of normal approximation can be justified if the right-hand side (r.h.s.) of (19) is less than a certain small number (say, ε) specified by a statistician (e.g., ε = 0.01).
Since the limiting distribution of t n may differ from N (0; 1) (cf. Proposition 3), we suggest that one first checks if a particular (not necessarily normal) approximation to the distribution of the test statistic t n is applicable.
One may have a number of bounds of the type where {F k } are d.f.s of certain candidate distributions. It is natural to choose k = k * such that r n (k * ) = min k r n (k). Obviously, one needs a list of possible limiting/approximating distributions together with the corresponding estimates of the accuracy of approximation with explicit constants. Such a list will always be finite but until recently only normal and Student's distributions were on the list. Proposition 3 adds another candidate to the list. Note that one can have a situation where neither distribution from the list has the estimate of the accuracy of approximation (e.g., r n (k)) below the specified threshold level ε (i.e., min k r n (k) > ε). That would mean the T -test is not applicable (either because of a small sample size or because of the list been too short).

Proofs
Since t n and t * n are scale-invariant, w.l.o.g. we may assume in the sequel that var X = 1. Below the operation of multiplication is superior to the division.
The proofs of Theorems 1, 2 use the fact that L(t n ) and L(t * n ) are not stochastically bounded uniformly in P n .
It suffices to find i.i.d. bounded r.v.s X, X 1 , ..., X n such that IEX = 0, IEX 2 = 1, and (12) holds. We will employ distribution (16) that seems to play the role of a testing stone when one deals with self-normalised sums and Student's statistic.
Note that 2π/e > 1.52. Thus, (12) holds. In particular, if n > 3. Relation (13) follows from (27). The proof is complete. ✷ Remark 1. The statement of Theorem 1 can be reformulated for negative x by switching from {X i } to {−X i }: (12) implies that for any n > 3 Similarly one reformulates the statement of Theorem 2: as n → ∞, (16) is not the only one that can be used in order to establish (10). For instance, let independent r.v.s τ and η be independent of ξ, L(τ ) = B(c/n), where c ≥ 0, IEη = 0, IEη 2 = 1. Set

Conclusion.
We have shown that the T -test in its present form can be misleading even if the sample size is arbitrarily large: normal or Student's approximation to the distribution of Student's statistics t n is not automatically applicable.
Note that the sample size is always finite; in applications it often cannot be increased either due to physical restrictions or because of cost considerations.
The paper suggests a generalisation of the T -test that involves checking for the appropriate approximating distribution, and requires estimates with explicit constants of the accuracy of approximation to L(t n ).
The list of possible limiting/approximating distributions may include, beyond normal, functions of Poisson, compound Poisson, and some other infinitely divisible laws (cf. (15)).
The problem of deriving estimates of the accuracy of normal approximation with explicit constants to the distribution of a sum of r.v.s goes back to Tchebychef [22] and Liapunov [7]. It lead to a vast literature with contributions from many renowned authors (see, e.g., references in [1,12,15,21]). The task of evaluating the accuracy of Poisson and compound Poisson approximation has been addressed by many distinguished authors (see, e.g., references in [1,3,18,23]).
In 1950s Kolmogorov has formulated the problem of evaluating the accuracy of approximation of the distribution of a sum of independent r.v.s by infinitely divisible laws.