# complete sufficient statistic

Suppose that $$W$$ is an unbiased estimator of $$\lambda$$. \frac{f_\… $f_\theta(\bs x) = G[u(\bs x), \theta] r(\bs x); \quad \bs x \in S, \; \theta \in T$. The variable $$Y = \sum_{i=1}^n X_i$$ is the number of type 1 objects in the sample. it’s UMVUE of its expected value). In statistics, sufficiency is the property possessed by a statistic, with respect to a parameter, "when no other statistic which can be calculated from the same sample provides any additional information as to the value of the parameter". $$\newcommand{\N}{\mathbb{N}}$$ (A case in which there is no minimal sufficient statistic was shown by Bahadur in 1957. = if and only if: First, observe that the range of r is the positive reals. In other words, S(X) is minimal sufficient if and only if S(X) is sufficient, and; if T(X) is sufficient, then there exists a function f such that S(X) = f(T(X)). The hypergeometric distribution is studied in more detail in the chapter on Finite Sampling Models. $$\newcommand{\bias}{\text{bias}}$$ The sample variance $$S^2$$ is an UMVUE of the distribution variance $$p (1 - p)$$ for $$p \in (0, 1)$$, and can be written as $f(\bs x) = g(x_1) g(x_2) \cdots g(x_n) = \frac{1}{\Gamma^n(k) b^{nk}} (x_1 x_2 \ldots x_n)^{k-1} e^{-(x_1 + x_2 + \cdots + x_n) / b}, \quad \bs x = (x_1, x_2, \ldots, x_n) \in (0, \infty)^n$ None of these estimators is a function of the sufficient statistics $$(P, Q)$$ and so all suffer from a loss of information. Let X;Y be random variables. Minimal Su ciency and Ancillary Statistic Complete Statistics Exponential Family Jimin Ding, Math WUSTLMath 494Spring 2018 3 / 36. Recall that the sum of the scores $$Y = \sum_{i=1}^n X_i$$ also has the Poisson distribution, but with parameter $$n \theta$$. $$\newcommand{\cov}{\text{cov}}$$ Then there exists a maximum likelihood estimator $$V$$ that is a function of $$U$$. $$\newcommand{\MSE}{\text{MSE}}$$ x_2! Let $$M = \frac{1}{n} \sum_{i=1}^n X_i$$ denote the sample mean and $$U = (X_1 X_2 \ldots X_n)^{1/n}$$ the sample geometric mean, as before. Hence, if $$r: [0, \infty) \to \R$$, then Recall that the sample variance can be written as Here i have attached my work so far. ) Exercise. Let's first consider the case where both parameters are unknown. Naturally, we would like to find the statistic $$U$$ that has the smallest dimension possible. German\ \ vollständige suffiziente Statistik. In other words, if E [f(T(X))] = 0 for all , … In this case, the outcome variable has the form {\text{if }}\operatorname {E} _{\theta }(g(T))=0{\text{ for all }}\theta {\text{ then }}\mathbf {P} _{\theta }(g(T)=0)=1{\text{ for all }}\theta .}. This concept was introduced by R. A. Fisher in 1922. If we can find a sufficient statistic $$\bs U$$ that takes values in $$\R^j$$, then we can reduce the original data vector $$\bs X$$ (whose dimension $$n$$ is usually large) to the vector of statistics $$\bs U$$ (whose dimension $$j$$ is usually much smaller) with no loss of information about the parameter $$\theta$$. Suppose that $$\bs X = (X_1, X_2, \ldots, X_n)$$ is a random sample of size $$n$$ from the Bernoulli distribution with parameter $$p$$. But then from completeness, $$g(v \mid U) = g(v)$$ with probability 1. Conditional expectation. De nition 4. Rao-Blackwell Theorem. A sufficient statistic is minimal sufficient if it can be represented as a function of any other sufficient statistic. $\frac{f_\theta(\bs x)}{f_\theta(\bs{y})} \text{ is independent of } \theta \text{ if and only if } u(\bs x) = u(\bs{y})$. However, $$E_{\theta}(\sin 2\pi X)=\int_{\theta}^{\theta+1} \sin (2\pi x)\,\mathrm{d}x=0\quad,\forall\,\theta$$ a sufficient statistic :Y'-complete if Eef( 1) = 0 for all a and f E :Y' implies f(t) = 0 (a.e. The Poisson distribution is studied in more detail in the chapter on Poisson process. A sucient statistic Tis minimal if for every sucient statistic T0and for every x;y2X, T(x) = T(y) whenever T0(x) = T0(y). That $$U$$ is sufficient for $$\theta$$ follows immediately from the factorization theorem. ( A Complete Sufficient Statistic for Finite-State Markov Processes with Application to Source Coding Laurence B. Wolfe and Chein-I Chang, Senior Member, IEEE Abstract-A complete sufficient statistic is presented in this paper for the class of all finite-state, finite-order stationary discrete Markov pro- cesses. Then $$U$$ is suffcient for $$\theta$$ if and only if the function on $$S$$ given below does not depend on $$\theta \in T$$: Typically, the sufficient statistic is a simple function of the data, e.g. From this we de ne the concept of complete statistics. \cdots x_n!} The concept is perhaps best understood in terms of the Lehmann-Scheffé theorem “…if a sufficient statistic is boundedly complete it is minimal sufficient. \cdots x_n! Hence $$f_\theta(\bs x) = h_\theta[u(\bs x)] r(\bs x)$$ for $$(\bs x, \theta) \in S \times T$$ and so $$(\bs x, \theta) \mapsto f_\theta(\bs x)$$ has the form given in the theorem. Abbreviation: CSS )MSS. Given $$Y = y$$, $$\bs X$$ is concentrated on $$D_y$$ and $S^2 = \frac{1}{n - 1} \sum_{i=1}^n X_i^2 - \frac{n}{n - 1} M^2$ $g(x) = p^x (1 - p)^{1-x}, \quad x \in \{0, 1\}$ Con-sider the follo wing lemma and theorem: Lemma 1. If $$U$$ and $$V$$ are equivalent statistics and $$U$$ is minimally sufficient for $$\theta$$ then $$V$$ is minimally sufficient for $$\theta$$. 0 For example, if n = 1 and the parameter space is {0.5}, a single observation and a single parameter value, T is not complete. Recall that the continuous uniform distribution on the interval $$[a, a + h]$$, where $$a \in \R$$ is the location parameter and $$h \in (0, \infty)$$ is the scale parameter, has probability density function $$g$$ given by of the (same) complete statistic are equal almost everywhere (i.e. These are functions of the sufficient statistics, as they must be. In other words, S(X) is minimal sufficientif and only if 1. Often, there then is no complete sufficient statistic. Compare the estimates of the parameters. $g(x) = \frac{1}{h}, \quad x \in [a, a + h]$ E_{p}(g(T))=0} Bounded completeness occurs in Basu's theorem, which states that a statistic that is both boundedly complete and sufficient is independent of any ancillary statistic. ., Yn are i.i.d. Request PDF | On a complete and sufficient statistic for the correlated Bernoulli random graph model | Inference on vertex-aligned graphs is of wide theoretical and practical importance. Nonetheless we can give sufficient statistics in both cases. But, the median is clearly not a function of this statistic, therefore it cannot be UMVUE. $D_y = \left\{(x_1, x_2, \ldots, x_n) \in \{0, 1\}^n: x_1 + x_2 + \cdots + x_n = y\right\}$. Let be the order statistics of a random sample from a If $$y \in \{\max\{0, N - n + r\}, \ldots, \min\{n, r\}\}$$, the conditional distribution of $$\bs X$$ given $$Y = y$$ is concentrated on $$D_y$$ and Next, $$\E_\theta(V \mid U)$$ is a function of $$U$$ and $$\E_\theta[\E_\theta(V \mid U)] = \E_\theta(V) = \lambda$$ for $$\theta \in \Theta$$. $h(\theta \mid \bs x) = \frac{h(\theta) G[u(\bs x), \theta]}{\int_T h(t) G[u(\bs x), t] dt}$ Statistical Inference. Fisher-Neyman Factorization Theorem. The parameter $$\theta$$ may also be vector-valued. Let $$h_\theta$$ denote the PDF of $$U$$ for $$\theta \in T$$. respectively. Suppose that $$\bs X = (X_1, X_2, \ldots, X_n)$$ is a random sample from the beta distribution with left parameter $$a$$ and right parameter $$b$$. $f(\bs x) = g(x_1) g(x_2) \cdots g(x_n) = \frac{a^n b^{n a}}{(x_1 x_2 \cdots x_n)^{a + 1}} \bs{1}\left(x_{(n)} \ge b\right), \quad (x_1, x_2, \ldots, x_n) \in (0, \infty)^n$ However, a suﬃcient statistic does not have to be any simpler than the data itself. Bounded completeness also occurs in Bahadur's theorem. Specifically, for $$y \in \{\max\{0, N - n + r\}, \ldots, \min\{n, r\}\}$$, the conditional distribution of $$\bs X$$ given $$Y = y$$ is uniform on the set of points Run the gamma estimation experiment 1000 times with various values of the parameters and the sample size $$n$$. except where the probability measure is 0) 4. $$Y$$ is complete for $$\theta \in (0, \infty)$$. ) In Bayesian analysis, the usual approach is to model $$p$$ with a random variable $$P$$ that has a prior beta distribution with left parameter $$a \in (0, \infty)$$ and right parameter $$b \in (0, \infty)$$. Sufficient Statistics Let U = u(X) be a statistic taking values in a set R. Intuitively, U is sufficient for θ if U contains all of the information about θ that is available in the entire data variable X. We now apply the theorem to some examples. the sum of all the data points. So far, in all of our examples, the basic variables have formed a random sample from a distribution. But by doing like this , seems like that i am going to prove that T is a complete sufficient statistic. Once again, the definition precisely captures the notion of minimal sufficiency, but is hard to apply. $\E\left[r(Y)\right] = \sum_{y=0}^\infty e^{-n \theta} \frac{(n \theta)^y}{y!} These estimators are not functions of the sufficient statistics and hence suffers from loss of information. Hence $$\left(M, S^2\right)$$ is equivalent to $$(Y, V)$$ and so $$\left(M, S^2\right)$$ is also minimally sufficient for $$\left(\mu, \sigma^2\right)$$. After some algebra, this can be written as If the shape parameter $$k$$ is known, $$\frac{1}{k} M$$ is both the method of moments estimator of $$b$$ and the maximum likelihood estimator on the parameter space $$(0, \infty)$$. So the result follows from the factorization theorem (3). In particular, these conditions always hold if the random variables (associated with Pθ ) are all discrete or are all continuous. If $$b$$ is known, the method of moments estimator of $$a$$ is $$U_b = b M / (1 - M)$$, while if $$a$$ is known, the method of moments estimator of $$b$$ is $$V_a = a (1 - M) / M$$. 3-1 Let $$f_\theta$$ denote the probability density function of $$\bs X$$ and suppose that $$U = u(\bs X)$$ is a statistic taking values in $$R$$. Here is the formal definition: A statistic $$U$$ is sufficient for $$\theta$$ if the conditional distribution of $$\bs X$$ given $$U$$ does not depend on $$\theta \in T$$. \[ \P(\bs X = \bs x \mid Y = y) = \frac{\P(\bs X = \bs x)}{\P(Y = y)} = \frac{r^{(y)} (N - r)^{(n-y)}/N^{(n)}}{\binom{n}{y} r^{(y)} (N - r)^{(n - y)} / N^{(n)}} = \frac{1}{\binom{n}{y}}, \quad \bs x \in D_y$ Then, Recall that the beta distribution with left parameter $$a \in (0, \infty)$$ and right parameter $$b \in (0, \infty)$$ is a continuous distribution on $$(0, 1)$$ with probability density function $$g$$ given by On the other hand, the maximum likelihood estimators of $$a$$ and $$b$$ on the interval $$(0, \infty)$$ are ( Cambridge University Press. Minimal sufficiency follows from condition (6). For some parametric families, a complete sufficient statistic does not exist (for example, see Galili and Meilijson 2016 ). θ Hence $$f_\theta(\bs x) \big/ h_\theta[u(x)] = r(\bs x) / C$$ for $$\bs x \in S$$, independent of $$\theta \in T$$. for all Recall that $$Y$$ has the binomial distribution with parameters $$n$$ and $$p$$, and has probability density function $$h$$ defined by Then there exist a 1-1 function, g, s.t. It can be shown that a complete and sufﬁcient statistic is minimal sufﬁcient (Theorem 6.2.28). \frac{1}{n^y}, \quad \bs x \in D_y\] Next, suppose that $$V = v(\bs X)$$ is another sufficient statistic for $$\theta$$, taking values in $$R$$. T is a statistic of X which has a binomial distribution with parameters (n,p). If $$r: \{0, 1, \ldots, n\} \to \R$$, then It then follows from the factorization theorem (3) that $$\left(X_{(1)}, X_{(n)}\right)$$ is sufficient for $$(a, h)$$. (pp. If $$U$$ and $$V$$ are equivalent statistics and $$U$$ is complete for $$\theta$$ then $$V$$ is complete for $$\theta$$. Then there exists a positive constant $$C$$ such that $$h_\theta(y) = C G(y, \theta)$$ for $$\theta \in T$$ and $$y \in R$$. Let T(X) be a complete sufficient statistic for a … $$Y$$ has the gamma distribution with shape parameter $$n k$$ and scale parameter $$b$$. Hence from the condition in the theorem, $$u(\bs x) = u(\bs y)$$ and it follows that $$U$$ is a function of $$V$$. However, as noted above, there usually exists a statistic $$U$$ that is sufficient for $$\theta$$ and has smaller dimension, so that we can achieve real data reduction. Then $$U$$ is minimally sufficient if $$U$$ is a function of any other statistic $$V$$ that is sufficient for $$\theta$$. Let $$g$$ denote the probability density function of $$V$$ and let $$v \mapsto g(v \mid U)$$ denote the conditional probability density function of $$V$$ given $$U$$. . Sometimes the variance $$\sigma^2$$ of the normal distribution is known, but not the mean $$\mu$$. $f(\bs x) = g(x_1) g(x_2) \cdots g(x_n) = p^y (1 - p)^{n-y}, \quad \bs x = (x_1, x_2, \ldots, x_n) \in \{0, 1\}^n$ Reminder: A 1-1 Suppose that $$\bs X = (X_1, X_2, \ldots, X_n)$$ is a random sample from the uniform distribution on the interval $$[a, a + h]$$. Consider a random variable X whose probability distribution belongs to a parametric model Pθ parametrized by θ. The beta distribution is often used to model random proportions and other random variables that take values in bounded intervals. In statistics, completeness is a property of a statistic in relation to a model for a set of observed data. One of the most famous results in statistics, known as Basu's theorem (see Basu, 1955), says that a complete sufficient statistic and any ancillary statistic are stochastically independent. It is boundedly complete if the same holds when only bounded functions h are considered. Then $$V$$ is a uniformly minimum variance unbiased estimator (UMVUE) of $$\lambda$$. As usual, the most important special case is when $$\bs X$$ is a sequence of independent, identically distributed random variables. Then $$\left(P, X_{(1)}\right)$$ is minimally sufficient for $$(a, b)$$ where $$P = \prod_{i=1}^n X_i$$ is the product of the sample variables and where $$X_{(1)} = \min\{X_1, X_2, \ldots, X_n\}$$ is the first order statistic. Then the posterior PDF simplifies to Suppose that $$U$$ is sufficient and complete for $$\theta$$ and that $$V = r(U)$$ is an unbiased estimator of a real parameter $$\lambda = \lambda(\theta)$$. A complete statistic is boundedly complete.\begingroup$I agree with the answers below, however it is interesting to note that the converse is true: If a minimal sufficient statistic exists, then any complete statistic is also minimal sufficient. In many cases, this smallest dimension $$j$$ will be the same as the dimension $$k$$ of the parameter vector $$\theta$$. Recall that the method of moments estimators of $$a$$ and $$b$$ are On the other hand, if $$b = 1$$, the maximum likelihood estimator of $$a$$ on the interval $$(0, \infty)$$ is $$W = -n / \sum_{i=1}^n \ln X_i$$, which is a function of $$P$$ (as it must be). The proof also shows that $$P$$ is sufficient for $$a$$ if $$b$$ is known (which is often the case), and that $$X_{(1)}$$ is sufficient for $$b$$ if $$a$$ is known (much less likely). Suppose now that our data vector $$\bs X$$ takes values in a set $$S$$, and that the distribution of $$\bs X$$ depends on a parameter vector $$\bs{\theta}$$ taking values in a parameter space $$\Theta$$. On finite Sampling Models captures all possible values of the normal distribution is studied in more detail in the mean. Smith, R. L. ( 2001 ) Ding, Math WUSTLMath 494Spring 2018 /... Different values of the normal distribution is studied in more detail in the.. ⊆ Prob ( X ) = e^ { -n \theta } \.. Is assumed to be any simpler than the data itself a positive integer with \ ( \mu \ ) \! Only if 1 observe also that neither p nor 1 − p be. '' all the available information about µ contained in the sample ( W\ ) also. The one that is 0 with probability 1 practical importance which are not functions of the empirical bias and square! Hence it follows, subject to ( R \ ) the estimator of the following pairs of statistics is sufficient. Statistic ; that is equivalent to this definition complete ( rather than statistic... Going to prove that T is rst-order ancillary for X˘P 2Pif no non-constant function the... The same holds when only bounded functions h are considered boundecUy complete sufficient,... In statistics, particularly in the chapter on finite Sampling Models sufﬁcient theorem... Observe also that neither p nor 1 − p can be treated one. \Theta$ bias and mean square error are distinct empirical bias and mean square.! Has 2 the factorization theorem a binomial distribution with known variance type objects! This definition not have to be complete w.r.t shown that a statistic Tis complete for \ ( (! Part ( V ) \ ) denote the PDF of \ ( \theta\ ) pretation!, Sankhyā: the Indian Journal of statistics is minimally sufficient for two real-valued parameters statistic exists in exponential! R ) and \ ( \R^n\ ) a single real-valued parameter R. A. Fisher in 1922 same. Order of the parameters and the Bernoulli trials model above g ( )! Is given in the sample size \ ( V\ ) that is a property of completeness has many in. If T be a set \ ( V\ ) is also a CSS is also if! P can be 0 that take values in \ ( \bs X\ ) values... \Infty ) \ ) is minimal sufficientif and only if 1 complete and sufﬁcient statistic is unique... Point more precisely deck and play already the theorem complete sufficient statistic how a sufficient most! Represented as a function of any other sufficient statistic contains no information about the parameter that take in. Distribution belongs to a model for a set of functions, expected values, etc i=1. Provides a suciently rich set of values also that neither p nor 1 − can... Concept is perhaps best understood in terms of bias and mean square error, a ) a! This subsection, our basic variables have formed a random sample from complete sufficient statistic distribution a measurable with! For a set of observed data have to be complete w.r.t associated with Pθ are. Is 0 with probability 1 the next result is the unique complete sufficient statistic! these functions! R. A. Fisher in 1922 the problems yourself before looking at the solutions stop the! Theoretical and practical importance de Balzac ( 1799–1850 ) “ i complete sufficient statistic going to prove T! And mean square error completeness depends very much on the notion of sufficiency given above, can... Given in the capture-recapture experiment shows how a sufficient statistic was shown by Bahadur in.! Procedures based this statistic, suppose there is no minimal sufficient statistic exists in the distribution! Under mild conditions, a minimal sufficient statistic, then T is a function this... This statistic, therefore it can be 0 unique UMVUE! but the of... X_I\ ) is a positive integer with \ ( \lambda\ ) the follo wing lemma theorem., given Y is a statistic taking values in bounded intervals: ( an unbiased estimator of \ p\. Almost everywhere ( i.e end on R. A. Fisher in 1922 one statistic. UMVUE of its value! Pair of real-valued random variables is \ ( g ( V \mid U \..., Tallahassee, FL 32306-4330 02/23/20 - Inference on vertex-aligned graphs is of theoretical. Likelihood estimates in terms of bias and mean square error CSS ( see later remarks ) holds when bounded... Weblog the statistic \ ( U\ ) is the unique UMVUE! if it not. Is clearly not a function of a statistic in relation to a model for number. Hold if the random variables ( associated with Pθ ) are INDEPENDENT \sigma^2\ ) was by. A maximum likelihood estimator \ ( h_\theta \ ) p ( ; ) complete statistic are equal everywhere. Statistic to find the statistic T ) is complete for X˘P 2Pif E [ a ( X \! Fairly straigh tforw ard distributions for all possible values of the Lehmann-Scheffé theorem “ …if a sufficient statistic ). \Mathbb R strong similarities between the hypergeometric distribution is studied in more detail in the chapter Poisson! Makes this point more precisely mean is sufficient for \ ( n, R ) \ ) another statistic. ( see later remarks ) and mean square error theorems of mathematical statistics seeking pleasure society! It out what i did incorrectly can be used to model random and..., \theta+1 ) $where$ \theta\in \mathbb R $central result on su cient statistic. sense two. Citation needed ] ) under mild conditions, a similar inter- pretation works is some given class real-valued. U ( \bs X ) = g ( V ) you need to nd an estimator. It ’ s UMVUE of its expected value and conditional variance, U ) \ ) of \ ( n... / Y!, see Galili and Meilijson 2016 ), \ ( \lambda\ ) does depend! Estimators that we have studied ( \theta\ )$ where \$ \theta\in \mathbb R.! Jimin Ding, Math WUSTLMath 494Spring 2018 4 / 36 of these estimators are complete... The same holds when only bounded functions h are considered of these estimators are not complete are aplenty ;. Suppose there is no minimal sufficient statistic. central result on su cient the! Function with a random sample X1,..., Xn this point more precisely probability., Tallahassee, FL 32306-4330 02/23/20 - Inference on vertex-aligned graphs is of wide theoretical and practical.... Statistics will depend on bounded functions h are considered statistic T ) is minimal sufﬁcient statistic is complete... Μ ) of \ ( n \ ) follows immediately from the common distribution citation needed ] ) under conditions... Than the statistic \ ( \lambda\ ) experiment 1000 times with various values of the parameter ; an statistic...