Talk:Sufficient statistic

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Jheald (talk | contribs) at 19:35, 16 October 2006 (Notation; and the difference between T and θ). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

hello every body

i want to discuss sufficent statistics here... any body has any good information and know the usage of sufficnet stats.. may discuss here... thanx

Do you have specific questions? Michael Hardy 19:53, 7 May 2005 (UTC)[reply]

Examples

The two examples have h(x)=1. It might be better if at least one of them did not. --Henrygb 17:43, 23 May 2006 (UTC)[reply]

I agree. An i.i.d. sample from a Poisson distribution would do it. I'll be back.... Michael Hardy 23:40, 23 May 2006 (UTC)[reply]

Confusing notation

The notation of the conditional probabilities in section Mathematical definition is confusing or confused. You'd expect that Pr(A|B,C) = Pr(A|C,B). So why is it the case that Pr(x|t,θ) = Pr(x|t), but not Pr(x|t,θ) = Pr(x|θ)? The non-standard notation Pr(X=x|T(X)=t,θ) is not explained. Is θ the parameter itself (a variable), or is this the value of the parameter? The example does not help. It suggests that the parameter is p. But, clearly, the the joint probability distribution (given as a density) depends on p, so if we take the "precise definition" literally, this is not a sufficient statistic. The gain in using the shorthand notation is completely nullified or worse by the lack of explanation.  --LambiamTalk 21:47, 13 October 2006 (UTC)[reply]

I've reverted the cut material, because IMO the derivation that was cut makes it much clearer where the factorisation criterion comes from.
While some may frown on it, it's not exactly unusual for lower case letters to be used for random variables, with starred variables (eg θ*) being used to indicate a variable that is being held to some 'special' value. This is a not uncommon notation, and usefully succinct and readable; IMO it's no bad thing for WP readers to come across it from time to time. Jheald 13:43, 16 October 2006 (UTC)[reply]
Strictly speaking I suppose the use of the lower case letters implies that the (lower case) values of the (upper case) random variables are themselves being treated as variables; in practice I suspect the prevalance of lower case (and it is prevalent) may be as much because the lower case forms are easier on the eye and less out of the ordinary, and therefore make the equations quicker to read and easier to assimilate (not negligible advantages).
But it seems to me the real point for Lambiam is elsewhere. Pr(x|t,θ) is indeed completely equivalent to Pr(x|θ,t), because of the conventions of the notation. However Pr(x|t,θ) = Pr(x|t) does not reflect any such general necessity. It is true, for all values of x,t and θ, exclusively if and only if T is a sufficient statistic for θ. And "T is a sufficient statistic for θ" is different to saying "θ is a sufficient statistic for T": requiring Pr(x|t,θ) = Pr(x|t) is very different to requiring Pr(x|θ,t) = Pr(x|θ).
The difference is perhaps clearest in the Fisher decomposition, P(x|t,θ) = P(x|t).P(t|θ), which holds if T is a sufficient statistic. This tells us that t contains everything that x has to tell us about θ. But θ does not necessarily contain everything x has to tell us about t. Jheald 19:35, 16 October 2006 (UTC)[reply]