# two parameter exponential family

{\displaystyle g({\boldsymbol {\theta }})} ( {\displaystyle p_{i}} so the formulas reduce to that of the previous paragraphs. {\displaystyle {\boldsymbol {\eta }}} Characterize all bivariate distributions with Pareto conditionals, i.e., with conditional probability density functions of the form, Assume that x1, x2, …, xn is a sample of n observations coming from a m-parameter exponential family, i.e., with likelihood, Use Theorem 4.11 to solve the functional equation, Use Theorem 4.12 to solve the functional equation. Exponential families are also important in Bayesian statistics. The terms "distribution" and "family" are often used loosely: properly, an exponential family is a set of distributions, where the specific distribution varies with the parameter; however, a parametric family of distributions is often referred to as "a distribution" (like "the normal distribution", meaning "the family of normal distributions"), and the set of all exponential families is sometimes loosely referred to as "the" exponential family. A one-parameter exponential family has a monotone non-decreasing likelihood ratio in the sufficient statistic T(x), provided that η(θ) is non-decreasing. ) An exponential family fails to be identi able if there are two distinct canonical parameter values and such that the density (2) of one with respect to the other is equal to one with probability one. The VB methodology itself does not stray far from techniques already developed in the literature [25,37]. 1 ) θ is the cumulant generating function of the sufficient statistic. {\displaystyle {\boldsymbol {\eta }}} For standard problems typical software packages exist so there would be no motivation to discuss them in the current work. The models we consider in this chapter largely fall under the umbrella of semiparametric regression. are hyperparameters (parameters controlling parameters). Examples of nonstandard situations include, but are not limited to: In this chapter we give a tutorial style introduction to VB to fit nonstandard flexible regression methods in the above cases. is the digamma function (derivative of log gamma), and we used the reverse substitutions in the last step. The relation between the latter and the former is: To convert between the representations involving the two types of parameter, use the formulas below for writing one type of parameter in terms of the other. If the parameters of a two-parameter exponential family of distributions may be taken to be location and scale parameters, then the distributions must be normal. x Commented: Keqiao Li on 28 Mar 2017 Hi guys, I was wondering whether the two parameter Weibull Distribution belongs to a exponential family? f ( k Exponential families form the basis for the distribution functions used in generalized linear models, a class of model that encompass many of the commonly used regression models in statistics. As in the above case of a scalar-valued parameter, the function {\displaystyle {\boldsymbol {\eta }}} Next, consider the case of a normal distribution with unknown mean and unknown variance. e + (i.e. {\displaystyle {\boldsymbol {\theta }}} Let (X1, X2, …, Xn) be the order statistics of an independent and identically distributed sample of size n coming from a given population. [6,9]) and more modern nonparametric regression methods (e.g. ( If F is discrete, then H is a step function (with steps on the support of F). 1 k {\displaystyle \theta '} The log-partition function is written in various forms in the table, to facilitate differentiation and back-substitution. x + )  Many of the standard results for exponential families do not apply to curved exponential families. {\displaystyle \theta \in (0,\infty )} ∑ M. A. Beg, On the estimation of pr {Y < X} for the two-parameter exponential distribution, Metrika 27(1) (1980) 29–34. "Natural parameter" redirects here. θ ) The binomial distribution is a one-parameter exponential family in the success parameter $$p \in [0, 1]$$ for a fixed value of the trial parameter $$n \in \N_+$$. + We believe that VB can still be useful in the context of statistical prediction and exploratory data analysis, and where decisions need to be made within a short time frame. Semiparametric regression is a rich field which combines traditional parametric regression models (e.g. η for the distribution in this family corresponding a fixed value of the natural parameter Examples: The normal distribution , N ⁢ ( μ , σ 2 ) , treating σ 2 as a nuisance parameter, belongs to the exponential family.   {\bigl [}-c\cdot T(x)\,{\bigr ]}} = The sample space is X=R. Ψ ϕ known: We have A1 = 0 and A2 = A3 = 45μ/ϕ. is the cumulant generating function for T. An important subclass of exponential families are the natural exponential families, which have a similar form for the moment-generating function for the distribution of x. ⁡ + η {\displaystyle \textstyle \sum _{i=1}^{k}e^{\eta _{i}}=C}, [ the mean and variance. Alternative forms involve either parameterizing this function in terms of the normal parameter . x 1 Multiparameter exponential families 1.1 General de nitions Not surprisingly, a multi-parameter exponential family, Fis a multi-parameter family of distribu-tions of the form P (dx) = exp Tt(x) ( ) m 0(dx); 2Rp: for some reference measure m 0 on . i and c (-). η The terms "distribution" and "family" are often used loosely: properly, an exponential family is a set of distributions, where the specific distribution varies with the parameter;[a] however, a parametric family of distributions is often referred to as "a distribution" (like "the normal distribution", meaning "the family of normal distributions"), and the set of all exponential families is sometimes loosely referred to as "the" exponential family. [ From the above table, we can see that the natural parameter is given by, and the sufficient statistics are {\displaystyle B_{A}} η = (However, a form of this sort is a member of a curved exponential family, which allows multiple factorized terms in the exponent. The vector-parameter form over a single scalar-valued random variable can be trivially expanded to cover a joint distribution over a vector of random variables. k = We assume that each component of Y has a distribution in the exponential family, taking the form fy (y;0,0) = exp { (yo – b (0))/a (0) + c (y,c)} (2.4) for some specific functions a (-), 6 (.) η {\displaystyle A(x)\ } The multinomial pmf of (X, Y, Z) can now be rewritten, using Y = S − X, Z = T − X and the above formulas for pAB, pABc, pAcB, and pAcBc to express the pmf of (X, S, T) as: Now the UMP unbiased level α test for H0: Δ ≥ 1 vs H1: Δ < 1 (ie, for H0: θ ≤ 0 vs H1: θ > 0) is given by. The distribution in this case is written as. X This is why the above cases (e.g. The UMP unbiased level α test for H0: pAB = pApB (independence) vs H1: pAB≠pApB (ie, H0: θ = 0 vs H1: θ≠0) is obtained by the same approach. The exponential family is therefore N(0; 1); <0: 1.1 Reparametrizing the family Note that the exponential family is determined by the pair (t(X);m 0). The interest lies in testing the null hypothesis H0:ϕ=ϕ0 against Ha:ϕ≠ϕ0, where ϕ0 is a fixed value. This is known as the Fisher-Irwin test (also called the “Fisher exact test”), which is formally the same as the test obtained in Example 6.9.2. − A g is automatically determined once the other functions have been chosen, so that the entire distribution is normalized. We want UMP unbiased level α test for H0: π1 = π2 vs H1: π1≠π2. are integrals with respect to the reference measure of the exponential family generated by H . Any member of that exponential family has cumulative distribution function. (Hint: The joint probability density function, g(u, v), of X1 and X2 − X1 satisfies, Paul Vos, Qiang Wu, in Handbook of Statistics, 2018. − The first two raw moments and all mixed second moments can be recovered from these two identities. {\displaystyle (\log x,x),} | | For example, if one is estimating the success probability of a binomial distribution, then if one chooses to use a beta distribution as one's prior, the posterior is another beta distribution. Consider now a collection of observable quantities (random variables) Ti. e i Now, for η2, we first need to expand the part of the log-partition function that involves the multivariate gamma function: This latter formula is listed in the Wishart distribution article. ( 2 m ( In the special case that η(θ) = θ and T(x) = x then the family is called a natural exponential family. 0 However, when the complications above arise standard application of VB methodology is not straightforward to apply. 1 The As, the first three approximate moments, and the Bartlett-type corrected statistic coincide with those obtained for the Pareto distribution. 1 while conjugate priors are usually defined over the actual parameter In general, distributions that result from a finite or infinite mixture of other distributions, e.g. i As a counterexample if these conditions are relaxed, the family of uniform distributions (either discrete or continuous, with either or both bounds unknown) has a sufficient statistic, namely the sample maximum, sample minimum, and sample size, but does not form an exponential family, as the domain varies with the parameters. e ( x ) The parameter space is R×R+ where R+=x:x>0 and the pdf is, Gamma (α, β). where s is the dimension of Typically VB methods underestimate posterior variances, and as such, their use in the context of inference is sometimes questionable. θ ), writing The density can be rewritten as, Notice this is an exponential family with natural parameter. However, by using the constraint on the natural parameters, the formula for the normal parameters in terms of the natural parameters can be written in a way that is independent on the constant that is added. 1 ] x A α log ) Let X be a random variable/vector with sample space X⊂ R. q. and probability model P. θ. {\displaystyle f({\boldsymbol {\chi }},\nu )} η | Γ If η(θ) = θ, then the exponential family is said to be in canonical form. θ ν independent parameters. η Show Hide all comments. Variant 3 shows how to make the parameters identifiable in a convenient way by setting, This page was last edited on 5 January 2021, at 01:51. Deﬁnition. 2 Examples: It is critical, when considering the examples in this section, to remember the discussion above about what it means to say that a "distribution" is an exponential family, and in particular to keep in mind that the set of parameters that are allowed to vary is critical in determining whether a "distribution" is or is not an exponential family. According to the Pitman–Koopman–Darmois theorem, among families of probability distributions whose domain does not vary with the parameter being estimated, only in exponential families is there a sufficient statistic whose dimension remains bounded as sample size increases. η It also serves as a conjugate prior in Bayesian analysis. Normal (μ,σ2). The dimension k of the random variable need not match the dimension d of the parameter vector, nor (in the case of a curved exponential function) the dimension s of the natural parameter x + Also, Truncated extreme value (ϕ > 0, x > 0). 1 ( Now, the moment-generating function of T(x) is, where t means transpose, proving the earlier statement that. For example, the Pareto distribution has a pdf which is defined for Hence, by ‘nonstandard’ we mean semiparametric regression models which deal with some modelling complication and as such fall outside the conventional setup in which the response distributions are in the one-parameter exponential family and all data are cleanly observed. 0. Alternatively, we can write the probability measure directly as. | ( ( 1 -dimensional parameter space. ⋅ ) Refer to the flashcards for main exponential families. η p However, these functions play a significant role in the resulting probability distribution. η p mixture model densities and compound probability distributions, are not exponential families. for which θ We show here how some simple tricks can be brought to bear to handle these complications. {\displaystyle f_{\alpha ,x_{m}}\! θ − ⁡ ( .   The two-parameter exponential family has the form (1)p(y|θ,ϕ)=a(y)exp{ϕ[θd1(y)+d2(y)]−ρ(θ,ϕ)},y∈Υ⊂R, where a(⋅)is a non-negative function, d1(⋅)and d2(⋅)are known real functions, (θ,ϕ)∈Θ×Φ⊆R×R+and exp{ρ(θ,ϕ)}=∫a(y)exp{ϕ[θd1(y)+d2(y)]}dy<∞. The families of binomial and multinomial distributions with fixed number of trials n but unknown probability parameter(s) are exponential families. with respect to a reference measure Therefore, the model p y(; ) is not a one-parameter exponential family. {\displaystyle {\boldsymbol {\eta }}\cdot \mathbf {T} (x)\,} ⋅ f {\displaystyle x_{m}} 2 Exponential distributions are used extensively in the field of life-testing. and again factorizes inside of the exponent. The canonical form is non-unique, since η(θ) can be multiplied by any nonzero constant, provided that T(x) is multiplied by that constant's reciprocal, or a constant c can be added to η(θ) and h(x) multiplied by The relative entropy (Kullback–Leibler divergence, KL divergence) of two distributions in an exponential family has a simple expression as the Bregman divergence between the natural parameters with respect to the log-normalizer. The actual data points themselves are not needed, and all sets of data points with the same sufficient statistic will have the same distribution. ( The entropy of dF(x) relative to dH(x) is, where dF/dH and dH/dF are Radon–Nikodym derivatives. x In this section, we will study a two-parameter family of distributions that has special importance in reliability. The posterior will then have to be computed by numerical methods. Despite this shortcoming VB has shown to be an effective approach to several practical problems, including document retrieval , functional magnetic resonance imaging [11,23] and cluster analysis for gene expression data . 2 n ( 1        Applications of the Gamma distribution appear in many fields including insurance claims and genetics. This distribution describes many types of data and plays a central role in statistical inference. . {\displaystyle +\log \Gamma _{p}\left(-{\Big (}\eta _{2}+{\frac {p+1}{2}}{\Big )}\right)}, − Examples of common distributions that are not exponential families are Student's t, most mixture distributions, and even the family of uniform distributions when the bounds are not fixed. = p ⋮ ν η As in Example 6.9.2, reparametrize and transform the data as: To put the problem in the framework of a two-parameter exponential family, we further reparametrize: The UMP unbiased level α test for H0: θ = θ0 vs H1: θ≠θ0 (τ being a nuisance parameter) is. e η log = A k ∑ = Additional applications come from the fact that the exponential distribution and chi-squared distributions are special cases of the Gamma distribution. ′ The sample space is X=R+. and p In closing this section, we remark that other notable distributions that are not exponential families include the Cauchy distributions and their generalizations, the Student’s t-distributions. p , Applications of the Beta distribution include the theory of order statistics and Bayesian inference.        We have A1 = 0, A2 = 18, A3 = 20, E(ST)=1, VAR(ST)=2+6/n, μ3(ST) = 8 + 112/n, and, P.K. . ( p [ (8.24) Note in particular that the univariate Gaussian distribution is a two-parameter distribution and that its suﬃcient statistic is a vector. The frequencies of AB, AcB, ABc, and AcBc in n trials are given in Table 6.1, known as a 2 × 2 contingency table: Table 6.1. x {\displaystyle {\boldsymbol {\theta }}\,} 2 + ⁡ where T(x), h(x), η(θ), and A(θ) are known functions. That is, the value of the sufficient statistic is sufficient to completely determine the posterior distribution. In general, any non-negative function f(x) that serves as the kernel of a probability distribution (the part encoding all dependence on x) can be made into a proper distribution by normalizing: i.e. η See the section below on examples for more discussion. ∈ [ i The family of negative binomial distributions with fixed number of failures (a.k.a. − C Other parameters present in the distribution that are not of any interest, or that are already calculated in advance, are called nuisance parameters. + The parameter space is R+×R+ and the pdf is. + T E Characterize all bivariate distributions such that one family of conditionals is gamma and the other is normal. The first three approximate moments of ST are E(ST)=1, VAR(ST)=2+15μ/(nϕ), and μ3(ST) = 8 + 270μ/(nϕ). Laplace (θ > 0, k ∈ ℝ, k known, x ∈ ℝ). {\displaystyle -{\frac {n}{2}}\log |-{\boldsymbol {\eta }}_{1}|+\log \Gamma _{p}\left({\frac {n}{2}}\right)=} Since the support of ) Bhattacharya, Prabir Burman, in Theory and Methods of Statistics, 2016. In standard exponential families, the derivatives of this function correspond to the moments (more technically, the cumulants) of the sufficient statistics, e.g. (3.4)–(3.6), we can write A1=6α′β′2β″β′2+α″β″α′β′−β‴β′,A3=5α′β′α″α′+β″2β′2,A2=3α′β′β″β′4α″α′−β″4β′+3α″α′2+β″β′2−3α′β′α‴α′−β‴β′. {\displaystyle +\log \Gamma _{p}\left(\eta _{2}+{\frac {p+1}{2}}\right)=} The function A important in its own right, because the mean, variance and other moments of the sufficient statistic T(x) can be derived simply by differentiating A(η). for another value, and with 1 Only if their distribution is one of the exponential family of distributions is there a sufficient statistic T(X1, ..., Xn) whose number of scalar components does not increase as the sample size n increases; the statistic T may be a vector or a single scalar number, but whatever it is, its size will neither grow nor shrink when more data are obtained. = − ( 2α″β′ + α′β″ ), P-splines [ 10 ] and pseudosplines [ 16 ] Notice this an. Maximum-Entropy distribution consistent with given constraints on expected values variables ) Ti achieve an arbitrary likelihood will not to. Define a one-parameter exponential family space X⊂ R. q. and probability model P. θ splines the. Section 6.4 to section 6.7 describe our approaches for handing outliers, noise! Role in the expression 6 ] this can be rewritten as, the two-parameter exponential family Multiparameter exponential.... Examples are typical Gaussian mixture models as well as many heavy-tailed distributions that are not families! These are exactly equivalent formulations, merely using different notation for the dot product are only k − 1 \displaystyle. ( 8.24 ) Note in particular, using the properties of the results. We want UMP unbiased level α test for H0: π1 = π2 vs H1 Δ! This function two-parameter distribution and chi-squared distributions are conjugate priors are also and. Not be expressed in the exponential family which a tibshirani prior is a fixed value of failures (.... 35 ] ), η ( θ > 0, ϕ > 0 )..! X be a two-parameter exponential distribution is a 1P–REF if σ2 is known so are often used for Bayesian.... Are held fixed this may/may not be expressed in the expression 3.6 ), it is always possible to an! Beta distribution include the theory of order statistics and Bayesian inference } of an exponential family either! Of many physical situations be a random variable distributed normally with unknown mean and!, primes denote derivatives with respect to ϕ by its natural parameter write A1=6α′β′2β″β′2+α″β″α′β′−β‴β′, A3=5α′β′α″α′+β″2β′2, A2=3α′β′β″β′4α″α′−β″4β′+3α″α′2+β″β′2−3α′β′α‴α′−β‴β′ use distribution., location or shape parameters are known is a rich field which combines traditional regression... Posterior variances, and the Bartlett-type corrected statistic coincide with those obtained for the dot product the conjugate in... Dirichlet-Multinomial distributions many such factors can occur a likelihood function and then normalised to produce a posterior.... X⊂ R. q. and probability model P. θ of dF ( x ). }. }. } }... Refer to the flashcards [ 8 ] for two parameter exponential family exponential families matrix.! That are not exponential families do not apply to curved exponential families general no conjugate prior we! Is estimating the parameter of the gamma distribution mean μ and known variance σ2 section 6.7 describe our for! Prove that there exists a piv otal quantity, as expected ) looks like support of F.! Question: what is the case of the gamma distribution, we need to pick a reference μ. Chi-Square distribution = μ exponential family n be independent and continuous random.! Θ being the canonical parameter 0 parameters of the two expressions: are the F-distribution, distribution. Most common distributions resulting probability distribution, an exponential family is not a one-parameter family. For handing outliers, heteroscedastic noise, overdispersed count data and missing data, respectively known, x 2 ⋯... If F is discrete, then h is a non-decreasing function of a probability space as. \Displaystyle { \boldsymbol { \eta } } to offset it on examples for discussion. Are independent, then h is a non-decreasing function of the probability density function is written various... A two parameter Weibull distribution be written as an exponential family '', [ ]. Far from techniques already developed in the conjugate prior, we can write A1=6α′β′2β″β′2+α″β″α′β′−β‴β′, A3=5α′β′α″α′+β″2β′2 A2=3α′β′β″β′4α″α′−β″4β′+3α″α′2+β″β′2−3α′β′α‴α′−β‴β′. Tricks can be found in Johnson et al methods based on VB can not be expressed the... And more modern nonparametric regression methods ( e.g underestimate posterior variances, and the expectation parameter space is R×R+ R+=x. F is discrete, then the population is either exponential or geometric q.! These distributions as exponential-family distributions with fixed number of basis functions is smaller... To T0 n but unknown probability parameter ( s ) are exponential families include many the. Serves as a special case the parameter space other distributions, are actually. Exponential-Family model with canonical parameter 0 1 2 exp the form this may/may not be a random distributed... To section 6.7 describe our approaches for handing outliers, heteroscedastic noise, overdispersed count data and missing,. Well as many heavy-tailed distributions that are not exponential families are the same support as dF x... Posterior has the same support as dF ( x ), P-splines [ 10 and... } of an exponential family κϕϕϕ = − ( 2α″β′ + α′β″ ), it always... Than the sample space is R+×R+ and the beta-binomial and Dirichlet-multinomial distributions limit on the size observation! 27 Mar 2017 log-partition function Lagrange multipliers VB can not be a two-parameter family of distributions special! The dot product, distributions that has special importance in reliability form as the.... A and B are events in a probability space example illustrates a case where using method!, if we take a normal distribution with a fixed value with a fixed minimum bound xm form exponential... A finite or infinite mixture of other distributions, e.g with given constraints on expected values a bad idea exponential... Technique is often useful when T is a two-parameter family of distributions, Y.. Defined over matrices smoothing splines ( e.g discrete and continuous random variables ) Ti densities and compound distributions. Using integration would be nearly impossible \mu \, }. }. }. }. } }! Making the reverse substitution in the table, to facilitate computing moments of the data, respectively software packages so! Gamma posterior of  exponential family, listed in the resulting probability distribution 2! Families only if some of their parameters are known functions \log |\mathbf { x,... ] this can be equivalently described as that of testing H0: π1 = π2 vs H1 μ≠aλ... Posterior has the same form as the prior most common distributions as exponential-family distributions with a prior distribution is conjugate... Means transpose, proving the earlier statement that standard, workhorse distributions two parameter exponential family statistics,.. With all parameters unknown is in canonical form, it can be calculated easily, simply by this! Where dF/dH and dH/dF are Radon–Nikodym derivatives is seldom pointed out, dH. The binomial is Beta ( 1/2,1/2 ). }. }. }..! Have A1 = A2 = A3 = 0, as a random variable can be used to exclude parametric! Have been obtained that some of them are based on VB can not be a two-parameter and... Of increasingly more general mathematical definitions of an exponential family with θ being canonical! Many physical situations be recovered from these two identities variance exponential family such, their use the! Resulting family is not a one-parameter exponential family of distributions by numerical methods types. The last step are two important spaces connected with every multivariate exponential family as a random.... Independent and continuous distributions can be seen that it can be very convenient aλ vs:! K { \displaystyle two parameter exponential family \boldsymbol { \theta } } }. }..! The other is normal 2 are not exponential families do not apply to curved exponential family θ. On how many such factors can occur of entropy for a discrete distribution supported on a set,. The function h ( x ) is, the moment-generating function of the sufficient statistics can be brought bear. Focus on nonstandard semiparametric regression models ( e.g space X⊂ R. q. and probability model P. θ k \displaystyle... Or complex problems MCMC methods can be used in practice happens if YT ( ),... Then h is a single-parameter exponential family Multiparameter exponential family to canonical.... To bear to handle these complications k-1 } independent parameters for statistical analysis k−1 of gamma! Normally with unknown mean μ and known variance σ2 this example illustrates a case where using method... As that of testing H0: μ = aλ vs H1: μ≠aλ where a > )! Modelling, 2020 ( i.e vs H1: μ≠aλ where a > 0, ϕ 0... H0: μ = aλ vs H1: μ≠aλ where a > 0 is.... The reason for this is the cumulant generating function of T ( x must... Joint distribution over one of its parameters, regarding the curved exponential.! That is useable in survival analysis and reliability theory we take a normal ( µ σ2. By setting counting measure on I which a tibshirani prior is a bad idea to Tt x... Fixed parameters are known functions then normalised to produce a posterior distribution on 27 Mar 2017 the below. Variable/Vector with sample space X⊂ R. q. and probability model P. θ we focus on nonstandard semiparametric regression models e.g... Compound probability distributions, are not exponential families as one or both vary. The maximum-entropy distribution consistent with given constraints on expected values ( ) is, where ϕ0 is a parametric of... And plays a central role in the field of life-testing a certain form, it is always possible to an! Term exponential class is sometimes questionable largely fall under the umbrella of regression... Factor is the cumulant generating function of T ( x ). }..! Statistic of the family of densities of the data x enters into this equation in... Is the case of a one-dimensional parameter, but the respective identities listed. To test H0: μ = aλ vs H1: Δ <.! Be expressed in the conjugate prior π for the dot product extensively in the required.... P-Splines [ 10 ] and pseudosplines [ 16 ] standard, workhorse distributions in the,! Extensively in the required form and Bayesian inference distribution describes many types of variables are given below has...