The simplest and most important distribution in survival studies is the exponential distribution. In the late 1940s, researchers began to choose the exponential distribution to describe the life pattern of electronic systems. The exponential distribution has since continued to play a role in lifetime studies analogous to that of the normal distribution in other areas of statistics. The exponential distribution is often referred to as a purely random failure pattern. It is famous for its unique "lack of memory," which requires that the age of the animal or person does not affect future survival. Although many survival data cannot be described adequately by the exponential distribution, an understanding of it facilitates the treatment of more general situations.
The exponential distribution is characterized by a constant hazard rate 𝜆, its only parameter. A high 𝜆 value indicates high risk and short survival; a low 𝜆 value indicates low risk and long survival. When the surival time T follows the exponential distribution with a parameter 𝜆, the probability density function is defined as
The cumulative distribution function is
and the survivorship function is then
and the hazard function is
Note that the hazard function is a constant, 𝜆, independent of t. Because the exponential distribution is characterized by a constant hazard rate, independent of the age of the person, there is no aging or wearing out, and failure or death is a random event indepdendent of time. When natural logarithms of the survivorship function are taken, log S(t) = -𝜆t, which is a linear function of t.
The Weibull distribution is a generalization of the exponential distribution. However, unlike the exponential distribution, it does not assume a constant hazard rate and therefore has broader application. The Weibull distribution is characterized by two parameters, 𝛾 and 𝜆. The value of 𝛾 determines the shape of the distribution curve and the value of 𝜆 determines its scaling. Consequently, 𝛾 and 𝜆 are called the shape and scale parameters, respectively. When 𝛾 = 1, the hazard rate remains constant as time increases; this is the exponential case. The hazard rate increases when 𝛾 >1 and decrease when 𝛾 <1 as t increases. Thus, the Weibull distribution may be used to model the survival distribution of a population with increasing, decreasing, or constant risk.
The probability density function, cumulative distribution functions are, survivorship function, and hazard function are:
Weibull distribution is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Frechet and first applied by Rosin & Rammler to describe a particle size distribution.
In its simplest form the lognormal distribution can be defined as the distribution of a variable whose logarithm follows the normal distribution. Its origin may be traced as far back as 1879, when McAlister described explicitly a theory of the distribution. Most of its aspects have since been under study. Gaddum gave a review of its application in biology, followed by Boag's applications in cancer research. Its history, properties, estimation problem, and uses in economics have been discussed in detail by AItchison and Brown. Later, other investigators also observed that the age at onset of Alzheimer's disease and the distribution of survival time of several diseases such as Hodgkin's disease and chronic leukemia could be rather closely approximated by a lognormal distribution since they are markedly skewed to the right and the logarithms of survival times are approxiamtely normally distributed.
Consider the survival time T such that log T is normally distributed with mean 𝜇 and variance 𝜎2. We then say that T is lognormally distributed and write T as 𝛬(𝜇, 𝜎2). It should be noted that 𝜇 and 𝜎2 are not the mean and variance of the lognormal distribution. The hazard function of the lognormal distrition increases initially to a maximum and then decreases (almost as soon as the median is passed) to zero as time approaches infinity. Therefore, the lognormal distribution is suitable for survival patterns with an initially increasing and then decreasing hazard rate. By a central limit theorem, it can be shown that the distribution of the product of n independent positive variates approaches a lognormal distribution under very general conditions: for example, the distribution of the size of an organism whose growth is subject to many small impulses, the effect of each of which is proportional to the momentary size of the organism.
The gamma distribution, which includes the exponential and chi-square distribution, was used a long time ago by Brown and Flood to describe the life of glass tumblers circulating in a cafeteria and by Birnbaum and Saunders as a statistical model for life length of materials. Since then, this distribution has been used frequently as a model for industrial reliability problems and human survival.
Suppose that failure or death takes place in n stages or as soon as n subfailures have happened. At the end of the first stage, after time T1, the first subfailure occurs; after that the second stage begins and the second subfailure occurs after time T2; and so on. Total failure or death occurs at the end of the nth stage, when the nth subfailure happens. The survival time, T, is then T1 + T2 + … + Tn. The times T1, T2, …, Tn spent in each stage are assumed to be independently exponentially distributed with probability density function 𝜆exp(-𝜆ti), i = 1, …, n. That is, the subfailures occur independently at a constant rate 𝜆. The distribution of T is then called the Erlangian distribution. There is no need for the stages to have physical significance since we can always assume that death occurs in the n-stage process just described. This idea, introduced by A. K. Erlang in his study of congestion in telephone systems, has been used widely in queuing theory and life processes.
The gamma distribution is characterized by two parameters, 𝛾 and 𝜆. When 0 < 𝛾 < 1, there is negative aging and the hazard rate decreases monotonically from infinity to 𝜆 as time increases from 0 to infinity. When 𝛾 > 1, there is positive aging and the hazard rate increases monotonically from 0 to 𝜆 as time increases from 0 to infinity. When 𝛾 = 1, the hazard rate equals 𝜆, a constant, as in the exponential case.
The survival time T has a log-logistic distribution if log(T) has a logistic distribution. The density, survivorship, hazard, and cumulative hazard functions of the log-logistic distribution are, respectively,
The log-logistic distribution is characterized by two parameters, 𝛼, and 𝛾. The median of the log-logistic distribution is 𝛼-1/𝛾. When 𝛾 > 1, the log-logistic hazard has the value 0 at time 0, increases to a peak at a specific t, and then declines, which is similar to the lognormal hazard. When 𝛾 = 1, the hazard starts at 𝛼1/𝛾 and then declines monotonically. When 𝛾 < 1, the hazard starts at infinity and then declines, which is similar to the Weibull distribution. The hazard function declines toward 0 as t approaches infinity. Thus, the log-logistic distribution may be used to describe a first increasing and then decreasing hazard or a monotonically decreasing hazard.
Other Survival Distributions
Many other distributions can be used as models of survival time, three of which we discuss briefly in this section: the linear exponential, the Gompertz, and a distribution whose hazard rate is a step function. The linear-exponential model and the Gompertz distribution are extensions of the exponential distribution. Both describe survival patterns that have a constant initial hazard rate. The hazard rate varies as a linear function of time or age in the linear-exponential model and as an exponential function of time or age in the Gompertz distribution.
In demonstrating the use of the linear-exponential model, Broadbent, uses as an example the serivce of milk bottles that are filled in a dairy, circulated to customers, and returned empty to the dairy. The model was also used by Carbone et al. to describe the survival pattern of patients with plasmacytic myeloma. The hazard function of the linear-exponential distribution is
where 𝜆 and 𝛾 can be values such that h(t) is nonnegative. The hazard rate increases from 𝜆 with time if 𝛾 > 0, decrease if 𝛾 < 0, and remains constant (an exponential case) if 𝛾 = 0. The probability density function and the survivorship function are, respectively,
The Gompertz distribution is also characterized by two parameters, 𝜆 and 𝛾. The hazard function, survival function, and the probability density function are below, respectively,
Finally, we consider a distribution where the hazard rate is a step function. The hazard rate, survival function, and probability density function are below, respective,
One application of this distribution is the life-table analysis. In a life-table analysis, time is divided into intervals and the harzard rate is assumed to be constant in each interval. However, the overall hazard rate is not necessarily consrtant.
The nine distributions described above are, among others, reasonable model for survival time distribution. All have been designed by considering a biological failure, a death process, or an aging property. They may or may not be appropriate for many practical situations, but the objective here is to illustrate the various possible techniques, assumptions, and arguments that can be used to choose the most appropriate model. If none of these distributions fits the data, investigators might have to derive an original model to suit the particular data, perhaps by using some of the ideas presented here.