Dr Ari Samaranayaka

BSc, MPhil, PhD

Senior Research Fellow and

Biostatistician

Centre for Biostatistics

Division of Health Sciences

University of Otago

Associate Professor Robin Turner

BSc, MBiostat, PhD

Unit Director

Centre for Biostatistics

Division of Health Sciences

University of Otago

Dr Claire Cameron

BSc(Hons), DipGrad, MSc, PhD

Senior Research Fellow and

Biostatistician

Centre for Biostatistics

Division of Health Sciences

University of Otago

Survival analysis investigates the time until the occurrence of an event.

Often, the event of interest is death, however, it can equally be time

to any other event such as recovery from a condition, wait time for

elective surgery, first recurrence of cancer after surgery, the intervals

between successive births, or the time until equipment failure. Why is

time-to-event analysis different from other analyses? Why can we not

use other standard statistical tools like a t-test or least square regres-

sion to analyse it? First, most of those analyses rely on the normally

distributed residuals assumption, which does not hold with time-to-

event data because this data is always positive and sometimes highly

skewed. For example, consider time to death after a high risk surgery;

many patients may die shortly after surgery and those who do survive

will then live for a long time. Second, it is common, when measur-

ing time-to-event outcomes, for events to not happen during the

study period for some individuals, which means some observations

are ‘censored’. Survival analysis using the Kaplan-Meier (K-M) survival

estimator does not assume a specific distribution, therefore it is a

nonparametric method. This means that the normality assumption

and the assumption that all outcomes are observed are not required.

Before explaining the K-M estimator, let us look at some terminology.

Imagine measuring the time to an event among a cohort of individ-

uals. Not everyone will enter the study on the same date, so time

zero for each individual is the day the person entered the study. The

study period might end before all participants experience the event,

also, some people might drop out during the study. We will use the

following terminology:

Censored: the event has not occurred, or the subject was not un-

der observation when the event occurred.

Interval censoring: rather than observing the exact time of event,

we only observed that the event occurred between two known

time points.

Followup period: the period during which the subject was under

observation. Followup starts when the person enters study, and

ends when either the event occurs or the study ends – whichever

comes first. This period can be shorter than the study period if the

event occurred during the study, or if the person leaves the study.

Survival function: the probability of survival up to a particular time

point as a function of time. This is different to the instantaneous

probability of survival as a function of time.

Hazard rate: the instantaneous rate of an event occurring. This is

known as the failure rate, conditional failure rate, or hazard func-

tion. This rate has no upper bound, unlike a probability.

Hazard ratio: the ratio of hazard rates corresponding to two levels

of an explanatory variable. For example, in a drug study, the ratio

of hazard rates among treated and control populations is used as

a measure of the effect of the treatment.

Presenting a survival function as a K-M plot is one way to describe

a cohort’s survival time graphically. The focus of this article is to de-

scribe K-M plots and in which circumstances they can be used, and

thus, how to interpret them correctly.

Table 1 gives an hypothetical example of survival times in days in as-

cending order for two groups of people treated with two different

procedures for the same condition. All 21 people in Group 1 and 11 of

20 people in Group 2 died during the followup period of 36 days. The

other nine people in Group 2 were either lost to followup, or alive at

the end of the study, therefore their survival times are censored. The

question is, how do we compare the survival in these two groups?

Comparing the mean survival times in two groups (ignoring censor-

ing), Group 1 (8.4 days) has about half the survival time of Group 2

(16.3 days). Alternatively, comparing the risk of dying, or the hazard,

in two groups (again, ignoring censoring): the mean hazard in Group

1 (21 deaths in 177 days of followup or 0.119 deaths per day) is about

3.5 times that of Group 2 (11 deaths over 326 days of followup or

0.034 deaths per day). Neither of these methods are satisfactory be-

cause they ignore the censored observations.

The K-M curve compares instantaneous rates in the two groups. The

K-M curve is defined as the probability of surviving a given length

of time (treating time as many small intervals). There are three as-

sumptions in this analysis: (1) at each time interval, censored individ-

uals have the same survival prospects as those who continue to be

followed during the interval; (2) survival probabilities are the same

for those recruited earlier and later in the study; and (3) the events

happen at the times specified, rather than between two time points.

The K-M estimate involves first computing probabilities of survival

during each time interval as the number who survived over the peri-

od divided by the number at risk at the start of the period. The total

probability of survival to the end of each time interval is calculated

by multiplying the probability of survival for that interval with all the

probabilities for earlier time intervals. This calculation is shown in Ta-

ble 2 for Group 2, the group with censored observations. The table

begins at time zero (start of followup). The reason for this is to allow

for the possibility of censoring before the earliest failure time.

Note that although 11 out of the 20 in Group 2 (55%) died over the

36 weeks (and 45% did not), the K-M estimate for the survival at 36

weeks is 24%. That is because the K-M estimator does not consider

those who died or survived beyond their followup. The K-M survival

plot displays the first and last columns of this table. Figure 1 shows the

K-M plot for both groups.

Figure 1 shows that estimated survival is lower in Group 1 than in

Group 2. The steeper slope shows that the rate of events is higher,

i.e. events occurred faster. If we repeated the experiment, we would

be unlikely to get the same two curves because there is uncertainty

associated with these estimates. The logrank test is often used to

decide if the observed difference between curves is expected if the

two corresponding populations have the same survival rates. 1 In oth-

er words, the logrank test is used to test the hypothesis that there

is no difference regarding survival among individuals in two groups.

Another commonly used method to compare survival curves is the

Cox proportional hazards model. This model allows for adjustment

of potential confounding. More information on this method is given

by Bewick and colleagues. 2

**References**

1. Bland JM, Altman DG. The logrank test . BMJ. 2004 May

1;328(7447):1073.

2. Bewick V, Cheek L, Ball J. Statistics review 12: survival analysis. Crit

Care. 2004 Oct;8(5):389–394.