Survival analysis with competing risks

5 min readNov 22, 2023

Survival analysis is a type of statistical method that looks at the trend of the outcome of interest instead of at one specific time. For example, instead of asking the probability of a patient developing a malignant cancer within 5 years, which is a pretty large range, it can be more informative if we know the probability of the malignant cancer across 5 years.

Kaplan-Meier analysis: a classical survival method

One of the most common survival methods is Kaplan-Meier (KM) estimate (the math behind this method is out of the scope of this post since here I want to focus on competing risks; hopefully I will get some time to write about this and other classical methods in the future :))

Figure 1. An example of Kaplan-Meier plot. Source: Goel MK, Khanna P, Kishore J. Understanding survival analysis: Kaplan-Meier estimate. Int J Ayurveda Res. 2010 Oct;1(4):274–8. doi: 10.4103/0974–7788.76794. PMID: 21455458; PMCID: PMC3059453.

The KM plot consists of time as x axis and cumulative survival rate as y axis. This can relate to our hypothetical scenario about the development of the malignant cancer. Using Figure 1, we can say that at time 40, there is a probability of 63% of surviving, that is, not developing the cancer. Usually, the interpretation of KM plot is focused on the median survival time, which is the time at which 50% of patients survive.

Introduction to competing risks (CR)

In Figure 1, we also see the red cross representing “censor”. Censoring typically means that these patients did not experience the event of interest (cancer) before they dropped out of the study or the end of study. Sometimes, we may encounter situations where we are interested in the development of some diseases but the patients died before the occurrence. Here, death is the competing risk (CR), since it precludes the occurrence of the event of interest. In other words, CR and the event of interest are mutually exclusive. In KM analysis, it treats all censored patients censored, regardless of the reasons. With the presence of CR, statisticians try to account for them using different definition of “number at risk”, and thus different definition of “hazard”.

I would also like to highlight here that competing risks do not always cause “informative censoring”. These two concepts should be considered separately (then together if necessary). CR (i.e., death due to detiriorating health that may lead to a higher probability of the event of interests) may cause informative censoring since those who died had a higher chance of experiencing the event; CR (i.e., death due to car accident) may be uninformative censoring since car accidents do not relate to health status (hypothetically). Informative censoring depends heavily on the study design regarding how you define the end of study, loss of follow-up, causes of death, and other factors. CR requires domain knowledge about other possible health outcomes that can prevent the event of interest from happening.

Statistical theory of CR

Cumulative incidence function (CIF)

So, we cannot use KM plot if we have CR in our data. What other methods to use? Instead of looking from a “survival” perspective, which can only distinguish those who survive and those who do not, we can look from the “incidence” perspective, which allows us to examine the occurrence of an event. The incidence perspective leads to the cumulative incidence function (CIF) approach which can draw CIF for each event (the event of interest and CR) in the presence of other events. Figure 2 shows two CIF curves for cardiovascular death (the event of interest) and the non-cardiovascular death (CR).

Figure 2. Source: Austin PC, Lee DS, Fine JP. Introduction to the Analysis of Survival Data in the Presence of Competing Risks. *Circulation*. 2016;133(6):601–609. doi:10.1161/CIRCULATIONAHA.115.017719

Regression models

In classical survival methods, Cox regression models are used to calculate the effect of covariate on the hazard ratio, which is the ratio of hazard between two groups. (Hazard is defined as “the instantaneous rate of occurrence of the event of interest in subjects who are still at risk of the event.”; for example, hazard in the treatment group is the rate of developing the event given all who have not developed the event)

In the presence of CR, there are two regression models using slightly different definitions of hazards (Figure 3). Subdistribution models define the number at risk using the people who have not yet experienced the event of interest and alive, dead or alive. In contrast, cause-specific Cox model defines the number at risk using the people who are event-free, and therefore only those who are alive. To choose which model to use, some researchers mention that subdistribution models are suitable when the goal of analysis is to predict prognosis, and cause-specific Cox model is suitable when it is to understand the etiology [1].

Figure 3. Regression models for CR. Source: Austin PC, Lee DS, Fine JP. Introduction to the Analysis of Survival Data in the Presence of Competing Risks. *Circulation*. 2016;133(6):601–609. doi:10.1161/CIRCULATIONAHA.115.017719

Packages

I try to look for packages both in Python and R. I also need to use propensity score weighting for the analysis. Figure 4 is what I have found so far. Python does not support subdistribution models but has functions for CIF and cause-specific Cox regression with weights. R supports all three methods but only include weighting in the Cox model. Hopefully, in the future, there will be more functions that can fill up the table!

Reference

Austin PC, Lee DS, Fine JP. Introduction to the Analysis of Survival Data in the Presence of Competing Risks. Circulation. 2016;133(6):601–609. doi:10.1161/CIRCULATIONAHA.115.017719
https://lifelines.readthedocs.io/en/latest/
https://cran.r-project.org/web/packages/cmprsk/index.html
https://cran.r-project.org/web/packages/survival/index.html