Skip to content

CausalFlow: Causal Discovery Methods with Observational and Interventional Data from Time-series

CausalFlow is a python library for causal analysis from time-series data. It comprises two causal discovery methods recently released in the literature:

Acronym Full-name
F-PCMCI Filtered-PCMCI
CAnDOIT CAusal Discovery with Observational and Interventional data from Time-series

Coming soon..

F-PCMCI

Extension of the state-of-the-art causal discovery method PCMCI, augmented with a feature-selection method based on Transfer Entropy. The algorithm, starting from a prefixed set of variables, identifies the correct subset of features and a hypothetical causal model between them. Then, using the selected features and the hypothetical causal model, the causal discovery is executed. This refined set of variables and the list of potential causal links between them contribute to achieving faster and more accurate causal discovery.

In the following, an example demonstrating the main functionality of F-PCMCI is presented, along with a comparison between causal models obtained by PCMCI and F-PCMCI causal discovery algorithms using the same data. The dataset consists of a 7-variables system defined as follows:

$$ \begin{cases} X_0(t) = 2X_1(t-1) + 3X_3(t-1) + \eta_0\ X_1(t) = \eta_1\ X_2(t) = 1.1(X_1(t-1))^2 + \eta_2\ X_3(t) = X_3(t-1)X_2(t-1) + \eta_3\ X_4(t) = X_4(t-1) + X_5(t-1)X_0(t-1) + \eta_4\ X_5(t) = \eta_5\ X_6(t) = \eta_6\ \end{cases} $$

min_lag = 1
max_lag = 1
np.random.seed(1)
nsample = 1500
nfeature = 7

d = np.random.random(size = (nsample, feature))
for t in range(max_lag, nsample):
  d[t, 0] += 2 * d[t-1, 1] + 3 * d[t-1, 3]
  d[t, 2] += 1.1 * d[t-1, 1]**2
  d[t, 3] += d[t-1, 3] * d[t-1, 2]
  d[t, 4] += d[t-1, 4] + d[t-1, 5] * d[t-1, 0]
Causal Model by PCMCI Causal Model by F-PCMCI
Execution time ~ 8min 40sec Execution time ~ 3min 00sec

F-PCMCI removes the variable $X_6$ from the causal graph (since isolated), and generate the correct causal model. In contrast, PCMCI retains $X_6$ leading to the wrong causal structure. Specifically, a spurious link $X_6$ → $X_5$ appears in the causal graph derived by PCMCI.

CAnDOIT

CAnDOIT extends F-PCMCI, allowing the possibility of incorporating interventional data in the causal discovery process alongside the observational data.

In the following, an example is presented that demonstrates CAnDOIT's capability to incorporate and exploit interventional data. The dataset consists of a 5-variables system defined as follows:

$$ \begin{cases} X_0(t) = \eta_0\ X_1(t) = 2.5X_0(t-1) + \eta_1\ X_2(t) = 0.5X_0(t-2) * 0.75X_3(t-1) + \eta_2\ X_3(t) = 0.7X_3(t-1)X_4(t-2) + \eta_3\ X_4(t) = \eta_4\ \end{cases} $$

This system of equation generates the time-series data in the observational case. For the interventional case instead, the equation $X_1(t) = 2.5X_0(t-1) + \eta_1$ was replaced by a hard intervention $X_1(t) = 15$.

min_lag = 1
max_lag = 2
np.random.seed(1)
nsample_obs = 1000
nsample_int = 300
nfeature = 5
d = np.random.random(size = (nsample_obs, nfeature))
for t in range(max_lag, nsample_obs):
    d[t, 1] += 2.5 * d[t-1, 0]
    d[t, 2] += 0.5 * d[t-2, 0] * 0.75 * d[t-1, 3] 
    d[t, 3] += 0.7 * d[t-1, 3] * d[t-2, 4]


# hard intervention on X_1
d_int1 = np.random.random(size = (nsample_int, nfeature))
d_int1[:, 1] = 15 * np.ones(shape = (nsample_int,)) 
for t in range(max_lag, nsample_int):
    d_int1[t, 2] += 0.5 * d_int1[t-2, 0] * 0.75 * d_int1[t-2, 3] 
    d_int1[t, 3] += 0.7 * d_int1[t-1, 3] * d_int1[t-2, 4]
Ground-truth Causal Model Causal Model by F-PCMCI Causal Model by CAnDOIT
$X_0$ observable $X_0$ hidden
observation samples 1000 observation samples 1000
intervention samples ✗ intervention samples ✗

By using interventional data, CAnDOIT removes the spurious link $X_1$ → $X_2$ generated by the hidden confounder $X_0$.

Other Causal Discovery Algorithms

Although the main contribution of this repository is to present the CAnDOIT and F-PCMCI algorithms, other causal discovery methods have been included for benchmark purposes. As a consequence, CausalFLow provides a collection of causal discovery methods, beyond F-PCMCI and CAnDOIT, that output time-series DAGs (DAGs which comprises the lag specification for each link). They are listed as follows:

Some algorithms are imported from other languages such as R and Java and are then wrapped in Python. Having the majority of causal discovery methods integrated into a single framework, which handles various types of inputs and outputs causal models, can facilitate the use of these algorithms.

Algorithm Feature Selection Observations Interventions
DYNOTEARS
PCMCI
TCDF
tsFCI
VarLiNGAM
F-PCMCI
CAnDOIT

Citation

Please consider citing the following papers depending on which method you use:

  • F-PCMCI:
    L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2023). Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios, Proceedings of the Conference on Causal Learning and Reasoning (CLeaR).
    @inproceedings{castri2023fpcmci, title={Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios}, author={Castri, Luca and Mghames, Sariah and Hanheide, Marc and Bellotto, Nicola}, booktitle={Conference on Causal Learning and Reasoning (CLeaR)}, year={2023}, }

  • CAnDOIT:
    L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2024). CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series, Under review in Advanced Intelligent System.

Requirements

  • pandas>=1.5.2
  • netgraph>=4.10.2
  • networkx>=2.8.6
  • ruptures>=1.1.7
  • scikit_learn>=1.1.3
  • torch>=1.11.0
  • gpytorch>=1.4
  • dcor>=0.5.3
  • h5py>=3.7.0
  • jpype1>=1.5.0
  • mpmath>=1.3.0
  • causalnex>=0.12.1
  • lingam>=1.8.2
  • tigramite>=5.1.0.3

Installation

Before installing CausalFlow, you need to install Java and the IDTxl package used for the feature-selection process, following the guide described here. Once complete, you can install the current release of CausalFlow with:

# COMING SOON: pip install causalflow

For a complete installation Java - IDTxl - CausalFlow, follow the following procedure.

1 - Java installation

Verify that you have not already installed Java:

java -version

if the latter returns Command 'java' not found, ..., you can install Java by the following commands, otherwise you can jump to IDTxl installation.

# Java
sudo apt-get update
sudo apt install default-jdk

Then, you need to add JAVA_HOME to the environment

sudo nano /etc/environment
JAVA_HOME="/lib/jvm/java-11-openjdk-amd64/bin/java" # Paste the JAVA_HOME assignment at the bottom of the file
source /etc/environment

2 - IDTxl installation

# IDTxl
git clone https://github.com/pwollstadt/IDTxl.git
cd IDTxl
pip install -e .

3 - CausalFlow installation

# COMING SOON: pip install causalflow

Recent changes

Version Changes
4.0.0 package published