Overview

CausalFlow is a python library for causal analysis from time-series data. It comprises:

F-PCMCI - Filtered-PCMCI
CAnDOIT - CAusal Discovery with Observational and Interventional data from Time-series
RandomGraph
Other causal discovery methods all within the same framework

Useful links

F-PCMCI:
L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2023).
Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios,
Proceedings of the Conference on Causal Learning and Reasoning (CLeaR).
@inproceedings{castri2023enhancing, title={Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios}, author={Castri, Luca and Mghames, Sariah and Hanheide, Marc and Bellotto, Nicola}, booktitle={Conference on Causal Learning and Reasoning}, pages={243--258}, year={2023}, organization={PMLR} }
CAnDOIT:
L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2024).
CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series,
Advanced Intelligent Systems.
@article{castri2024candoit, title={CAnDOIT: Causal Discovery with Observational and Interventional Data from Time Series}, author={Castri, Luca and Mghames, Sariah and Hanheide, Marc and Bellotto, Nicola}, journal={Advanced Intelligent Systems}, volume={6}, number={12}, pages={2400181}, year={2024}, publisher={Wiley Online Library} }
Tutorials [Coming soon..]

F-PCMCI

Extension of the state-of-the-art causal discovery method PCMCI, augmented with a feature-selection method based on Transfer Entropy. The algorithm, starting from a prefixed set of variables, identifies the correct subset of features and a hypothetical causal model between them. Then, using the selected features and the hypothetical causal model, the causal discovery is executed. This refined set of variables and the list of potential causal links between them contribute to achieving faster and more accurate causal discovery.

In the following, an example demonstrating the main functionality of F-PCMCI is presented, along with a comparison between causal models obtained by PCMCI and F-PCMCI causal discovery algorithms using the same data. The dataset consists of a 7-variables system defined as follows:

$\small{ \begin{cases} X_0(t) &= 2X_1(t-1) + 3X_3(t-1) + \eta_0 \\ X_1(t) &= \eta_1 \\ X_2(t) &= 1.1(X_1(t-1))^2 + \eta_2 \\ X_3(t) &= X_3(t-1) \cdot X_2(t-1) + \eta_3 \\ X_4(t) &= X_4(t-1) + X_5(t-1) \cdot X_0(t-1) + \eta_4 \\ X_5(t) &= \eta_5 \\ X_6(t) &= \eta_6 \end{cases} }$

min_lag = 1
max_lag = 1
np.random.seed(1)
nsample = 1500
nfeature = 7

d = np.random.random(size = (nsample, feature))
for t in range(max_lag, nsample):
  d[t, 0] += 2 * d[t-1, 1] + 3 * d[t-1, 3]
  d[t, 2] += 1.1 * d[t-1, 1]**2
  d[t, 3] += d[t-1, 3] * d[t-1, 2]
  d[t, 4] += d[t-1, 4] + d[t-1, 5] * d[t-1, 0]

Causal Model by PCMCI	Causal Model by F-PCMCI

Execution time ~ 8min 40sec	Execution time ~ 3min 00sec

F-PCMCI removes the variable $X_6$ from the causal graph (since isolated), and generate the correct causal model. In contrast, PCMCI retains $X_6$ leading to the wrong causal structure. Specifically, a spurious link $X_6$ -> $X_5$ appears in the causal graph derived by PCMCI.

CAnDOIT

CAnDOIT extends LPCMCI, allowing the incorporation of interventional data into the causal discovery process alongside observational data. Like its predecessor, CAnDOIT can handle both lagged and contemporaneous dependencies, as well as latent variables.

Example

In the following example, taken from one of the tigramite tutorials (this), we demonstrate CAnDOIT's ability to incorporate and leverage interventional data to improve the accuracy of causal analysis. The example involves a system of equations with four variables:

$\small{ \begin{cases} X_0(t) = 0.9X_0(t-1) + 0.6X_1(t) + \eta_0 \\ L_1(t) = \eta_1 \\ X_2(t) = 0.9X_2(t-1) + 0.4X_1(t-1) + \eta_2 \\ X_3(t) = 0.9X_3(t-1) - 0.5X_2(t-2) + \eta_3 \\ \end{cases}}$

Note that $L_1$ is a latent confounder of $X_0$ and $X_2$ . This system of equations generates the time-series data in the observational domain, which is then used by LPCMCI for causal discovery analysis.

tau_max = 2
pc_alpha = 0.05
np.random.seed(19)
nsample_obs = 500
nfeature = 4

d = np.random.random(size = (nsample_obs, nfeature))
for t in range(tau_max, nsample_obs):
  d[t, 0] += 0.9 * d[t-1, 0] + 0.6 * d[t, 1]
  d[t, 2] += 0.9 * d[t-1, 2] + 0.4 * d[t-1, 1]
  d[t, 3] += 0.9 * d[t-1, 3] - 0.5 * d[t-2, 2]

# Remove the unobserved component time series
data_obs = d[:, [0, 2, 3]]

var_names = ['X_0', 'X_2', 'X_3']
d_obs = Data(data_obs, vars = var_names)
d_obs.plot_timeseries()

lpcmci = LPCMCI(d_obs,
                min_lag = 0,
                max_lag = tau_max,
                val_condtest = ParCorr(significance='analytic'),
                alpha = pc_alpha)

# Run LPCMCI
lpcmci_cm = lpcmci.run()
lpcmci_cm.ts_dag(node_size = 4, min_width = 1.5, max_width = 1.5, 
                 x_disp=0.5, y_disp=0.2, font_size=10)

Observational Data	Causal Model by LPCMCI

As you can see from LPCMCI's result, the method correctly identifies the bidirected link (indicating the presence of a latent confounder) between $X_0$ and $X_2$ . However, the final causal model presents uncertainty regarding the link $X_2$ o-> $X_3$ . Specifically, the final causal model is a PAG that represents two MAGs: the first with $X_2$ <-> $X_3$ , and the second with $X_2$ -> $X_3$ .

Now, let's introduce interventional data and examine its benefits. In this case, we perform a hard intervention on the variable $X_2$ , meaning we replace its equation with a constant value corresponding to the intervention (in this case, $X_2 = 3$ ).

nsample_int = 150
int_data = dict()

# Intervention on X_2.
d_int = np.random.random(size = (nsample_int, nfeature))
d_int[0:tau_max, :] = d[len(d)-tau_max:,:]
d_int[:, 2] = 3 * np.ones(shape = (nsample_int,)) 
for t in range(tau_max, nsample_int):
    d_int[t, 0] += 0.9 * d_int[t-1, 0] + 0.6 * d_int[t, 1]
    d_int[t, 3] += 0.9 * d_int[t-1, 3] - 0.5 * d_int[t-2, 2]

data_int = d_int[:, [0, 2, 3]]
df_int = Data(data_int, vars = var_names)
int_data['X_2'] =  df_int

candoit = CAnDOIT(d_obs, 
                  int_data,
                  alpha = pc_alpha, 
                  min_lag = 0, 
                  max_lag = tau_max, 
                  val_condtest = ParCorr(significance='analytic'))

candoit_cm = candoit.run()
candoit_cm.ts_dag(node_size = 4, min_width = 1.5, max_width = 1.5, 
                  x_disp=0.5, y_disp=0.2, font_size=10)

Observational & Interventional Data	Causal Model by CAnDOIT

CAnDOIT, like LPCMCI, correctly detects the bidirected link $X_0$ <-> $X_2$ . Additionally, by incorporating interventional data, CAnDOIT resolves the uncertainty regarding the link $X_2$ o-> $X_3$ , resulting in a reduction of the PAG size. Specifically, the PAG found by CAnDOIT is the representaion of only one MAG.

Robotics application of CAnDOIT

In this section, we discuss an application of CAnDOIT in a robotic scenario. We designed an experiment to learn the causal model in a hypothetical robot arm application equipped with a camera. For this application, we utilised Causal World, which models a TriFinger robot, a floor, and a stage.

In our case, we use only one finger of the robot, with the finger's end effector equipped with a camera. The scenario consists of a cube placed at the centre of the floor, surrounded by a white stage. The colour's brightness ( $b$ ) of the cube and the floor is modelled as a function of the end-effector height ( $H$ ), its absolute velocity ( $v$ ), and the distance between the end-effector and the cube ( $d_c$ ). This model captures the shading and blurring effects on the cube. In contrast, the floor, being darker and larger than the cube, is only affected by the end effector's height.

Note that $H$ , $v$ , and $d_c$ are obtained directly from the simulator and not explicitly modelled, while the ground-truth structural causal model for the floor colour ( $F_c$ ) and cube colour ( $C_c$ ) is expressed as follows:

$\small{ \begin{cases} F_c(t) = b(H(t-1))\\ C_c(t) = b(H(t-1), v(t-1), d_c(t-1)) \end{cases}}$

This model is used to generate observational data, which is then used by LPCMCI and CAnDOIT to reconstruct the causal model. For the interventional domain instead, we substitute the equation modelling $F_c$ with a constant colour (green) and collect the data for the causal analysis conducted by CAnDOIT. Note that, for both the obervational and interventional domains, $H$ is considered as latent confounder between $F_c$ and $C_c$ .

Observational dataset	Interventional dataset

Ground-truth Causal Model	Causal Model by LPCMCI	Causal Model by CAnDOIT

Also in this experiment, we can see the benefit of using intervention data alongside the observations. LPCMCI is unable to orient the contemporaneous (spurious) link between $F_c$ and $C_c$ due to the hidden confounder $H$ . This results in the ambiguous link $F_c$ o-o $C_c$ , which does not encode the correct link <->. Instead CAnDOIT, using interventional data, correctly identifies the bidirected link $F_c$ <-> $C_c$ , decreasing once again the uncertainty level and increasing the accuracy of the reconstructed causal model.

RandomGraph

RandomGraph is a random-model generator capable of creating random systems of equations with various properties: linear, nonlinear, lagged and/or contemporaneous dependencies, and hidden confounders. This tool offers several adjustable parameters, listed as follows:

time-series length;
number of observable variables;
number of observable parents per variable (link density);
number of hidden confounders;
number of confounded variables per hidden confounder;
noise configuration, e.g. Gaussian noise $\mathcal{N}(\mu, \sigma^2)$ ;
minimum $\tau_{min}$ and maximum $\tau_{max}$ time delay to consider in the equations;
coefficient range of the equations' terms;
functional forms applied to the equations' terms: $[-, \sin, \cos, \text{abs}, \text{pow}, \text{exp}]$ , where $-$ stands for none;
operators used to link various equations terms: $[+, -, *, /]$ .

RandomGraph outputs a graph, the associated system of equations, and observational data. Additionally, it provides the option to generate interventional data.

Example - Linear Random Graph

noise_uniform = (NoiseType.Uniform, -0.5, 0.5)
noise_gaussian = (NoiseType.Gaussian, 0, 1)
noise_weibull = (NoiseType.Weibull, 2, 1)
RG = RandomGraph(nvars = 5, 
                 nsamples = 1000, 
                 link_density = 3, 
                 coeff_range = (0.1, 0.5), 
                 max_exp = 2, 
                 min_lag = 0, 
                 max_lag = 3, 
                 noise_config = random.choice([noise_uniform, noise_gaussian, noise_weibull]),
                 functions = [''], 
                 operators = ['+', '-'], 
                 n_hidden_confounders = 2)
RG.gen_equations()
RG.ts_dag(withHidden = True)

$\small{ \begin{cases} X_0(t) = 0.44X_1(t-1) - 0.15X_0(t-2) + 0.1X_4(t-3) + 0.33H_0(t-3) - 0.11H_1(t-2)\\ X_1(t) = 0.13X_2(t-2) + 0.19H_0(t-3) + 0.46H_1(t-3)\\ X_2(t) = 0.21X_4(t-3) + 0.37H_1(t)\\ X_3(t) = 0.23X_0(t-2) - 0.44H_0(t-3) - 0.17H_1(t-3)\\ X_4(t) = 0.47X_1(t-2) + 0.23X_0(t-3) + 0.49X_4(t-1) + 0.49H_0(t-3) - 0.27H_1(t-2)\\ H_0(t) = 0.1X_4(t-2)\\ H_1(t) = 0.44X_3(t)\\ \end{cases}}$

Example - Nonlinear Random Graph

noise_uniform = (NoiseType.Uniform, -0.5, 0.5)
noise_gaussian = (NoiseType.Gaussian, 0, 1)
noise_weibull = (NoiseType.Weibull, 2, 1)
RG = RandomGraph(nvars = 5, 
                 nsamples = 1000, 
                 link_density = 3, 
                 coeff_range = (0.1, 0.5), 
                 max_exp = 2, 
                 min_lag = 0, 
                 max_lag = 3, 
                 noise_config = random.choice([noise_uniform, noise_gaussian, noise_weibull]),
                 functions = ['','sin', 'cos', 'exp', 'abs', 'pow'], 
                 operators = ['+', '-', '*', '/'], 
                 n_hidden_confounders = 2)
RG.gen_equations()
RG.ts_dag(withHidden = True)

$\small{ \begin{cases} X_0(t) = 0.48\frac{\cos(X_4)(t-3)}{0.12\sin(H_1)(t-3)}\\ X_1(t) = 0.17\sin(X_4)(t-3) - 0.46\cos(X_3)(t) + 0.14|H_1|(t-3)\\ X_2(t) = \frac{0.32X_4(t-1)}{0.2X_2(t-1)} + 0.23|X_3|(t-2) - 0.34e^{H_1}(t-3)\\ X_3(t) = 0.1|X_1|(t-1) \cdot 0.26\sin(X_2)(t) \cdot 0.4\cos(X_0)(t-2) - 0.2\cos(H_0)(t-2)\\ X_4(t) = 0.24|X_1|(t-3) - 0.43X_3(t) + 0.31\sin(H_0)(t-3) + 0.21H_1(t-3)\\ H_0(t) = 0.45|X_3|(t-2)\\ H_1(t) = \frac{0.32H_0(t-1)}{0.35e^{H_1}(t-3)} \cdot 0.4X_4(t-3)\\ \end{cases} }$

Linear Random Graph	Nonlinear Random Graph

Linear model	Nonlinear model
Lagged dependencies	Lagged dependencies
Contemporaneous dependencies	Contemporaneous dependencies
2 hidden confounders	2 hidden confounders

Example - Random Graph with Interventional Data

noise_gaussian = (NoiseType.Gaussian, 0, 1)
RS = RandomGraph(nvars = 5, 
                 nsamples = 1500, 
                 link_density = 3, 
                 coeff_range = (0.1, 0.5), 
                 max_exp = 2, 
                 min_lag = 0, 
                 max_lag = 3, 
                 noise_config = noise_gaussian,
                 functions = ['','sin', 'cos', 'exp', 'abs', 'pow'], 
                 operators = ['+', '-', '*', '/'], 
                 n_hidden_confounders = 2)
RS.gen_equations()

d_obs_wH, d_obs = RS.gen_obs_ts()
d_obs.plot_timeseries()

d_int = RS.intervene('X_4',250, random.uniform(5, 10), d_obs.d)
d_int['X_4'].plot_timeseries()

Observational Data	Interventional Data

Other Causal Discovery Algorithms

Although the main contribution of this repository is to present the CAnDOIT and F-PCMCI algorithms, other causal discovery methods have been included for benchmarking purposes. Consequently, CausalFlow offers a collection of causal discovery methods, beyond F-PCMCI and CAnDOIT, that output time-series graphs (graphs that specify the lag for each link). These methods are listed as follows:

DYNOTEARS - from the causalnex package;
PCMCI - from the tigramite package;
PCMCI+ - from the tigramite package;
LPCMCI - from the tigramite package;
J-PCMCI+ - from the tigramite package;
TCDF - from the causal_discovery_for_time_series package;
tsFCI - from the causal_discovery_for_time_series package;
VarLiNGAM - from the lingam package;

Some algorithms are imported from other languages such as R and Java and are then wrapped in Python. Having the majority of causal discovery methods integrated into a single framework, which handles various types of inputs and outputs causal models, can facilitate the use of these algorithms.

Algorithm	Observations	Feature Selection	Interventions
DYNOTEARS	✅	❌	❌
PCMCI	✅	❌	❌
PCMCI+	✅	❌	❌
LPCMCI	✅	❌	❌
J-PCMCI+	✅	❌	❌
TCDF	✅	❌	❌
tsFCI	✅	❌	❌
VarLiNGAM	✅	❌	❌
F-PCMCI	✅	✅	❌
CAnDOIT	✅	❌	✅

Citation

Please consider citing the following papers depending on which method you use:

F-PCMCI:
L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2023).
Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios,
Proceedings of the Conference on Causal Learning and Reasoning (CLeaR).
@inproceedings{castri2023enhancing, title={Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios}, author={Castri, Luca and Mghames, Sariah and Hanheide, Marc and Bellotto, Nicola}, booktitle={Conference on Causal Learning and Reasoning}, pages={243--258}, year={2023}, organization={PMLR} }
CAnDOIT:
L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2024).
CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series,
Advanced Intelligent Systems.
@article{castri2024candoit, title={CAnDOIT: Causal Discovery with Observational and Interventional Data from Time Series}, author={Castri, Luca and Mghames, Sariah and Hanheide, Marc and Bellotto, Nicola}, journal={Advanced Intelligent Systems}, volume={6}, number={12}, pages={2400181}, year={2024}, publisher={Wiley Online Library} }

Requirements

pandas>=1.5.2
numba>=0.58.1
scipy>=1.3.3
networkx>=2.8.6
ruptures>=1.1.7
scikit_learn>=1.1.3
torch>=1.11.0
gpytorch>=1.4
dcor>=0.5.3
h5py>=3.7.0
jpype1>=1.5.0
mpmath>=1.3.0
causalnex
lingam
pyopencl>=2024.1
matplotlib>=3.7.0
numpy
pgmpy>=0.1.19
tigramite>=5.1.0.3
rectangle-packer
grandalf

Installation

Before installing CausalFlow, you need to install Java and the IDTxl package used for the feature-selection process, following the guide described here. Once complete, you can install the current release of CausalFlow with:

pip install py-causalflow

For a complete installation Java - IDTxl - CausalFlow, follow the following procedure.

1 - Java installation

Verify that you have not already installed Java:

java -version

if the latter returns Command 'java' not found, ..., you can install Java by the following commands, otherwise you can jump to IDTxl installation.

# Java
sudo apt-get update
sudo apt install default-jdk

Then, you need to add JAVA_HOME to the environment

sudo nano /etc/environment
JAVA_HOME="/lib/jvm/java-11-openjdk-amd64/bin/java" # Paste the JAVA_HOME assignment at the bottom of the file
source /etc/environment

2 - IDTxl installation

# IDTxl
git clone -b v1.4 https://github.com/pwollstadt/IDTxl.git
cd IDTxl
pip install -e .

3 - CausalFlow installation

pip install py-causalflow

Recent changes

Version	Changes
4.0.5	JPCMCI+ fix DAG __add_edge dockstring fix DAG load and filter methods added DAG node colour with dict DAG dag() node border label fix: it allows multiple lags/scores
4.0.4	IDTxl v1.4
4.0.3	numba version fix DAG dag() fix CAnDOIT fix: min_lag must be equal to 0
4.0.2	PyPI fixes rectangle-packer and grandalf added to requirements numba version fix causal_discovery/baseline/pkgs fix
4.0.1	PyPI
4.0.0	package published