Preprocessing
This module provides the Data class.
Classes
Data: public class for handling data used for the causal discovery.
Data
Data class manages the preprocess of the data before the causal analysis.
Source code in causalflow/preprocessing/data.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
N
property
Number of features.
Returns:
Type | Description |
---|---|
int
|
number of features. |
T
property
Dataframe length.
Returns:
Type | Description |
---|---|
int
|
dataframe length. |
features
property
Return list of features.
Returns:
Name | Type | Description |
---|---|---|
list |
str
|
list of feature names. |
pretty_features
property
Return list of features with LATEX symbols.
Returns:
Name | Type | Description |
---|---|---|
list |
str
|
list of feature names. |
__init__(data, vars=None, fill_nan=True, stand=False, subsampling=None, show_subsampling=False)
Class constructor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str / DataFrame / np.array
|
it can be a string specifing the path of a csv file to load/pandas.DataFrame/numpy.array. |
required |
vars |
list(str)
|
List containing variable names. If unset then, if data = (str/DataFrame) vars = data columns name elif data = np.array vars = [X_0 .. X_N] Defaults to None. |
None
|
fill_nan |
bool
|
Fill NaNs bit. Defaults to True. |
True
|
stand |
bool
|
Standardization bit. Defaults to False. |
False
|
subsampling |
SubsamplingMethod
|
Subsampling method. If None not active. Defaults to None. |
None
|
show_subsampling |
bool
|
If True shows subsampling result. Defaults to False. |
False
|
Raises:
Type | Description |
---|---|
TypeError
|
if data is not str - DataFrame - ndarray. |
Source code in causalflow/preprocessing/data.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
plot_timeseries(savefig=None)
Plot timeseries data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
savefig |
str
|
figure path. |
None
|
Source code in causalflow/preprocessing/data.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
save_csv(csvpath)
Save timeseries data into a CSV file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
csvpath |
str
|
CSV path. |
required |
Source code in causalflow/preprocessing/data.py
153 154 155 156 157 158 159 160 |
|
shrink(selected_features)
Shrink dataframe d on the selected features.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
selected_features |
list(str
|
list of variables. |
required |
Source code in causalflow/preprocessing/data.py
118 119 120 121 122 123 124 125 |
|
This module provides the Subsampler class.
Classes
Subsampler: public class for subsampling.
Subsampler
Subsampler class.
It subsamples the data by using a subsampling method chosen among
- Static - subsamples data by taking one sample each step-samples
- WSDynamic - entropy based method with dynamic window size computed by breakpoint analysis
- WSFFTStatic - entropy based method with fixed window size computed by FFT analysis
- WSStatic - entropy base method with predefined window size
Source code in causalflow/preprocessing/Subsampler.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
__init__(df, ss_method)
Class constructor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
pd.DataFrame
|
dataframe to subsample. |
required |
ss_method |
SubsamplingMethod
|
subsampling method. |
required |
Source code in causalflow/preprocessing/Subsampler.py
25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
plot_subsampled_data(dpi=100, show=True)
Plot dataframe sub-sampled data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dpi |
int
|
image dpi. Defaults to 100. |
100
|
show |
bool
|
if True it shows the figure and block the process. Defaults to True. |
True
|
Source code in causalflow/preprocessing/Subsampler.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
subsample()
Run the subsampling algorithm and returns the subsapled ndarray.
Returns:
Type | Description |
---|---|
ndarray
|
Subsampled dataframe value. |
Source code in causalflow/preprocessing/Subsampler.py
40 41 42 43 44 45 46 47 48 |
|
This module provides the EntropyBasedMethod class.
Classes
EntropyBasedMethod: EntropyBasedMethod abstract class.
EntropyBasedMethod
Bases: ABC
EntropyBasedMethod abstract class.
Source code in causalflow/preprocessing/subsampling_methods/EntropyBasedMethod.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
__init__(threshold)
Class constructor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
threshold |
float
|
entropy threshold. |
required |
Source code in causalflow/preprocessing/subsampling_methods/EntropyBasedMethod.py
16 17 18 19 20 21 22 23 24 25 |
|
__normalization()
Normalize entropy for each moving window.
Source code in causalflow/preprocessing/subsampling_methods/EntropyBasedMethod.py
40 41 42 43 44 |
|
create_rounded_copy()
Create deepcopy of the dataframe but with rounded values.
Returns:
Type | Description |
---|---|
pd.DataFrame
|
rounded dataframe. |
Source code in causalflow/preprocessing/subsampling_methods/EntropyBasedMethod.py
28 29 30 31 32 33 34 35 36 37 |
|
dataset_segmentation()
abstractmethod
Abstract method.
Source code in causalflow/preprocessing/subsampling_methods/EntropyBasedMethod.py
81 82 83 84 |
|
extract_indexes()
Extract a list of indexes corresponding to the samples selected by the subsampling procedure.
Source code in causalflow/preprocessing/subsampling_methods/EntropyBasedMethod.py
71 72 73 74 75 76 77 78 |
|
moving_window_analysis()
Compute dataframe entropy on moving windows.
Source code in causalflow/preprocessing/subsampling_methods/EntropyBasedMethod.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|
This module provides subsampling methods for data preprocessing.
Classes
SSMode: An enumerator containing all the supported subsampling methods. SubsamplingMethod: A class for implementing various subsampling techniques.
SSMode
Bases: Enum
Enumerator containing all the supported subsampling methods.
Source code in causalflow/preprocessing/subsampling_methods/SubsamplingMethod.py
14 15 16 17 18 19 20 |
|
SubsamplingMethod
Bases: ABC
SubsamplingMethod abstract class.
Source code in causalflow/preprocessing/subsampling_methods/SubsamplingMethod.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
__init__(ssmode)
Class constructor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ssmode |
SSMore
|
Subsampling method. |
required |
Source code in causalflow/preprocessing/subsampling_methods/SubsamplingMethod.py
26 27 28 29 30 31 32 33 34 |
|
initialise(dataframe)
Initialise class by setting the dataframe to subsample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe |
pd.DataFrame
|
Pandas DataFrame to subsample. |
required |
Source code in causalflow/preprocessing/subsampling_methods/SubsamplingMethod.py
37 38 39 40 41 42 43 44 |
|
run()
abstractmethod
Run subsampler.
Source code in causalflow/preprocessing/subsampling_methods/SubsamplingMethod.py
47 48 49 50 |
|
This module provides the MovingWindow class to facilitate the entropy-based subsampling methods.
Classes
MovingWindow: A class used by the entropy-based subsampling methods.
MovingWindow
Moving window class used by the entropy-based subsampling methods.
Source code in causalflow/preprocessing/subsampling_methods/moving_window.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
__init__(window)
Class constuctor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window |
int
|
moving window size. |
required |
Source code in causalflow/preprocessing/subsampling_methods/moving_window.py
15 16 17 18 19 20 21 22 23 24 25 26 |
|
get_entropy()
Compute the entropy based on probability distribution function.
Source code in causalflow/preprocessing/subsampling_methods/moving_window.py
44 45 46 |
|
get_pdf()
Compute the probability distribution function from an array of data.
Source code in causalflow/preprocessing/subsampling_methods/moving_window.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
optimal_sampling(thres)
Find the optimal number of sample for a particular moving window.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
thres |
float
|
stopping criteria threshold. |
required |
Source code in causalflow/preprocessing/subsampling_methods/moving_window.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
samples_selector(step)
Select sample to be taken from a moving window.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
step |
int
|
subsampling frequency. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list
|
list of indexes corresponding to the sample to be taken. |
Source code in causalflow/preprocessing/subsampling_methods/moving_window.py
49 50 51 52 53 54 55 56 57 58 59 |
|
This module provides the Static class.
Classes
Static: Subsamples data by taking one sample each step-samples.
Static
Bases: SubsamplingMethod
Subsample data by taking one sample each step-samples.
Source code in causalflow/preprocessing/subsampling_methods/Static.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
__init__(step)
Class constructor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
step |
int
|
integer subsampling step. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
if step == None. |
Source code in causalflow/preprocessing/subsampling_methods/Static.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
run()
Run subsampler.
Source code in causalflow/preprocessing/subsampling_methods/Static.py
28 29 30 |
|
This module provides the WSDynamic class.
Classes
WSDynamic: Subsampling method with dynamic window size based on entropy analysis.
WSDynamic
Bases: SubsamplingMethod
, EntropyBasedMethod
Subsampling method with dynamic window size based on entropy analysis.
Source code in causalflow/preprocessing/subsampling_methods/WSDynamic.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
__init__(window_min_size, entropy_threshold)
Class constructor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_min_size |
int
|
minimun window size. |
required |
entropy_threshold |
float
|
entropy threshold. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
if window_min_size == None. |
Source code in causalflow/preprocessing/subsampling_methods/WSDynamic.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
dataset_segmentation()
Segment dataset based on breakpoint analysis and a min window size.
Source code in causalflow/preprocessing/subsampling_methods/WSDynamic.py
35 36 37 38 39 40 41 |
|
run()
Run subsampler.
Returns:
Type | Description |
---|---|
list[int]
|
indexes of the remaining samples. |
Source code in causalflow/preprocessing/subsampling_methods/WSDynamic.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
This module provides the WSFFTStatic class.
Classes
WSFFTStatic: Subsampling method with static window size based on Fourier analysis.
WSFFTStatic
Bases: SubsamplingMethod
, EntropyBasedMethod
Subsampling method with static window size based on Fourier analysis.
Source code in causalflow/preprocessing/subsampling_methods/WSFFTStatic.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
__fourier_window()
Compute window size based on Fourier analysis performed on dataframe.
Returns:
Type | Description |
---|---|
int
|
window size |
Source code in causalflow/preprocessing/subsampling_methods/WSFFTStatic.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
__init__(sampling_time, entropy_threshold)
Class constructor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sampling_time |
float
|
timeseries sampling time. |
required |
entropy_threshold |
float
|
entropy threshold. |
required |
Source code in causalflow/preprocessing/subsampling_methods/WSFFTStatic.py
19 20 21 22 23 24 25 26 27 28 29 |
|
dataset_segmentation()
Segments dataset with a fixed window size.
Source code in causalflow/preprocessing/subsampling_methods/WSFFTStatic.py
55 56 57 58 59 60 61 |
|
run()
Run subsampler.
Returns:
Type | Description |
---|---|
list[int]
|
indexes of the remaining samples. |
Source code in causalflow/preprocessing/subsampling_methods/WSFFTStatic.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
This module provides the WSStatic class.
Classes
WSStatic: Entropy based subsampling method with static window size.
WSStatic
Bases: SubsamplingMethod
, EntropyBasedMethod
Entropy based subsampling method with static window size.
Source code in causalflow/preprocessing/subsampling_methods/WSStatic.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
__init__(window_size, entropy_threshold)
Class constructor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_size |
int
|
minimun window size. |
required |
entropy_threshold |
float
|
entropy threshold. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
if window_size == None. |
Source code in causalflow/preprocessing/subsampling_methods/WSStatic.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
dataset_segmentation()
Segment dataset with a fixed window size.
Source code in causalflow/preprocessing/subsampling_methods/WSStatic.py
33 34 35 36 37 38 39 |
|
run()
Run subsampler.
Returns:
Type | Description |
---|---|
list[int]
|
indexes of the remaining samples. |
Source code in causalflow/preprocessing/subsampling_methods/WSStatic.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|