*author: niplav, created: 2022-02-04, modified: 2024-04-15, language: english, status: in progress, importance: 4, confidence: unlikely*

Absence of correlation implies causation ≤5% of the time in sparse-ish linear causal networks, but notnever. The number of causal non-correlations first grows and then shrinks with the number of nodes in the causal network, with the maximum at ≈30. I can't explain why.

"Correlation ⇏ Causation" is trite by now. And we know that the contrapositive is also false: "¬Correlation ⇏ ¬Causation".

Spencer Greenberg summarizes:

All of this being said, while causation does not NECESSARILY imply correlation, causation USUALLY DOES imply correlation. Some software that attempts to discover causation in observational data even goes so far as to make this assumption of causation implying correlation.

I, however, have an inner computer scientist.

And he demands answers.

He will not rest until he knows *how often* ¬Correlation ⇒ ¬Causation,
and how often it doesn't.

This can be tested by creating a Monte-Carlo
simulation
over random linear structural equation
models
with `$n$`

variables, computing the correlations between the different
variables for random inputs, and checking whether the correlations being
very small implies that there is no causation.

So we start by generating a random linear SEM with
`$n$`

variables (code in
Julia). The
parameters are normally
distributed with
mean 0 and variance 1, but for now we'll assume there is no noise.

```
struct LinearSEM
g::SimpleDiGraph{Int64}
coefficients::Dict
end
```

We can decide how dense/sparse we want the SEM to be via the `threshold`

parameter, the probability that two different nodes have an edge
between them. The higher the threshold, the more edges in the SEM.

```
function random_linear_sem(n::Int, threshold=0.5)
g=DiGraph(n)
for i in 1:n
for j in (i+1):n
if rand() < threshold
add_edge!(g, i, j)
end
end
end
coefficients=Dict()
for edge in edges(g)
coefficients[edge]=randn()
end
return LinearSEM(g, coefficients)
end
```

We can then run a bunch of inputs through that model, and compute their correlations:

```
function correlations(sem::LinearSEM, inner_samples::Int)
n=size(vertices(sem.g), 1)
input_nodes=[node for node in vertices(sem.g) if indegree(sem.g, node) == 0]
results=Matrix{Float64}(undef, inner_samples, n) # Preallocate results matrix
for i in 1:inner_samples
input_values=Dict([node => randn() for node in input_nodes])
sem_values=calculate_sem_values(sem, input_values)
sem_value_row=reshape(collect(values(sort(sem_values))), 1, :)
results[i, :]=sem_value_row
end
cor_matrix==cor(results)
cor_matrix[diagind(cor_matrix)].=0
return abs.(cor_matrix)
end
```

We can then check how many correlations are "incorrectly small".

Let's take all the correlations between variables which don't have any causal relationship. The largest of those is the "largest uncaused correlation". Correlations between two variables which cause each other but are smaller than the largest uncaused correlation are "too small": There is a causation but it's not detected.

We can now take the correlations and separate them into correlations between pairs of variables with causal relationship (i.e., a directed graph through the DAG representing the SEM), and pairs of variables without causal relationships.

```
correlation=correlations(sem, inner_samples)
influence=Matrix(Bool.(transpose(adjacency_matrix(transitiveclosure(sem.g)))))
not_influence=tril(.!(influence), -1)
non_causal_cors=not_influence.*correlation
causal_cors=influence.*correlation
```

This gives us two distributions, the distribution of `non_causal_cors`

and the distribution of `causal_cors`

, e.g. for SEMs with 48 variables:

One may notice that some variables that are not causing each other still
have high correlations with each other, this is because they have a
common cause. So we have to decide *what it means* for a correlation
to be too small to be relevant.

I can see three different salient options to decide whether a correlation is small:

- A causal correlation is "small" iff it is smaller than the largest non-causal correlation.
- I used this criterion in an earlier version of this essay, but I now think that it's too lax, since due to common causes a pair of variables without direct causation can have a very high correlation.

- A causal correlation is "small" iff it is smaller than the average/median non-causal correlation.
- This is stronger than the previous condition, and I vibe with it, but I don't have specific good reasons
*why*I choose it. If you think this is stupid for reasons, I'd be interested in hearing the reasons (or even suggestions for improvement).

- This is stronger than the previous condition, and I vibe with it, but I don't have specific good reasons
- A causal correlation is "small" iff it falls under a certain constant value.
- Wikipedia states that a correlation of 0.1 is small. I think it's basically just an arbitrary cutoff, so I don't use it.

function misclassifications(sem::LinearSEM, inner_samples::Int)
return sum((causal_correlations .!= 0) .& (causal_correlations .< mean(non_causal_correlations)))
end

And, in the outermost loop, we compute the number of misclassifications for a number of linear SEMs (with a threshold of 0.25, since 0.5 usually produces SEMs which are too dense):

```
function misclassified_absence_mc(n::Int, outer_samples::Int, inner_samples::Int)
return [misclassifications(random_linear_sem(n, 0.25), inner_samples) for i in 1:outer_samples]
end
```

So we collect a bunch of samples. SEMs with one, two and three variables are ignored because when running the code, they never give me any causal non-correlations. (I'd be interested in seeing examples to the contrary).

```
results = Dict{Int, Array{Int, 1}}()
sem_samples=400
inputs_samples=10000
upperlim=52
stepsize=4
Threads.@threads for i in 4:stepsize:upperlim
results[i]=misclassified_absence_mc(i, sem_samples, inputs_samples)
end
```

We can now first calculate the mean number of small
causal correlations and the *proportion* of of small causal
correlations correlations, using the formula for the triangular
number:

```
result_means=[mean(values) for (key, values) in sort(results)]
result_props=[mean(values)/((key^2+key)/2) for (key, values) in sort(results)]
```

So it *looks like* at most 5% of causal relationships have a correlation
that is "too small", which happens if the SEM has ≈30 variables,
and that number shrinks with a larger SEM.

I find this *very surprising*: Is there a specific mathematical
reason why SEMs with 30 variables have the highest number of causal
non-correlations? The mean number of causal relationships with small
correlations seems to grow quite steadily, so I'm not sure what's going
on here.

Is the issue with the number of inner samples, are we
simply *not checking enough*? But 10k samples ought to be enough for
anybody—if
that's not sufficient, I don't know what is.

But let's better go and write some code to check:

```
more_samples=Dict{Int, Array{Int, 1}}()
samples_test_size=20
sem_samples=400
inputs_samples=2 .^(6:16)
for inputs_sample in inputs_samples
println(inputs_sample)
more_samples[inputs_sample]=misclassified_absence_mc(samples_test_size, sem_samples, inputs_sample)
end
```

Plotting the number of causal non-correlations reveals that 10k samples
*ought* to be enough, at least for small numbers of variables:

The densities fluctuate, sure, but not so much that I'll throw out the baby with the bathwater. If I was a better person, I'd make a statistical test here, but alas, I am not.

I don't know why causal non-correlations are so common, especially in linear SEMs.

- What a Tangled Net We Weave When First We Practice to Believe (Gwern, 2019): Where I probably got this from, subconsciously.
- How Often Does Correlation=Causality? (Gwern, 2019)