
This post shows four ways to calculate the conditional probability
that I have an umbrella, given that it’s raining, \(P(umbrella|raining)\). The first way uses
the ratio formula, \(P(raining \land
umbrella)/P(raining)\). The second uses a three-valued logical
connective, the conditional event, which can be traced back to the work
of Bruno de Finetti in the 1930s (see, e.g., Baratgin, 2021). The third
filters the dataset to rows where it’s raining before counting the
proportion of those where I had an umbrella. The fourth uses logistic
regression.
Here’s a simulated dataset about rain and umbrellas, where each row
is a day.
library(conflicted)
library(tidyverse)
conflicts_prefer(dplyr::filter)
days_n <- 20
set.seed(5)
sim_dat <- data.frame(day = 1:days_n, raining = rbinom(days_n, 1, .5)) |>
mutate(umbrella = rbinom(days_n, 1, ifelse(raining, .8, .2))) |>
mutate(across(raining:umbrella, as.logical))
sim_dat
The variable raining denotes whether it was raining and
umbrella denotes whether I had an umbrella with me.
Here are the joint probabilities:
joints <- sim_dat |>
group_by(raining, umbrella) |>
tally() |>
ungroup() |>
mutate(p = n / sum(n)) |>
select(-n) |>
arrange(desc(raining), desc(umbrella))
joints
Then
\[P(umbrella|raining) = \frac{P(raining
\land umbrella)}{P(raining)}\]
\(P(raining \land umbrella)\), i.e.,
the probability that it’s raining and I have an umbrella,
is:
raining_and_umbrella_p <- joints |>
filter(raining & umbrella) |>
pull(p)
raining_and_umbrella_p
[1] 0.45
\(P(raining)\) is:
raining_p <- joints |>
filter(raining) |>
pull(p) |>
sum()
raining_p
[1] 0.5
So, \(P(umbrella|raining)\) is:
raining_and_umbrella_p / raining_p
[1] 0.9
Another way to calculate this is using the conditional
event, \(B \dashv A\) (“B given
A”), here using Hailperin’s (1996) formulation (he calls it the
suppositional):
dont_care <- function(x)
ifelse(x, NA, x)
`%-|%` <- function(y, x)
dont_care(!x) | (x & y)
The don’t_care function has one parameter and returns the
missing value, NA, if that parameter is TRUE. \(B \dashv A\) is then defined using this so
it gets the value NA if \(A\)
is FALSE, otherwise it’s equal to \(A
\land B\). Here’s a table:
vals <- c(TRUE, FALSE)
expand.grid(A = vals, B = vals) |>
mutate(`B -| A` = B %-|% A) |>
arrange(desc(A), desc(B))
We can apply this to the original dataset:
sim_dat <- sim_dat |>
mutate(`umbrella -| raining` = umbrella %-|% raining)
sim_dat
Now for \(P(umbrella|raining)\) we
drop the NAs and count the proportion of the time that \(umbrella \dashv raining\) is TRUE:
sim_dat |>
pull(`umbrella -| raining`) |>
mean(na.rm = TRUE)
[1] 0.9
An easier, though less fun, way would be just:
sim_dat |>
filter(raining) |>
pull(umbrella) |>
mean()
[1] 0.9
Finally, we could use logistic regresion:
logistic <- glm(umbrella ~ raining, data = sim_dat, family = binomial)
predict(logistic, newdat = data.frame(raining = TRUE), type = "response")
1
0.9
LS0tDQp0aXRsZTogRm91ciB3YXlzIHRvIGNhbGN1bGF0ZSB0aGUgcHJvYmFiaWxpdHkgdGhhdCBpZiBpdCByYWlucywgSSBoYXZlIGFuIHVtYnJlbGxhLCB1c2luZyBSDQphdXRob3I6ICJBbmRpIEZ1Z2FyZCINCmRhdGU6ICIzIE5vdiAyMDI0Ig0Kb3V0cHV0Og0KICBodG1sX25vdGVib29rOg0KICAgIGNvZGVfZm9sZGluZzogbm9uZQ0KICBodG1sX2RvY3VtZW50Og0KICAgIGRmX3ByaW50OiBwYWdlZA0KLS0tDQoNCmBgYHtyIGVjaG89RkFMU0UsIGZpZy5hbHQ9IlR3byBwZW9wbGUgc2l0dGluZyBvbiBhIGJlbmNoLCBzaGVsdGVyZWQgZnJvbSB0aGUgcmFpbiBieSBhIHRyYWluIHBsYXRmb3JtIHJvb2Y7IGEgdGhpcmQgcGVyc29uIHNoZWx0ZXJpbmcgdW5kZXIgYSB5ZWxsb3cgdW1icmVsbGEuIiwgb3V0LndpZHRoPSIxMDAlIn0NCmtuaXRyOjppbmNsdWRlX2dyYXBoaWNzKCJGYXJoYW1wdG9uX3VtYnJlbGxhLmpwZyIpDQpgYGANCg0KDQpUaGlzIHBvc3Qgc2hvd3MgZm91ciB3YXlzIHRvIGNhbGN1bGF0ZSB0aGUgY29uZGl0aW9uYWwgcHJvYmFiaWxpdHkgdGhhdCBJIGhhdmUgYW4gdW1icmVsbGEsIGdpdmVuIHRoYXQgaXQncyByYWluaW5nLCAkUCh1bWJyZWxsYXxyYWluaW5nKSQuIFRoZSBmaXJzdCB3YXkgdXNlcyB0aGUgcmF0aW8gZm9ybXVsYSwgJFAocmFpbmluZyBcbGFuZCB1bWJyZWxsYSkvUChyYWluaW5nKSQuIFRoZSBzZWNvbmQgdXNlcyBhIHRocmVlLXZhbHVlZCBsb2dpY2FsIGNvbm5lY3RpdmUsIHRoZSBjb25kaXRpb25hbCBldmVudCwgd2hpY2ggY2FuIGJlIHRyYWNlZCBiYWNrIHRvIHRoZSB3b3JrIG9mIEJydW5vIGRlIEZpbmV0dGkgaW4gdGhlIDE5MzBzIChzZWUsIGUuZy4sIEJhcmF0Z2luLCAyMDIxKS4gVGhlIHRoaXJkIGZpbHRlcnMgdGhlIGRhdGFzZXQgdG8gcm93cyB3aGVyZSBpdCdzIHJhaW5pbmcgYmVmb3JlIGNvdW50aW5nIHRoZSBwcm9wb3J0aW9uIG9mIHRob3NlIHdoZXJlIEkgaGFkIGFuIHVtYnJlbGxhLiBUaGUgZm91cnRoIHVzZXMgbG9naXN0aWMgcmVncmVzc2lvbi4NCg0KSGVyZSdzIGEgc2ltdWxhdGVkIGRhdGFzZXQgYWJvdXQgcmFpbiBhbmQgdW1icmVsbGFzLCB3aGVyZSBlYWNoIHJvdyBpcyBhIGRheS4NCg0KYGBge3IgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0NCmxpYnJhcnkoY29uZmxpY3RlZCkNCmxpYnJhcnkodGlkeXZlcnNlKQ0KY29uZmxpY3RzX3ByZWZlcihkcGx5cjo6ZmlsdGVyKQ0KYGBgDQoNCmBgYHtyfQ0KZGF5c19uIDwtIDIwDQpzZXQuc2VlZCg1KQ0KYGBgDQoNCmBgYHtyfQ0Kc2ltX2RhdCA8LSBkYXRhLmZyYW1lKGRheSA9IDE6ZGF5c19uLCByYWluaW5nID0gcmJpbm9tKGRheXNfbiwgMSwgLjUpKSB8Pg0KICBtdXRhdGUodW1icmVsbGEgPSByYmlub20oZGF5c19uLCAxLCBpZmVsc2UocmFpbmluZywgLjgsIC4yKSkpIHw+DQogIG11dGF0ZShhY3Jvc3MocmFpbmluZzp1bWJyZWxsYSwgYXMubG9naWNhbCkpDQpzaW1fZGF0DQpgYGANCg0KVGhlIHZhcmlhYmxlIF9yYWluaW5nXyBkZW5vdGVzIHdoZXRoZXIgaXQgd2FzIHJhaW5pbmcgYW5kIF91bWJyZWxsYV8gZGVub3RlcyB3aGV0aGVyIEkgaGFkIGFuIHVtYnJlbGxhIHdpdGggbWUuDQoNCkhlcmUgYXJlIHRoZSBqb2ludCBwcm9iYWJpbGl0aWVzOg0KDQpgYGB7cn0NCmpvaW50cyA8LSBzaW1fZGF0IHw+DQogIGdyb3VwX2J5KHJhaW5pbmcsIHVtYnJlbGxhKSB8Pg0KICB0YWxseSgpIHw+DQogIHVuZ3JvdXAoKSB8Pg0KICBtdXRhdGUocCA9IG4gLyBzdW0obikpIHw+DQogIHNlbGVjdCgtbikgfD4NCiAgYXJyYW5nZShkZXNjKHJhaW5pbmcpLCBkZXNjKHVtYnJlbGxhKSkNCmpvaW50cw0KYGBgDQoNClRoZW4gDQoNCiQkUCh1bWJyZWxsYXxyYWluaW5nKSA9IFxmcmFje1AocmFpbmluZyBcbGFuZCB1bWJyZWxsYSl9e1AocmFpbmluZyl9JCQNCg0KJFAocmFpbmluZyBcbGFuZCB1bWJyZWxsYSkkLCBpLmUuLCB0aGUgcHJvYmFiaWxpdHkgdGhhdCBpdCdzIHJhaW5pbmcgX2FuZF8gSSBoYXZlIGFuIHVtYnJlbGxhLCBpczoNCg0KYGBge3J9DQpyYWluaW5nX2FuZF91bWJyZWxsYV9wIDwtIGpvaW50cyB8Pg0KICBmaWx0ZXIocmFpbmluZyAmIHVtYnJlbGxhKSB8Pg0KICBwdWxsKHApDQpyYWluaW5nX2FuZF91bWJyZWxsYV9wIA0KYGBgDQoNCiRQKHJhaW5pbmcpJCBpczoNCg0KYGBge3J9DQpyYWluaW5nX3AgPC0gam9pbnRzIHw+DQogIGZpbHRlcihyYWluaW5nKSB8Pg0KICBwdWxsKHApIHw+DQogIHN1bSgpDQpyYWluaW5nX3ANCmBgYA0KDQpTbywgJFAodW1icmVsbGF8cmFpbmluZykkIGlzOg0KDQpgYGB7cn0NCnJhaW5pbmdfYW5kX3VtYnJlbGxhX3AgLyByYWluaW5nX3ANCmBgYA0KDQpBbm90aGVyIHdheSB0byBjYWxjdWxhdGUgdGhpcyBpcyB1c2luZyB0aGUgX2NvbmRpdGlvbmFsIGV2ZW50XywgJEIgXGRhc2h2IEEkICgiQiBnaXZlbiBBIiksIGhlcmUgdXNpbmcgSGFpbHBlcmluJ3MgKDE5OTYpIGZvcm11bGF0aW9uIChoZSBjYWxscyBpdCB0aGUgX3N1cHBvc2l0aW9uYWxfKToNCg0KYGBge3J9DQpkb250X2NhcmUgPC0gZnVuY3Rpb24oeCkNCiAgaWZlbHNlKHgsIE5BLCB4KQ0KDQpgJS18JWAgPC0gZnVuY3Rpb24oeSwgeCkNCiAgZG9udF9jYXJlKCF4KSB8ICh4ICYgeSkNCmBgYA0KDQpUaGUgX2Rvbid0X2NhcmVfIGZ1bmN0aW9uIGhhcyBvbmUgcGFyYW1ldGVyIGFuZCByZXR1cm5zIHRoZSBtaXNzaW5nIHZhbHVlLCBfTkFfLCBpZiB0aGF0IHBhcmFtZXRlciBpcyBfVFJVRV8uICRCIFxkYXNodiBBJCBpcyB0aGVuIGRlZmluZWQgdXNpbmcgdGhpcyBzbyBpdCBnZXRzIHRoZSB2YWx1ZSBfTkFfIGlmICRBJCBpcyBfRkFMU0VfLCBvdGhlcndpc2UgaXQncyBlcXVhbCB0byAkQSBcbGFuZCBCJC4gSGVyZSdzIGEgdGFibGU6DQoNCmBgYHtyfQ0KdmFscyA8LSBjKFRSVUUsIEZBTFNFKQ0KZXhwYW5kLmdyaWQoQSA9IHZhbHMsIEIgPSB2YWxzKSB8Pg0KICBtdXRhdGUoYEIgLXwgQWAgPSBCICUtfCUgQSkgfD4NCiAgYXJyYW5nZShkZXNjKEEpLCBkZXNjKEIpKQ0KYGBgDQoNCg0KV2UgY2FuIGFwcGx5IHRoaXMgdG8gdGhlIG9yaWdpbmFsIGRhdGFzZXQ6DQoNCmBgYHtyfQ0Kc2ltX2RhdCA8LSBzaW1fZGF0IHw+DQogIG11dGF0ZShgdW1icmVsbGEgLXwgcmFpbmluZ2AgPSB1bWJyZWxsYSAlLXwlIHJhaW5pbmcpDQpzaW1fZGF0DQpgYGANCg0KDQpOb3cgZm9yICRQKHVtYnJlbGxhfHJhaW5pbmcpJCB3ZSBkcm9wIHRoZSBOQXMgYW5kIGNvdW50IHRoZSBwcm9wb3J0aW9uIG9mIHRoZSB0aW1lIHRoYXQgJHVtYnJlbGxhIFxkYXNodiByYWluaW5nJCBpcyBUUlVFOg0KDQpgYGB7cn0NCnNpbV9kYXQgfD4NCiAgcHVsbChgdW1icmVsbGEgLXwgcmFpbmluZ2ApIHw+DQogIG1lYW4obmEucm0gPSBUUlVFKQ0KYGBgDQoNCkFuIGVhc2llciwgdGhvdWdoIGxlc3MgZnVuLCB3YXkgd291bGQgYmUganVzdDoNCg0KYGBge3J9DQpzaW1fZGF0IHw+DQogIGZpbHRlcihyYWluaW5nKSB8Pg0KICBwdWxsKHVtYnJlbGxhKSB8Pg0KICBtZWFuKCkNCmBgYA0KDQpGaW5hbGx5LCB3ZSBjb3VsZCB1c2UgbG9naXN0aWMgcmVncmVzaW9uOg0KDQpgYGB7cn0NCmxvZ2lzdGljIDwtIGdsbSh1bWJyZWxsYSB+IHJhaW5pbmcsIGRhdGEgPSBzaW1fZGF0LCBmYW1pbHkgPSBiaW5vbWlhbCkNCnByZWRpY3QobG9naXN0aWMsIG5ld2RhdCA9IGRhdGEuZnJhbWUocmFpbmluZyA9IFRSVUUpLCB0eXBlID0gInJlc3BvbnNlIikNCmBgYA0KDQoNCiMjIFJlZmVyZW5jZXMNCg0KQmFyYXRnaW4sIEouICgyMDIxKS4gW0Rpc2NvdmVyaW5nIGVhcmx5IGRlIEZpbmV0dGnigJlzIHdyaXRpbmdzIG9uIHRyaXZhbGVudCB0aGVvcnkgb2YgY29uZGl0aW9uYWxzXShodHRwczovL2RvaS5vcmcvMTAuMTQyNzUvMjQ2NS0yMzM0LzIwMjExMi5iYXIpLiBfQXJndW1lbnRhXywgXzZfLCAyNjctLTI5MS4NCg0KSGFpbHBlcmluLCBULiAoMTk5NikuIF9TZW50ZW50aWFsIFByb2JhYmlsaXR5IExvZ2ljLiBPcmlnaW5zLCBEZXZlbG9wbWVudCwgQ3VycmVudCBTdGF0dXMsIGFuZCBUZWNobmljYWwgQXBwbGljYXRpb25zXy4gTGVoaWdoIFVuaXZlcnNpdHkgUHJlc3MuDQoNCg==