I was curious if there’s a positive correlation between the total number of users on a server and how many followers I have from that server.
Include some packages…
library(rtoot)
library(stringr)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────────── tidyverse 2.0.0 ──
âś” dplyr 1.1.2 âś” purrr 1.0.1
âś” forcats 1.0.0 âś” readr 2.1.4
âś” ggplot2 3.4.2 âś” tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0── Conflicts ────────────────────────────────────────────── tidyverse_conflicts() ──
âś– dplyr::filter() masks stats::filter()
âś– dplyr::lag() masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
Get {rtoot} authorised to talk to my server:
auth_setup(instance = "tech.lgbt", type ="user")
I’ll need my ID: who am I?
acc <- search_accounts("@Andi@tech.lgbt")
acc |>
select(id, acct, display_name) |>
head(1)
It me!
whoami <- "109273348690338129"
There’s a handy function in {rtoot} for getting all followers; however, it doesn’t (or didn’t, end of 2022) support auto-pagination. After reading the friendly manual, here’s a workaround:
really_get_all_followers <- function(id, sure = "No!") {
stopifnot(sure == "Yes, I know what I am doing")
followers <- c()
still_working <- TRUE
max_id <- NULL
while (still_working) {
next_lot <- get_account_followers(id,
max_id = max_id)
followers <- bind_rows(followers, next_lot)
attrs <- attr(next_lot, "headers")
if ("max_id" %in% names(attrs))
max_id <- attrs$max_id
else
still_working <- FALSE
}
followers
}
Get my followers:
my_followers <- really_get_all_followers(
whoami,
sure = "Yes, I know what I am doing"
)
This number is correct: it worked!
nrow(my_followers)
[1] 494
What servers are they from?
get_servers <- function(followers) {
servers <- followers$acct |> str_split_fixed("@", 2)
servers[,2]
}
followers_servers <- my_followers |>
mutate(server = get_servers(my_followers)) |>
mutate(server = ifelse(server == "", "tech.lgbt", server))
Here are the counts:
server_count <- followers_servers |>
group_by(server) |>
summarise(n = n()) |>
arrange(desc(n))
server_count
Check everything adds up:
server_count$n |> sum()
[1] 494
So far so good.
Next up, how many users are there on each of those servers? Note the exception handling…
get_user_count <- function(server) {
res <- NA
# This will catch problems like missing servers
tryCatch(
res <- get_instance_general(server)$stats$user_count,
error = function(e) {
cat("***")
cat(server)
cat("***")
cat("\n")
print(e)
cat("\n")
}
)
ifelse(is.numeric(res), ifelse(length(res) == 1, res, NA), NA)
}
server_count$server_user_n <- map(server_count$server, get_user_count)
***bbs.crumplab.com***
<simpleError in curl::curl_fetch_memory(url, handle = handle): SSL peer certificate or SSH remote key was not OK: [bbs.crumplab.com] schannel: SEC_E_UNTRUSTED_ROOT (0x80090325) - The certificate chain was issued by an authority that is not trusted.>
***bv.umbrellix.org***
<simpleError: something went wrong. Status code: 503>
***bytebuilders.uk***
<simpleError in curl::curl_fetch_memory(url, handle = handle): SSL peer certificate or SSH remote key was not OK: [bytebuilders.uk] schannel: SEC_E_UNTRUSTED_ROOT (0x80090325) - The certificate chain was issued by an authority that is not trusted.>
***calckey.social***
<simpleError: something went wrong. Status code: 500>
***fedi.astrid.tech***
<simpleError in curl::curl_fetch_memory(url, handle = handle): Timeout was reached: [fedi.astrid.tech] Connection timeout after 10012 ms>
***firefish.social***
<simpleError: something went wrong. Status code: 500>
***iscurrently.live***
<simpleError: something went wrong. Status code: 522>
No encoding supplied: defaulting to UTF-8.
***mythago.space***
<simpleError in curl::curl_fetch_memory(url, handle = handle): SSL peer certificate or SSH remote key was not OK: [mythago.space] schannel: SEC_E_UNTRUSTED_ROOT (0x80090325) - The certificate chain was issued by an authority that is not trusted.>
***social.ebusinessworkshop.co.uk***
<simpleError in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: social.ebusinessworkshop.co.uk>
***toot.theresnotime.io***
<simpleError in curl::curl_fetch_memory(url, handle = handle): schannel: next InitializeSecurityContext failed: SEC_E_ILLEGAL_MESSAGE (0x80090326) - This error usually occurs when a fatal SSL/TLS alert is received (e.g. handshake failed). More detail may be available in the Windows System event log.>
server_count
Hmmm something went wrong… Quick fix:
server_count$server_user_n2 <-
server_count$server_user_n |> sapply(\(x) ifelse(length(x) == 1, x[[1]], NA))
server_count
A couple of histograms:
server_count |>
select(n, server_user_n2) |>
na.omit() |>
pivot_longer(cols = everything(),
names_to = "key",
values_to = "value") |>
mutate(nice_name = case_when(key == "n" ~ "Followers",
key == "server_user_n2" ~ "Users on server")) |>
ggplot(aes(value)) +
facet_wrap(~ nice_name, scales = "free") +
geom_histogram(bins = 40) +
labs(x = "Users", y = "Freq")
A scatterplot:
server_count |>
na.omit() |>
mutate(home = ifelse(server == "tech.lgbt",
"My home server",
"Elsewhere"),
home = factor(home,
c("My home server", "Elsewhere"))) |>
ggplot(aes(y = log(n, 10),
x = log(server_user_n2, 10),
colour = home)) +
geom_point() +
scale_colour_manual(values = c("magenta", "black")) +
#theme_bw() +
labs(y = expression(log[10]~(followers)),
x = expression(log[10]~(total~server~users)),
title = "Follower count by server",
colour = "")
There is indeed a correlation:
cor.test(~ n + server_user_n2, data = server_count, method = "kendall")
Kendall's rank correlation tau
data: n and server_user_n2
z = 6.0364, p-value = 1.576e-09
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.3970694
Last run (or at least knitted) Mon Aug 7 21:42:09 2023.