author: niplav, created: 2023-12-21, modified: 2024-04-18, language: english, status: finished, importance: 3, confidence: certain
Subscripts in text can be used to attach explicit probabilities to claims99%.
Gwern has wondered about a use-case for subscripts in hypertext. While they have settled on a specific use-case, namely years for citations, I propose a different one: reporting explicit probabilities.
Explicitely giving for probabilities in day-to-day English text is usually quite clunky: "I assign 35% to North Korea testing an intercontinental ballistic missile until the end of this year" reads far less smoothly than "I don't think North Korea will test an intercontinental ballistic missile this year".
And since subscripts are a solution in need of a problem, one can wonder how well those two fit together: Quite well, I claim.
In short, I propose to append probabilities in subscript after a statement
using standard HTML subscript notation (or $\LaTeX$
as a fallback if
it's available), with the probability possibly also being a link to a
relevant forecasting platform with the same question:
I think Donald Trump is going to be incarcerated before 203065%.
This is almost as readable as the sentence without the probability.
There are some complications with negations in sentences or multiple statements. For the most part, I'll simply avoid such cases ("Doctor, it hurts when I do this!" "Don't do that, then."), but if I had to, I'd solve the first problem by declaring that the probability applies to the literal meaning of the previous sentence, including all negations; the problem with multiple statements is solved by delimiters.
As an example for the different kinds of negation: "The train won't come more than 5 minutes late90%" would (arguendo) mean the same thing as "I don't think the train will come more than 5 minutes late90%" means the same as "The train will take more than 5 minutes to arrive10%" equivalent to "I assign 90% probability to the train arriving within the next 5 minutes".
With multiple statements, my favorite way of delimiting is currently half brackets: "I think ⸤it'll rain tomorrow⸥55%, but ⸤Tuesday is going to be sunny⸥80%, but I don't think ⸤your uncle is going to be happy about that⸥15%."
The probabilities in this context aren't quite evidentials, but neither are they veridicals nor miratives, I propose the world "credal" for this category.
The exact place of insertion is subtle: In sentences with a single central statement, there are multiple locations one could place the probability.
This becomes trickier in sentences with multiple statements.
A variant of the notation could use decimal notation instead
of percentages, and leave out trailing zeroes. "I think it'll
rain tomorrow$_{50\%}$
" would then become the more compact "I
think it'll rain tomorrow$_{.5}$
". This has the advantage of
being compatible with plain text through the combining dot below
diacritic, which would
yield "I think it'll rain tomorroẉ₅". However, the meaning of the
combining dot can be ambiguous to uninformed readers.
On LessWrong, one can also use reacts signifying probabilities on one's own text. While it's restricted to LessWrong, it also allows other people to easily assign different probabilities to your statements.
Since the people writing the text reporting probabilities are probably logically non-omniscient bounded agents, it might as well be useful to report the time or effort one has spent on refining the reported probability: "I reckon humanity will survive the 21st century55%:20h", indicating that the speaker has reflected on this question for 20 hours to arrive at their current probability (something akin to reporting an "epistemic effort" for a piece of information). I fear that this notation is getting into cumbersome territory and won't be using it.
There are three available options: Either ones writing platform supports
HTML, in which case one can use the <sub>18%</sub>
tags (giving
18%), or it supports $\LaTeX$
, which creates a sligthly
fancier looking but also more fragile notation using _{18\%}
(resulting
in $_{18\%}$
), or ones platform directly supports subscripting, such
as pandoc with ~18%~
, but not
Reddit Markdown (which does support superscript). More info about other
platforms here.
Ideally one would simply use Unicode subscripts, which are available for all digits, but tragically not for the percentage sign '%' or a simple dot '.'. Perhaps a project for the future: After all, they did include a subscript '+'₊, a subscript '-'₋, equality sign '='₌ and parentheses '()'₍₎, but many subscript letters (b, c, d, f, g, j, q, r, u, v, w, y and z) are still missing…
I've used this notation sparingly but increasingly, a good example of a first exploration is here and interspersed in the text here.
Fischer 2023 uses a different notation:
- Given hedonism and conditional on sentience, we think (credence: 0.7) that none of the vertebrate nonhuman animals of interest have a welfare range that’s more than double the size of any of the others. While carp and salmon have lower scores than pigs and chickens, we suspect that’s largely due to a lack of research.
- Given hedonism and conditional on sentience, we think (credence: 0.65) that the welfare ranges of humans and the vertebrate animals of interest are within an order of magnitude of one another.
- Given hedonism and conditional on sentience, we think (credence 0.6) that all the invertebrates of interest have welfare ranges within two orders of magnitude of the vertebrate nonhuman animals of interest. Invertebrates are so diverse and we know so little about them; hence, our caution.
The notation proposed here would change the text:
- Given hedonism and conditional on sentience, we think that none of the vertebrate nonhuman animals of interest have a welfare range that’s more than double the size of any of the others70%. While carp and salmon have lower scores than pigs and chickens, we suspect that’s largely due to a lack of research.
- Given hedonism and conditional on sentience, we think that the welfare ranges of humans and the vertebrate animals of interest are within an order of magnitude of one another65%.
- Given hedonism and conditional on sentience, we think that all the invertebrates of interest have welfare ranges within two orders of magnitude of the vertebrate nonhuman animals of interest60%. Invertebrates are so diverse a nd we know so little about them; hence, our caution.
"Likelihood ratios are good! Likelihood ratios are the only good thing!"
"I agree that likelihood ratios are good! In fact, I think we have a moral responsibility to look for clever strategies to make the likelihood ratios bigger! But at the same time, you know, priors."
"Priors?! How dare you?! Priors are bad!"
—Mark Taylor Saotome-Westlake, “Interlude X”, 2017
For sharing a likelihood
ratio, we
need to talk about both the hypothesis $H$
and the evidence
$E$
. If I then want to say that $E$
updates $H$
by $k$
shannon, how could I
write that?
$E$
provides $k$
bits for/against $H$
" is enough.$E⥌_{k}H$
, specifically $E⥜_{k}H$
if $E$
is evidence for $H$
, and $E⥝_{k}H$
if $E$
is evidence against $H$
.
$E⥣_{k}H$
and $E⥥_{k}H$
in cases where $E$
is strong evidence.