PDF
Bayes Theorem1Bayes' Theorem: Updating Beliefs with EvidenceRiguz Lee · 2026-01-15 · math, probability, statisticsA deep dive into Bayes theorem from its 18th-century origins to modern applications in spam filters, medical diagnosis, and machine learning.ContentsThe Problem of Inverse Probability ................................................. 1Bayes Theorem ...................................................................... 1A Simple Example ........................................................................... 2Visualizing the Update ...................................................................... 2The Odds Form ....................................................................... 2Implementation ....................................................................... 3Applications .......................................................................... 4Spam Filtering .............................................................................. 4A/B Testing ................................................................................. 5Bayesian Neural Networks ................................................................... 5The Philosophy of Priors ............................................................. 5Conclusion ............................................................................ 6Bibliography .......................................................................... 6Your ability to do good science depends on your ability to update your beliefs in the face of evidence. E. T. Jaynes [1]The Problem of Inverse ProbabilitySuppose you test positive for a rare disease. The test is 99% accurate. Should you panic?The intuitive answer is yes but as well see, the correct answer depends critically on how rare the disease is.This is a question of inverse probability: given an observed effect (positive test), what is the probability of the underlying cause (having the disease)? For centuries, this kind of reasoning was considered impossible or meaningless. It took a Presbyterian minister named Thomas Bayes to crack the problem.Bayes TheoremBayes theorem relates the conditional probability of a hypothesis 𝐻 given evidence 𝐸 to the reverse:𝑃(𝐻|𝐸)=𝑃(𝐸|𝐻)𝑃(𝐻)𝑃(𝐸)(1)where:𝑃(𝐻|𝐸) is the posterior our updated belief after seeing evidence𝑃(𝐸|𝐻) is the likelihood how probable the evidence is if the hypothesis is true𝑃(𝐻) is the prior our belief before seeing evidence𝑃(𝐸) is the marginal likelihood the total probability of the evidence1 The Odds Form2A Simple ExampleConsider a medical test for a disease that affects 1 in 1,000 peopleThis base rate is crucial. Many real-world errors in reasoning come from ignoring it a phenomenon called base rate neglect.:Prevalence: 𝑃(disease)=0.001Sensitivity: 𝑃(positive|disease)=0.99 (true positive rate)Specificity: 𝑃(negative|no disease)=0.95 (true negative rate)What is 𝑃(disease|positive)?First, compute the marginal probability of testing positive:𝑃(positive)=𝑃(positive|disease)𝑃(disease)+𝑃(positive|no disease)𝑃(no disease)(2)=0.99×0.001+0.05×0.999=0.05094(3)Then apply Equation 1:𝑃(disease|positive)=0.99×0.0010.050940.0194(4)The result is shocking: despite the test being 99% accurate, a positive result only means a  2% chance of actually having the disease.This counterintuitive result arises because the false positives from the large healthy population (5%) vastly outnumber the true positives from the tiny sick population (0.1%).Visualizing the UpdateThe shift from prior to posterior is best seen visually. Figure 1 shows how the distribution concentrates around the true value as evidence accumulates:Figure 1: Prior vs posterior distribution. The posterior concentrates around the true value as evidence accumulates.The following table tracks how our belief updates through each piece of evidence:Table 1: Sequential Bayesian updates. Each independent positive test dramatically shifts the probability.StageP(disease)OddsPrior (before test)0.0011 : 999After positive test0.01941 : 50.5After second positive test0.2841 : 2.5After third positive test0.8676.5 : 1This is why doctors order confirmatory tests. A single positive is weak evidence; multiple independent positives compound into strong evidence.The Odds FormBayes theorem has an elegant equivalent in odds form that makes sequential updating trivial:odds(𝐻|𝐸)=LR×odds(𝐻)(5)where the likelihood ratio LR=𝑃(𝐸|𝐻)/𝑃(𝐸|¬𝐻) and odds(𝐻)=𝑃(𝐻)/(1𝑃(𝐻)).This is the form used in practiceThe odds form avoids computing the marginal probability 𝑃(𝐸) entirely the likelihood ratio handles it implicitly. because it factors out 𝑃(𝐸) and lets us chain updates:odds𝑛=LR1×LR2××LR𝑛×odds0(6)2 Implementation3Figure 2 illustrates the likelihood ratio concept: when the evidence is much more likely under the hypothesis than under the alternative, the LR is large and the update is strong.Figure 2: The likelihood ratio compares how probable the evidence is under competing hypotheses. A large LR means strong evidence for the hypothesis.ImplementationThe code is straightforward. Heres a Python implementation that handles both single and sequential updates:class BayesUpdater: """Bayesian belief updater using the odds form.""" def __init__(self, prior: float): self.odds = prior / (1 - prior) self.history = [prior] def update(self, lr: float) -> float: """Update belief with a new likelihood ratio.""" self.odds *= lr p = self.odds / (1 + self.odds) self.history.append(p) return p @property def probability(self) -> float: return self.odds / (1 + self.odds)# Medical diagnosis examplebelief = BayesUpdater(prior=0.001)# First positive test (LR = sensitivity / (1 - specificity))belief.update(lr=0.99 / 0.05) # P 0.0194print(f"After 1st positive: {belief.probability:.4f}")# Second independent positive testbelief.update(lr=0.99 / 0.05) # P 0.2843 Applications4print(f"After 2nd positive: {belief.probability:.4f}")# Third independent positive testbelief.update(lr=0.99 / 0.05) # P 0.867print(f"After 3rd positive: {belief.probability:.4f}")Output:After 1st positive: 0.0194After 2nd positive: 0.2841After 3rd positive: 0.8673Visualizing the sequence: Figure 3 shows how the probability climbs with each independent positive test starting from near zero and crossing 50% after just three tests.Figure 3: Sequential Bayesian updates in the medical diagnosis example. Each positive test dramatically shifts the posterior probability.ApplicationsSpam FilteringEmail spam filters were one of the first killer apps of Bayesian reasoning. Paul Grahams 2002 essay A Plan for Spam [2] showed that a simple Bayesian classifier could achieve >99.5% accuracy.The idea: each word has a spam probability. Given the words in an email, combine them using Bayes theorem:𝑃(spam|𝑤1,𝑤2,,𝑤𝑛)=11+𝑛𝑖=11𝑝𝑖𝑝𝑖1𝑠𝑠(7)4 The Philosophy of Priors5where 𝑝𝑖=𝑃(spam|𝑤𝑖) and 𝑠 is the prior spam rate.Modern spam filters (Gmail, etc.) use more sophisticated models, but the core insight remains Bayesian.A/B TestingIn product development, Bayesian A/B testing provides a principled way to decide whether variant A or variant B is better:𝑃(B > A|data)=1010𝟙[𝑝𝐵>𝑝𝐴]𝑃(𝑝𝐴|data)𝑃(𝑝𝐵|data)𝑑𝑝𝐴𝑑𝑝𝐵(8)Unlike frequentist hypothesis testing, this directly answers the question product managers actually care about: what is the probability that B is better?Bayesian Neural NetworksDeep learning has embraced Bayesian ideas. Instead of learning a single weight vector 𝑤, a Bayesian neural network learns a distribution over weights 𝑃(𝑤|data), enabling principled uncertainty estimatesUncertainty quantification is critical for safety-sensitive applications like autonomous driving and medical diagnosis..Figure 4 compares traditional and Bayesian approaches to neural network weights:Figure 4: Traditional neural networks learn a single point estimate (left). Bayesian networks learn a distribution over weights (right), capturing uncertainty.𝑃(𝑦|𝑥,data)=𝑃(𝑦|𝑥,𝑤)𝑃(𝑤|data)𝑑𝑤(9)The Philosophy of PriorsThe choice of prior is the most controversial aspect of Bayesian reasoning. Critics argue it introduces subjectivity; proponents counter that all reasoning involves assumptions, and Bayes just makes them explicit.There are several schools of thought:Objective Bayes: use uninformative priors (Jeffreys, maximum entropy) that let the data speakSubjective Bayes: encode genuine prior knowledge this is a feature, not a bugEmpirical Bayes: estimate the prior from the data itselfEmpirical Bayes is widely used in genomics and other fields with many similar hypotheses see Efrons work on large-scale inference [3].5 Bibliography6The pragmatic view: the prior matters most with little data. With enough evidence, all reasonable priors convergeThis is a consequence of the washing out of priors Bayesian posteriors are asymptotically dominated by the likelihood..ConclusionBayes theorem is not just a formula it is a framework for rational thought. It tells us how to update our beliefs when new evidence arrives, how much weight to give that evidence, and when to be skeptical of our own intuitions.From spam filters to medical diagnosis, from A/B testing to artificial intelligence, Bayesian reasoning is everywhere often hiding in plain sight. The next time you encounter surprising evidence, ask yourself: what is the likelihood ratio, and what was my prior?Bibliography[1]E. T. Jaynes, Probability Theory: The Logic of Science. Cambridge University Press, 2003.[2]P. Graham, A Plan for Spam, 2002, [Online]. Available: http://www.paulgraham.com/spam.html[3]B. Efron, Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge University Press, 2010.6

HTML view coming soon.

Download PDF for the full formatted version.