The Born Rule is a Feature of Superposition

Finite probability theory is enriched by introducing the mathematical notion (no physics involved) of a superposition event Σ S –in addition to the usual discrete event S (subset of the outcome space U = ( u 1 , ..., u n )). Mathematically, the two types of events are distinguished using n × n density matrices. The density matrix ρ ( S ) for a discrete event is diagonal and the density matrix ρ (Σ S ) is obtained as an outer product | s i h s | of a normalized vector | s i ∈ R n . Probabilities are deﬁned using density matrices as Pr ( T | ρ ) = tr [ P T ρ ] where T ⊆ U and P T is the diagonal projection matrix with diagonal entries χ T ( u i ). Then for the singleton { u i } ⊆ U , the probability of the outcome u i conditioned by the superposition event Σ S is Pr ( { u i } | Σ S ) = h u i | s i 2 , the Born Rule. Thus the Born Rule arises naturally from the mathematics of superposition when superposition events are added to ordinary ﬁnite probability theory. No further explanation is required when the mathematics uses C n instead of R n except that the square h u i | s i 2 is the absolute square |h u i | s i| 2 .


Introduction
The purpose of this paper is to expand ordinary finite probability theory by introducing superposition events in addition to the usual discrete events (subsets of the outcome space) and then to show that the Born Rule naturally arises mathematically with superposition events.A superposition event or subset is a purely mathematical notion-although obviously inspired by the notion of a superposition state in quantum mechanics (QM).As a mathematical notion, it could have been (but was not) introduced centuries before QM.The thesis is that the Born Rule is not a bug that needs to be "explained" or "justified"; it is just a feature of the notion of a superposition event in this expanded probability theory.

Superposition events
The outcome (or sample) space is a set U = {u 1 , ..., u n } which has point probabilities p = (p 1 , ..., p n ).An event S is a subset S ⊆ U with the probability Pr (S) = ui∈S p i .For T ⊆ U , the conditional probability of T given S is Pr (T |S) = Pr(T ∩S) Pr(S) so the conditional probability of a singleton event {u i } is Pr ({u i } |S) = pi Pr(S) if u i ∈ S, else 0. In an (ordinary) event S, the atomic outcomes or elements of S are considered as perfectly discrete and distinguished from each other; in each run of the "experiment" or trial conditioned on S, one of the discrete outcomes in S is chosen.The intuitive idea of the corresponding superposition state, denoted ΣS, is that the outcomes in the state are not distinguished but are blobbed or cohered together as indefinite event.In each run of the "experiment" or trial conditioned on ΣS, the indefinite state is sharpened to a less indefinite state which maximally is one of the outcomes in S [4].In the case of a singleton event S = {u i }, the ordinary event S = {u i } is the same as the superposition event ΣS = Σ {u i } = {u i } = S.For a suggestive visual example, consider the outcome set U as a pair of isosceles triangles that are distinct by the labels on the equal sides and the opposing angles.

Figure 1: Set of distinct isosceles triangles
The superposition event ΣU is definite on the properties that are common to the elements of U , i.e., the angle a and the opposing side A, but is indefinite where the two triangles are distinct, i.e., the two equal sides and their opposing angles are not distinguished by labels..It might be noted that this notion of superposition and the notion of abstraction are essentially flip-side viewpoints of the same idea of extracting from a set an entity that is definite on the commonalities of the elements of the set and indefinite on where the elements differ [3].The two flip-side viewpoints are like seeing a glass half-empty (superposition) or seeing a glass half-full (abstraction).
What is a mathematical model that will distinguish between the ordinary event S and the superposition event ΣS?Using n-ary column vectors in R n , the ordinary event S could be represented by the column vector, denoted |S , with the i th entry χ S (u i ), where χ S : U → {0, 1} is the characteristic function for S, i.e., χ S (u i ) = 1 if u i ∈ S, else 0. But to represent the superposition event ΣS we need to add a dimension to use two-dimensional n × n matrices to represent the blobbing together or cohering of the elements of S in the superposition even ΣS.
An incidence matrix for a binary relation R ⊆ U ×U is the n×n matrix In (R) where In (R) jk = 1 if (u j , u k ) ∈ R, else 0. The diagonal ∆S is the binary relation consisting of the ordered pairs {(u i , u i ) : u i ∈ S} and its incidence matrix In (∆S) is the diagonal matrix with the diagonal elements χ S (u i ).The superposition state ΣS could then be represented as In (S × S), the incidence matrix of the binary relation S × S ⊆ U × U , where the non-zero off-diagonal elements represent the cohering or blobbing together of the corresponding diagonal elements.That incidence matrix could be constructed as the outer product |S (|S ) t = |S S| = In (S × S) (where S| = (|S ) t is the transpose).
If we divided In (∆S) and In (S × S) through by their trace (sum of diagonal elements) |S|, then we obtain two density matrices ρ (S) = In(∆S) In general, a density matrix ρ over the reals R (or the complex numbers C) is a symmetric matrix ρ = ρ t (or conjugate symmetric matrix ρ = (ρ * ) t in the case of C) with trace tr [ρ] = 1 and all non-negative eigenvalues.A density matrix ρ is pure if ρ 2 = ρ, otherwise a mixture.
Intuitively, the interpretation of the superposition event represented by ρ (ΣB 1 ) = ρ (Σ {♦, ♥}) is that it is definite on the properties common to its elements, e.g., in this case, being a red suite, but indefinite on where the elements differ.The indefiniteness is indicated by the non-zero off-diagonal elements that indicate that the diamond suite ♦ is blurred, cohered, or superposed with the hearts suite ♥ in the superposition state Σ {♦, ♥}.
The next step is to bring in the point probabilities p = (p 1 , ..., p n ) where those two real density matrices ρ (S) and ρ (ΣS) defined so far correspond to the special case of the equiprobable distribution on S with 0 probabilities outside of S.

= pi
Pr(S) if u i ∈ S, else 0. The density matrix ρ (S) for the classical or discrete event S is then the diagonal matrix with the entries Pr ({u i } |S) χ S (u i ) = pi Pr(S) χ S (u i ).Given two column vectors |s = (s 1 , ..., s n ) t and |t = (t 1 , ..., t n ) t in R n , their inner product is the sum of the products of the corresponding entries and is denoted t|s = (|t ) t |s = n i=1 t i s i .Their outer product is the n × n matrix denoted as |s t| = |s (|t ) t .A vector |s is normalized if s|s = 1.Let |u i be the n-ary column vector with the i th entry 1 and otherwise 0.
To construct the density matrix ρ (ΣS) for the superposition event ΣS, we may first construct the normalized column vector |s where the i th entry, denoted u i |s = (|u i ) t |s , is the "amplitude": pi Pr(S) if u i ∈ S, else 0. In the case of the equiprobable distribution 1|S| on S, Then the density matrix for the superposition event ΣS is the outer product: The density matrices for the classical event S and for the superposition event ΣS have quite different properties.The density matrix ρ (ΣS) is pure since ρ 2 = ρ, while the density matrix ρ (S) is a mixture (except in the special case when S = {u i } is a singleton so that ρ ({u i }) = ρ (Σ {u i })).For ρ (S), the eigenvalues are just the conditional probabilities Pr ({u i } |S) = Pr({ui}∩S) Pr(S)

= pi
Pr(S) χ S (u i ).But for the pure density matrix ρ (ΣS), there is one eigenvalue of 1 with the rest of the eigenvalues being zeros (since the sum of the eigenvalues is the trace).Given just ρ (ΣS), the vector |s is obtained as the normalized eigenvector associated with the eigenvalue of 1 and ρ (ΣS) = |s s|. 1  A partition π = {B 1 , ..., B m } on U is a set of non-empty subsets, called blocks, B j ⊆ U that are disjoint and whose union is U .Taking each block B j = S, then there is the normalized column vector |b j whose i th entry is pi Pr(Bj ) χ Bj (u i ) and the density matrix ρ (ΣB j ) = |b j b j | for the superposition subset ΣB j .Then the density matrix ρ (π) for the partition π is just the probability sum of those pure density matrices for the superposition blocks: The eigenvalues for ρ (π) are the m probabilities Pr (B j ) with the remaining n − m values of 0. Given two partitions π = {B 1 , ..., B m } and σ = {C 1 , ..., C m ′ }, the join is the partition σ ∨ π whose blocks are the non-empty intersections B j ∩ C j ′ of the blocks of π and σ.To construct the meet σ ∧ π, form the undirected graph on U where there is a link between u j and u k if they are in the same block of π or in the same block of σ.Then the blocks of the meet are the connected components of that graph.The partition π refines the partition σ, written σ π, if for each block B j ∈ π, there is a block C j ′ ∈ σ such that B j ⊆ C j ′ .Then the partitions on U form a lattice Π (U ) with the refinement partial order.The maximal partition or top of the lattice is the discrete partition 1 U = {{u i }} n i=1 where all the blocks are singletons and the minimal partition or bottom is the indiscrete partition 0 U = {U } with only one block U .Then the density matrices for these top and bottom partitions are just the density matrices for the discrete set U and the superposition set ΣU :

Probabilities using density matrices
A (real-valued) random variable on the outcome space U is a function f : U → R with distinct values of {φ 1 , ..., φ m }.The inverse image of f is a partition π = {B j } m j=1 where B j = f −1 (φ j ).In ordinary classical probability theory, the conditional probability of getting the value φ j given the event S in a trial is Pr (φ j |S) = Pr(Bj ∩S) Pr(S) .But now we have two versions of S, the discrete event and the superposition event.Since they have different density matrices, we can take the given conditioning event as a density matrix ρ.Let P T for T ⊆ U be the diagonal projection matrix with the diagonal entries (P T ) ii = χ T (u i ).Projection matrices are idempotent, i.e., P T P T = P T and equal their transpose P T = P t T .The usual conditional probability of the classical event T given the classical event S can be computed as: Starting with the conditioning event being the superposition event ΣS, that probability is defined as: .
It is particularly notable that the probabilities for the values of a random variable (or any given event T ) are the same if the conditioning event is the classical event S represented by the mixed ρ (S) or the superposition event ΣS represented by the pure ρ (ΣS): = tr P Bj ρ (S) = Pr (φ j |ρ (S)).
But the interpretation is quite different.The classical trial starting with the subset S represented by ρ (S) picks out the subset B j ∩S with probability Pr (φ j |S) = tr P Bj ρ (S) = Pr(Bj ∩S) Pr(S) .However, the 'measurement' or trial conditioned by the superposition event ΣS represented by ρ (ΣS) 'sharpens' or projects that indefinite event to the more definite superposition event Σ (B j ∩ S) with probability Pr (φ j |ρ (ΣS)) = tr P Bj ρ (ΣS) .
In either case, the follow-up trial or 'measurement' returns the same value φ j with probability 1, i.e., Pr (φ j |B j ∩ S) = tr P Bj ρ (B j ∩ S) = 1 = tr P Bj ρ (Σ (B j ∩ S)) .In the classical case, all the elements of a non-empty B j ∩ S have the value φ j so φ j occurs conditioned on the classical event B j ∩ S with probability 1.In the superposition case, the property of having the value φ j is a commonality, i.e., is definite, on the superposition event Σ (B j ∩ S) represented by ρ (Σ (B j ∩ S)), so no 'sharpening' occurs and probability of a trial returning that definite value φ j is 1.
Let us illustrate this result with the case of flipping a fair coin.The classical set of outcomes U = {H, T } is represented by the density matrix: Figure 3: Classical event: trial picks out heads or tails The superposition event ΣU , that blends or superposes heads and tails, is represented by the density matrix: . Figure 4: Superposition event: Trial sharpens to heads or tails.
The probability of getting heads in each case is: Pr (H|ρ (ΣU )) = tr P {H} ρ (ΣU ) = tr 1 0 0 0 and similarly for tails.Thus the two conditioning events U and ΣU cannot be distinguished by performing an experiment or trial that distinguishes heads and tails.This is a feature, not a bug, since the same thing occurs in quantum mechanics.For instance, a spin measurement along, say, the z-axis of an electron cannot distinguish between the superposition state 1 √ 2 (|↑ + |↓ ) with a density matrix like ρ (ΣU ) and a statistical mixture of half electrons with spin up and half with spin down with a density matrix like ρ (U ) [1, p. 176].2

Conclusion: The Born Rule
The Born Rule does not occur in ordinary classical probability theory because that theory does not include superposition events.When superposition events are introduced into the purely mathematical theory [2], then the Born Rule naturally emerges as a feature of the mathematical treatment of superposition.
The pure density matrix ρ (ΣS) can be constructed as the outer product ρ (ΣS) = |s (|s ) t = |s s| where |s is the n-ary column vector with the i th entry as .
Or starting with the pure density matrix ρ (ΣS), then |s so that ρ (ΣS) = |s s| is obtained as the normalized eigenvector associated with the eigenvalue of 1.
The probability of u i conditioned on the superposition event ΣS is: .
The point is that this same probability obtained conditioned by the two-dimensional density matrix ρ (ΣS) could also be obtained from the one-dimensional vector |s as: = Pr ({u i } |ρ (ΣS)) = tr P {ui} ρ (ΣS) .The Born Rule In the case of the random variable f : U → R, Pr (φ j |ρ (ΣS)) = f (ui)=φj u i |s 2 .The Born Rule does not occur in classical finite probability theory since the events S are all discrete sets that can be represented by n-ary column vectors.The associated two-dimensional diagonal density matrix ρ (S) is not the outer product of a one-dimensional vector with itself (except when S is a singleton).To accommodate the notion of a superposition event ΣS, it is necessary to use two-dimensional density matrices ρ (ΣS) where the non-zero off-diagonal elements indicate the blobbing or cohering together in superposition of the elements associated with the corresponding diagonal entries.And mathematically those density matrices ρ (ΣS) can be constructed as the outer product |s (|s ) t = |s s| of a one-dimensional vector |s with itself.Then the probability of the individual outcomes u i conditioned by the superposition event ΣS is given by the Born Rule: Pr ({u i } |ρ (ΣS)) = u i |s 2 .Thus the Born Rule arises naturally out of the mathematics of probability theory enriched by superposition events. 3It does not need any more-exotic or physics-based explanation.No physics was used in the making of this paper.The Born Rule is just a feature of the mathematics of superposition.