Highschool is usually the first place people encounter any form of mathematics, and along with this setting comes some inappropriate expectations about what lies ahead in mathematics. I might go as far as to say that highschool mathematics and even first year undergraduate mathematics approaches the topic primarily from the perspective of an engineer or experimental physicist; that is mainly to say that you wouldn't call highschool mathematics even particularly 'mathematical' in the ways that a mathematician is actually concerned with the topics studied.
I myself only discovered the seriousness of this discrepency in my first real analysis class. I had lamented at the time that it felt like I was being 'expected to do math as a lawyer who writes computer compileable legalese'. However, as time has gone on, not only have I grown to embrace that description, I've also come to understand that the other classes I was taking at the time (with which I did not have this grievance) were more about applying a physicist's reasoning to geometry and calculus than true mathematics.
This 'true mathematics' is the programmatic legalese. In mathematics, we care very deeply about strict conditions and labels that tell us with certainty under what circumstances our tools will always work. A peer once even said to me that the mathematics was not in symbols, but in all of the words around them.
To that end, we will not spend this section discussing mathematics either; recalling the notion that mathematics sits in between logic and physics on an axis of rigor, we will swing all the way to the left and discuss a description of formal logic. In particular we are going to discuss formal logic in the context of type theory, which will allow us draw on programming metaphors which our society's culture has become much more familiar with in recent years.
However I must confess that the programming metaphor has a special value to me, and I hope also soon to you. As I was first coming to understand that mathematics was inseparable from its legalese, the pattern of theorems and proofs in particular (and also due in part to my double major in physics), I had a notion that one does not truly understand a proof unless one sees clearly how the steps of the proof intuitively mean that the theorem must be obvious. I now see this as wrong, or at least far too physicist-brained. There was a period in my education when I was very ill, and although feeling like I had lost half my brain, I realised that as long as you are doing pure mathematics, you only need half a brain.
The job in solving the puzzle is to connect all of the hypotheses, like pipes or belts in a factory, to the relevant machines (other theorems) that transform them into whatever they need to be to satisfy the proof obligations. Proving a theorem is then no different than playing Factorio or programming. For most jobs, the high level understanding of the whole system is completely subserviant to the low level understanding where you merely know how to correctly plug pipes into ports. If you can do that, then with sufficient effort and an open mind to understand your objects/items/constructions and use your tools/machines/theorems, you can do mathematics. It is this, among other experiences, that makes me so confident that anyone can follow this text, provided I have explained myself well enough.
First I must belabor about some of the basic constructions of programming. You may of course see fit to skip this part, and that would be reasonable if you are familiar with the languages Rust or Haskell, or a similar language with a robust type system. Otherwise, or even in spite of this, it will be valuable to examine what a type is or what we might want it to be, along with what we might expect of functions or arguments.
First, what is a function? There are so many things we could say about this, but first, let us say that a function is a black box that takes an input (or an argument) and gives us back an output, in such a way that we will always get the same output if we use the same input. This could be a function like $x \mapsto x + 1$ which takes a number $x$ and adds to it one before returning it; this notation with the $\mapsto$ symbol (read as 'maps to', and distinct from $\to$) defines a function in terms of what an output looks like assuming a given input.
But a function need not be defined by the procedure we take to calculate its output; if you draw a circle of radius one, draw a straight line through its center, and then for some angle off of that straight line draw a ray, the distance perpendicular to our line to where the ray intersects the circle would be the sine of the angle. While methods to calculate this exist, what the function is here is the relationship between an input to an output, not necessarily a sequence of arithmetic operations. A function could be a pair of lists, $[1,2,3]$ and $[5,2,3]$ such that for an input, we find its position in the first list and take the number at the matching position in the second list, $1 \mapsto 5$, $2 \mapsto 2$ and $3 \mapsto 3$; we just as soon make the lists uncountably infinite, the outputs seemingly random, and we still have a function, and for this reason we also call a function a map. The function is defined by its repeatability and no more. This ambiguity about how we take a function is not a failure of the abstraction but a feature: much later on it will allow us to study functions for which we only have partial information, and perhaps even discover from that information how to calculate them.
But in each of these cases, even the most arbitrary, we have declared either implicitly or explicitly, a prerequisite condition for an input. For instance, with possible exception to the final example, we could not give any of these functions as an input, say, a letter of the alphabet. The procedure $x \mapsto x + 1$ does not admit addition of a letter, the relationship between an angle and its sine does not admit non-angles, and yet we can just as well define a function that takes a letter and sends it to its opposite letter in the alphabet $a \mapsto z$ or a word to itself interspersed with some symbol $``hello" \mapsto ``h.e.l.l.o"$, so the fault is not in the concept of a letter or a word. In fact, one might even conceive of a function which we define to take both numbers and letters, which maps a number $x \mapsto x + 1$ or a letter to its next letter $a \mapsto b$, $b \mapsto c$, selecting whether to be a function of letters or numbers be based on its input. In this case one might even say that we have extended the function to letters by treating them as numbers, using $+1$ to move from one number to the next. But what of $+27$? What is the sine of a letter? Our naive extensions break down.
It is a simple pragmatism that we decide (perhaps only as an arbitrary choice we make now then cling to dearly) that operations such as addition, or relationships that define things like the sine or cosine, fall into classes or types of what they take as inputs. We can have the type of counting numbers $\mathbb{N}$, the type of letters, etc. or the type of functions such as $f\colon n \mapsto n + 1$ which we might write as $f\colon \mathbb{N} \to \mathbb{N}$ to symbolize that it is a function-type with input type $\mathbb{N}$ and output type $\mathbb{N}$, and that this type $\mathbb{N} \to \mathbb{N}$ has as a member $f$. We would equally write $5 \colon \mathbb{N}$ in this scheme, to say that $5$, as a counting number, has type $\mathbb{N}$, of the counting number.
But we have a problem that must be examined closely if we want to make this concept robust. What is it exactly that populates our types? Where does a counting number come from, and if we have two counting numbers, how should we know that they are distinct or the same? Are complex numbers really numbers? If them then why not every clifford algebra? This may seem pedantic but in fact it is the result of carelessness; we cannot say that a type is of numbers without being stuck in the problem of 'what is a number' with its own risk of sophistry. There is a simple solution at the cost of a much more arbitrary choice than to declare that functions' inputs have types. We make a firm rule about where types come from which is the following: a type is defined as having constructors, which are function-like objects which take an object of the correct type and return as their output a new object that does not exist except as the output of this constructor when given that input. This means, among other things, that an object has only one type. We now present the typical example.
We define the natural numbers, denoted $\mathbb{N}$, to be a type with the following constructors:$$\begin{gather*} \mathbf{Z}\colon \mathbb{N}, \\[0em] \mathbf{S}\colon \mathbb{N} \to \mathbb{N} \end{gather*}$$ which we call zero and the successor respectively. That is, $\mathbf{Z}$ can be thought of as a function that takes nothing and always produces the same result as a constructor, thus defining one single element of the type; $\mathbf{S}$ can also be thought of a function that takes any other member $n \colon \mathbb{N}$ and uniquely defines a new member $\mathbf{S}(n) \colon \mathbb{N}$. By unique, we mean that if $\mathbf{S}(a) = \mathbf{S}(b)$ then $a = b$ and vice versa.
Outside of this section, we consider the natural numbers to start from one not zero.
It is reasonable to see this and wonder how this definition of the natural numbers, defined ostensibly from only two rules, populates the entire set of counting numbers we know. The simple answer to this is that you would read $\mathbf{S}$ as a function $n \mapsto n + 1$, however instead of being a function, it is in this instance the definition of what it means to count forward one number. What we have done, really, is to define the counting numbers distinctly as 'the first number' and 'the number we count after that'. What that means is that a number like $3$ would be written in its full form as $\mathbf{S}(\mathbf{S}(\mathbf{S}(\mathbf{Z})))$, and we would 'read' the number based on counting the successor functions, i.e. counting how many times we counted up one.
If you're familiar with rust then you may have seen something like the above before. In fact the corresponding code to recreate this in rust would look something like this:
enum NaturalNumber {
Z,
S(Box<NaturalNumber>)
}
Where we have 'boxed' the natural number purely because the rust compiler does not allow enums constructors to contain instances of themselves since this can lead to not knowing the size of the constructor; by holding a box, we know that the size of $\mathbb{S}$ is equal to or slightly greater than a pointer to our boxed natural number within. Such a constraint does not exist in languages such as Haskell; see below the equivalent construction.
data NatNum = Z | S NatNum
On its own a construction like this is not very useful in rust or haskell; in fact languages that literally use the construction above for the sake of proofs (lean, agda, coq, generally proof languages) will usually have some behind the scenes tool to substitute anything that looks roughly like a natural number with an unsigned integer. However if you say that the 'number' can hold one other thing, then it starts to look more like a linked list.
enum SimpleLinkedList<T> {
Z,
S(Box<T>, Box<SimpleLinkedList>)
}
More importantly to our purposes however, it is then the case that any object in code that looks roughly like a linked list in this sort of way also represents natural numbers, i.e. $\mathtt{NaturalNumbers}$ is virtually identical to $\mathtt{SimpleLinkedList<()>}$ where we have used the unit type in our generic type.
Accordingly, we would then define equivalence of two natural numbers in the following way:
impl PartialEq for NaturalNumber {
fn eq(&self, other: &Self) -> bool {
match (self, other) {
(Z,Z) => true,
(S(m), S(n)) => (m == n)
_ => false
}
}
}
impl Eq for NaturalNumber {}
where we have literally encoded the statement that $\mathbf{Z} = \mathbf{Z}$ and that "if $\mathbf{S}(a) = \mathbf{S}(b)$ then $a = b$" as we had said above. This pattern of defining behaviours or properties of natural numbers recursively will show up a lot as we continue through this section, and is in a sense what we mean when we say 'inductive' natural numbers (think 'proof by induction').
You could reasonably ask now 'but what is $\mathbf{Z}$?' or 'but what does $\mathbf{S}$ do?' in a deeper sense of what they are rather than merely what we intend these symbols to mean. The answer to that, as in a similar fashion we will keep encountering, is whatever works. You could define a notion of natural numbers such as these in python using the notion of a list, or an array in other languages, defining $\mathbf{Z} = \mathtt{[]}$ the empty list, and $\mathbf{S}(n) = \mathtt{[n]}$ as the numeral contained in a list, such that $3 = \mathtt{[[[[]]]]}$, an empty list contained in a list, repeated three times. You could do a similar thing with nested pairs so that $2 = \mathtt{([],([],[]))}$. The instantiation of these constructors does not matter, and they can correspond 'really' (insofar as one rationalizes that there is something real about the constructs of programming languages or stories about what a CPU does with those constructs) to anything you like as long as you are able and willing to read that function as symbolizing what we intend it to mean.
In fact, this is almost literally the idea of Alonzo Church, the famous Church Numerals as they have come to be known. In his construction, constructors were literally functions, but merely functions which we commit to never defining a way to compute. In such a way, you would read ${f \circ f \circ f = x \mapsto f(f(f(x)))}$ (we read $\circ$ as 'of' or 'compose' meaning to take the output of the second function as the input of the first, $f$ of $f$) as merely $f \circ f \circ f$ and no more, since there is nothing to be done with the expression, and thus choose to read $f \circ f \circ f$ as $3$. However the idea to define natural numbers as a starting number followed by an operation to count forward is attributed to Giuseppe Peano, and typically considered a presentation of the Peano Axioms, although in his day the language of type theory was in its cradle with Bertrand Russel, the commonly referred to originator of type theory of Russel's Paradox fame. In truth, its origins stretch further back than even Russel's predecessor on the topic, Gottlob Frege, and has been a gradual progression in the development of logic. The school of thought we draw on in this section is downstream of that of the Intuitionists, in particular Martin-Löf and his various colleagues and successors.
This framework has certain powers which we will appreciate soon. One often calls the kind of natural numbers we have defined an inductive type, and it enables a suite of logical tools for which the famous 'proof by induction' is only one. However there is a general mental rule here that sums up its powers up here quite well: once a type has been defined as nothing but a collection of constructors, anything we want to do with that type need only be defined in terms of those few constructors. This introduces powers such as proof by induction, and indeed a more general form of it which apply to all other inductive types, as well as some unfamiliar responsibilities.
That is, okay, we have our natural numbers and we have confined ourselves to a relatively rigid system, that of types, to do so. Can we do anything with these natural numbers, or do we have to drop this rather laborious system just after we have defined it and return to intuitive, poorly defined notions of arithmetic? The answer is of course we can do things with these natural numbers, however to do so, almost every operation we do will appear as a recursive function. Let me give you an example.
We define the function $\mathtt{add}\colon \mathbb{N} \to \mathbb{N} \to \mathbb{N}$, which we also write as $m + n$ for two natural numbers $m, n \colon \mathbb{N}$, by the following procedure: if $n = \mathbf{Z}$ then output $m + n = m$ immediately, but if $n = \mathbf{S}(p)$, where $p \colon \mathbb{N}$ is some other number, then we output the result of $\mathbf{S}(m) + p$. We write these rules by pattern matching on the different possible forms of $n$ which we are checking for, and so the same definition is written $$\begin{gather*} \mathtt{add}(m,\mathbf{Z}) = m, \\[0em] \mathtt{add}(m, \mathbf{S}(p)) = \mathtt{add}(\mathbf{S}(m),p). \end{gather*}$$
If you are so inclined, the above definition can be thought of as saying the following:
fn add(m: NaturalNumber, n: NaturalNumber) {
match n {
Z => m,
S(p) => add(S(m),p)
}
}
It will be valuable as we continue to remember that most of what we do can be likened directly to code that you may be familiar with.
What we have done then is to define addition in terms of the notion of counting upwards, just as we defined natural numbers in terms of counting upwards. Where a natural number can be thought of a tally of each successor function applied to a zero, we define addition to move each of these tally marks from the right hand side to the left hand side, until we are left with zero on the right and the sum on the left. Only then does our recursion terminate, with all 'plus one's moved to one number. Cumbersome as it may seem to define addition in such a way, the upshot is that now everything we could hope to do with a counting number is now almost exactly that hard, including some much more complicated operations.
I have slipped something by you however: we have not yet discussed formally what it means for a function to take more than one input, and instead I have merely presented one to you with the nebulous function type $\mathbb{N} \to \mathbb{N} \to \mathbb{N}$ with no explanation. This is not a notation that we will see in the rest of mathematics (in fact, in spite of all the sense it could make, mathematicians generally invent new notations just to not need to use this one), and is usually reserved for the purer functional programming languages such as Haskell, however it exposes certain questions and concepts that are valuable to us here. In a programming language, it is common to write $f(x,y,z)$ to use a function, and in many languages (and indeed to us now) it is reasonable to speak of $f$ as a function alone, but then what is $(x,y,z)$ exactly?
Most programming languages have a notion of a tuple related to this, a generalization of an ordered pair to as many terms as necessary, allowing one to combine objects into one without forcing them to interact or relate to one another; $x$ and $y$ could be different types entirely and this is fine. We also have a variety of similar notions in mathematics, chiefly the Cartesian Product which we invoke when we say $(2,3)$ is a coordinate on a 2D plane combined of two mere numbers. It will also be necessary for us to define the notion of a Product type, the type theory equivalent to the tuple which solves this problem by defining functions such as $\mathtt{add}$ as of type $(\mathbb{N},\mathbb{N}) \to \mathbb{N}$, however I pose to you that this is not, at this stage, necessary.
But how then will we pass a second input into a function, if not to force it to ride in a pair with the first input? The answer lies in how one should read $\mathbb{N} \to \mathbb{N} \to \mathbb{N}$. That is, what does the $\to$ operation do exactly? Let us for a moment compare it to our addition function, which also is written in the form $m + n$; then one might say that $\to$ is itself a function with two inputs, except that these inputs are not elements of a type, they are types, and it produces a function type. But if the function type it produces is to be thought of as new, then by the rules we have established, $\to$ is not merely a function but a constructor!
We define the symbol $\to$ to be a right-associative constructor of form $\mathtt{Type} \to \mathtt{Type} \to \mathtt{Type}$. It takes two types, $A, B \colon \mathtt{Type}$ and constructs the type of functions between them $A \to B$.
This does not resolve our problem of how to read types of form $A \to B \to C$ however; instead that is done by our insistence that the operation is defined right-associative. Consider, if $\to$ takes only two types and produces another type, then absent all other reasoning, you could immediately place brackets to attempt to make this form sensible, considering either $(A \to B) \to C$ or $A \to (B \to C)$. However, consider the form of the former: we know that $A \to B$ must be a function type, taking inputs of type $A$ and producing outputs of type $B$, which means a type $(A \to B) \to C$ takes functions of type $A \to B$ and somehow turns these functions into elements of $C$. A function is however, itself, a single value, and so we have not constructed a function of multiple inputs as desired. The latter, $A \to (B \to C)$ considers a function that takes an element of type $A$ and outputs a function $B \to C$; consider, this output which is a function could now have an input passed to it! This is the meaning of a right-associative operation, one where there are implicit brackets making the right-most operation be performed first. $$\begin{gather*} A \to B \to C \to D \to E \\[0em] = \\[0em] A \to (B \to (C \to (D \to E))) \end{gather*}$$
Right-associativity is perhaps poorly named, as it is distinctly non-associative. Usually in mathematics we only care about whether an operation is equivalent no matter where you put the brackets (associative, or equivalently right-associative and left-associative) or not (non-associative), and certainly in common mathematics almost every binary operation you have encountered is associative (addition, multiplication) at its core (since subtraction and division are inversions of addition and multiplication, they are fundamentally associative, but as written they are not, hence the common memes about disagreeing calcuators) with exception to the cross product. Here however, right-associativity of function types enables us to do something special.
It is of course possible merely to replace functions of form $A \to B \to C$ with those of form $(A,B) \to C$, but consider the latter, it requires that the inputs come as a pair prepared and packaged in one single input. Do inputs need to come prepared as a collection? If we write $5+$, is this object meaningless, or have we invented a function? The right-associative notation $\to$ would imply the latter, since $\mathbb{N} \to (\mathbb{N} \to \mathbb{N})$ is still a function with one input and one output, and with one input (five), that output will be of type $\mathbb{N} \to \mathbb{N}$, an operation that adds $5$ to its input. That is, a side effect of this notation is that it allows us to postpone this process of providing inputs; in the same way that a function exists without any inputs, and exists with all of its inputs as its final output, now it exists with only some of its inputs. This is called Currying for Haskell B. Curry, and is common in languages such as Haskell (sharing a namesake) although it existed in this form as early as Church's lambda calculus. Such a concept is exceedingly common in algebra under a different presentation (and different historical lineage), although mathematicians have relatively clumsy notation to do this at best and use it only very conservatively, an error on both counts in my opinion.
Since currying is so natural to the functional mode of operation, they dispense with the $f(x,y,z)$ notation of function application entirely and often write $f \hspace{1pt} x \hspace{1pt} y \hspace{1pt} z$, since then it is equally natural to write $f \hspace{1pt} x \hspace{1pt} y$ meaning the curried form defined $z \mapsto f \hspace{1pt} x \hspace{1pt} y \hspace{1pt} z$. In this frame, one provides an input $x$ to $f$, at which point we say that '$x$ has become an argument of $f$', and then when we later provide $y$, we have $(f \hspace{1pt} x) \colon y \mapsto (f \hspace{1pt} x \hspace{1pt} y)$ where $x$ has become a parameter. The distinction between these terms is thus: an argument is what is provided in the moment of a function's application, and a parameter is some value that exists fixed in an expression that defines a function's output. In our example, $x$ appears as a parameter at the time that $y$ is an argument, due to our currying; if instead we had $(x,y,z) \mapsto f(x,y,z)$ then we might call each of $x,y,z$ arguments at the same time. Equally, you would say that a function $x \mapsto x + 1$ has as a parameter $1$, since one could imagine this is also the function $(+1)$ curried. This distinction might seem moot now, but later we will see that while the notion of an argument is a formalism, the notion of a parameter is deeply tied up with scientific notions of independent and dependent variables, chiefly as those held fixed in an experiment.
We are now ready to use type theory for its principle purpose in this section, bringing home multiple threads we have left open in the previous subsection. First, let us discuss basic logic so we can in some sense define our conceptual targets, what our theory of types must model in order to describe logic.
We have notions of true and false, which we denote $\top$ and $\bot$ respectively, and we have, famously three atomic logical operations we can do with them: $\mathtt{AND}$, $\mathtt{OR}$, and $\mathtt{NOT}$. In many programming languages, these are written $\mathtt{\&\&}$, $\mathtt{||}$ and $\mathtt{!}$, however in formal logic they are written $\wedge$, $\vee$ and $\neg$. Then if we have two propositions $P,Q$ (you can think of these as representing something like "the ground is wet" or "I played basketball") then famously, their rules are:
$P \wedge Q = \top$ only if both $P= \top$ and $Q = \top$, otherwise $P \wedge Q = \bot$.
$P \vee Q = \top$ if either $P = \top$ or $Q = \top$ otherwise $P \vee Q = \bot$ (or equivalently $P \vee Q = \bot$ if $P = \bot$ and $Q = \bot$ otherwise $P \vee Q = \top$)
$\neg P = \bot$ if $P = \top $ and $\neg P = \top$ if $P = \bot$
In order to define propositions as types, we will need to consider, just like the counting numbers type $\mathbb{N}$ is populated with the counting numbers themselves, what is it that populates a proposition? We call these elements of the proposition type 'witnesses' or 'evidence'; you might think of the members of the proposition type "is the ground wet" as testimony that "I went outside and touched the grass and it made my hand wet", however a proposition's witness is never considered in doubt unlike real witnesses.
If we think of propositions as types which are populated (or not populated if they are false and thus have no witnesses) then we know some things about what our logical operations have to be. $\mathtt{AND}$ must create a new type which has witnesses only when both of its argument types have witnesses, $\mathtt{OR}$ must have the witnesses of both types so that it is true if either is true, and $\mathtt{NOT}$ must be a type that somehow has witnesses when its argument type has no witnesses. In fact, we have actually already discussed one of these!
For two types $A \colon \mathtt{Type}$ and $B \colon \mathtt{Type}$, define their product type denoted $A \times B$, to be a new type with the right-associative constructor $\mathtt{,} \hspace{1pt}\colon A \to B \to A \times B$, the comma. In this way, elements $a \colon A$ and $b \colon B$ form an element $(a,b) \colon A \times B$. We also define the projection operators $\mathtt{pr}_1 \colon A \times B \to A$ and $\mathtt{pr}_2 \colon A \times B \to B$ which are $\mathtt{pr}_1 (a,b) = a$ and $\mathtt{pr}_2 (a,b) = b$.
This notation introduces an ambiguity, since we do not literally mean every comma we use in a formal context to be an element of the product type, but it has a value. That is, as a right-associative constructor, we have not only defined pair types but also triples, etc. since any $(a,b,c,d)$ is to be read as nested pairs $(a,(b,(c,d)))$. To avoid confusion however, we use brackets where possible.
The product type also allows us to now formalize what we mean when we say curried and uncurried functions are equivalent. That is, we can always turn a function $f \colon A \times B \to C$ into a function $g\colon A \to B \to C$ and vice versa, because we can set $f \hspace{0.2em} (a,b) = g \hspace{0.2em} a \hspace{0.2em} b$ or $g \hspace{0.2em} a \hspace{0.2em} b = f \hspace{0.2em} (x,y)$ respectively.
Although it may not be obvious, the product type is exactly our $\mathtt{AND}$ operation, since to form an element of $P \times Q$ we require some $p \colon P$ and $q \colon Q$ to form $(p,q) \colon P \times Q$. For example, let $P$ be the proposition that the ground is wet with witnesses which are days that the ground was wet, and $Q$ be "I played basketball" with witnesses which are days I played basketball. Then $P \times Q$ is the type of pairs of a day the ground was wet together with a day I played basketball. If there was a day the ground was wet but not a day where I played basketball, such a pair cannot be formed. This is what gives it its logical 'and' structure; if either fails to have a witness, we will be left stuck in producing such a pair. We do not have an element of the product type $A \times B$ with merely $(a,)$, even now that we can curry. Although, knowing what we do about currying, it will also sometimes be appropriate to read a function $P \to Q \to R$ as not just 'if $P$ then if $Q$ then $R$' but 'if $P$ and $Q$ then $R$'.
Using this as inspiration, we can immediately figure out that the analogous $\mathtt{OR}$ operation will also have a more general use, that is, injecting elements of two types into the one.
For two types $A, B \colon \mathtt{Type}$, define their sum type denoted $A + B$, to be the new type with constructors $l\colon A \to A + B$ and $r \colon B \to A + B$ (often called 'in left' and 'in right').
These constructors, like all constructors, have no additional structure other than that stated. $l$ will take any $a \colon A$ and merely dump it in $A + B$, just as $r$ will, just as $\mathbf{S}$ would take any natural number and count it up. This is the general case of the example we discussed earlier in which a function has an input type defined of 'either numbers or letters'. In this way, if we have even one of $p \colon P$ or $q \colon Q$, we will populate $P + Q$; we could just as well have $Q$ be a type that is empty of witnesses and we will still have $l \hspace{1pt} p \colon P + Q$ as witness. One could imagine this type as a hat; we would write down the days the ground was wet on pieces of paper, likewise with the days we played basketball, and throw them into the hat, forgetting which was which. Only on noticing that the hat is indeed not empty (i.e. it was wet or we played basketball on at least one day) that we may pull out the piece of paper and distinguish which type it originated from (via. was it $l$ or $r$?). This is our logical $\mathtt{OR}$.
Before we proceed to $\mathtt{NOT}$, we must notice two things. First, that there is another kind of logical operation, one that does not appear in programming in quite so obvious a fashion, that is of great relevance to us. That is, we have been ignoring logical implication the entire time. This is perhaps because it has been obvious to us the whole time as well. Our logical implication operator, 'if $P$ then $Q$', is the function type $\to$. When we say $P \to Q$ we mean to say that witness of one is witness of the other. For instance, if we use our ongoing example about wet ground and playing basketball, we could say that a witness $f \colon P \to Q$ is a witness that says "we play basketball whenever the ground is wet". Thus, on providing a day that the ground was wet, we could clearly say "ah, I remember now, we played basketball on that day when it was wet".
Second, we must think more about what it means for a proposition to be false. The primary thing we are trying to manouver around, given that a proposition type's members are witnesses, is the case where a type has no witnesses and is thus false. This prompts us to codify something we had not said earlier, which is the following:
Define the type $\top$ to have the constructor $* \colon \top$ which takes no arguments, and the type $\bot$ to be the type with no constructors. We call these the unit type and the empty type. We may also at times call these the true type and false type.
Our archetypal false type is explicitly the type that is empty, identified with all other types when they are empty, and thus we can say things like $P \times Q = \bot$ when $Q = \bot$. It is not necessarily so simple when a proposition is true however, since there are often multiple witnesses to a true proposition, and the unit proposition only has one constructor. Regardless, with this codified, we may now ask a question: what does $\to$ do to an empty type, i.e. what does it mean to imply false or derive from it? Let us examine this as removed from analogies for a moment.
Consider our responsibilities when defining a function: a function is a map which assigns to every valid input a valid output. So for a type $A$, the function type $\bot \to A$ is populated with what exactly? There are no valid inputs, and so we say that there exists immediately a trivial function $f \colon \bot \to A$. It is common to be confused when seeing this, but consider that from what we know about functions, we have no grounds with which to argue $f$ is not a function; it does its job as a function, since in this case it has no job to do. There simply are no inputs to which we must find outputs. This has another consequence, which is that for all types $P$, the trivial function $\bot \to P$ will exist even if $P$ is a false proposition with no witnesses; again, the space of outputs does not matter since there are no inputs. In fact, what we have described in a trivial function is the type theory version of the 'principle of explosion', often stated as 'if false then anything', with the trivial function as its witness.
The reverse is equally interesting: for a type $A$, just as $\bot \to A$ is always populated with the trivial function, $A \to \bot$ is never populated so long as $A$ is populated. In this case, we have inputs for which we must find valid outputs, but no such outputs exist. The only way we can short circuit this is if indeed we are discussing $\bot \to \bot$, in which case there is still the trivial function. In fact, this function-type is our $\mathtt{NOT}$ operator.
For a type $A\colon \mathtt{Type}$, define the function $\neg \colon \mathtt{Type} \to \mathtt{Type}$ by $$\begin{gather*} \neg A := (A \to \bot). \end{gather*}$$
Unlike our other logical operations, strictly speaking we have not defined a new type here; that was done back when we formally discussed $\to$. This is merely a function on types, in fact it is a shorthand for the curried $(\to \bot)$, rather than a constructor. It works as logical negation, as discussed earlier, since $\neg P$ is populated with the trivial function only when $P = \bot$ since if we have a witness $p \colon P$, we will have no outputs to assign for a function $f \colon P \to \bot$, thus falsifiying the function and leaving $P \to \bot$ empty. Returning to our example about days the ground was wet and days we played basketball, let $Q$ be days we played basketball, but say that the proposition type is empty, i.e. we never played basketball, and thus $Q = \bot$. We can still construct $f \colon P \to Q$, or $f \colon P \to \bot$ rather, still saying "we played basketball on every day the ground was wet" just so long as there were no days the ground was wet.
This now makes it possible to discuss certain basic deductions of logic, the most basic theorems, and see that these will take the form of functions. It is my intention that above all else, what you take away from this section is the ability to read a mathematical theorem as a kind of function, and so we have a preliminary example.
Let $P,Q \colon \mathtt{Type}$ be proposition types. Then "if $P$ or $Q$, then if not $P$, then $Q$", or $$\begin{gather*} P + Q \to \neg P \to Q \end{gather*}$$ i.e. "if the ground was wet or we played basketball, then if the ground was not wet, then we played basketball".
This will be our first ever proof and so I would like to note and emphasize some things. First, for the love of god, please do not let the proofs frighten you. In a proof, we merely step through a series of reasoned logical statements to deduce a conclusion. If you are following with the background of a programmer, then you should think of all proofs in this section as programs; indeed all of them can be written verbatim as code in at least some language, and many of them can be written in rust.
The following proof is such a case, and so if matching rust code will make this make sense to you, please see the code below the proof. In either case, this proof most of all is to be regarded as merely a program which we take cares to explain the writing of.
Let $f$ denote the witness of type $P + Q \to \neg P \to Q$ that we construct. It has as its input a member of $P + Q$ and outputs a function of type $\neg P \to Q$, or $(P \to \bot) \to Q$. Since $P + Q$ has two constructors, let us define $f$ for both: if the constructor passed to $f$ is $r \colon Q \to P + Q$ then it must have been passed $q \colon Q$ as $r \hspace{2pt} q$. In that case, we have our witness of $Q$ and may pass it directly, defining $$\begin{gather*} f \hspace{2pt} (r \hspace{2pt} q) = (g \mapsto q) \end{gather*}$$ where $g\colon \neg P$ is the evidence that $P = \bot$.
Perhaps counter-intuitively, we must now consider the case where $f$ is given as its input $l\hspace{2pt} p$ for some $p \colon P$ and then some $g \colon P \to \bot$ as well, the self contradicting case where both $P$ and $\neg P$ are true, which we call the counterfactual (meaning roughly 'the case that was not true'). That is, we are above all defining a function, and so no matter our intuitions about logic, we must put them aside and say what the function does. This is not a problem for us however.
The $\bot$ type is empty, however since we are in a case that should never happen, we now have evidence of false and can prove anything we like, including our proof goal. We write this as $$\begin{gather*} f \hspace{2pt} (l \hspace{2pt} p) \hspace{2pt} g = h \hspace{2pt} (g \hspace{2pt} p) \end{gather*}$$ where $h \colon \bot \to Q$ is the trivial function. This can be interpretted as "If $P$ or $Q$ via $P$, and $P$ implies false, then since we know $P$, imply false. Since false implies anything, imply $Q$". In the process of this we have momentarily allowed there to be a fictitious element of the empty type, as is natural in the counterfactual case.
Then for any $P + Q$ we have constructed $f \colon P + Q \to \neg P \to Q$.
\qed
To write this code in rust, we'll need a way to substantiate our notions of a sum type and a negation type. The sum type will be the easy one here.
enum SumType<S,T> {
inl(S),
inr(T)
}
with $\mathtt{inl}$ and $\mathtt{inr}$ corresponding to our $l$ and $r$ constructors. The negation type is a little trickier but we can write something rusty enough to follow, although don't expect it to compile. In rust, we have a type called the never type, represented with an exclamation mark $\mathtt{!}$. This type is actually useful for exactly the same thing we use it for here, that is, it is used when we want to guarentee that its type is never the output. If, for instance, we need to pass a Result<Q,P> then setting P = ! means that our result type will never be an $\mathtt{Err}$ constructor.
For our purposes, we'll imagine that our fantasy land rust compiler allows functions fn(p: P) -> ! to exist when P == !, as evidence that it is true. Similarly, we'll imagine that we can write functions from the empty type without specifying how they work since they are unreachable code anyway.
Then our proof is merely the following:
fn trivial_function(x: !) -> Q { match x {} }
fn theorem<P,Q>(
s: SumType<P,Q>,
g: impl Fn(p: P) -> !
) -> Q
{
match s {
inl(p) => trivial_function(g(p)),
inr(q) => q
}
}
Notice here that the unreachable code in trivial_function is never executed despite being used in the function. This is because the function g is evidence that P is empty, and thus we can be certain that no inl(p) is ever given; the entire branch is unreachable code.
In fact this is exactly what we have written above. Of course this program, while helpful for writing and giving structure to the proof, does not give us a corresponding notion of why the proof is true. This is precisely why the proof is not written in two lines, although of course we see it could be in theory.
Part of the power of using type theory is that, as discussed before but more obvious now, a function (and thus a theorem) is defined so long as one can account only for every constructor the function may receive. This case is simple, asking what constructors $P + Q$ could have and then taking any member of $\neg P$ absent a question about which constructor it is exactly, however as we will see going forward, theorems of arbitrary complexity will yield to this kind of case analysis since we are merely fulfilling our obligation to provide an output for every input. Thus the question 'which witnesses did we receive?' is merely to divide our sets of inputs/witnesses/circumstances into classes which may individually have easier proofs.
Before moving on, note that we have introduced a split in the notation of constructive mathematics and classical mathematics: in type theory, propositions and their operations exist at the same level as objects, and use the same tools. As our intention here is to proceed, after this section, to classical mathematics, it would be better practice to give our operations appropriate aliases so that we are not confused in future.
When they act on propositions, define the following symbols as aliases for their non-propositional counterparts:
$\wedge := \times \hspace{0.5em}$ read as 'and',
$\vee := + \hspace{0.5em}$ read as 'or',
$\Rightarrow \hspace{4pt} := \hspace{4pt} \to \hspace{0.5em}$ read as 'implies', e.g. $P \Rightarrow Q$ is the type of functions taking witnesses $p \colon P$ and producing some witness $q \colon Q$ , along with $(P \Leftarrow Q) = (Q \rightarrow P)$,
$(P \Leftrightarrow Q) := (P \to Q) \times (Q \to P)$, that is, we say $P$ 'if and only if' $Q$ when we can find as witness a function $P \to Q$ as well as a function $Q \to P$ so that both propositions imply one another.
We have of course not yet given an example of a real proposition type with its own constructors yet, and the reason for that is this: while we have the tools to discuss a proposition type now, we don't have the tools to discuss whole classes of propositions, and thus to deduce anything that could actually have generality or be useful. The unit type and empty type are technically examples of propositions, and yet they do not say anything. What we need, more than the ability to discuss witnesses or implications about the ground being wet or having played basketball, is a reliable way to discuss witnesses which specify which day the ground was wet or which day we played basketball. It is not on its own useful to deduce that that because the ground was once wet ever that we did play basketball, we need tools that allow us to discuss propositions like "we play basketball the day after it rains".
Since we are preparing to speak about mathematics, most often of all, the things we will have to say with a proposition is to speak of equality, this equals that. And yet equality is not useful if we must manually define a new type for every pair of items we want to claim are equal. We can of course make functions between types, as seen by $\to$, but, consider for equality of natural numbers, we need something of the form $\mathbb{N} \to \mathbb{N} \to \mathtt{Type}$. And yet even in that case, we will certainly need constructors that are somehow dependent on the arguments, lest we populate a proposition type that says any random pair of numbers is equal.
Let us think then about what properties equality has that could found its type. In mathematics, we say that a 'relation' in general describes an equivalence when it has three properties: every thing is equal to itself (reflexivity), an equality is the same backwards as it is forwards (symmetry), and equalities can be chained together to make new equalities (transitivity). A broader discussion of equivalence relations will appear in the following section, however for now, one of these does indeed look like a constructor.
If $A \colon \mathtt{Type}$ and $a,b \colon A$, then define by $a \equiv b$, read as '$a$ equals $b$', a new type with constructor $$\begin{gather*} \mathtt{refl} \colon (a \colon A) \to (a \equiv a). \end{gather*}$$
There are a number of peculiar things about this type, least of all how to interpret what it means. First, to avoid confusion, we have used the symbol $\equiv$ rather than $=$ in order to separate the rules of $\equiv$ from out existing notions of equivalence; instead we will deduce them. Second, its constructor doesn't look like a function in the way we are familiar, since now the type of output depends on the individual input rather than merely its type, i.e. $(\mathtt{refl} \hspace{3pt} a) \colon (a \equiv a)$ and $(\mathtt{refl} \hspace{3pt} b) \colon (b \equiv b)$ are members of separate, mutually incomparable types. The constructor we've given this represents the expectation that equality relations are reflexive, so what we've really said is 'in the case where $b$ is both equal to $a$ and literally the symbol $a$, $a \equiv b$ has constructor $\mathtt{refl} \hspace{2pt} a$'.
But we cannot use this yet; the same issue that constructors for this type have, a bizarre dependency of their output type on their input element, is also necessary for any theorem that speaks about equality. Otherwise, more than just having to make each equality type one at a time, we'd have to define a new version of each theorem for each equality type used. We must generalize this concept of dependent function types.
Let $A \colon \mathtt{Type}$ and let $B$ be some family of types, a collection of types each with roughly the same constructors which each use some $a \colon A$ as a parameter. In that case, we write $B \colon A \to \mathtt{Type}$ and define the $\Pi$-type as the dependent function type, written $$\begin{gather*} \prod_{x : A} B(x) := (x \colon A) \to B(x) \end{gather*}$$ where $\Pi$ is the capital greek letter 'pi'. In practice, let us also write what is on the right hand side of this definition to denote the dependent function type.
This construction allows us to do something we really have not been able to before. Before, for instance, we could speak of a function which takes letters of the alphabet to words that start with that letter, like $a \mapsto \text{Apple}$ or $b \mapsto \text{Banana}$, and in that way, functions have always depended on their inputs. But now, we can speak of a function that maps $a$ to a particular apple in the type $\mathtt{Apple}$ or $b$ to a particular banana in the type $\mathtt{Banana}$. It is called a dependent function precisely because the type of the output now changes depending on the particular input given. To map that onto the formalization above, we might say that $A$ is the type of alphabetic letters, and $B$ is a family of types of foods which each start with a different letter of the alphabet.
This may seem abstract, but in fact the reason we are discussing it in the first place is because it concretizes a much more natural concept; when $P \colon \mathbb{N} \to \mathtt{Type}$ is a family of propositions depending on a natural number, we can speak of $n \colon \mathbb{N}$ where $P(n)$ is true; this is the type theoretic form of a condition on $n$. This allows us to do two things very interesting things with respect to functions. We can speak of a function of type $(n \colon \mathbb{N}) \to P(n) \to Q(n)$ which, since it takes a witness of $P(n)$ as an argument, is a statement of the form 'when $P$ is true for $n$ then $Q$ is true for $n$'. If instead we have a function of form $(n \colon \mathbb{N}) \to P(n)$ then we have promised that we can reliably produce a witness that condition $P$ is true for any $n$ we give; we have said 'for all $n \colon \mathbb{N}$, $P(n)$ is true'.
In effect, we have gained the ability to speak about propositions in generalities. To say that 'in an entire class of circumstances, it remains true that...'. This concept is so ubiquitous in mathematics that there will be very few sections of this text which do not refer to it literally or implicitly.
When $A\colon \mathtt{Type}$ and $P\colon A \to \mathtt{Type}$ is a family of proposition types, define the notation $$\begin{gather*} \forall x : A, P(x) := \prod_{x : A} P(x) \end{gather*}$$ as the propositional logic version of the $\Pi$-type, reading the symbol $\forall$ as "for all". It is also called in some settings the "universal quantifier".
Here, the deeper value of the type theory metaphor for propositional logic begins to show itself. In mathematics, we tend to use $\Pi$ in a similar way as above to denote large repetative multiplications, which we sometimes call 'product notation'. This is in fact entirely apt as a choice of notation against the backdrop of our other names. Consider what it means to have a function with a dependent output type: we require of a function that for every input there is an output, and so, what it means to have found a witness for a dependent function proposition type is that we have a member of every type in the type family $P\colon A \to \mathtt{Type}$.
Said another way, imagine a type family dependent on the unit type, $Q \colon \top \to \mathtt{Type}$. Since $\top$ only has one element, the $*$ element, there is only one type $Q(*)$ in the type family, and so $\prod_{* : \top} \hspace{0.05em} Q(*)$ is still a type of functions, but with no real dependency left. Nonetheless, an element $f \colon (* : \top) \to Q(*)$ still selects an element of $Q(*)$ to be its output, $q = f(*)$. Then the data, the information that could be said to be contained in $f$, the choices required to define it, lie only in the choice of $f(*) \colon Q(*)$. What we have done in finding a member $f \colon \prod_{* : \top} \hspace{0.05em} Q(*)$ is merely to find a member $q \colon Q(*)$. Now consider, for a type, let's call it $\mathtt{Bool}$ with only two members $\mathtt{on}\colon \mathtt{Bool}$ and $\mathtt{off} \colon \mathtt{Bool}$; then the dependent type $\prod_{b : \mathtt{Bool}} P(b)$ contains functions that still assign an output to every input, and so it finds a witness for both $P(\mathtt{on})$ and $P(\mathtt{off})$. The data required to define a function $f \colon \prod_{b : \mathtt{Bool}} P(b)$ is then contained in each element of $P(\mathtt{on}) \times P(\mathtt{off})$, since each $(p_{\mathtt{on}},p_{\mathtt{off}}) \colon P(\mathtt{on}) \times P(\mathtt{off})$ corresponds to a possible $f$ which is defined $f(\mathtt{on}) = p_{\mathtt{on}}$ and $f(\mathtt{off}) = p_{\mathtt{off}}$. If extended, the conclusion one reaches is that $\forall (n \colon \mathbb{N}), P(n)$ is to be read as $P(0) \times P(1) \times P(2) \times ...$, i.e. "for all $n$, we have $P(n)$" is an identical statement to "$P(0)$ and $P(1)$ and $P(2)$ and ...". In fact, this interpretation is ostensibly correct, and the intuitions garnered from it generalize.
Holding that thought for a moment, we must first complete our suite of tools. If we are to think of the $\Pi$-type as generalizing the notion of a product type to entire families of types, then can we generalize the sum type to an entire family of types? Let us think for a moment: if a dependent function generalizes the product type since there must exist an output in each type of the type family to correspond to an input, then to generalize the sum type means to say there exists at least one type with a witness amongst an entire family of types. The tool we are looking for is a dependent pair type.
Let $A \colon \mathtt{Type}$ and $B \colon A \to \mathtt{Type}$ be a family of types, each which take a member $a \colon A$ as a parameter. We define the $\Sigma$-type as the dependent pair type, written $$\begin{gather*} \sum_{x : A} B(x), \end{gather*}$$ where $\Sigma$ is the capital greek 'sigma', to be the type of all pairs $(a,b)$ where $a\colon A$ and $b \colon B(a)$, with the type of the second term depending on the individual choice of element of the first in $A$. Analogously to the product type, we have the comma constructor $$\begin{gather*} , \colon (a \colon A) \to B(a) \to \sum_{x : A} B(x). \end{gather*}$$
It is of course possible to set $B \colon A \to \mathtt{Type}$ to be some constant family, say, a type $C \colon \mathtt{Type}$, in which case the $\Sigma$-type trivializes back down to the standard pair, $A \times C$. This is important however; as we discussed earlier about currying, a pair argument $(a,b)$ for a function $A \times B \to C$ is the same as a curried function $A \to B \to C$. But then we defined dependent functions that could be $(a \colon A) \to P(a) \to Q(a)$; can we uncurry those? With the dependent pair, we can try writing $\sum_{a : A} P(a) \to Q(a)$ but this would be very bad practice. The natural currying of dependent functions allows us to speak of $(a \colon A) \to P(a) \to Q(a)$ as a dependent function where indeed the family of types $A \to \mathtt{Type}$ is $P(a) \to Q(a)$, or $\prod_{a : A} \big(P(a) \to Q(a)\big)$ with the entire function type as dependent on $a$; if we hide this $a$ in a dependent pair, it is not reasonable to say that it suddenly starts behaving as a dependent function argument later.
The principle is however, sound in a sense. It is reasonable to desire that in the way $P \to Q \to R$ can be read as 'if $P$ and $Q$ then $R$' via currying, that we might use dependent pairs to speak of $\sum_{n : \mathbb{N}} P(n)$ as 'we have a number $n$ and we know that $P(n)$'. In fact, if we have $(a \colon A) \to P(a) \to Q$ where $Q$ is merely a proposition rather than a family of propositions, we can in fact uncurry this to 'if we have a number $n$ and $P(n)$ is true then $Q$'. This should be peculiar in a sense; if $Q$ does not depend on a choice of $n$, then we are saying that $Q$ is implied true merely because there exists a $n \colon \mathbb{N}$ which satisfies the condition $P$, without care of which number $n$ is. This an extremely useful concept.
In propositional logic, the $\Sigma$-type $\sum_{x : A} B(x)$ describes the quantifier 'exists', the same one as in statements like "for all $n$, there exists a prime number $p$ which is greater than $n$". In fact, pending a definition of $>$ (greater than) and letting $P(n)$ be the proposition that $n$ is prime, we could now describe that proposition literally by the type $\prod_{n : \mathbb{N}} \sum_{p : \mathbb{N}} P(p) \wedge (p > n)$. Such a pattern of pairing 'for all' with 'there exists' is a common one, however, it is usually written with the following notation.
When $A\colon \mathtt{Type}$ and $P\colon A \to \mathtt{Type}$ is a family of proposition types, define the notation $$\begin{gather*} \exists x : A, P(x) := \sum_{x : A} P(x) \end{gather*}$$ as the propositional logic version of the $\Pi$-type, reading the symbol $\exists$ as "there exists" and the comma as "such that". When there exists exactly one $a \colon A$ with a populated $P(a)$ type, we say that $\exists ! x \colon A, P(x)$, reading $\exists !$ as "there exists uniquely". $\exists$ is also called in some settings the "existential quantifier".
It should also be mentioned at least briefly that although in type theory, all objects must belong only to one type, dependent pairs give us the closest thing we have to a subtype. Consider for example, the type $$\begin{gather*} \sum_{n : \mathbb{N}} \exists (m \colon \mathbb{N}) \hspace{0.2em} (n \equiv m + m) \end{gather*}$$ The use of $\Sigma$ here together with $\exists$ is a purely notational choice (indeed, the symbols are intended to be equivalent as just defined) but we write it this way because we are in fact interested in the particular $n \colon \mathbb{N}$. Then in fact all $n \colon \mathbb{N}$ for which there exists a pair of type $\exists (m \colon \mathbb{N}) (n \equiv m + m)$ must be the even numbers, since we will be unable to form such pairs if $n$ is odd, and thus no $m$ with $2m = n$ exists. That is, our closest notion of a subtype of all $a \colon A$ that satisfy a proposition family $P\colon A \to \mathtt{Type}$ is exactly the dependent pairs of $a \colon A$ together with the evidence that they satisfy $P$.
Together, the operations we have described above show how this dependent type theory, one that exists in computer programming languages with operations and objects $\to, \times, +, \top, \bot, \Pi, \Sigma$, has a one-to-one mapping with the logical operations used in mathematics, $\Rightarrow, \wedge, \vee$, true, false, $\forall, \exists$. This is a consequence of the Curry-Howard correspondance, often stated simply as "proofs are programs".
The whole reason we are discussing type theory is to recontextualise what it means to prove a theorem; what we do in pure mathematics often has less in common with the deductions of a detective eliminating possible stories and more in common with a programming problem. That is, although we are certainly making deductions, the manner in which one does this in every day life can sometimes lead one to think logical statements can be weakened or strengthened with synonyms or by persuading oneself with some argumentative sleight of hand. Although for much of this text we will write proofs in the form of worded deductions, those deductions are written each as the application of some function that transforms a logical context.
To that end, it helps to see propositions as not merely statements to be read as true or false, and theorems as statements which are conditionally true, but that propositions have witnesses, and those theorems act on those witnesses and are incomplete if every single hypothesis is not proven. Moreover, we do not accept a statement merely because it seems like it should be true, but because we have a transformation of existing witnesses to propositions that produces a new witness that the statement is true. With addition to some core axioms, we hold ourselves to this high a standard in mathematics.
This also means that everything you think you know to be true about the typical mathematical objects, rules like $a + (b + c) = (a + b) + c$ are, if not assumed, propositions that have explicit witnesses. It is no longer enough to expect that things should be true, or ought to be true of mathematical objects or objects which we intend to describe things in the real world. If they are true, then they have witnesses. This will expose certain things about how the mathematical mode of thought differs from the logical one, or rather, that many things one believes to be true about mathematics are not necessarily true but are in fact distinct choices we make about the system of logic we work in.
Let us begin with some proofs of familiar concepts.
Let $A \colon \mathtt{Type}$ and $a, b ,c \colon A$. If $a \equiv b$ and $b \equiv c$ then $a \equiv c$, or $$\begin{gather*} \forall (a,b,c) \colon A \times A \times A, \hspace{0.4em} (a \equiv b) \Rightarrow (b \equiv c) \Rightarrow (a \equiv c) \end{gather*}$$ has a witness.
Let us digress briefly to discuss how the statements of theorems correspond to their propositional counter parts. In mathematics we will often see statements of this form, in particular 'Let $a,b,c$ be of type $A$. If $a \equiv b$ and $b \equiv c$ then $a \equiv c$'. Obviously the proposition 'if $a \equiv b$ and $b \equiv c$ then $a \equiv c$' doesn't make a lot of sense on its own since we must specify what $a,b,c$ we are talking about, and this is done by 'let $a,b,c$' and interpretted as $\forall (a,b,c) \colon A \times A \times A$.
But since most theorems do not have their propositional forms written symbolically, we must be able to read which parts of the theorems are contextualised and which are not. For instance, when we say $a,b,c \colon A$, we do this only after we have said 'let $A \colon \mathtt{Type}$'. Perhaps then it would be more appropriate to first write $\forall (A \colon \mathtt{Type})$ but this would be superfluous since we can't even state the theorem without a choice of type in the first place. Similarly, we may then only write the theorem as $(a \equiv b) \Rightarrow (b \equiv c) \Rightarrow (a \equiv c)$ since we have already contextualised that $a,b,c$ must be of the same type.
It is common in mathematics to drop conditions from statements of the form 'if ... then ...' and instead propose them first as statements 'let ...'; since our theorem can only be applied in a relevant context anyway, our let statements turn arguments into contextual requirements for purely notational clarity. While there exists a broader study of logical context, its rules and manipulations (and if you are interested in this, look into the turnstile), this study exists to split more hairs than even dependent type theory. Contextual manipulations are, as far as we are concerned, equivalent to additional 'for all' arguments.
We are to define a dependent function $f$ which takes a triple $(a,b,c) \colon A \times A \times A$ and then a witness of $a \equiv b$ and of $b \equiv c$. Presuming we are supplied an appropriate triple, we then need to define $$\begin{gather*} f(a,b,c) \colon (a \equiv b) \to (b \equiv c) \to (a \equiv c). \end{gather*}$$ Presume also then that we are given a witness $p \colon (a \equiv b)$. Now we will pattern match on the remaining argument, that is, since there is only one constructor for the equality type, we know that an argument of type $b \equiv c$ will be supplied as the witness $\mathtt{refl} \hspace{0.2em} b \colon (b \equiv b)$; this is only as appropriate, since equality types only have constructors, and thus are populated, when the left hand side of the equality is equal to the right side. And if it were not so, then no witness would be supplied at all (this is indeed a power of the inductive type). In this way, by merely asking what constructor of $b \equiv c$ would be supplied, we have immediately shown that what we are proving could only be the following$$\begin{gather*} f(a,b,b) \colon (a \equiv b) \to (b \equiv b) \to (a \equiv b). \end{gather*}$$ This is substantially easier job, since we may define $f$ as $$\begin{gather*} f\hspace{0.2em} (a,b,c) \hspace{0.3em} p \hspace{0.3em} (\mathtt{refl} \hspace{0.2em} b) = p \end{gather*}$$ since, by requiring a witness of $b \equiv c$ and thus ensuring $c$ can only literally be $b$, our final output must be a witness $a \equiv b$ just as our first witness $p \colon (a \equiv b)$.
\qed
The mode of reasoning in this proof also applies immediately to certain things about numbers. For instance, the idea that $m \equiv n$ implies $m + 1 \equiv n + 1$ is immediate in this manner: a function $$\begin{gather*} \forall (m, n) \colon \mathbb{N} \times \mathbb{N}, (m \equiv n) \Rightarrow (\mathbf{S}(m) \equiv \mathbf{S}(n)) \end{gather*}$$ has its witness easily since in the moment that we pattern match on $m \equiv n$, we see that the only constructor could have been $\mathtt{refl} \hspace{0.3em} n$, and thus the case we are proving is exactly $$\begin{gather*} (n \equiv n) \Rightarrow \big(\mathbf{S}(n) \equiv \mathbf{S}(n) \big) \end{gather*}$$ with witness given trivially by $\mathtt{refl} \hspace{0.3em} (\mathbf{S}(n))$. This is the extended utility of the inductive type. The mere fact of saying that a function is defined if we know what output to assign to every case of constructor inputs is equivalent also to eliminating the cases where our prerequisite propositions were untrue. In this instance, the only way we could say the relatively trivial solution $\mathtt{refl} \hspace{0.3em} (\mathbf{S}(n))$ fails to be a solution is if our final type were not $\mathbf{S}(m) \equiv \mathbf{S}(n)$. And yet it must be that type since evidence of $n \equiv m$ was supplied, and the only form that evidence could have been was $\mathtt{refl} \hspace{0.2em} n$, meaning $n \equiv m$ must have been $n \equiv n$.
But our system can do much more than just make things we knew intuitively to be true much more burdensome to prove, it can make things which are often difficult for students to comprehend become equally as concrete as those we knew to be obvious already. The famous 'proof by induction' is brought into crystal clarity by our technique, and in fact deduced.
Let $P\colon \mathbb{N} \to \mathtt{Type}$ be a type family. If $P(\mathbf{Z})$ has witness and there is a function $\forall (n : \mathbb{N}), P(n) \to P(\mathbf{S}(n))$, then we have proven $\forall (n : \mathbb{N}), P(n)$. That is, there is a witness for $$\begin{gather*} \forall (P\colon \mathbb{N} \to \mathtt{Type}), P(\mathbf{Z}) \Rightarrow \big(\forall (n : \mathbb{N}), P(n) \Rightarrow P(\mathbf{S}(n)) \big) \Rightarrow \forall (n \colon \mathbb{N}), P(n). \end{gather*}$$
To complete this proof, we will need to speak of two functions: $f$, the witness of the theorem (type omitted for length) and $g\colon \forall (n : \mathbb{N}), P(n) \Rightarrow P(\mathbf{S}(n))$, the function we require that can propogate a proof of $P(n)$ to its successor $P(\mathbf{S}(n))$. Although it may not be obvious by the statement of the theorem, by the right-associativity of function types (yes, even dependent ones), we will also be able to provide $n \colon \mathbb{N}$, defining our function $f$ by the final output of type $P(n)$ it produces.
Let us pattern match on $n$: it must either be $\mathbf{Z}$ or $\mathbf{S}(m)$ for some $m \colon \mathbb{N}$. In the first case, we are trying to produce a member of $P(n) = P(\mathbf{Z})$, for which an evidence is already provided as prerequisite. So we may define $$\begin{gather*} f \hspace{0.2em} P \hspace{0.2em} p \hspace{0.2em} g \hspace{0.3em} \mathbf{Z} = p \end{gather*}$$ where $p \colon P(\mathbf{Z})$ and the other arguments are defined above. In the case of $\mathbf{S}(m)$, we write $$\begin{gather*} f \hspace{0.2em} P \hspace{0.2em} p \hspace{0.2em} g \hspace{0.3em} \mathbf{S}(m) = g \hspace{0.2em} m \hspace{0.3em} (f \hspace{0.2em} P \hspace{0.2em} p \hspace{0.2em} g \hspace{0.2em} m) \end{gather*}$$ defining a recursive function just as we had done for addition. What we have done on the right hand side is this: since $g\colon \forall (n : \mathbb{N}), P(n) \Rightarrow P(\mathbf{S}(n))$, we provide it with $m \colon \mathbb{N}$ so that we are discussing $g(m) \colon P(m) \Rightarrow P(\mathbf{S}(m))$. To complete the proof, we need only hand $g(m)$ some proof of $P(m)$, which we hand off recursively to $f$ itself in its specialized form, $f \hspace{0.2em} P \hspace{0.2em} p \hspace{0.2em} g \hspace{0.2em} m \colon P(m)$ as we described above. In this way, we hand the obligation of actually proving something off through recursion until finally the tally marks are taken off and we are in fact proving $f \hspace{0.2em} P \hspace{0.2em} p \hspace{0.2em} g \hspace{0.3em} \mathbf{Z} \colon P(\mathbf{Z})$, which has witness $p$ as was provided.
\qed
We can use proof by induction to extend our earlier observation that $a \equiv b$ implies $\mathbf{S}(a) \equiv \mathbf{S}(b)$ to a general form that $b \equiv c$ implies $b + a \equiv c + a$ for all $a \colon \mathbb{N}$. That is, we find a witness of $$\begin{gather*} \forall a, \forall b, \forall c, (b \equiv c) \Rightarrow (b + a \equiv c + a) \end{gather*}$$ by setting it as our $P\colon \mathbb{N} \to \mathtt{Type}$ and performing induction on it. The case $a = \mathbf{Z}$ is covered immediately, recalling the inductive definition $\mathtt{add}(n, \mathbf{Z}) = n$, i.e. $P(\mathbf{Z})$ is in fact the type $\forall b, \forall c, (b \equiv c) \to (b \equiv c)$, proven trivially by passing the witness along. Next we need $\forall (n : \mathbb{N}), P(n) \to P(\mathbf{S}(n))$, specialized to our case as $$\begin{align*} \forall (n \colon \mathbb{N}), \hspace{1em} & \forall b, \forall c, (b \equiv c) \Rightarrow (b + n \equiv c + n) \\[0em] \Rightarrow \hspace{1em} & \forall b, \forall c, (b \equiv c) \Rightarrow \big(b + \mathbf{S}(n) \equiv c + \mathbf{S}(n) \big). \end{align*}$$ But again we may apply our inductive definition of addition $\mathtt{add}(m, \mathbf{S}(n)) = \mathtt{add}(\mathbf{S}(m),n)$, and so the output proposition for this is in fact $$\begin{gather*} \forall b, \forall c, (b \equiv c) \Rightarrow (\mathbf{S}(b) + n \equiv \mathbf{S}(c) + n). \end{gather*}$$ Now recall our observation that $(m \equiv n) \Rightarrow (\mathbf{S}(m) \equiv \mathbf{S}(n))$, which we write in our case with witness function $g\colon \forall b, \forall c, (b \equiv c) \Rightarrow (\mathbf{S}(b) \equiv \mathbf{S}(c))$. Since our inductive function gives us as hypothesis, the witness $h \colon \forall b, \forall c, (b \equiv c) \Rightarrow (b + n \equiv c + n) $ (i.e. this is the $P(n)$ from which we must show $P(\mathbf{S}(n))$, we can now specialize that as $\big(h \hspace{0.2em} (\mathbf{S}(b)) \hspace{0.2em} (\mathbf{S}(c)) \big) \colon \big(\mathbf{S}(b) \equiv \mathbf{S}(c) \big) \rightarrow \big(\mathbf{S}(b) + n \equiv \mathbf{S}(c) + n \big)$. We can now chain these together $$\begin{gather*} \Big( h \hspace{0.2em} (\mathbf{S}(b)) \hspace{0.2em} (\mathbf{S}(c)) \hspace{0.2em} (g \hspace{0.2em} b \hspace{0.2em} c) \Big) \colon (b \equiv c) \Rightarrow \big(\mathbf{S}(b) + n \equiv \mathbf{S}(c) + n \big) \end{gather*}$$ forming our inductive step. Then (and you should check this by looking at the proof by induction's form) we have satisfied the requirements for applying the theorem form of proof by induction, and we deduce $\forall a, \forall b, \forall c, (b \equiv c) \Rightarrow (b + a \equiv c + a)$.
If we use $f$ to denote the inductive rule then the full proof can be read monolithically as $$\begin{align*} f \hspace{0.2em} & \big( \forall a, \forall b, \forall c, (b \equiv c) \Rightarrow (b + a \equiv c + a) \big) \\[0em] & (b \mapsto c \mapsto p \mapsto p) \\[0em] & \big(b \mapsto c \mapsto h \hspace{0.2em} (\mathbf{S}(b)) \hspace{0.2em} (\mathbf{S}(c)) \hspace{0.2em} (g \hspace{0.2em} b \hspace{0.2em} c)\big). \end{align*}$$
In practice, we will rarely reason about mathematical objects themselves at the type constructor level. While it is a useful tool for being absolutely certain no logical error has been made, we typically try to adjust our intuitions so that we can reason such steps automatically. That is not to say, in this moment at least, that the rigidity we expect of pure mathematics is any weaker, but that statements such as $a = b \Rightarrow a + c = b + c$ is one we train our intuitions on so that we know automatically how to make algebraic deductions.
When we lack such intuitions, or when we know our intuitions are at risk of failing us, this system or modes of reasoning like it are there to catch us. This is often the case when we speak about the various refactorings we can do to statements to make them easier to prove related to negation. It is a common mistake for students to set out to prove by contradiction, only to structure the negated form of the proposition incorrectly and make the proof trivial but wrong. With the benefit of this system, we will not be making such mistakes.
Let $P,Q \colon \mathtt{Type}$ be propositions, and let $A \colon \mathtt{Type}$ with $R\colon A \to \mathtt{Type}$ a type family. Then we have the following:
$\neg P \vee \neg Q \Rightarrow \neg (P \wedge Q)$ i.e. 'if $P$ or $Q$ is false then they are not both true'
$\neg P \wedge \neg Q \Rightarrow \neg (P \vee Q)$ i.e. 'if both are false then not one of them are true'
$\neg (P \vee Q) \Rightarrow \neg P \wedge \neg Q$ i.e. 'if not either then both false'
$\neg (\exists (a \colon A), R(a)) \Rightarrow \forall (a \colon A), \neg R(a)$ i.e. 'if not one example, then for all false'
$\forall (a \colon A), \neg R(a) \Rightarrow \neg (\exists (a \colon A), R(a))$ i.e. 'if for all false, then there does not exist an example'
$ \exists (a : A), \neg R(a) \Rightarrow \neg (\forall (a : A) \to R(a))$ i.e. 'if counterexample then not for all'
Many of the proofs for this theorem will depend on applying right-associativity, since if the final type we want to show is of form $\to \neg P$, then this is the same as $\to P \to \bot$. Just like in our previous proofs involving negation, we can indulge a counterfactual case, however proofs ending in negation are special in a certain respect. That is, when we engage the counterfactual, we do not need to apply the principle of explosion (the trivial function) in order to obtain any proposition we want, because our final proposition type is $\bot$. Said another way, we can reinterpret $\to \neg P$ as $\to P \to \bot$ and supply a witness $p \colon P$, and this is equivalent to saying "$P$ is false and we will show it's false by showing absurdity is derived from it being true". This is a very specific form of proof by contradiction.
This proof will be the longest we've ever seen before, a record to be almost immediately broken again. For that reason, it is on the website displayed as a minimized box by default. I do recommend looking over this proof, so please click it.
Let us proceed as if $P,Q$ were normal types. Then we read $\neg P \vee \neg Q = (P \to \bot) + (Q \to \bot)$ and $\neg (P \wedge Q) = P \times Q \to \bot$. Together then, we must find a witness function of the type $$\begin{align*} (P \to \bot) + (Q \to \bot) \to (P \times Q \to \bot). \end{align*}$$ Let us call this witness $f$. Then to pattern match $f$ for its argument in $(P \to \bot) + (Q \to \bot)$, we must define its procedure in when it receives either $l \hspace{0.2em} g$ where $g \colon P \to \bot$ or $r \hspace{0.2em} h$ where $h \colon Q \to \bot$. In either case, recalling the right-associativity of $\to$, we may then define its behaviour as a function with output in $\bot$ if it receives an input $(p,q) \colon P \times Q$ thus solving in the counterfactual. Then, for example, we may extract the desired value $p$ and apply it to our function $g \colon P \to \bot$. Thus $f$ is defined $$\begin{gather*} f \hspace{0.2em} (l \hspace{0.2em} g) \hspace{0.2em} (p,q) = g \hspace{0.2em} p \\[0em] f \hspace{0.2em} (r \hspace{0.2em} h) \hspace{0.2em} (p,q) = h \hspace{0.2em} q \end{gather*}$$
Once again, note that the desired statement is written fully as $$\begin{gather*} (P \to \bot) \times (Q \to \bot) \to (P + Q \to \bot). \end{gather*}$$ with the benefit of familiarity with the previous proof, we must define the behaviour assuming we are given an argument $(g,h) \colon (P \to \bot) \times (Q \to \bot)$ and, using right-associativity, let us say we are also given an argument $P + Q$ which must be pattern matched as either $l \hspace{0.2em} p$ or $r \hspace{0.2em} q$ for $p \colon P$ or $q \colon Q$ respectively. Then we can define $f$ as $$\begin{gather*} f \hspace{0.2em} (g,h) \hspace{0.2em} (l \hspace{0.2em} p) = g \hspace{0.2em} p \\[0em] f \hspace{0.2em} (g,h) \hspace{0.2em} (r \hspace{0.2em} h) = h \hspace{0.2em} q \end{gather*}$$ a near re-ordering of the previous proof.
This statement is written as $$\begin{gather*} ((P + Q) \to \bot) \to (P \to \bot) \times (Q \to \bot). \end{gather*}$$ Its witness, which we once again call $f$, will take as an argument $g \colon P + Q \to \bot$. Our output must be a pair of functions that take some $p\colon P$ to $\bot$ and some $q \colon Q$ to $\bot$. Since we can inject either into $P+Q$ with $l$ or $r$, this is simple. $$\begin{gather*} f \hspace{0.2em} g = \Big( \big( p \mapsto g \hspace{0.2em} (l \hspace{0.2em} p) \big), \big(q \mapsto g \hspace{0.2em} (r \hspace{0.2em} q) \big)\Big) \end{gather*}$$
Despite the different symbols, this is remarkably similar to the previous proofs. First, observe what we want to prove is really $$\begin{gather*} \left( \sum_{a : A} R(a) \to \bot \right) \to (a : A) \to (R(a) \to \bot) \end{gather*}$$ with $\sum_{a : A} R(a)$ the type of dependent pairs $(a,r)$ with $r \colon R(a)$. Then, once again providing our witness $f$ with arguments $g \colon \sum_{a : A} R(a) \to \bot$, $a \colon A$ and $r \colon R(a)$ just as in previous proofs, we define $f$ as $$\begin{gather*} f \hspace{0.2em} g \hspace{0.2em} a \hspace{0.2em} r = g \hspace{0.2em} (a,r) \end{gather*}$$ since the pair $(a,r)$ is exactly the argument $g$ needs to produce an element of $\bot$, completing the counterfactual.
Repeating our procedure, what we aim to prove is really $$\begin{gather*} \big( (a \colon A) \to ( R(a) \to \bot) \big) \to \left( \sum_{a : A} R(a) \to \bot \right). \end{gather*}$$ So we will provide our proof $f$ with $g \colon (a \colon A) \to R(a) \to \bot$ and a pair $(a,r) \colon \sum_{a : A} R(a)$. We then have immediately $$\begin{gather*} f \hspace{0.2em} g \hspace{0.2em} (a,r) = g \hspace{0.2em} a \hspace{0.2em} r \end{gather*}$$ thus providing our fictitious element of $\bot$ and completing the theorem.
The non-propositional form is $$\begin{gather*} f\colon \left( \sum_{a : A} R(a) \to \bot \right) \to \big((a : A) \to R(a) \big) \to \bot \end{gather*}$$ so presume the arguments are supplied $(a , g)$ with $a \colon A$ and $g \colon R(a) \to \bot$ and $h \colon (a : A) \to R(a)$. We need simply specialise $h$ on $a$ to obtain an element of $R(a)$, which we can then apply to $g$ to obtain a member of $\bot$. $$\begin{gather*} f \hspace{0.2em} (a,g) \hspace{0.2em} h = g \hspace{0.2em} (h \hspace{0.2em} a). \end{gather*}$$
\qed
Notice however that these four statements we prove are clearly not all the logical statements we tend to consider about negation. For instance, we cannot prove 'if not both then one is false' ($\neg (P \wedge Q) \Rightarrow \neg P \vee \neg Q$) because our system has hamstrung us. Or more interestingly, it has in fact revealed to us that there are things we believe to be true about logic that are not true a priori.
Consider for example, the concept of proof by contradiction. Since $\neg \neg P$ is read as 'not not P' you would be forgiven for thinking that it is true only when $P$ is true, but that is not the case. In fact, to believe $\neg \neg P \Rightarrow P$ is a strong statement, implying in particular that you believe all propositions are either true or false ($\forall (P \colon \mathtt{Type}), P \vee \neg P$), a property called 'being decidable'. In our every day life, this seems to reflect our experiences, and yet there are rare cases when a question must be undecidable, as is famously the case in Alan Turing's solution to the Halting Problem.
In fact, to consider dependent type theory as the basis of logic would force one to accept a bizarre truth: that there is more than just true or false, but a secret third thing, the not false. We can prove (and this is a fun exercise you are now well equipped to do) that $\neg \neg \neg P \to \neg P$, that is, 'not not not $P$ implies not $P$', but we cannot do anything about $\neg \neg P$. So if we (circularly) think of $\bot = \neg \top$, then we must accept that there is in fact a third outcome, that of $\neg \bot = \neg \neg \top$ in a distinct and meaningful way where there is no fourth (since $\neg \neg \bot = \neg \neg \neg \top$ would be $\neg \top$ as above). We must ask ourselves, is this the system of logic we want to work within? Does this reflect our goals in mathematics?
To believe that all propositions are either true or false is in fact a distinct axiom, called the Law of Excluded Middle (LEM) in this case, a statement which we consider to be true and do not question precisely because it is part of the foundation of our system of reasoning. To claim an axiom in one's reasoning is a profoundly dangerous step that must be exercised with caution; presume the wrong thing, and you will end up in a counterfactual, but axiomitise the wrong thing, and your entire logical system is vulnerable to the principle of explosion.
And yet if it is dangerous to believe this axiom since it has clear contradictions, why is it that proof by contradiction is commonly accepted in mathematics?
Here we meet the difference between mathematics and logic; there is a sense in which mathematicians are more interested in the metaphysical behaviour of the objects they study than the logical soundness of them. This is not to say that mathematicians are willing to accept contradictions (in fact they are repulsed by them!) but it is just not their primary concern. To a mathematician, the fact that almost every object they encounter is decidable is enough to consider these rare cases the exception to the rule, i.e. statements are decidable until proven or cautioned otherwise.
Moreover, this more metaphysical interest means that there are circumstances when even mathematicians, looking at a statement that agrees with their metaphysical understanding of an object, will be tempted to say 'this seems true enough'. Such is famously the case with the axiom of choice.
This axiom is famously difficult to comprehend, and many on encountering it are unclear on why exactly it is an axiom which must be asserted rather than a straight forward logical operation. To us, with the benefit of type theory, we see the axiom of choice as a clear assertion of type $$\begin{gather*} \Big(\forall (a : A), \exists (b : B), P(a,b) \Big) \hspace{0.5em} \to \hspace{0.5em} \exists (f : A \to B), \forall (a : A), P(a,f(a)) \end{gather*}$$ given a family of types $P \colon A \to B \to \mathtt{Type}$. We read this statement as "if for all $a$ there exists a $b$ so that $P(a,b)$ is true, then there exists a choice function $f$ which for all $a$ will choose the $b = f(a)$ satisfying $P(a,b)$". Yet even here you may begin to see why such a statement makes one question why it is an axiom: if we know the $b$ exists, then surely it is right there, and we can just choose it, right? But in fact, it can be shown that the axiom of choice implies the law of excluded middle. It is the nature of the mathematician (more particularly the naive one) to look at such a statement and be unconcerned with it; their attention is instead focused on the nature of the type or set that they are choosing from. One might remark even in the other direction, that the great success of mathematics is not in spite of a blindness to logic but because of their singleminded metaphysical curiosity. For much of the remainder of this text, we will embrace that curiosity, and the mindset of the mathematician.
We may even point out a distinct failure of the system we have been working within. That is, while we can prove quite sophisticated mathematical theorems in dependent type theory, we do so with a stipulation that is not the case in mathematics. A witness to the type $\exists (a : A), P(a)$ is not just evidence that it is true that there exists $a$ satisfying $P(a)$, it is an example of it being true $(a,p)$. The functional nature of our deductions means that we cannot merely prove an example exists, we must construct it, hence the name constructive mathematics.
This burden of proof is distinctly higher than necessary. There are valuable things we can say when we are able to prove examples or counter-examples exist where the example itself does not particularly matter. We can even reverse the accusatory finger; since constructive proofs are computational (indeed there are many languages that will run them), to expect all of mathematics to be fully constructive corresponds in some sense to an axiom itself, that the world we live in is a computable one. Does every question that has an answer have a procedure to find that answer? This is an assumption mathematicians do not make.
As I said many times before, the rules of dependent type theory remain valuable to us despite our disinterest in its strict adherence. Instead, we know now what it means exactly to presume something is true: treating our theorems as types and our proofs as functions, we can use our intuition to guide our steps and know what type-form a presumption might have, noting immediately if it is reasonable or in fact a much more dangerous statement.
Accordingly, we can now give a more full set of rules for how to manipulate negations, paying off one of the uses of type theory that we will continue to return to throughout this text.
Let $P,Q \colon \mathtt{Type}$ be decidable types, that is, we have witnesses of $P \vee \neg P$ and $Q \vee \neg Q$. Then we have the following:
Proof by Contradiction or Reductio ad absurdum, i.e. 'if $P$ is true or false, then if we can show that $P$ being false is logically inconsistent, then $P$ must be true since it is either true or false and can not be false' $$\begin{gather*} \mathtt{RAA}\colon \forall (P : \mathtt{Type} ), P \vee \neg P \Rightarrow ( \neg \neg P \Rightarrow P) \end{gather*}$$
$\neg(P \wedge Q) \Leftrightarrow \neg P \vee \neg Q$, i.e. 'not both, if and only if either is false'
$\neg (P \vee Q) \Leftrightarrow \neg P \wedge \neg Q$ i.e. 'neither, if and only if both are false'
Proof by Contrapositive i.e. 'the rain must wet the ground if and only if knowing the ground is not wet tells us that it did not rain' $$\begin{gather*} \mathtt{PBC} \colon (P \Rightarrow Q) \Leftrightarrow (\neg Q \Rightarrow \neg P) \end{gather*}$$
Negation of implication i.e. '$P$ does not cause $Q$ if and only if there are examples of $P$ happening but not $Q$' $$\begin{gather*} \neg (P \Rightarrow Q) \Leftrightarrow P \wedge \neg Q \end{gather*}$$
Additionally, let $R \colon A \to \mathtt{Type}$ be a family of types with $A \colon \mathtt{Type}$ a non-empty type. Moreover, allow that $R(a)$ and its derivative types are decidable as necessary, such as $\forall (a:A), R(a)$, or $\exists (a : A), \neg R(a)$, etc. Then we also have:
$\neg \big(\forall (a : A), R(a) \big) \Leftrightarrow \exists (a : A), \neg R(a)$ i.e. 'not for all if and only if there exists a counter example '
$\neg \big(\exists (a : A), R(a)\big) \Leftrightarrow \forall (a : A), \neg R(a)$ i.e. 'there does not exist an example if and only if for all it is false'
This is the promised even-longer proof, so we have minimized it on the web. I do recommend taking a look however, particularly in the last thing we prove.
We have already shown some parts of these in theorem proptypes.18, so we only need to show the remaining parts.
We have already assumed by hypothesis $P \colon \mathtt{Type}$ and some witness $P \vee \neg P$, so let us pattern match on this witness. It is either $l \hspace{0.2em} p$ with $p \colon P$ or $r \hspace{0.2em} f$ with $f \colon P \to \bot$. In the former case, by right-associativity we are done since we may assign the function $\neg \neg P \Rightarrow P$ to be merely the constant function $p$. In the latter case, we must also be given an element of $\neg \neg P$, that is, some $g\colon (P \to \bot) \to \bot$. In this case, $g \hspace{0.2em} f \colon \bot$ produces the fictitious element of the empty type, allowing us to use the principle of explosion with trivial function $h \colon \bot \to P$. $$\begin{gather*} \mathtt{RAA} \hspace{0.2em} (l \hspace{0.2em} p) \hspace{0.2em} g = p \\[0em] \mathtt{RAA} \hspace{0.2em} (r \hspace{0.2em} f) \hspace{0.2em} g = h \hspace{0.2em} (g \hspace{0.2em} f) \end{gather*}$$
Recall that $\Leftrightarrow$ merely means a pair of implications going in either direction. We have already shown $\neg P \vee \neg Q \Rightarrow \neg (P \wedge Q)$ so we need now to show in our context that $\neg (P \wedge Q) \Rightarrow \neg P \vee \neg Q$, or rather $$\begin{gather*} \big((P \times Q) \to \bot \big) \to \big( (P \to \bot) + (Q \to \bot) \big). \end{gather*}$$ To show this, we will need to actively use our information that $P$ and $Q$ are decidable, so in fact, the above type as written may not have witnesses, nor are we trying to prove it in a general sense. Our full theorem, accounting for the desired context, is $$\begin{align*} f \colon P \vee \neg P \to & Q \vee \neg Q \to \big((P \times Q) \to \bot \big) \\[0em] & \to \big( (P \to \bot) + (Q \to \bot) \big) \end{align*}$$ We must now pattern match on $P \vee \neg P$ and $Q \vee \neg Q$ with $p \colon P$ or $g_p \colon P \to \bot$ and $q \colon Q$ or $g_q \colon Q \to \bot$ respectively. Then let $h \colon (P \times Q) \to \bot$ be the next argument and $k \colon \bot \to \neg P \vee \neg Q$ be a trivial function. Our four cases become $$\begin{gather*} f \hspace{0.2em} (l \hspace{0.2em} p) \hspace{0.2em} (l \hspace{0.2em} q) \hspace{0.2em} h = k \hspace{0.2em} \big(h \hspace{0.2em} (p,q) \big) \\[0em] f \hspace{0.2em} (l \hspace{0.2em} p) \hspace{0.2em} (r \hspace{0.2em} g_q) \hspace{0.2em} h = r \hspace{0.2em} g_q \\[0em] f \hspace{0.2em} (r \hspace{0.2em} g_p) \hspace{0.2em} (l \hspace{0.2em} q) \hspace{0.2em} h = l \hspace{0.2em} g_p \\[0em] f \hspace{0.2em} (r \hspace{0.2em} g_p) \hspace{0.2em} (r \hspace{0.2em} g_q) \hspace{0.2em} h = l \hspace{0.2em} g_p \end{gather*}$$ Notice that whenever either decidable proposition is decided false, we simply use that to form our witness of "not $P$ or not $Q$".
Both $\neg P \vee \neg Q \Rightarrow \neg (P \wedge Q)$ and $\neg (P \wedge Q) \Rightarrow \neg P \vee \neg Q$ have been proven in theorem proptypes.18, so their pair forms $\neg (P \vee Q) \Leftrightarrow \neg P \wedge \neg Q$.
The implication of the contrapositive form of a proposition $(P \Rightarrow Q) \Rightarrow (\neg Q \Rightarrow \neg P)$ can be proven without assuming decidability. In fact, reading the type as $$\begin{gather*} (P \to Q) \to (Q \to \bot) \to P \to \bot \end{gather*}$$ we can supply our proof $f_1$ with three arguments $g \colon P \to Q$, $h \colon Q \to \bot$ and $p \colon P$, immediately solving this form as $$\begin{gather*} f \hspace{0.2em} g \hspace{0.2em} h \hspace{0.2em} p = h \hspace{0.2em} (g \hspace{0.2em} p) \end{gather*}$$ by using $g$ to turn $p$ into a member of $Q$ and then using $h$ to turn that member of $Q$ into a member of $\bot$.
In the other direction, we require decidability of $Q$, so let us extend the type as we did in the previous proof which we used decidability in. $$\begin{align*} f_2 \colon Q \vee \neg Q \to (\neg Q \to \neg P ) \to (P \to Q) \end{align*}$$ Similarly to the last proof which used decidability, our proof in fact looks the same so long as $Q$ is true, since by right-associativity, we may simply ignore the rest of the arguments and say that $P \Rightarrow Q$ since no matter $P$, we have $q \colon Q$ as a constant function. This means, for other cases, we assume $Q$ to be false. In that case, with argument $h \colon \neg Q \to \neg P$, we may read $\neg P = P \to \bot$ and supply the final $p \colon P$ argument from the end of the type $P \to Q$ (i.e. we are applying right-associativity) to obtain an element of $\bot$, then apply the trivial function $k \colon \bot \to Q$. $$\begin{gather*} f_2 \hspace{0.2em} (l \hspace{0.2em} q) \hspace{0.2em} h \hspace{0.2em} p = q \\[0em] f_2 \hspace{0.2em} (r \hspace{0.2em} g) \hspace{0.2em} h \hspace{0.2em} p = k \hspace{0.2em} (h \hspace{0.2em} g \hspace{0.2em} p) \end{gather*}$$ Our complete proof is then the pair $\mathtt{PBC} = (f_1,f_2)$.
To show the form of negation of implications, once again we prove two forms $$\begin{gather*} f_1 \colon P \vee \neg P \to Q \vee \neg Q \to \neg (P \to Q) \Rightarrow P \wedge \neg Q \\[0em] f_2 \colon P \vee \neg P \to Q \vee \neg Q \to (P \wedge \neg Q) \Rightarrow \neg (P \to Q) \end{gather*}$$ where $f_2$ turns out to be substantially easier to prove, and in fact can be done without decidability. For that reason, let us provide it the dummy arguments $x \colon P \vee \neg P$ and $y \colon Q \vee \neg Q$ and the pair $(p, g) \colon P \times \neg Q$ where $p \colon P$ and $g \colon Q \to \bot$. Finally, we once again apply right-associativity to provide the argument $h \colon P \to Q$ changing our goal type to $\bot$. In this case our proof is merely $$\begin{gather*} f_2 \hspace{0.2em} x \hspace{0.2em} y \hspace{0.2em} (p , g) \hspace{0.2em} h = g \hspace{0.2em} (h \hspace{0.2em} p). \end{gather*}$$ The reverse will prove to be the most complicated non-dependent proof we've done so far, and will involve reusing our proof of contrapositive which we named so we could use it now, $\mathtt{PBC}$. In particular, we will need to extract its second form $$\begin{gather*} (\mathtt{pr}_2 \hspace{0.2em} \mathtt{PBC}) \colon Q \vee \neg Q \to (\neg Q \to \neg P ) \to (P \to Q) \end{gather*}$$ with the projection operator $\mathtt{pr}_2$ which we use for extracting the second item in a pair. We will also need two different trivial functions $k_1,k_2$, and so we will name their types shortly; the more important thing to recognise is that whenever we see a trivial function used, it is because we are in a false case.
Now consider, since $P$ and $Q$ are decidable by hypothesis, they are either true or false. First let us pattern match on the decidability of $Q$, and indeed the truth of $P$ will not matter if $Q$ is true, so we'll need a dummy variable symbol $x$, and a few others $y,z$. Recalling the form we want to show $$\begin{gather*} f_1 \colon P \vee \neg P \to Q \vee \neg Q \to \neg (P \to Q) \Rightarrow P \wedge \neg Q, \end{gather*}$$ we also take as an argument $g \colon \neg (P \to Q)$. In the case where $Q$ is true, our evidence $q \colon Q$ allows us to define a constant function $(y \mapsto q) \colon P \to Q$, and this can be fed to $g$ to produce $\bot$. If $P$ is true and $Q$ is false, we are in exactly the situation desired and may provide $p \colon P$ and $h \colon \neg Q$ as output $(p,h) \colon P \wedge \neg Q$.
If however both $P$ and $Q$ are false, let us say that we have $h_p \colon P \to \bot$ and $h_q \colon Q \to \bot$. Since $h_q \colon \neg Q$, we can pass it to $(\mathtt{pr}_2 \hspace{0.2em} \mathtt{PBC})$ as our proof that $Q$ is decidable and thus witness that contrapositive applies. This gets us $$\begin{gather*} (\mathtt{pr}_2 \hspace{0.2em} \mathtt{PBC}) \hspace{0.2em} h_q \colon (\neg Q \to \neg P ) \to (P \to Q). \end{gather*}$$ To use this, we'll need a function of type $\neg Q \to \neg P$, however we have $h_p \colon \neg P$, which is enough to define a constant function $(z \mapsto h_p) \colon \neg Q \to \neg P$. Once fed to $(\mathtt{pr}_2 \hspace{0.2em} \mathtt{PBC}) \hspace{0.2em} h_q$, we obtain now a function of type $P \to Q$. Recall now that we have the argument $g \colon (P \to Q) \to \bot$, which can eat $(\mathtt{pr}_2 \hspace{0.2em} \mathtt{PBC}) \hspace{0.2em} h_q \hspace{0.2em} (z \mapsto h_p)$ to give us our element of $\bot$. Finally we can apply the trivial function $k_2 \colon \bot \to P \wedge \neg Q$.
All together, these three cases are written $$\begin{gather*} f_1 \hspace{0.2em} x \hspace{0.2em} (l \hspace{0.2em} q) \hspace{0.2em} g \hspace{0.2em} = k_1 \big( g \hspace{0.2em} (y \mapsto q) \big) \\[0em] f_1 \hspace{0.2em} (l \hspace{0.2em} p) \hspace{0.2em} (r \hspace{0.2em} h) \hspace{0.2em} g = (p, h) \\[0em] f_1 \hspace{0.2em} (r \hspace{0.2em} h_p) \hspace{0.2em} (r \hspace{0.2em} h_q) \hspace{0.2em} g = k_2 \Big( g \hspace{0.2em} \big((\mathtt{pr}_2 \hspace{0.2em} \mathtt{PBC}) \hspace{0.2em} h_q \hspace{0.2em} (z \mapsto h_p) \big) \Big) \end{gather*}$$ with $k_1 \colon \bot \to P \wedge \neg Q$.
We proceed now to the dependent statements. In theorem proptypes.18 we showed both directions of $\neg \big(\exists (a : A), R(a)\big) \Leftrightarrow \forall (a : A), \neg R(a)$. So we set our attention on $\neg \big(\forall (a : A), R(a) \big) \Leftrightarrow \exists (a : A), \neg R(a)$, for which we have also proven the direction $\exists (a : A), \neg R(a) \Rightarrow \neg (\forall (a : A) \to R(a))$ without decidability.
The other direction will prove substantially more difficult, requiring helper functions as well as decidability of all $R(a)$ and of the dependent pair type $\sum_{a : A} \neg R (a)$. We also need an example of $a$ so we are sure the family $R$ is non-empty. Then we will need dependent trivial functions $\phi_a \colon \bot \to R(a)$ (spoken as 'phi', consider this if you like to be $(a : A) \to \bot \to R(a)$ with the argument $a$ given in the subscript) and $\psi_a \colon \bot \to \neg R(a)$.
Our goal is then $$\begin{align*} f \colon A \to & \big((a : A) \to R(a) \vee \neg R(a) \big) \to \\[0em] & \left( \sum_{a:A} \neg R(a) \right) \vee \neg \left( \sum_{a:A} \neg R(a) \right) \to \\[0em] & \Big( \big((a : A) \to R(a) \big) \to \bot \Big) \to \sum_{a : A} \neg R(a) \end{align*}$$ consisting of four arguments, $$\begin{align*} & b \colon A, \\[0em] & g \colon (a : A) \to R(a) \vee \neg R(a), \\[0em] \text{either}\hspace{0.5em} & (l \hspace{0.2em} e_1) \hspace{0.5em} \text{or} \hspace{0.5em} (r \hspace{0.2em} e_2) \hspace{0.5em} \text{pattern matched}\\[0em] \text{with}\hspace{0.5em} & e_1 \colon \exists (a : A), \neg R(a) \hspace{0.5em} \text{or} \hspace{0.5em} e_2 \colon \big( \exists (a : A), \neg R(a) \big) \to \bot \\[0em] \text{and finally} \hspace{0.5em}& h \colon \big((a : A) \to R(a) \big) \to \bot \end{align*}$$ We will also need helper functions alpha and beta $$\begin{gather*} \alpha \colon (a : A) \to R(a) \vee \neg R(a) \to R(a) \\[0em] \beta \colon (a : A) \to R(a) \end{gather*}$$ which will exist only in this context in the case with $e_2$ and as a consequence of it. You may think of them as effectively a notational sugar-coat for readability.
Proceed in the case with $e_1$. This is the case where we imagine that decidability of $\exists (a : A), \neg R(a)$ is true; in this case our proof is exactly this example. $$\begin{gather*} f \hspace{0.2em} b \hspace{0.2em} g \hspace{0.2em} (l \hspace{0.2em} e_1) \hspace{0.2em} h = e_1 \end{gather*}$$ In the other case, we can and must first define $\alpha$ and $\beta$. The statement of $\alpha$ is effectively 'for all $a \colon A$ with $R(a)$ true or false, we have $R(a)$ true'. This obviously does not make sense in general, but in this specific case we have $e_2$ which one can consider to be the statement 'there are no counter-examples to $\forall (a : A), R(a)$' (read the type of $e_2$ carefully to get this). The other reason we define $\alpha$ is so that we can pattern-match on $(g \hspace{0.2em} b) \colon R(b) \vee \neg R(b)$.
Thus our $\alpha$ takes two arguments with two cases, one where it takes $c\colon A$ and $(l \hspace{0.2em} p)$ with $p \colon R(c)$ and one where it takes instead $(r \hspace{0.2em} q)$ with $q \colon R(c) \to \bot$. Recalling we defined the trivial function $\phi_a \colon \bot \to R(a)$, we can now write $$\begin{gather*} \alpha \hspace{0.2em} c \hspace{0.2em} (l \hspace{0.2em} p) = p \\[0em] \alpha \hspace{0.2em} c \hspace{0.2em} (r \hspace{0.2em} q) = \phi_c \hspace{0.2em} \big( e_2 \hspace{0.2em} (c , q)\big) \end{gather*}$$ i.e. evidence of decidability of $R(c)$ must always give us evidence that it is true since we have $e_2$ to tell us there exists no counter examples. If we try to say there exists $q \colon R(c) \to \bot$ then we get a contradiction.
This is nearly the form we need, that of $\forall (a : A), R(a)$, which will allow us to apply $h \colon \big((a : A) \to R(a) \big) \to \bot$ to say that $e_2$ itself causes a contradiction. To do that though, we need to find some way to get rid of the dependence on proof of decidability as an argument. Since, in our context, we already have $g$ which takes some $a : A$ and tells us if its $R(a)$ is decidable, we don't need to take that as an argument; in this context it can be taken as a constant. Thus we define $$\begin{gather*} \beta \hspace{0.2em} c \hspace{0.2em} = \alpha \hspace{0.2em} c \hspace{0.2em} (g \hspace{0.2em} c) \end{gather*}$$ which is merely $\alpha$ with decidability of $R(c)$ provided externally by $(g \hspace{0.2em} c)$.
Finally we provide $\beta$ to $h$, obtaining a contradiction, and then applying our trivial function $\psi_a \colon \bot \to R(a)$ to eliminate the counterfactual. $$\begin{gather*} f \hspace{0.2em} b \hspace{0.2em} g \hspace{0.2em} (l \hspace{0.2em} e_1) \hspace{0.2em} h = \big(b, \psi_b \hspace{0.2em} (h \hspace{0.2em} \beta) \big) \end{gather*}$$
We are finally done.
\qedIt is worth noting that this also pays off our earlier mention of universal quantifiers generalizing and statements and existential quantifiers generalizing or statements. The generalized De Morgan's laws show that the way a for all statement negates with respect to a there exists statement is indeed the same relationship we see between and and or. For instance, we have both 'not both, if and only if either is false' and 'not for all if and only if there exists a counter example'.
This system of deductions will remain close with us. It is not strictly that the system of dependent type theory is a bad one for logic, but merely that it does not automatically provide a great model for our interests in mathematics. This is largely solved by augmenting our system with the axioms we insist to be true.
In the context of dependent type theory, an axiom can be thought of as insisting a witness of some proposition or theorem exists with no way to define it or find it. Once we add even one axiom, we are in the non-constructive regime, since if theorems are functions, our axioms are functions with no procedure to execute them. Computers can still check such theorems however, and they do this at the level that a computer program merely checks to see that there are no type errors; the lack of a procedure does not change the fact that if such a procedure existed, the theorem would then run without error.
But as we move forward, we will be doing more than merely making our theorems non-constructive with our axioms. We, as mathematicians, are not logicians. There will be times when we strike off one of our axioms and insist that it does not apply in the case we are in. We will even do this without strict rules, but by reasoning about our system, or simply by a known counter-example. That is, as mathematicians, we can know an inconsistency exists somewhere in our system, and simply choose to avoid it as a false branch of conclusions. We do not, for instance, neurotically emphasize solutions to Russel's paradox to avoid even the slightest risk of paradox, we simply avoid the mechanism that causes Russel's paradox to occur as implicitly solving the paradox.
That is neither to say that we merely allow the risk of paradoxes due to our axioms. Due to over a century of knit-picking out of concern for this exact problems, we know what our axioms are in mathematics, and just as in our proof of De Morgan's laws we wrote axioms not as functions that were merely available in context, but as specific conditions, we can simply fail to provide those conditions in the situations where they are not true.
Dependent type theory, with the appropriate asterisks, will form a scaffold by which we continue to reason about logic, and by which we intuit what behaviours we might desire from other structures. For instance, moving forward, we largely put the idea of constructive numbers behind us, but we know that we can construct inductive natural numbers, and thus we can do proof by induction on them. Moreover, seemingly qualitative statements can now be made quantitative by interpretting our conditions for when they are true as constructors, by our so called universal and existential quantifiers.