Despite the problems with constructive mathematics that we discussed previously, I think it is necessary to first see mathematics as a much more rigid system. The two topics of this section, set reasoning and relational reasoning, while both formal studies, are at their best when one simply develops an intuition for them. Going from no mathematical training to having these intuitions is perhaps the hardest part, the barrier that makes people think that math is hard, and it is made significantly worse by the lack of structure at the bottom of the reasoning.
That is, set theory is not a constructive theory in the slightest. It cannot exist in a vacuum and provides a very poor pedagogical lens since it was invented for existing mathematicians to better understand their craft. It does not say anything about where an object comes from or what it fundamentally is, but rather assumes an object could be anything and then whittles that anything down to something which can be reasoned about. It appears as a top down theory rather than a bottom up theory, like type theory is.
And while (pure) mathematics has evolved into an especially symbolic field, it did not begin that way; the symbols are merely a way to help us formalize our thoughts but the thoughts themselves have historical roots in logical statements, reasoned about just as one reasons about in philosophy. Mathematics has moved less from those roots than people realize, thus the great friction between computer readable mathematics and human readable mathematics; human readable mathematics involves context and it skips steps where they are obvious. Nonetheless, we have (quite nervously) formalized into rules whatever we can to ensure that our reasoning is not merely elaborate sophistry, and to move the foundation of our thought further down.
It is because mathematics is merely a formalized discipline of human reasoning that it is significantly broader than any one system we have invented within it. This is what it means when we say something like that type theory is a system of reasoning built up within set theory (or rather that it can be). Not because sets are themselves the perfect abstraction, but because as an abstraction, they were made to model the broadest reasoning that consistently made sense to us rather than narrowing reasoning to something a computer could follow.
So in this section we aim to explore some of the cognitive primitives of mathematics, namely our relationship to relations themselves, the mathematician's notion of a set, and what it means for an object or theory to be founded on axioms. To do this, we will touch on the notion of a rewriting system and the Zermelo-Fraenkel-Choice axioms, discussing how an axiomatic system works. We will see in this section that, although in broader mathematics we can often say very little about what a thing is (in the same way that we have constructors in type theory), we trade this knowledge for an increased focus on what a thing does.
Consider what is meant when we write the abstract symbol $x$. If you were comfortable with the previous section then you'd expect it could be an element of a type, or even breaking convention, perhaps a type itself. As we enter into non-constructive mathematics, the rule will be that a symbol is essentially any possible thing until otherwise constrained (perhaps by context) to some set of rules. As we will discuss more when we get into set theory, classical mathematics does not have types or constructors, it has only rules, and sets of things that obey those rules. We have the set of natural numbers, the set of integers, etc. Not because those sets are themselves types, or because the members they contain are necessarily fundamentally different from one another, but merely because it is a collection of abstract things which happen to obey conditions.
This is a clean break from our notion of types; just as we had constructed a type of natural numbers, we could also construct a type of integers (counting numbers with negatives) but this type would have its own constructors. It would be reasonable to create a function that sends a natural number to its corresponding integer in the integer type, but this is not the same as saying that 'all natural numbers are also integers'. In our type theory, a natural number can become an integer but in classical mathematics one may say that members of what should be a type, are also members of another type.
This creates a problem in some modes of thinking; if we can say that some integers are natural numbers (the positive ones), and some real numbers are integers (the whole numbers), etc. etc., can we say that there is some true thing that each of these systems of numbers are? Not only is the answer no, the question would probably illicit some laughter from a mathematician. The lack of knowing what a number is is in fact a feature which allows our tools to be general. When we say that 'integers can be added, subtracted, multiplied, and in some cases divided', we do not mean that we are referring to some fundamental object that exists. We mean that we are referring to a collection of objects, the extent of which we do not know and do not care of, which we can describe in general. We are studying the nature of anything which is counted whole with positives and negatives when we study integers. The apples in bags, the candies divided amongst children, anything that we think of as whole.
This creates a different problem too, for if we are discussing all things that our rules describe, then how are we to intuit what we should do with a problem? One might imagine this like an actor being told to display a certain emotion in front of a green screen, only to watch the movie themselves and discover that an entirely different performance would be more appropriate in that scene. Or like being handed a simple puzzle interface and being told to solve puzzles, only to discover that you were in fact designing the electrical grid for a nation. Surely it is appropriate to know the material nature of a problem so that you can better approach it, right? To a (pure) mathematician, the answer is no. In fact, it is of great value to the pure mathematician that someone else can make formal the rules of the game we are playing so that we may merely play a game.
And this is entirely appropriate for the mathematician. While it is certainly true that knowing more about a material setting allows us extra information to deduce more things, there is a value in reasoning out everything there is to be known about the general case. Moreover, the way that we investigate this general case is by rejecting all other information about the system other than the symbolic rules we have been given and merely monkeying around with them.
An example is in order, if for no other reason than to explain what it means to reject information and settle for deduction by rules. Consider the equation $x^2 - 4 = 0$, although do not consider it so much as to be afraid; we will step through this example very slowly. To a mathematician, a problem such as this is examined as an algebraic one, and is thus attacked with algebraic tools. If we want to find the real number values of $x$ that solve the equation, we consider some things we know about algebraic numbers. We know about addition and multiplication. We know that for every addition there exists a number which, when added, reverses the addition (the negative). We know that for every multiplication (except zero) there exists a number which reverses the multiplication (the inverse). We would write these rules as follows: $$\begin{gather*} \forall x \in \reals, \exists y \in \reals, (x + y = 0) \\[0em] \forall x \in \reals, \forall y \in \reals, (x + y = 0) \Leftrightarrow (x = -y) \\[0em] \forall x \in \reals \setminus \{0\}, \exists y \in \reals, (x y = 1) \\[0em] \forall x \in \reals \setminus \{0\}, \forall y \in \reals \setminus \{0\}, \end{gather*}$$ Now, it is common when reasoning about equalities to think of alterations as "surely this implies that" or "this is equal to that", "if we add $2$ to both sides, then the sides remain equal". While this mode of reasoning is helpful, on this occasion we reject it entirely.
Instead, we will think in terms of rewrites. Which is to say, when we see terms of a certain form, we do not need to ask what they are other than the relevant rules, and we replace symbols. In this case, we see the second rule we wrote above, that $x + y = 0$ implies $x = -y$ and vice versa for all $x,y$, and we perform a rewrite. In this case, the only symbol in the expression which is fixed is zero, and the equality itself. So can we see a zero and an equality in $x^2 - 4 = 0$? Yes we can, and identify $x$ in our general expression as $x^2$ in our specific expression, likewise with $y$ and $-4$, and we rewrite $x^2 - 4 = 0$ to $x^2 = 4$. We do this without any particular reasoning about this equals that, but rather on the basis of a rule that is known about real number algebra. It is only at the point that we are stuck (i.e. no rewrites apply) that we try to reason, not about the equation itself exactly, but about what other rules may apply. Then perhaps we would reason about when a square root is appropriate.
This mindset radically simplifies the mental overhead of doing mathematics. We can of course, when necessary, make more reasoned deductions about why a thing is true, but if we did this all the time then it would be exhausting as well as slow. Mathematicians do not do this, their theorems (insofar as their theorems describe relations such as equals or less than/greater than) provide for them rewrite rules which they remember and apply as their fingers reaching into the world of symbols.
But perhaps that example was too unfamiliar. We have in fact already studied at least one rewrite, which was the addition operation for inductive natural numbers. First, consider this in abstract: if we say we have two numbers $a$ and $b$ whose sum is $c$, let us write $d$ to mean the sum $a+b$ not as their sum quantity but as a representative of the symbols $a + b$. Then what is the difference between $c$ and $d$? The former is the result of the calculation whereas $d$ is the calculation itself, in a state before any rewrites had been applied that might produce its result. We can think of $c$ and $d$ as 'equal' in the sense of the numbers we might discuss, but still representing different objects. $d$, representing $a + b$ must be rewritten into $c$ through whatever rules we have to perform this addition. Our notion of equality is then a choice of granularity: do we speak of symbolic equality, i.e. equality before reductions, or equality after reductions? What is it exactly that we want equality to mean?
Recall that the function we defined for addition in the previous section was based on two cases, that of the inductive zero and the inductive succession operator, $\mathbf{Z}\colon \mathbb{N}$ and $\mathbf{S}\colon \mathbb{N} \to \mathbb{N}$. The rules were $$\begin{gather*} \mathtt{add}(m, \mathbf{Z}) = m \\[0em] \mathtt{add}(m, \mathbf{S}(p)) = \mathtt{add}(\mathbf{S}(m),p) \end{gather*}$$ in which we either shifted a 'tally mark' from the second argument to the first and then repeated, or we were in the case where the right hand side had no tally marks and we simply returned the left number. This was in fact the first proper rewrite rule we discussed, in which every instance of the left hand sides of these equalities, the $\mathtt{add}(m, \mathbf{Z})$ and the $\mathtt{add}(m, \mathbf{S}(p))$ are automatically replaced with the right hand sides, $m$ and $\mathtt{add}(\mathbf{S}(m),p)$. In the second case, the case of the succession operation $\mathbf{S}$, we would then perform this rewrite as many times as we need to until we found ourself in the first case. But consider that knowing that these rewrite rules are equal, we could have also rewritten them backwards. In fact it is often common to name such a theorem $\mathtt{add}(m, \mathbf{Z}) \equiv m$ a theorem of right-identity.
Even in the type theory case however, one must noticing something. In the previous section we showed transitivity of our $\equiv$ relation, and so if we had $m \equiv n$ we could show $\mathtt{add}(m, \mathbf{Z}) \equiv n$, however we could not show that $m + a \equiv n + a$ implied $\mathtt{add}(m, \mathbf{Z}) + a\equiv n + a$ directly. While we had other rules to do that, the idea that we could insert $\mathtt{add}(m, \mathbf{Z}) \equiv m$ into the existing expression $m + a$ simply does not exist a priori. What we are reaching for there is a rewrite rule, the idea that equivalence implies that a symbol can be replaced with what it is equal to. But the only rewrite rules in the type theory we discussed, without significant extra effort or axioms, is that of function definitions.
The version of type theory we layed out in the previous section is one that can be computer interpretted, like a programming language. In the proper study of rewriting systems, we require that programming languages are normalizable, meaning that either there is a proper order of rewrites, or no matter which rewrites one does, there is a promised final state that all valid rewrites will arrive at. A correct normal form which we arrive at as the simplification of whatever we had written. Our addition operator is an example of this, as we have a rewrite that can almost always be applied $\mathtt{add}(m,\mathbf{S}(p))$ to $\mathtt{add}(\mathbf{S}(m),p)$ and a rewrite that could be applied in the one case where that rewrite did not, the termination case arriving finally to normal form which is the sum.
Mathematics is not like this. In mathematics it is expected that we have a rewrite rule for every equality we discover. Whenever it is that we write $A = B$, then we would swap $A$ for $B$ no matter how complex the expression we would find either symbol in. Moreover, rules like addition, subtraction, any operation, are constantly reversable even in contexts where they did not originally occur. We write $a + b = c$ and then we write $c = c + 0 = c + d - d$ and thus $a + b = c + d - d$ as necessary, inventing new terms if it serves our deductive goals. Sometimes I have tried to explain to programmers that mathematics is like programming but with a time machine; if one thinks of the reductions of a program in the process of being executed in one direction towards its final simplified form as the flow of time, mathematical deductions reverse this process freely, swap timelines, etc. Our goal is not a final form but some observation about the operation we are studying, and to do this we must actively reject a final form.
In fact, one might say that mathematics as a practice (a procedural skill distinct from its discipline or its theory) revolves almost entirely around rewrite rules. We are able to study concepts far more abstract than just numbers precisely because we put what a thing is to the side and focus only on how its symbols behave. We can then define objects which have entirely different symbolic behaviours from numbers and try to understand what they must represent in our world to behave in such a way. The beginning of the following chapter will be almost entirely devoted to elaborating how our notions about numbers are deduced from rewrites, however our first true example of this will be the set.
Set theory is famously described as being developed in a period of panic about the foundations of mathematics. We did not have then, as we do now, a notion of how one builds up mathematics from a small and strictly limited set of rules so that we are certain that what we are doing in mathematics makes sense. One could at the time (very anxiously) worry that perhaps all of the mathematics we had developed was somehow fraudulant. This is important context because it tells us about what set theory was meant to do and what it was not meant to do. It was developed for existing mathematicians so that they could understand what it was that they were doing. This means that it does not mention the wider expectations about what one would be doing with its rules. Consequently, set theory is a woefully incomplete manual for 'how to do mathematics'; it does not even try to be that. Set theory exists within what we have been calling the set theoretic lens, and the set theoretic lens is the lens that existed for all mathematics before we had invented notions of starting from scratch in a constructive setting.
So let us first discuss some mental primitives to keep in mind. A set is supposed to be a collection of objects without ordering or multiplicity (i.e. a set either contains an item or it does not, with no notion of before or after and never containing it twice). A set containing three objects $a,b,c$ can be written $\{a,b,c\}$. In fact, with certain asterisks which we discuss shortly, one can think of a set also as a proposition, much as in the way that we discussed propositions in the previous section. That is, one writes $\left\{x \mid P(x) \right\}$ to denote the set of all objects which we might label $x$, for whom the condition $P(x)$ is true. This notation is called a set comprehension. One may naively narrate sets as things like 'the set of all people' for which you can select a random member and get a person, or restrict this to the set of people you know as a subset, etc. As discussed earlier, we have shed type theory now, so a symbol means only the rules that apply to it; a set $\{x \mid P(x) \}$ that we draw members $x$ from could now be anything, and all we know about $x$ is that it obeys whatever kind of condition $P$ is. In this practice of mathematics, we reject a fixation on what a symbol is or means in favor of what it does, as $P$ will encode information necessary to make rewrites valid. If we set $P$ to be a condition that "$x$ is either one or some counting number with one added to it" then we have defined the set of natural numbers, not by construction but by restriction from the set of all things, and this enables rewrites involving sums for example. Thus forth, these kinds of restrictions, restrictions that add rules to our objects of study will be what we now consider construction.
This more naive description of what a set is is unreasonable however, as famously shown by Bertrand Russel. Russel's paradox showed that we cannot allow sets to naively contain 'anything that satisfies some condition', as this would allow sets to contain themselves. Moreover, one can start asking questions such as, "does the set of all sets which do not contain themselves contain itself?" and as Russel showed, deduce that it both contains itself and does not contain itself.
This and a plurality of other matters that needed clearing up or formalization culminated in what is now called the Zermelo-Fraenkel set axioms, usually supplemented with the axiom of choice to be called ZFC.
As this will be our first exploration of a mathematical concept from its axioms, we should discuss a bit about how this will work. Our notion of propositional logic will carry over quite strongly from the previous section, and our axioms will define a collection of propositions asserted true about sets. These will be the requirements, the rules, that restrict down the could-be-anythings of unspecified symbols down to a collection of certainly-are-somethings from which we can make deductions. Importantly, a set of axioms is often far from a literate series of statements about what a thing is or how it should be interpretted; that information is certainly encoded, but it is decoded as a deduction from what is said or not said in the axiomatic statement. (It is often the case that axioms will try to be relatively minimal, making as weak a statement as possible so that properties that are desireable can be encountered or deduced rather than enforced).
As such, our reading of the Zermelo-Fraenkel axioms will start from the axioms themselves, how one can read or interpret each axiom, and some basic consequences, in that order. I must emphasize that you should acknowledge this order and not take any part of it to do the job of a part that follows; the list of axioms themselves are, once again, intended to make strict statements that may not be immediately fully understood. One should not try to draw some philosophical notion of what axioms describe immediately from their list, or else you are likely to halt unnecessarily when the answer, the clearly stated description, lies just a paragraph or two below as it does in this case. Instead, axioms are to be read with a note taken of anything that does not immediately make sense to you (until such a time that you are an experienced mathematician) so that you may await the text to answer all such questions you may have. This is the process we must eventually go through habitualy when encountering a new set of axioms. As we progress, we will develop an intuition for the kinds of concepts mathematical descriptions map on to, and our corresponding intuition prescriptions will get necessarily shorter.
Inherit the notion of a proposition or a condition (i.e. loosely in the sense of a family of types) from the previous section along with logical operations. Consider a set to be a collection of objects with a notion of equivalence between those objects and a propositional relation written $\in$ which is true when an object $x$ is in a set $A$ (and thus read $\in$ as 'in' or 'is an element of').
Then sets are defined to obey the following.
(Axiom of Extensionality) Two sets $A$ and $B$ are equal if they have the same elements. $$\begin{gather*} \big(\forall x, (x \in A) \Leftrightarrow (x \in B) \big) \Rightarrow A = B \end{gather*}$$
(Axiom of Pairing) For all objects $a$ and $b$ there exists a set ${a,b}$ that contains exactly and only $a$ and $b$. $$\begin{gather*} \forall a, \forall b, \exists A, \big( \forall x, (x \in A) \Leftrightarrow (x = a \vee x = b) \big) \end{gather*}$$
(Axiom of Comprehension) Let $P$ be a condition and $A$ be a set. There exists a set $B$ such that $x \in B$ if and only if $x \in A$ and $P(x)$. $$\begin{gather*} \forall P,\forall A, \exists B, \big(\forall x, (x \in B) \Leftrightarrow (x \in A) \wedge P(x) \big) \end{gather*}$$ Then we write $B$ using the set comprehension notation $$\begin{gather*} B = \{x \in A | P(x)\}. \end{gather*}$$
(Axiom of Power Set) For every set $A$ there exists a set $\mathcal{P}(A)$ of subsets of $A$. $$\begin{gather*} \forall A, \exists B, \forall C, \big( (\forall x, x\in C \Rightarrow x \in A) \Leftrightarrow C \in B \big) \end{gather*}$$ When $B \in \mathcal{P}(A)$ then we write $B \subseteq A$.
(Axiom of Union) Let $\mathcal{A}$ be a set which contains sets. There exists $A$ such that for all sets $X$ and all objects $x$, if $x \in X$ and $X \in \mathcal{A}$ then $x \in A$. $$\begin{gather*} \forall \mathcal{A}, \exists A, \forall X, \forall x, \big( (x \in X) \wedge (X \in \mathcal{A}) \big) \Leftrightarrow (x \in A) \end{gather*}$$ Then we write $A$ using the union notation, which is either $A = B \cup C$ when $\mathcal{A} = \{B, C\}$ or more generally $$\begin{gather*} A = \bigcup_{X \in \mathcal{A}} X. \end{gather*}$$
(Axiom of Replacement) Let $A$ be a set and $f$ be a function for which $f(x)$ is meaningful when $x \in A$. Then there exists a set $B = f(A) := \{f(x) | x \in A\}$, extending the set comprehension notation. To state this formally, let $P$ be the relation such that $P(x,y)$ is true when we say $y = f(x)$, and in this sense let $P$ define $f$. $$\begin{align*} \forall P, & \big( \forall x \forall y \forall z ( P(x,y) \wedge P(x,z) \Rightarrow y = z) \\[0em] & \Rightarrow \exists B (\forall y, y \in B \Leftrightarrow \exists x, P(x,y)\big) \end{align*}$$ We then call $B$ the image of $f$ or its range.
(Axiom of Foundation) Every set $A$ which is non-empty contains an element $B$ with which it is disjoint. $$\begin{gather*} \forall A, \big( (\exists a, a \in A) \Rightarrow \exists B, (B \in A) \wedge \neg \exists b, (b \in B \wedge b \in A) \big) \end{gather*}$$
(Axiom of Infinity) There exists a set containing infinitely many objects $$\begin{gather*} \forall A \exists S, \big( \{ x \in A| \bot\} \in S \wedge (\forall x, x \in S \Rightarrow x \cup \{x\} \in S)\big). \end{gather*}$$
Once again, we must emphasize a change in the way that we write propositions. That is, none of our objects (and thus none of our conditions or families of propositions) are typed anymore. Set theory does not make claim to 'where objects come from'. It instead assumes mostly through other mechanisms or as prescribed later by the mathematician, that there are objects. One might even say that set theory specifically exists in a perspective where all objects that could exist do exist already, which we then draw on. It is then our responsibility to pick and choose from these objects by appropriately setting conditions, propositions, etc. This, on its own, obviously creates problems such as the one we mentioned earlier, of Russel's paradox, but these axioms carefully skirt around those.
Our notions of propositions as families of types, involving a function from a type to the type of types, must then be modified. Our notion of a function in general must also be modified, as we will see when we discuss the axiom of replacement. It will still mostly make sense to think of our propositions as types though, and in particular the way we think about $\wedge,\vee,\neg, \forall, \exists$ as being $\times,+,(\to \bot), \Pi, \Sigma$, but our notions of the relationship to a type must change. For instance, we say $x \in A$ is a proposition. This means that we can also speak of the proposition $\neg (x \in A)$, which we write $x \notin A$. The analogous statement does not make sense in the context of types; if $a \colon A$, it is of type A, and it doesn't make sense to say otherwise because $a$ is constructed in accordance of the rules of the type $A$. Objects have no such loyalty in set theory.
Let us now take these axioms as they are and pay attention to what they do say rather than what they don't. While the axioms we have above are intended to describe roughly the intuition we mentioned about what a set is, we cannot take any of that intuition for granted; the whole point of axioms is to give a rigid foundation for those intuitions, smoothing them where we may intuit contradictory ideas. This also means that we must be on the lookout for which axioms (or deductions from them) tell us the basic things we expect to be true about sets.
First, the axiom of extensionality. Since the place where objects come from is not particularly discussed, it is unclear how or when we might speak about those objects being equal to one another, or when any notion of mathematical structure might arise. We will discuss that problem in general later, but this axiom solves that problem in the specific case of sets by saying that two sets are equal when they share members. Specifically, the reading of the formal form is 'if for all objects $x$, to say $x$ is a member of $A$ is the same as saying it is a member of $B$, then we are speaking of the same set'. We are to take from this that a set contains no information other than what objects it contains, disregarding notions of an order or a multiplicity (i.e. sets are not lists and they never contain objects twice) since they are irrelevant to equivalence/identity.
The axiom of pairing, in some sense, states that it is possible to form small sets of specifically chosen objects regardless of the nature of those objects. The reading of its formal form is 'for all objects $a$ and all objects $b$, there exists a set $A$ for which, for any object $x$ that we find to be an element of $A$, it must either be $x = a$ or $x = b$', thus implying the set contains only $a$ and $b$ alone. This also means we can, in one set, put a number and a letter, or a set and a set containing sets, etc. A set is not itself an object with type, and it does not discriminate on any basis whatsoever when accepting members.
The axiom of comprehension, in one framing, can be stated as 'sets are propositions' as mentioned earlier. That is, a set can be defined specifically to be containing the objects that satisfy a proposition or condition. In this way, the notation $\{x \in A \mid P(x) \}$ is to be read "the set of all elements $x$ in $A$ given that $x$ satisfies $P$" (and indeed this bar in the middle that we read 'given that' will appear in many other contexts). This comes with a specific asterisk however, relating to Russel's paradox. For instance, we cannot speak of a set containing all sets, or more specifically a set containing all sets which do not contain themself, lest we ask whether this set contains itself. Such a concept is in contradiction with the axiom of foundation anyways. This problem is solved by defining a set comprehension as requiring a base set from which to draw its elements. We then state the axiom as 'for all conditions $P$ and all sets $A$, there exists a set $B$ which contains the elements of $A$ which satisfy $P$'. That set $B$ must then be a subset of $A$ in accordance with the axiom of comprehension, a set that is a restriction on a larger set.
The axiom of power set extends this notion. We say next that it is not merely that for each set there exists sets containing fewer members where each satisfies a proposition, but in fact that every subset that could exist does exist, and populates a new set which we call the power set (denoted as a function $A \mapsto \mathcal{P}(A)$ where $\mathcal{P}(A)$ denotes the set of all subsets of $A$). A more literal example is given if we define the set $F = \{1,2,3,4\}$, where we can see its powerset is $$\begin{gather*}\mathcal{P}(F) = \left\{ \begin{matrix} \{1\},\{2\},\{3\},\{4\}, \\[0em] \{1,2\},\{2,3\},\{3,4\}, \\[0em] \{1,3\},\{2,4\},\{1,4\}, \\[0em] \{1,2,3\},\{2,3,4\}, \\[0em] \{1,3,4\},\{1,2,4\}, \\[0em] \{\},\{1,2,3,4\} \end{matrix} \right\} \end{gather*}$$ including both $F$ itself and every set that is $F$ without some of its elements. Consequently, every set comprehension, since it is a subset by condition of some $P$ (the members of the set that satisfy $P$), is also a member of the power set. This axiom is profoundly important for moving from the countable regime to the uncountable regime since it says that one can pack every subset into a single set. Although it is a bit of a stretch, one could in a sense blame this axiom, together with the axiom of infinity, for the continuity of real numbers. Going forward, we use $B \subseteq A$ to denote that $B$ is either a subset or equal to the set $A$. This is expressed formally within the statement of the powerset, in particular $\forall C$ (that is, for any set we propose as a subset) $\forall x$ we have that $x \in C$ implies $x \in A$, so all elements of $C$ are elements of $A$. We also use the symbol $\subset$ to speak of a proper subset where there exists some $x$ for which $x \in A$ but $x \notin B$, i.e. $B \subset A$ if $B \subseteq A$ and $A \neq B$.
The axiom of union can be thought of as a flattening in the way it is seen above. This, as with the previous axiom, the axiom of power set, are the first times we have used a caligraphic script to denote something higher. That is, in type theory it was common to write $\mathbb{N}$ to denote the set of natural numbers, looking like a capital $N$, and a lower case $n \colon \mathbb{N}$ to denote a single number in the natural numbers. We have continued to use capital letters to denote sets and lower case letters to denote the things inside the sets, as this connotation (much in the same way as in natural language) allows us to reduce certain confusions. But if we think of capital letters as containing things, then how should we denote the containers of containers? This is one use for calligraphic capital letters, as we discuss more closer to the end of the section. In this instance, we describe a set of sets $\mathcal{A}$ and how it can be flattened into just a set $A$, where our choice of letters of the alphabet connotes that they are similar objects. The more literal reading of the axiom is that for all sets $\mathcal{A}$ there exists a set $A$ with the property that for all objects $X$ and all objects $x$, you can say that $x\in X$ (implying that $X$ is a set) and that $X \in \mathcal{A}$ if and only if $x$ is a member of the set $A$. In effect, there exists a set which is $\mathcal{A}$ but flattened down a level of 'container'. We establish a notation for this (elaborated later in the chapter), the union notation as a binary operation $B \cup C$ and a big-operator notation $\bigcup_{X \in \mathcal{A}} X$. This union is also the same one that one thinks of in a Venn diagram as including the contents of both circles.
The axiom of replacement forces us to redefine what it is we meant by a function. In the previous section we defined functions by their relationship to programming and types, where they take an input and produce an output. This intuition holds, but since we do not have types, the strict manner by which we define a function is different. Now, a function is a condition $P$ on two objects $x,y$, which is true if $y$ is the output corresponding to the input $x$. As we see in the formal definition of the axiom of replacement, we also require of this condition that the property of being an output is unique. In particular, if $P(x,y)$ and $P(x,z)$ then we expect that $y=z$, which we expect since for each input, a function should only have one output; we stated this previously when we said that functions are defined as a concept almost exclusively by their consistency. This is now brought to primacy; since the concept of types are gone for the moment, it seems as though functions need not even assign an output to every input, since the inputs we consider are now unindexed and undescribable (in fact the set $A$ we mention is nowhere to be found in the formal statement!); all that is left to say about a function is that it is consistent. The axiom of replacement then says, fundamentally, that if we can define a function $f$, then we have a set of all outputs which do exist. This makes it meaningful to speak of transformations of sets and allows us to extend set comprehension notation so long as we can define a function and input set.
And still it will be valuable to give some structure to functions. In fact, the structure that is appropriate for functions in mathematics general does not necessarily line up with what formal set theory prepares for us here. So we will have to overrule these axioms in some sense, and define functions in the following way.
A function $f$ amongst sets is a condition $P$ on two objects together with two sets $\text{Dom} \hspace{0.2em} f$ and $\text{Cod} \hspace{0.2em} f$ called the domain and codomain. These must satisfy the following: $$\begin{gather*} \forall x\forall y \forall z, P(x,y) \wedge P(x,z) \Rightarrow y = z \\[0em] \forall x, x \in \text{Dom} \hspace{0.2em} f \Leftrightarrow \exists y, P(x,y)\\[0em] f\big(\text{Dom} \hspace{0.2em} f\big) \subseteq \text{Cod} \hspace{0.2em} f \end{gather*}$$ read as "functions are consistent", "$f$ acts on $x$ if and only if it is in the domain" and "the image of a function is a subset of its codomain". When all these are true, we write $f(x) = y$ when $P(x,y)$ and $$\begin{gather*} f \colon \text{Dom} \hspace{0.2em} f \to \text{Cod} \hspace{0.2em} f. \end{gather*}$$
Importantly a codomain is very strictly not the whole image of the function (i.e. there may exist items $y$ in the codomain for which no input $x$ can produce $f(x)=y$), codomains are the same as if not larger than the image set. The ideas of domains and codomains allow us to, in many contexts, recover a semblance of our notion of typed inputs and outputs; this is because the sets we speak of will usually be identified not by items but by what rules their elements obey as discussed earlier. We will return to this thread later.
The axiom of foundation seems largely exists to iron out potential problems. It implies that sets do not contain themselves, that there is no infinite tower of sets containing sets, and that sets cannot mutually contain each other. The prerequisite $\exists a, a \in A$ is merely to say that $A$ is a non-empty set (i.e. there exists an object contained in $A$); we can also say this by defining the notation $\emptyset = \{x | \bot\}$ which represents a set with no members and saying $A \neq \emptyset$. Once established, we can also reformulate other parts of the statement; using a set comprehension, the statement $b \in B \wedge b \in A$ discusses the members shared by both sets, and this is called the intersection and denoted $A\cap B = \{x \in A | x \in B\}$. The statement then becomes the following $$\begin{gather*} \forall A, A \neq \emptyset \Rightarrow \exists B, B \in A \wedge A \cap B = \emptyset \end{gather*}$$ or "every set which is not the empty set contains an object it is disjoint with", disjointness referring to sharing no members. Personally this is not an axiom I make the slightest attempt to remember.
Finally, the axiom of infinity can be read as an embedding of the Peano axioms (the natural numbers we spoke of in the previous chapter) into the context of sets. That is, it defines a zero $\mathbf{Z} = \emptyset$ and a succession $\mathbf{S}(n) = n \cup \{n\}$ so that numbers become $2 = \{\{\},\{\{\}\}\}$ and consequently a set of all natural numbers made in this way. I personally find this to be rather contrived (indeed the nesting of empty sets is simply to give each one some uniqueness since they are no longer equal by the axiom of extensionality). But this axiom is valuable nontheless, since by the axiom of replacement, we can construct functions from this set of natural numbers, thus establishing that even infinite sets of objects may exist in a single set. As you may be much more familiar with the consequences of this axiom than the axiom itself, this is one you can forget as we move forward.
Together, these axioms, the Zermelo-Fraenkel axioms, establish formally what it is that we expect sets to do and what we do not expect them to do. They can be infinite, they can be empty, they can be finite, they can be flattened or combined, they can be propositions (in a rough sense as by a subset), they can be the outputs of functions, they contain no data other than their memberships (such as ordering or type), and they can not contain themselves.
The fact that they contain no data other than their memberships and are at times representative of propositions however is somewhat telling to me personally however; the combination of set comprehension notation and the notation used (commonly) in the axiom of replacement, appearing in the format of an unrestricted set comprehension indicates why this might have been tempting to mathematicians. And containing no other data than membership is the other side of this coin; if a set is a condition, then discussing $A = \{x \mid P\}$ is to translate $P(x)$ is true into $x \in A$ is true. One may ask then, why involve sets at all? Especially when restrictions must be placed on them to avoid contradictions, when we could simply speak in terms of propositions? I would posit, very much as an opinion which I hope will give your thoughts structure on the matter, that a set is a device for simultaneously for mentally disregarding all $x$ which are not $x \in A$ and for indexing propositions. That is, to say $P(x)$ is true is still to consider the counterfactual that it might be false. But when $P$ is a strong enough set of conditions providing rewrite rules such as those describing the real numbers in the following chapter, our mental focus is shifted from the possibility of $P(x)$ being false to treating $A$ as an almost type-like object, with its elements obeying rewrite rules which place it in an appropriate mental category. Moreover, it can be intimidating and frustrating to think about 'all possible conditional statements on a set of objects' and yet in the framing of sets, we have the much more intuitive 'set of all subsets' notion in the power set, an object which we be much more important to us later.
However using sets as the basis of much of mathematical reasoning imparts on us to job of reconstructing certain familiar notions within its rules. We can of course still do this, as seen in the example of the analogy to the product type, the cartesian product set.
Let an ordered pair $(a,b)$ be a set $\{\{a\},\{a,b\}\}$ in order to identify the first item as the one that applies in both member sets and the second item as the one that applies only once. Then there exists an operation $\times$ called the Cartesian product that takes two sets $A,B$ and uniquely defines the set of their pairs $$\begin{gather*} A \times B = \left\{ (a,b) \mid a \in A \wedge b \in B \right\}. \end{gather*}$$
Note first that the set comprehension notation we use in the theorem statement is not a true set comprehension, and can not be since it does not define a set which it draws elements from. The goal of this proof will be to show that there is such a set to draw elements from that makes a Cartesian product valid under the Zermelo-Fraenkel axioms.
Without loss of generality, proceed under the assumption that $A$ and $B$ are non-empty; if they are, this discussion is simply moot in such a case since the cartesian product is the empty set. We are ensured that an ordered pair $(a,b)$ with $a \in A$ and $b \in B$ in the way we have defined it exists, since we may construct $\{a,b\}$ by the axiom of pairing and then restrict it down to $\{a\}$ via axiom of comprehension $\left\{ x \in \{a,b\}\mid x = a\right\}$. From there, we may apply the axiom of pairing again to obtain the ordered pair $(a,b) = \{\{a\},\{a,b\}\}$.
Now consider the following. The set $\{a,b\} \subseteq A \cup B$ where $A \cup B$ exists by the axiom of union (implicit invocation of axiom of pairing by setting $\mathcal{A}=\{A,B\}$ before the axiom of union $\cup \mathcal{A}$), and by the axiom of powerset, we have $\{a,b\} \in \mathcal{P}(A \cup B)$ and thus also $\{a\} \in \mathcal{P}(A \cup B)$. The ordered pair $\{\{a\},\{a,b\}\}$ produced by the axiom of pairing containing two such sets in $\mathcal{P}(A\cup B)$ is then $$\begin{gather*} \{\{a\},\{a,b\}\} \subseteq \mathcal{P}(A \cup B) \\[0em] \text{and thus} \\[0em] \{\{a\},\{a,b\}\} \in \mathcal{P}\big(\mathcal{P}(A \cup B) \big) \end{gather*}$$ Then this is our set which we draw from for our comprehension. Define the Cartesian product to be the set comprehension $$\begin{gather*} A \times B = \left\{ x \in \mathcal{P}\big( \mathcal{P}( A\cup B )\big)\mid \exists a\exists b, (a \in A) \wedge (b \in B) \wedge (x = (a,b))\right\} \end{gather*}$$ Since it contains all such pairs by construction, it is unique since any other set containing all such pairs will, by the axiom of extensionality, be the same set.
This example is far from archetypal with respect to rewrites (see the beginning of the following chapter where we discuss real numbers for a much more obvious one) but it is indeed an example. Our rewrites here are existential, since our axioms exist mostly to tell us what kinds of sets exist; the procedure of this proof is thus. First to case-split on $A$ and $B$ being empty sets or not (in a similar way that we pattern matched in the previous section), and if they are then we are done immediately. With the case of empty sets handled, we are ensured that $A$ and $B$ are not empty sets and thus there exists some $a$ and some $b$ such that $a \in A$ and $b \in B$; the rest of the proof serves to put these $a$ and $b$ into a pair by set comprehension. We have a context in which there is some $a \in A$ and $b \in B$, so we rewrite by axiom to existence of $\{a,b\}$, and we have a different axiom that rewrites to existence of $\{a\}$. Another axiom uses existence of these to rewrite to the existence of $\{\{a\}, \{a,b\}\}$, etc. etc. More important than the particular lack of symbolic clarity of how a rewrite is occurring here, is that during the proof we do not think too hard about whether or not a step of the proof aligns with our intuitions or why it might be true. In fact, we do not even use our intuitions when they tell us something is simply true, we rewrite using the axioms provided exclusively.
It will of course often be the case that we have developed intuitions for some field and may make some statements more informally, but this is only with the confidence that we have mastered the use of axioms behind those intuitions. When this occurs in a textbook or in lecture notes, it is often jokingly referred to as "proof by intimidation", in the sense that the lecturer/author is insisting 'I/we know this to be true or obvious, and if it is not obvious to you then go verify this fact as you see fit'. This will eventually become impossible to avoid, as our tower of abstractions will grow to a point where complete statement of why every single reasoning step (and indeed its sub-steps) would blow proofs up to enormous size, making their reasoning much harder to follow. This is one case where the mathematician's metaphysical fascination is so valuable; we are not logicians, and while logic is of near supreme value to us, understanding the nature of an object (through its defining axioms) will prove a much greater substitute for totalizing reasoning.
Returning to our foundational discussion, it is typical to append to the Zermelo-Fraenkel axioms the axiom of choice, forming the famous 'ZFC' foundations of mathematics. This is in some sense quite a meaningful alteration to what it is that a set does; in a sense, it may be stated as 'existence within a set is the same as accessability'. The axiom of comprehension for instance allows us to restrict a set by a condition, and from there we might even show that the set is non-empty, i.e. that something satisfies that condition. It is the axiom of choice however that says that when we can show that something satisfies that condition, we may take it and use it without any knowledge of the set's structure. We do not need to define a choice function, we will simply always have one.
For every family of non-empty sets there exists a choice function (in the sense that we defined a function in maththink.2) that selects an element of each set. $$\begin{align*} \forall \mathcal{A}, \big(\forall & X, X \in A \Rightarrow X \neq \emptyset \big) \Rightarrow \\[0em] &\exists (f \colon \mathcal{A} \to \cup \mathcal{A}), \forall X, X \in \mathcal{A} \Rightarrow f(X) \in X \end{align*}$$ or 'for all sets $\mathcal{A}$ which has all members which are not the empty-set, then there must exist a function from $\mathcal{A}$ to its union which outputs a choice of element from each set $X$'.
This axiom of choice is formulated differently from the one we saw for type theory due to the disagreements between the two theories of how exactly functions work, and using sets instead of types to contain members. Nonetheless, it performs a similar role to the law of excluded middle in the previous chapter. In constructive mathematics we had the problem that existential statements, rendered to us as dependent pairs, could only say that a thing existed by giving an example of it; consequently, we had difficulty deducing that a thing exists from surrounding information that it must surely exist (e.g. 'if not for all then a counter example exists' from De Morgan's laws). We have a similar problem here, that one may say $A \neq \emptyset$ to say that the set is not empty, and yet, knowing that it is not empty does not really tell you how you grab an element out of it. The axiom of choice says that there must be a function which does just that.
We are now in a position to extend our discussion on symbolic reasoning. That is, earlier, we discussed the notion of equality as related to which rewrites are valid, and considered the corseness of equality as a decision about whether an operation is truly equal to its result. In our description of the ZFC axioms, we also mentioned equality a number of times, but never specified what that equality meant exactly except when it meant equality of sets. This is complicated more, as mentioned earlier, by set theory being a theory built on top of a mathematician's intuitions; a theory where sets are able to contain any thing that a mathematician might want to talk about, without specifying where that thing came from, what it is, or how its equality works.
Consider the following example. If we interpret the construction we see in the axiom of infinity to be the natural numbers, then we can apply the axiom of extensionality and say that our zero has $\mathbf{Z} = \mathbf{Z}$ since $\mathbf{Z} = \emptyset$ and the axiom of extensionality says $\emptyset = \emptyset$ since they share all members (none). We can subsequently do this for every nested-set integer implied by the axiom of infinity since, while we may not have a notion of equality from elsewhere, we do have one for sets and objects which are merely sets in disguise as in the set-numbers implied by the axiom of infinity. But the moment we declare that the natural numbers we use are some other object which is not itself a set but are contained in a set via the axiom of replacement (e.g. say we define a function which takes axiom-of-infinity set-numbers to inductive natural numbers from type theory and use the axiom of replacement to define this as a 'set of natural numbers') we have to import our own notion of equality.
This is actually common. It is natural for definitions of many constructions to simply declare that a set of objects with the desired rewrite rules/equivalences exist and thus bring with it the necessary conditions for equality. Even in the previous section, this was why it was important for us to declare that inductive natural numbers are equal only when they are either both $\mathbf{Z}$ or both $\mathbf{S}$ acting on the same $n$; irrespective of our later equivalence defined by $\equiv$, we specified that objects such as constructors have such a quality as equivalence under same constructor and same arguments.
There will be times when our constructions provide unsatisfactory notions of equivalence however, where the notion of equivalence does not draw directly to any nice intuition of what an object is. In fact, it will often be the case that we actively abstain from referencing a broader system of constructions in order to make a theory more general, or a construction that itself defines its equivalencce rule based on another one which is not specified. This is the case in our set extensionality example; the moment our sets contain something other than sets themselves, we say two sets are equal when they have equal elements, but we cannot compare those elements without them having their own equality. This places us in something of a pickle which will require us to study equality more closely.
Let us study an example from type theory to ground ourselves. In most systems of constructive mathematics it is considered necessary to either derive from axioms or suppose explicitly what is called function extensionality. This axiom or rule says effectively that two functions are equal if and only if they both assign the same outputs to the same inputs. $$\begin{gather*} \forall (f,g \colon A \to B), f = g \Leftrightarrow \big( \forall (a \colon A), f(a) = g(a) \big) \end{gather*}$$ This is similar to our set theory extensionality, that said that sets also hold no data other than their membership. Consider though, what it would mean if we did not suppose function extensionality. Then it would be true that functions contain some data in addition to their map between inputs and outputs. Since functions are defined with no additional information in type theory, the only thing left to say that they are different when they share inputs and outputs is their symbol. To say that $f \neq g$ specifically because $f$ is written $f$ and not $g$.
This is a valuable proposition: since we explicitly claim no knowledge unless stated otherwise about members of sets, in effect, they have untold amounts of identifying data and in effect are defined exclusively by their symbols. This is precisely what we meant at the beginning of the section when we said that an abstract symbol $x$ could be anything and everything until restricted down. When we say 'let $A$ be a set and $a,b \in A$', there is potentially infinite information held behind the symbols $a$ and $b$; that potential for infinite information in fact does not go away when we say something like 'let $\mathbb{N}$ be a set containing a number $1$ and an operation $\mathbf{S}\colon n \mapsto n+1$'. It only goes away at the point that we add a natural number extensionality and say '$1_a = 1_b$ for every $1_x \in \mathbb{N}$ (i.e. there is only one one) and $\mathbf{S}(n) = \mathbf{S}(n)$ for every $n \in \mathbb{N}$ (i.e. there is only one succession)' that we have successfully papered over that additional information, rendering it irrelevant.
Rather than a fault, this is a power of the set theoretic lens which otherwise might seem needlessly ontologically cautious. In set theory we have the power to define or construct a set and give it a rule for what is equal or what is not equal a priori, a power that does not exist quite so simply in type theory. In fact it can be valuable at times to informally think of a set as coming with its own equivalence rule. We can, just as we papered over what could have been untold infinite information held in the objects that we write as $1$ or $2$, paper over information that we did know about. This gives us a brand new way to construct things.
Define the set $\mathbb{Z}$ to be the set $\mathbb{N} \times \mathbb{N}$ with the following rule: when a two pairs $(a,b),(x,y) \in \mathbb{N} \times \mathbb{N}$ satisfy $$\begin{gather*} a + y = x + b \end{gather*}$$ we say that they are equal. Then we call $\mathbb{Z}$ the set of integers. A natural number can become an integer via the inclusion $n \mapsto (n+1,1)$.
This is the standard construction of integers in mathematics, the various conversations we will shorly have about formally stating the equivalence rule put aside momentarily. If by counting numbers we mean numbers that are strictly greater than or equal to one (this is the standard in mathematics as well, not starting from zero) then we invent negative numbers in this way. From the perspective of already having negative numbers, one easily manipulates this equation to see that it is read $a - b = x - y$, but this is invalid on natural numbers since there is no result for $x-y$ when $y > x$. But if we speak only of positive numbers, then we avoid these problems and instead speak of larger numbers rather than smaller ones, moving $y$ and $b$ to their opposite sides.
The technique we have utilized to do this does not exist without significant effort in type theory; over there, constructors with some set of parameters are unique, and one cannot weaken this. So what they do is to define integers similarly to $\mathbb{N} + \mathbb{N}$ with a $l$ in-left constructor for negatives, a $r$ in-right constructor for positives, and a constructor $\mathbf{Z}$ for zero. To construct rational numbers which are much more obviously pairs, fractions such as $1/1$ and $2/2$ are equal, and so they must put in substantial work to reduce create a sigma type which is a pair of $\mathbb{Z} \times \mathbb{Z}$ along with a witness that it is in simplified form.
It may be valid to protest however, just as type theoretic constructions are too restrictive about equality, the above notion of equality is perhaps dangerously forgiving. Surely we are not saying that all pairs of natural numbers $(m,n)$ are literally always integers, right? Integers are supposed to be at the same standing as natural numbers, or real numbers. Enabling different operations of course, but still distinct kinds of objects themselves.
This is in part our expectations about types creeping in, but these are not exactly bad expectations to have. The set theorists, prior to a similar popularity of type theory as we enjoy today, saw fit to try to formalize what we have done above. Thus we have the formal notion of a partition, equivalence relation, and equivalence class. In such a way, we can say that what we did above was not true equality (and thus not overwriting the properties of pairs nor natural numbers) but a different notion of equivalence, thus indicating the existence of a different set of which objects obey that equivalence natively.
To explore this, we will need to formally investigate some terms that were mentioned briefly in the previous section when we spoke about equivalence.
An equivalence relation can be considered to be a set of pairs $A$. When a pair $(a,b) \in A$ then we write $a \sim b$ and either say that $a$ is similar to $b$ or that $a$ is equivalent to $b$ under $\sim$. Let $B$ be the set of all items that $\sim$ relates to one another (we might write $B = \cup\cup A$ if using the notion of pairs defined in the cartesian product definition). Then in order to be an equivalence relation, $\sim$ must satisfy the following:
(Reflexivity) For all $a \in B$, we have $a \sim a$ since a thing must be equal to itself
(Symmetry) For all $a,b \in B$, if we have $a \sim b$ then we have $b \sim a$ since equality (or similarity) is not an ordered property; i.e. $(a,b) \in A \iff (b,a) \in A$
(Transitivity) For all $a,b,c \in B$, if we have $a \sim b$ and we have $b \sim c$ then we have $a \sim c$, since if $b$ is the same as both $a$ and $c$ then they must be the same as one another as well.
These three properties, capturing what we expect of the notion of 'sameness', formalize our notion of what it means for something to describe a kind of equivalence.
It is also possible to define relations in which one of these assumptions fail, the most important example being ordering. That is, an ordering relation $>$ is not symmetric, since $b > a$ does not imply $a > b$ and quite frankly it is important that it does not for it to be an ordering. However it does obey transitivity, since $c > b$ and $b > a$ imply $c > a$. There are some cases even where it is desireable for reflexivity to fail, such as in computer's floating point numbers: when you divide by zero in floating point numbers you get NaN, and it is not necessarily meaningful to say that two 'not-a-number's are equal to each other just because they are both not numbers, however the rest of the time reflexivity may work perfectly fine. If one speaks of the set of options one may play in rock-paper-scissors then there is a relation of 'who wins' which is not reflexive, symmetric or even transitive, since rock beats scissors and scissors beats paper but rock loses to paper, so it fails to appear like an ordering because the ordering forms a cycle.
This notion of similarity translates the condition of set membership into a different condition; we can now describe integers by speaking of a set $$\begin{gather*} [-3] = \{(a,b) \in \mathbb{N} \times \mathbb{N} \mid (a,b) \sim (1,4)\} \end{gather*}$$ which is an equivalence class of pairs which would be treated as equal to $-3$ under our desired equivalence relation. Instead of resorting to simply saying that pairs are equal in order to form the integers, we have instead done this: we have defined a similarity relation, $(a,b) \sim (x,y)$ if $a + y = x + b$ in the sense of natural numbers, and then created a set of all pairs in $\mathbb{N} \times \mathbb{N}$ which also obey the condition that they are similar to $(1,4)$, one of many representatives of $-3$, via set comprehension. Since we restrict on the set of all pairs in $\mathbb{N} \times \mathbb{N}$, we have then captured all pairs which our similarity relation says represent $-3$, and thus we have the set of all $-3$ representatives. What we may choose to do, mostly as an exercise in semantics and making ourselves comfortable with our own tools, is to say that this set is the integer $-3$.
If we say this, then any other object that wants to be $-3$ will also need to have all representatives of $-3$ under the relation $\sim$ as members, and then will be equal via set extensionality. In such a way, we can also create a set for each other class which is unique under $\sim$, the $-4$, the $+5$, etc. each with their own set $[-4] = \{(a,b) \in \mathbb{N} \times \mathbb{N} \mid (a,b) \sim (1,5)\}$ or $[+5]=\{(a,b) \in \mathbb{N} \times \mathbb{N} \mid (a,b) \sim (6,1)\}$ and populate an entire set of integers in such a way. $$\begin{gather*} \mathbb{Z} = \Big\{ \big\{ (a,b) \in \mathbb{N} \times \mathbb{N} \hspace{0.2em} \big| \hspace{0.2em} (a,b) \sim (c,d) \big\} \in \mathbb{N} \times \mathbb{N} \hspace{0.2em} \Big| \hspace{0.2em} (c,d) \in \mathbb{N} \times \mathbb{N} \Big\} \end{gather*}$$ This structure is called a partition on $\mathbb{N} \times \mathbb{N}$, and is defined by the property that it is a set of sets where every member of $\mathbb{N} \times\mathbb{N}$ goes into precisely one set; so $(1,1)$ goes into the set we designate $[0]$, which contains $(1,1)$ but also $(5,5)$ and $(23,23)$, and yet no other $[n]$ will contain $(1,1)$.
This makes precise our notion of papering over additional information, and is generally referred to as a set quotient. That is, in the definition we have given above, we would write $\mathbb{Z} = (\mathbb{N} \times \mathbb{N})/\sim$ as though we had divided by our similarity relation.
The reason I have chosen not to describe these partitions and equivalence classes in a more precise manner however (breaking out the green boxes as it were) is because, at least for the objects we have described so far, how much of these constructions matter when discussing the integers is a matter of perspective. For instance, we have convinced ourselves with the above constructions that it is certainly valid to speak of integers, since we may construct a set with their desired properties and transplant any desired computations we want integers to do as required (e.g. $[-3] - [+2]$ meaning some $(a,a+3)$ and $(b+2,b)$ and we take the result to be the equivalence class containing $(a+b,a+b + 5)$). And yet, it is usually the case that we simply take the integers to exist, having been generally convinced that they can exist, in which case $-3$ and $+2$ are themselves distinct objects with no internal structure, least of all sets of pairs.
These concepts, of similarity relations, partitions, equivalence classes, find their formal use not in constructing objects we are already familiar with but in constructing new objects for which we do not have existing intuitions. Indeed, we will next see them when we discuss point-set topology; when applied to patches of the real number line, these quotients can take any notions of closeness defined for some space and say that 'we say that these two points are similar' thus creating a sort of portal or gluing the two points together in space, creating loops.
By now, we have discussed most of the concepts which mathematical reasoning implicitly expects awareness of but does not discuss formally. More examples of how axioms construct concepts will readily become apparent in the following chapter which will provide ample examples. There are however a few loose ends to cover.
First, we must return to our notion of functions, or set morphisms as we described them. Amongst all the discussion of how sets replace our notion of types, and how our objects are now made via restriction rather than construction, it is perhaps difficult to imagine that in many circumstances sets will not be so bad a replacement for types. The notion of a domain and a codomain for functions, a set from which they draw their inputs and a set from which they assign outputs respectively, will usually and desirably be sets defined by structure that makes them nearly tantamount to types themselves. From there, the following concepts will become relevant to us.
Let $A$, $B$ and $C$ be sets, and $f\colon A \to B$ be a function $f$ with domain $A$ and codomain $B$, likewise $g\colon B \to C$. Then:
we call $f$ injective or one-to-one if for all $x_1,x_2 \in A$ we have $f(x_1) = f(x_2)$ if and only if $x_1 = x_2$. That is, $f$ is injective if every unique input has a unique output. An injective function is invertible but only on the subset of its codomain which is its image.
we call $f$ surjective or onto if for all $y \in B$ there exists some $x \in A$ such that $f(x) = y$. That is, $f$ is surjective if we can find an input for every possible output, or that the set of $f$'s outputs fills the codomain. A surjective function can find an inverse for every output but there may be more than one inverse, in which case we describe the function as many-to-one.
as mentioned in the previous section, the $\circ$ operation takes two functions and composes them so that the output of one is the input of the other. When two functions $f$ and $g$ have the property that the domain of one is the codomain of the other ($B$ in this case), then we can compose them and write $g \circ f \colon A \to C$. For any $x \in A$ it is defined $g \circ f(x) = g(f(x))$. We can also do this when the codomain of $f$ is a subset of the domain of $g$.
we call $f$ isomorphic or bijective if it is both injective and surjective. In this case, since we can find an input for every output, and that input is unique, we can establish a mapping from the output to the input, an inverse map which we call $f^{-1} \colon B \to A$ with domain and codomain reversed. It has the properties that $f \circ f^{-1} \colon B \to B$ and $f^{-1} \circ f \colon A \to A$ are identity functions, meaning they send every input to itself and change nothing. When an isomorphism exists between two sets $A$ and $B$ then we may write $A \cong B$ to mean that they are isomorphic.
Additionally, we write $f(A)$ to mean the image of a set, the set of its outputs; if $f$ is surjective then $f(A) = B$. Even when $f$ is not isomorphic and thus not invertible, we may at times speak of $f^{-1} (D)$ if $D$ is a subset of the image of $f$, to mean $$\begin{gather*} f^{-1}(D) = \left\{ x \in A \mid f(x) \in D \right\} \end{gather*}$$ which we call the preimage of $D$.
Sometimes when there exists an isomorphism between two sets, we may say that the sets themselves are isomorphic, and sometimes we may even then say that they represent the same underlying set and treat them as equal. This will be something to discuss later but treating sets as equal when they are isomorphic is a strong claim that will at times be appropriate and most of the time not be (I expect a full discussion of this to appear in chapter 3).
Now that we have formally discussed relation properties, we should also collect into one place the various rules that a binary operation can have, such as $\circ$, $+$ or $\div$.
Let $+ \colon A \times A \to A$ represent a binary operation, taking two members of $A$ to another member of $A$. Let $a,b,c \in A$. Then there are three rules that it may or may not have that are important to remember.
(Associativity) the operation is associative if $(a + b) + c = a + (b + c)$, so the order in which we calculate individual pairs does not matter
(Commutativity) the operation is commutative if $a + b = b + a$ so that individual pairs may be flipped without changing their value
(Identity) the operation has an identity if there is some $e \in A$ for which $e + a = a$ and $a + e = a$, that is, an object which does nothing to the other object when the operation acts. If it satisfies only $e + a = a$ then we call it a left-identity and if it satisfies only $a + e = a$ then we call it a right-identity.
Many binary operations we have seen so far obey these, but notably operations such as subtraction or division which are simply operations composed with an inverse are not. Here some are listed:
Addition $(+)$ is associative and commutative with identity zero.
Multiplication ($\times$ or $\cdot$) is associative and commutative with identity one.
Logical-And ($\wedge$) is associative and commutative with identity $\top$ the true proposition.
Logical-Or ($\vee$) is associative and commutative with identity $\bot$ the false proposition.
Set Union ($\cup$) is associative and commutative with identity $\emptyset$ the empty set.
Set Intersection ($\cap$) is associative and commutative, and when it exists in the context of some total-set $X$ (as is often the case when discussing subsets of a space) then it has identity $X$.
Function composition ($\circ$) is associative but not commutative, but has identity which is the identity function, the function that takes an input and returns it as output unchanged.
Cartesian Product ($\times$) is in practice associative since we are generally unconcerned with finer structure beyond the creation of tuples, so $(A \times B) \times C$ is treated as equal to $A \times (B \times C)$ since we only care that their members are triples $(a,b,c)$ with $a \in A$, $b\in B$ and $c \in C$. If we enforce that cartesian products only make pairs, i.e. $A \times (B\times C)$ has elements $(a , (b,c))$ then it is not associative. It is also not commutative since $A \times B$ has members $(a,b)$ but not members $(b,a)$. It does in a sense have an identity however, since any set $\{z\}$ with only one element will produce a set of pairs $(a,z) \in A \times \{z\}$ which is isomorphic to $A$ via $(a,z) \mapsto a$ and $a \mapsto (a,z)$, however this is the sort of thing that we care about contextually.
There also exists a notion of subtraction on sets, which we call set minus and denote by the backwards slash $\setminus$. The set $A \setminus B$ is then the set containing all elements which are in $A$ but not in $B$. Consequently you might write $$\begin{gather*} A \setminus B = \big\{ a \in A \hspace{0.2em}\big| \hspace{0.2em} a \notin B \big\}. \end{gather*}$$
Before moving on from this chapter, we should also emphasize some notations for writing functions which were mentioned multiple times but should be clarified together here in case details were missed. That is, in general mathematics, we write $f\colon A \to B$ to say that $f$ is a function with domain $A$ and codomain $B$, and we write it in this way specifically with the arrow $\to$. This arrow is distinct from the arrow starting with a bar $\mapsto$, which describes not the set of inputs and the set of outputs $A$ and $B$, but instead a particular pattern for how to evaluate a function. For instance, we might write $``a" \mapsto 1$ to describe that a function on letters specifically maps the letter $``a"$ to the number $1$, but we might write $x \mapsto x + 1$ to describe a function that maps any number $x$ to the number $x + 1$, and this is primarily contextual.
It is very important that you do not confuse these arrows $\to$ and $\mapsto$. We may at times write that instead of $f\colon A \to B$, it is $f\colon a \mapsto b$ where $b$ is some expression involving what to do with $a$; in this context, we are basically writing that $f$ is defined to be the function that does $f(a) = b$ without explicitly setting the domain and codomain, but we can write $f\colon a \mapsto b$ and $f\colon A \to B$ together as well. A function which does different things in different cases may be written with piecewise notation, $$\begin{gather*} f(x) = \left\{ \begin{matrix} x + 1, & x < 0\\[0em] x + 2, & x > 0\\[0em] 0 \end{matrix} \right. \end{gather*}$$ where in general we have a list of different things $f$ could do to $x$ and conditions beside them that say when to do these things. Sometimes the propositions will not describe every case of $x$, in which case an option should be provided at the bottom without a proposition, which is the fallback case that the function takes when no other case applies.
We must also discuss notations relating to sequential patterns, starting with big operator notation. Once in a while this notation makes rounds on the internet as 'just being a for loop' and half that time that description is quite apt. When we have a binary operator such as $+$ or $\times$ which is associative and commutative, it is meaningful to simply take a list of values and say 'simply apply the binary operator to all of them, in any order' since, as commutativity and associativity describe, ordering really does not matter. In this case, we may write $$\begin{gather*} \sum_{i = 1}^n f(i) = f(1) + f(2) + f(3) + ... + f(n) \end{gather*}$$ which we call summation notation. The rule with this pattern is that we have an expression on the right hand side of the capital greek sigma $\Sigma$ involving a variable $i$ which we declare below it (this can be some other variable such as $n$ or $k$ or $j$) and we add the expression repeatedly, going through each counting number starting with the number we set $i$ equal to, 1 in this case, upwards until we reach $n$. We can also as necessary start with a negative number, in which case it is understood that we are counting upwards in the integers. A matching notation exists for multiplication, called the product notation $$\begin{gather*} \prod_{i=1}^n f(i) = f(1) \times f(2) \times f(3) \times ... \times f(n) \end{gather*}$$ in which we use capital Pi $\Pi$ with the same system. In situations we discuss later, it may even become appropriate to write infinity in the top instead of $n$ to mean summing or multiplying the expression over all counting numbers.
This notation is a special case of a more general notation in which we do an operation over an entire set. That is, when we write $i=1$ on the bottom and $n$ on the top, the more general notation would be to write $A = \{1,2,3,...,n\}$ and then $$\begin{gather*} \sum_{i \in A} f(i) = f(1) + f(2) + f(3) + ... + f(n) \end{gather*}$$ meaning that we perform the operation over all members of the set $A$. At times also we can, when it is well enough communicated, write a short proposition on the bottom of a big-operator notation to describe the operation applying over values which are true. In such a fashion, we can rewrite the above as $$\begin{gather*} \sum_{1\le i \le n} f(i) = f(1) + f(2) + f(3) + ... + f(n) \end{gather*}$$ as long as we have some reasonable assumption that it is implied $i \in \mathbb{N}$. We can also stack these big operators together to do many more sums, for instance $$\begin{gather*} \sum_{i = 1}^n \sum_{j=1}^n i j = 1 \times 1 + 1 \times 2 + ... 1 \times n + 2 \times 1 + 2 \times 2 + ... + n \times n \end{gather*}$$ or it may instead be appropriate to put these values together in order to add an extra condition, such as in $$\begin{gather*} \sum_{1 \le i < j \le n} f(i,j) = f(1,2) + f(1,3) + ... + f(1,n) + f(2,3) + ... + f(n-1,n) \end{gather*}$$ in which the structure of the proposition only adds terms corresponding to pairs $(i,j)$ when $i$ is less than $j$.
We may circumstancially apply these big operators when the operation they describe is not associative or commutative, so long as we define exactly what we mean by this. For instance, while it will rub many mathematicians the wrong way, for a collection of indexed functions $f_1,f_2,..., f_n$ we can compose them all if as long as we say first that we mean composition to act in order from left to right or right to left. For instance, we would then write $$\begin{gather*} \bigcirc_{i=1}^n f_i = f_1 \circ f_2 \circ f_3 \circ ... \circ f_n \end{gather*}$$ if we are specifying that it starts leftmost with the first choice of $i$ and counts upwards to the right.
We must also mention the symbol $...$ itself. Obviously this is meant to describe the full extension of a pattern, but it will sometimes be used so frequently that it almost constitutes a symbol itself. As such I should specify the general practice with $...$, the ellipsis, that we generally begin the sequence with enough items to establish a pattern and then end it with a final term if indeed there is a final term. See above that in each time we have used it we have used two or three examples of the pattern before the ellipsis and then completed its final element. For illustrative purposes, it may at times be appropriate to write two or three examples at the end of the pattern, especially when we want to do something with one of those examples for the purposes of a mathematical manipulation. When we are pressed for space and the pattern has already been well established, only then will we write $f(1) + ... + f(n)$, always buffered on each side by the appropriate operation.
In a case where for some reason we are adding to infinity (a concept which we will need to strictly define later) then it may become appropriate to not write a final term $$\begin{gather*} \sum_{n = 1}^\infty f(n) = f(1) + f(2) + f(3) + ... \end{gather*}$$
These notations also extend to union and intersection, so a sequence of sets may have $$\begin{gather*} \bigcup_{i=1}^n A_i = A_1 \cup A_2 \cup A_3 \cup ... \cup A_n \\[0em] \bigcap_{1\le i < j \le n} A_{i,j} = A_{1,2} \cap A_{1,3} \cap ... \cap A_{1,n} \cap A_{2,3} \cap ... \cap A_{n-1,n} \end{gather*}$$
We will often reuse symbols such as $+$ or $\times$ or indeed our big operators $\Sigma$ or $\Pi$ to mean different things when we have defined other concepts which we consider analogous to addition or multiplication. For instance, while we have $\int$ to represent integrals in calculus, the same way integration can be thought of as a sum over an area, there is an analogous notion of a product over an area which is generally denoted with product notation for lack of a better symbol (when it is not denoted $\exp(\int \log f(x) dx)$ instead or something of the sort), or a large number of multiplications in the sense of group theory may also use big $\Pi$.
Finally, we have discussed context in the labelling of variables before. Mathematicians (especially in particular fields) tend to have specific symbols which they use to denote specific things as part of their language with one another. Accordingly, one tends to know that $f$ is a function or $x$ is a real number without any instruction. We list some of these here for your reference, and so you are not left out of these conventions, but in doing so we mention some things you may encounter elsewhere or later, so be sure to ignore anything you do not understand here until later.
$x$ can be used to represent a point in a space or an input to a function. Accordingly, we sometimes write a point in space as $\mathbf{x}$ or $\vec{x}$, etc. in which case it may have coordinates $x_1,x_2,x_3$, but in other circumstances (especially in two or three dimensions) these may be labeled $x,y,z$.
$y$ or $z$ are often used as an output for a function or as a spatial coordinate. It may also be used when $x$ is taken as a sort of second-preference $x$.
When $x,y,z$ are in use, our next set of preference letters are generally $u,v,w$. These may also be used to describe coordinates in some transformed space where $x,y,z$ are the inputs or base space, as is the case in UV-maps in 3d model skins; it is precisely because a 3d model's skin is a bizarre map from 2d to 3d that we consider the 3d space version of the skin to be $x,y,z$ and the virtual space in which the skin were flat as the $u,v$ coordinates.
$n$ is typically used to mean a counting number in $\mathbb{N}$ with $m$ used as a second preference for this
$i$, with $j$ and $k$ as second and third preferences are often used to index some indexed object. That is to say, if we have three things $x_1,x_2,x_3$, we often write $x_i$ or $x_j$ to mean 'one of our labels'.
$r$ is often used to describe a radius, but this may also mean that it describes a radius vector, and thus a position in space relative to a origin.
$f$, second preference $g$ and third preference $h$ are preferred letters to refer to functions
Capital letters may be used to describe things which we do not think of as numbers but as containing numbers or number-adjacent things. Accordingly, sets are generally written with capital letters such as $A,B,S,X$, and we often use the lower case version of the letter to denote a member of that set, so $A \times B$ will often be written with elements $(a,b)$, implying $a\in A$ and $b\in B$.
As we move on, we may encounter collections of objects which are collections of collections, and our first way of saying that something does not merely contain numbers or number-like things but contains things which themselves contain those, is to write it with a special script. This will often be a caligraphic script, and so we might write $\mathcal{A}$ to mean a set of sets for example.
$q$ may be found used in place of $x$ in many texts related to physics to describe position, however physics has many many more conventions and expectations about which letters mean what, such as $p$ for momentum, $V$ for kinetic energy, etc. and frankly too many to list here.
In general, it is often the preference when speaking about some collection of objects $G$ to call some member in it its lowercase letter, so we might say $g \in G$. When this letter is taken, or indeed we wish to speak about more than variable, the usual practice is to go up in the alphabet, often trying to exclude letters which are known to have strong connotations as a first preference to mean something else. so we would say $g,h,k \in G$, with $g$ and $h$ included since $g$ is the lowercase $G$ and $h$ comes after $k$, but since $i$ and $j$ are first and second preference letters for indicating index, it is convention to skip them and go to $k$.
We usually skip the letter $o$ to be used as a variable name since it looks too much like zero.
Often when there is not a good choice of letter in latin script, we use greek letters. Many greek letters have strong associations in mathematics for what they mean, for instance, the greek iota $\iota$ tends to mean some kind of inclusion, a way of sending a value to basically itself but as contained in a larger set. Our first choices for greek letters are generally $\psi,\phi$ called psi and phi, however it is often preference to use a variant writing of phi written $\varphi$ (I am a fan of this one). Mathematicians tend to have conventions for handwriting these greek letters especially when it comes to those which look a lot like latin characters (i.e. $\kappa$ and k) but you are better off asking one and receiving a jubilant ideosyncratic rant. Some mathematicians, when there is no good greek character left in a context, will resort to using cyrillic letters such as Zhe although my editor will crash if I try to include that here.
We double up the use of letters sometimes by marking them in special ways. Some common ones are $\bar{x}$ called 'x bar', $\hat{x}$ called 'x hat', $x^*$ or $x^\star$ which are generally referred to as 'x star', and $\tilde{x}$ called 'x tilde' or 'x squiggle'. Some other markings have special meanings; for instance $y'$ is often called 'y prime' and $\dot{y}$ is called 'y dot', however these markings both often refer to derivatives of $y$. Each of these markings when used on a letter that already means something in a context often means that the value assigned to that marked letter is strongly tied or related to the main letter but different somehow.
We use a special bold face script to denote certain canonical sets of numbers or number-like things which are important or well studied. For instance we have $\mathbb{N}$ for the natural numbers, $\mathbb{Z}$ for the integers, $\mathbb{Q}$ for the rationals and $\reals$ for the real numbers. There are also other sets which may be stranger such as $\mathbb{C}$ the complex numbers, $\mathbb{H}$ the quaternions, or $\mathbb{PR}^n$ the real projective space in dimension $n$. Some authors will generalize their theorems and say something like "for any field $\mathbb{F}$ ..." to mean that their theorems work on both $\reals$ and $\mathbb{C}$ or any other algebraic field satisfying the properties to make it behave like a similar number system; in that case $\mathbb{F}$ is but a variable for some set.
Again, many terms here are mentioned in advance of the time they show up so that this list may also act as something of a reference. In future, our preferences for what symbols we want to mean what will generally become apparent simply because we will use them a lot, so be sure to pay some attention to which letters we preferentially use as we develop our mathematical language.
If you are still reading, congratulations and thank you. You have made it through the hurdle of learning the rules of mathematical writing and most of the unwritten rules of mathematical thought.
In the following chapter, we will start doing some real math in all its symbolic glory, now that we have seen, hopefully, that symbolic manipulation is much less scary than it looks, and moreover mainly a proxy for intuitive although strict reasoning. We will develop that reasoning over the next chapter as we study the properties of real numbers and space as thought of as real numbers.
Thank you again for following along, and I hope this text proves useful and insightful to you.