Before we discuss mathematics properly, since I want this text to be approachable to people with only cursory mathematical experience, I think we should first specify what doing mathematics actually is. There are a few necessary parts of this: why mathematics is possible from its foundations; the structure of mathematics, i.e. what a mathematical discovery would look like if we found one; and finally the language of mathematics, i.e. the way mathematicians articulate our discoveries in order to make them useful. This chapter will be my attempt to discuss these philosophical foundations and motivations, which are usually and unfortunately left implicit by mathematicians; it is generally expected that being able to see these foundations is merely a prerequisite for a practical interest in mathematics. Instead of following this pattern, I will venture to make these prerequisites and expectations explicit, at the risk of putting on the for clown makeup a single chapter.
If you consider yourself sufficiently familiar with the mathematical format, you may wish to skip this chapter, however I think it may hold some value in its articulation of the mathematical pursuit regardless. If you are unfamiliar with the mathematical format, then you should know now that i.e. means "id est", latin for "that is" and generally used to mean "this can also be stated as", and we will be using it a lot.
First we must discuss "what is mathematics is at its core"? This is a fairly open question, but I think there are two parts of an answer that stand out as insightful: one about the nature of logic that mathematics is built on top of, and one of the mathematical tradition itself. As this project progresses, I hope you will come to see this pattern I will describe which is: that logic, almost alone, has the peculiar power to describe and characterize concepts on an intuitive level, through the network of their consequences and incompatibilities.
The classic example of this is of rain making the ground wet. If it has rained then you know that the ground is wet. If the ground is wet, that does not necessarily imply that it has rained. Instead, there could have been some intense humidity, or someone washed their car, or someone simply covered the ground with as much water as they could. If the ground has been wet for a while, then one might deduce that it could not be true that it is a hot dry day, or else the ground would have quickly dried. These statements are incompatible. If I had not told you that the property we were discussing was that "the ground is wet" then from the various things I told you could not be true, or things that would imply this unnamed property, you may even deduce without instruction that we are indeed discussing "is the ground is wet". In the course of 'doing' mathematics, one of the things we do is invent concepts that are unnamed at first, and by their implications, discover what they truly mean or might describe.
One might then say that the realm of the mathematician begins in the full scope of cognitive tools we employ to characterize these propositions. Consider, if we study a two dimensional plane where points have an $x$ and $y$ coordinate so that $(2,3)$ describes a point, I could pose the proposition to you that $y = 5 x + 2$, read as "the $y$ coordinate is equal to five times the $x$ coordinate plus two". Of course this proposition would be wrong in the vast majority of cases, such as $(2,3)$ (where $y = 5 x + 2 = 12$ by setting $x=2$, contradicting $y=3$), and neither does it have a 'solution', a point we can single out and speak of alone where it is true. However just as we spoke of a space of points in a two dimensional plane, we may perhaps speak of a sub-space where the proposition is true, and restrict our attention to this space. If we are to do that, we do not need to limit ourselves to discussing this with a mere proposition $y = 5 x + 2$. We can define a set of points where the proposition is true, which we would write as $\{(x,y) \in \reals^2 \hspace{1mm} | \hspace{1mm} y = 5 x + 2\}$, and then we could use the language of set theory to discuss which other sets it intersected with. We could also draw these points on to a two dimensional plane, obtaining a straight line that meets the $x$-axis at $(-0.4,0)$ and passes through at an angle of about 78.69 degrees. On the level of a set, when we show that the intersection of two such sets corresponding to propositions has an empty intersection, we can say that the propositions are mutually exclusive, i.e. if one then not the other. When we draw these graphs, we can check this merely by looking to see if the lines touch; this is a substantially easier way to make deductions about a different class of reasoning, graphs implying statements about set theory or equational propositions.
Mathematics is very much the study of logical abstractions. We study them by restricting them to special cases, seeing which other statements they cannot exist with or must exist with, just as in our example of the ground being wet. However we break from the discipline of logic, vitally, by actively concerning ourselves with the relations between these different settings of reasoning so that we can make deductions using multiple forms of analysis. For example, an equation is a proposition ("this is equal to that"), but as we saw above it is also a set and thus vulnerable to set theoretic attack, and also a graph which we may inspect visually. Just as we have these three lenses from which to study a logical statement, as mathematicians we are often concerned with inventing some fourth intuition which is equally different in its reasoning than those three are from one another. We ask, 'what new things could we learn if we were able to invent a new perspective, or think about this problem in an entirely new way?' and then we discover and formalize the rules of this new way of thinking and apply it.
One might blame this for the sense that mathematicians quickly devolve into speaking about alien concepts, but this is merely the other side of that coin: we are willing to adapt our thinking so drastically to find new ways of understanding. In this way and many others, mathematicians sit necessarily between physicists and logicians on the axis of 'rigor'. We are distinct from the logicians as we (usually imperceptibly) weaken this rigor to understand things better, and distinct from the physicists as we crave to resolve contradictions in our descriptions that we might finally 'know what we are talking about'.
To describe what one hopes to achieve from mathematics, it's perhaps easier to first describe what is achieved from a theory in abstract. When one selects an object of study, say for example, mammals, one may first put significant effort into defining the object, and then giving structure to subcategories in order to discover meaning from those substructures.
I am no biologist but naively and for the sake of argument, one may say that it is clear a mouse is a mammal, and clear that a lizard is not. One may then decide that a bird is not a mammal, and thus that mammals give live birth and do not have beaks. However many echidnas have beaks, and platypus have beaks as well as laying eggs. Despite this exception, they both still have only one jaw bone, do not have feathers, and lactate to feed their newborns, so we may broaden or adjust our conditions slightly and still call them mammals. With the boundaries around our study fixed, we may then notice that bears and dogs share a resemblance, and that dogs and cats share a similar body plan, constructing the order Carnivora and the two suborders Feliformia and Caniformia, containing cats and dogs together with bears respectively. Such a structure is necessary for one to argue for Speciation, that just as we have artificially bred kinds of dogs, that perhaps nature has over a long time bred some four legged animal into what became a proto-dog and proto-cat, and the former was later separated into dogs and bears. Now if you were responsible for medically treating one of these animals, you would have the information that the biology of a dog is more similar to a bear than to a cat. And you can conceive of a broader family-tree of life for which you may extend this utility or draw on further examples to better inform it.
This mode of thinking is actually not out of place to describe something such as Group Theory.
For the layman with great interest in mathematics, a naive attempt to learn about what group theory is may yield what a group is axiomatically. That is, a formal definition will tell you (if for no other reason than to illustrate how dense and unapproachable such a description is) that a group is a set $G$ closed under a multiplication rule such that for all $g,h \in G$, $(gh) \in G$, associative such that $(gh)k = g(hk)$ and with identity and inverses so there is some $e \in G$ with $e g = g$ doing nothing (i.e. like multiplying by one) for all $g \in G$ and an inverse $g^{-1} \in G$ for each $g \in G$ such that $g g^{-1} = e$. Or instead you might encounter an explanation grounded in intuition such as the ways in which a shape can be rotated or moved so that it looks the same, or algebraic examples such as the set of matrices with non-zero determinants. To use our previous example, this would be like noticing the existence of the category of mammals and stopping there. It is not clear what deeper understanding we have achieved by constructing a mere name, title, or category, and we have not discovered any particular similarity between groups nor do we have a notion of a family tree with which to discover other valuable patterns.
So let me describe how that works.
Once we have two groups, $G$ and $H$, we may construct new groups, such as the product group $G \times H$, whose members are pairs $(g,h)$ for each $g$ in $G$ and $h$ in $H$. If we have a multiplication operation between groups, can we reverse this? Is there a notion of group division? In fact there is. I must emphasize not to worry if you do not fully understand this example, it will be elaborated properly later:
First we find a subgroup $H$ to divide, that is, a subset of $G$ inheriting the multiplication operator, which cannot be escaped into $G$ by multiplying its elements (i.e. the subgroup is closed), and also contains all of its own inverses. A group is not always commutative, that is, we cannot say for certain that two elements $g$ and $h$ in $G$ satisfy $g h = h g$, i.e. multiplication cannot be reordered arbitrarily. However there may be subgroups whose elements do reorder with the rest of the group, i.e. a subgroup $Z$ in $G$ for which all elements $z$ satisfy $g z = z g$ for $g$ in $G$ even if this is not satisfied by other elements $h$ in $G$ (we would call $Z$ the 'center' of $G$). More generally, a subgroup $N$ may not necessarily have elements that each reorder with each element of $G$, but may reorder as a whole subgroup, so for example, for each $g$ in $G$, we have $g h = k g$ where $h \neq k$ most of the time but $h$ and $k$ are both in $N$. So long as this is true, we write $g N = N g$ to mean 'some element of $N$' in place of the element itself, i.e. for any $h$ in $N$, there will be some $k$ in $N$ such that $g h = k g$, and when this is true, we call $N$ a 'normal' subgroup. If we have $g N = N g$, then elements of $N$ may be in a sense reordered with the rest of the elements in $G$, and so we may define a new group $G/N$ of elements $g N$ for each $g$ in $G$. This 'quotient group' factors out the dynamics of the normal subgroup, since we have $(g N) (h N) = g (N h) N = g (h N) N = (g h) N$ for elements $g$ and $h$ in $G$, and if $h$ should also be a member of the subgroup $N$, then we have $g (h N) = g N$, meaning that any elements of $N$ are factored out.
Again, it is not important to understand this example in detail, in fact later we will discuss it properly, however the example does illustrate the point. We take a definition of a group and construct a group product, and a subgroup, then the conditions necessary for a factor group and thus a quotient group, using our intuitions about multiplication and division of numbers as a guide. If we restrict our attention to the finite groups, then we may ask whether or not this notion of divisibility admits a version of 'prime numbers' for groups, and indeed this is true. In fact, mathematicians have completed a de facto periodic table of finite groups, the great accomplishment known as the classification of finite simple groups. In fact it has been proved from underlying axioms that all possible finite groups are accounted for, and that it is impossible for a group to exist which does not classify into our periodic table.
From here, the intuition that groups describe "symmetries" of objects now has strong rules: the symmetries of objects decompose into now well known groups and we may say with certainty what the atomic components of these symmetries are as well as studying those atoms. Our theory has yielded an unfalsifiable taxonomy and practical implications for all systems with discrete symmetries.
Many fields in mathematics follow a similar pattern as laid out in the example about mammals which we then applied to group theory. The protocol, if you can even call it that, is thus: strictly define a core object, preferably abstractly so that later on you can apply the structure to anything that fits the bill (e.g. symmetries described by group theory); create sub-labels (e.g. subgroups, some subgroups are 'normal' or 'central') and characterize their properties to build an understanding of what these labels are or what they mean, possibly relating the structure to something more familiar (e.g. product groups and quotient groups as analogous to multiplication and division of numbers); and finally, substantiate the utility of this theory if you have not already, which we did by saying that 'groups describe symmetries', which is formalized in the study of 'group actions'.
With this pattern in mind, we are able to collect 'theories' or academic sub-disciplines centered around a school of thought, each with a certain descriptive capability. With a large collection of theories each studying some construction that 'appears everywhere', we assemble an increasingly comprehensive framework that is able to provide already well studied models for new phenomena that we discover. For instance, once the differential equation is well studied, the experimental evidence that not a magnetic field but a change in magnetic fields induces a current, allows the articulation of Faraday's law. Or we could apply our models retroactively to draw new conclusions: man has always known that fluid flows and perhaps at times even understood many things about its flow, but only with an understanding of vector calculus and continuum mechanics did it become reasonable to describe Navier-Stokes equations, and only much later did we discover a framework by which to use this model computationally (e.g. finite volume methods) when we could not solve the equations analytically. So to widen this arsenal of theories is to widen our foreknowledge; the physicist need not try every experiment he can imagine, only a choice few and the rest can be tested at the blackboard since the proper mathematical framework is already explored.
With this in mind, I must add an essential disclaimer that will also characterize our motivations and the rules we follow in line with those motivations. We must consider the axis of mathematical 'purity'.
In fact we do call it Pure Mathematics to be concerned primarily with abstractions rather than their consequences, and Applied Mathematics when we try to take our grand collection of theories and see if they make valuable predictions about, or tools for, manipulating the world. I would posit that this is a microcosm of the axis I spoke of earlier, placing mathematicians between logicians and physicists, with the pure mathematicians closer to the logicians and the applied mathematicians closer to the physicists, but with a particular asterisk. One must not think that the applied mathematicians, or the physicists for that matter, are being in some way illogical. Rather, the task of translating a setting we are concerned with studying into a mathematical framework requires reasoning outside of the pure mathematical constructions, using an understanding of the setting itself.
My favorite example is this: say we are studying the way that thermal energy disperses within a solid. That is, we have for example a long metal cylinder laying on a heating pad that keeps the bottom of the cylinder very hot while the top is cooler. This can be studied using the heat equation, which when solved, will provide for each point in a coordinate system, let us say $(x,y,z)$, with a temperature at a given time $t$. However, what happens if we lift up the cylinder and bend it? Suddenly we have a problem in our mode of analysis: is our coordinate system fixed in space, or fixed relative to the material? If it is fixed in space, our description will surely become inaccurate, because by bending one end of the cylinder down, we have moved material that was further from the heating pad toward the bottom, where hotter material would be if the cylinder had not been bent. If we remain in fixed spatial coordinates, we would go to that location and instead find material colder than it should be.
Although it seems obvious to us, as creatures that live in this world and have become familiar with its rules, this constitutes a very serious alteration to our mathematical model. We know that temperature is not a property of space, it is a property of material, and a hot piece of material retains its heat when moved. Our mathematical model does not know this unless it is told, and to do that, we must construct an alternative coordinate system that can track all pieces of material based on their locations at some fixed time 'before' they were moved (in fact this is called Lagrangian coordinates). Then the $(x,y,z)$ we speak of would not refer to a position at a given time, but only to the position that a piece of material has at time zero, prior to any deformations. It must be emphasized that we decide to use a different coordinate system in this way on the basis of fundamentally qualitative reasoning; there is no formal deduction we can make that tells us 'if material property then material coordinates' other than the knowledge we have that this is the mathematical description of such a system. When we do this, we are necessarily engaging in Natural Philosophy, and this is not something we can escape as formal reasoning about the world must be founded upon knowledge of the world.
One may argue then that pure mathematicians in fact have it easier than applied mathematicians: everything relevant to a pure mathematician's problem is written down for him clearly. He needs only be a puzzle solver. By contrast, a good applied mathematician must be a Natural Philosopher of his object of study and make qualitative judgements. And this makes him vulnerable, in a way that the pure mathematician is not, to having reasoned about his study incorrectly or formalizing a statement that is in fact not true. (The pure mathematician, when told the model he is studying does not reflect reality, will laugh and say something to the effect of 'I'm sure it models something else then' or 'I'm still curious about this system' and resume their study without regard.)
It is under this mode of operation that we say that every mathematical theory of a phenomena is a formalization of a philosophy of that phenomena. This distinction is an important one, as it is clear that not every philosophy of phenomena is 'true', but it may well be convincing in a way where it is at least logically self consistent. This creates a problem that mathematicians themselves prefer not to be responsible for, and perhaps rightly so, which is that it is entirely possible to create a self consistent mathematical description of phenomena which has little relation to the actual behaviour of the real phenomena. This is what we call 'a bad model'.
This is the double edged sword of the 'large collection of theories each studying a construction which appears everywhere' that pure mathematicians produce for us. Once the space of implications of different sets of presuppositions have been explored, we obtain a relationship between presuppositions and implications. This means that to select the things we suppose are true about a phenomena is also to select the implications; we are no longer choosing merely the 'starting axioms' but in fact discounting entire branches of thought since we know which axioms lead to those branches. We see problems such as this in cosmology and theoretical physics: it is entirely possible to restrict our attention to models of reality that aren't clearly wrong without selecting a model that is particularly right. If you have a little bit of mathematical training, you may have even seen this yourself when people propose linear models and draw overly simple conclusions.
A different version of this accusation could also be levied at category theory (of which I have reasons to be fond) which finds ways to express things we knew in other theories all within one theory, and then is sometimes granted some unifying ownership over all other theories when all it did was provide a language with no implications of its own. Such a pattern may seem impressive if you are unfamiliar with the subfields such a theory claims to unify, but with awareness it becomes clear that the 'unifying theory' predicts nothing (although I should emphasize category theory is much less guilty of this than other theories which I will not name here).
The broader argument I am making is also true in reverse. The failure of a school of thought to make their descriptions quantitative/algebraic is not a signal that their study cannot be understood or is not sensible, just in the same way that the mathematical self consistency of a model does not make it true. In fact it is a failure mode we are well aware of that we might discover we are "using the wrong model" when a new phenomena contradicting our math arises. The mathematical lens of analysis is one that requires unfalsiable rules as prerequisite for any meaningful deductive implication. Consequently, as the complexity of a system increases, it becomes increasingly inappropriate to apply such a lens; the tools of probability and statistics can be found fit to analyze such systems regardless, however these tools carry their own intense and deep epistemological complexities that are rarely understood without particular mathematical expertise, and famously used to mislead.
What this amounts to is that one must know there are two distinct jobs of mathematicians each with separate responsibilities and different failure modes: the puzzle solvers who do mathematics and the natural philosophers who use mathematics. For the former, we have a game that, once understood, offers a rich and fascinating landscape of closed-ended problems to solve. It is an unfortunate but necessary (and in fact extremely gratifying) responsibility of the latter, the applied mathematicians, to know the object of their study deeply, to have the knowledge-that-is-power over nature first before expecting the symbols of math to yield anything of value.
Two significant things must be said about the language of mathematics. First and foremost is to notice that it is a language. Some of this is due purely to the things that we learn in mathematics about how familiar concepts are broken down, or descriptions of them are seen in many forms. The most famous example of this is of mathematicians learning to see $x^2 + y^2$ or anything of the sort as implying some kind of circle or distance, and the mere mention of $\pi$ as implying there is a circle or sphere hiding somewhere. Of course, the plurality of such abstractions that mathematicians must learn is much greater and much more abstract than this.
But to notice that mathematics is a language brings a set of expectations about exactly how it must be learned. People often speak of mathematics as being for 'people smart enough to follow deep abstractions', but this is in my opinion a misattribution. When learning a language, it is infeasible to treat the language as merely a set of translations that must be done on each individual word; french is not merely english with all of the words spelled differently, it has a different grammar, and there are things that are easier to say in it than in english. Moreover, and more importantly, while it may often be possible between languages to find or assert literal translations between words, this will surely fail you at the point that you attempt to string these translated words into a sentence and derive meaning from that sentence.
Mathematics is the same. In order to understand mathematics, you must go further than merely taking new terms as 'meaning a thing' and translating it as necessary, but allowing these new terms to become first class in your mind, along with all the connotations and deeper meanings implicit in the term. That is to say, you must at some point train yourself to stop reading sentences by translating them into english and merely allow yourself to understand them as they are spoken; in particular, the emphasis is that you must train yourself. This is a process of cognitive adaptation that occurs naturally with some effort over an extended period of time, and one's ability to achieve this adaptation says little about some notion of abstract intelligence. You are almost never 'too dumb for math', but one does not learn a language in a day.
Secondly, we must speak about the grammar of mathematics, and the writing forms appropriate to it. Mathematics remains remarkably successful at cultivating an image of mechanical logic, regardless of the disclaimers I make in the previous subsection, and this is not without reason. There is a strong argument that mathematics is, before everything else, a literary tradition (coinciding with it as a language), and it has its cultural and linguistic features of this literary tradition to thank for its achievements. Indeed, in the next section, we will discuss how the language of mathematics mostly maps on to a structure robust enough that it can be understood and checked by a computer; that the structure of this tradition lends itself to such a standard, despite the appropriate asterisks, is no small feat.
In order to participate in this literary tradition, we must spend some time (at the risk of becoming a bit drier) discussing the literary structure and features of a mathematical text.
In mathematical literature, there exist much more formal rules for certain features of writing structure. That is, where an english teacher might tell you an essay comes in discrete units of introductory and distinct argumentative paragraphs, mathematics comes in units of text that are distinct much moreso than in other forms of writing, even to the detriment of typical notions of readability. The reason for this is straight forward once you are familiar with the true goals of mathematical literature: you might say that mathematical writing is in fact intended as almost computer code, (although I would argue the genealogy is reversed given the comparative ages of the two disciplines). When, as both a programmer and one with mathematical literacy, one inspects a terse math book, one immediately sees the telltale pattern of blocks of code, functions, class or typeclass instantiations, each interspersed with comments and documentation. This relationship is made somewhat obvious in the school of literate programming.
This is the error in trying to 'read' a math textbook in the way that one reads any other piece of literature. In fact, many math books are published with the explicit intention of being formal references much in the same way that one writes a package providing high level tools for other developers; e.g. one does not rewrite the Windows API each time one writes an application in Windows that opens a page on the screen, there are sophisticated tools written to do this already that are reused and maintained separately. Similarly one does not always strictly define every term and theorem before publishing a math paper, there is a corpus of written constructions that one can refer to even by name as necessary. The flip side of this is that one does not naively read a math book unless it is explicitly written to be forgiving. Some authors move away from this tradition in certain books which are intended explicitly to introduce a subject, however the consequence of this is that the book then makes a terrible reference after having educated the reader; a problem made worse when you consider that the book may have slight differences in implementation of ideas, and so is not in perfect agreement with other books which are intended as a reference.
My intention with this text is to take the best of both literary styles, one that is intensely descriptive and which denominates strict statements side by side.
Some of the description of mathematical literature must be postponed until our discussion of propositional logic, as mathematics inherits a lot from logic. More still must be kept in a separate glossary of mathematical grammar patterns, since there are more than can fit nicely in a narrativized exposition.
There are three core types of formal blocks in mathematical writing, and a few other less formal blocks which are nonetheless made distinct. It is common to either name or number these blocks since they must also be referred to formally.
Definitions: These define new mathematical constructions, such as labels for functions, or conditions under which we classify certain constructions. These often come in the form of an object or collection of objects, often along with a proposition about that object, and acts as a rule for constructing an object in a formal sense which will be discussed more in its own chapter. We may also include in definitions the rules for how we write these constructions and what notation exists around them here or in other literature you may find on them. To set up a definition, it is common for us to set up a circumstance in which the definition may apply, saying "Let $f\colon A \to B$ be a $X$ and ... then we call $f$ and $Y$ together a $Z$ if ..." and $Z$ is the new label we are creating which has no strict meaning other than what we give it here. The use of 'let' should usually be taken to mean something like "should we find ourselves in a circumstance where..." for the purposes of definitions. Definitions (and theorems for that matter) often employ a grammatical rule in which we may use parentheses following certain terms to duplicate a statement into another statement that says something near identical, ideally with the word 'respectively' included or abbreviated to indicate what we are doing, but this is not always the case. An example of this would be "when a number satisfies $x > 0$ ($x < 0$) we say that it is positive (negative)".
It is informally important to give defined concepts good names, so that people can may develop appropriate associations and intuitions; we may encounter names that seem unintuitive but in these cases it is important to respect and understand historical associations built up by the literary tradition.
Theorems: These define rules under which one proposition might imply another, or imply each other thereby becoming equivalent statements. They are, in a sense, the functions of mathematical propositions, and they also adopt the language of setting up circumstances and conditions for their application as in definitions. That is, while the exemplary theorem may be written "if $X$ then $Y$", in practice they are often written "Let $X$ be true. Then $Y$ is true".
There are a few logical operations or quantifiers such as "for all" and "there exists" which have particular ways or sets of ways to refer to them which may seem at first ambiguous. We will discuss their formal meanings in our chapter on propositional logic, but it is important to know that the formal meanings referred to there are equally referred to by "for any" or "there is a". "For all $X$, we satisfy $Y$" can also be written in shorthand as $\forall X, Y$ or referred to as "choose some $X$. Then $Y$" implying that we could choose any other. "There exists $X$ such that $Y$" is written in shorthand as $\exists$, as well as $\exists !$ when we mean "there exists uniquely", i.e. if there exist $X$ and $Z$ such that $Y$ for both $X$ and $Z$ then $X = Z$. Logical implication is often written by a double-lined arrow $\Rightarrow$, and when this implication goes both ways, both $X \Rightarrow Y$ and $Y \Rightarrow X$, we say "$X$ if and only if $Y$" or write $X \Leftrightarrow Y$.
Since theorems may also show that two definitions are equivalent, they sometimes serve as the follow-through of a set of definitions, carrying the notation we will use when writing about the described concepts. Lastly, theorems are referred to by many names, each with connotations about the theorem being described. These are:
Theorem: a particularly important discovery either for its utility to us in the text we find it, or historically.
Lemma: a less significant insight which will be very useful for proving other theorems.
Proposition: a theorem of moderate importance, between that of a lemma and a theorem. Although "proposition" should refer to a statement that could be true in principle, the name is very often used simply to mean theorem when it punctuates a relevant insight. On rare occasions, one may write a proposition formally as a conjecture, provided without proof, and then disprove it later to make a point, however this practice is not common.
Corollary: a theorem that is obvious or especially easy to prove as a consequence of a theorem or proposition. The statement of a corollary is exemplified in the format "it follows simply from $X$ that $Y$".
Proof: These usually follow directly after theorems and show the process of deducing the implication of a theorem from its premises. It is common to describe these premises as the 'hypothesis' rather than the theorem itself, and so a proof will usually start with setting up the context under which the theorem would apply, e.g. "assume/presume by hypothesis that ...". Depending on the style of proof, we may first manipulate the proposition we want to prove into something that is equivalent on purely logical grounds, such as identifying multiple goals for proof (in the case of "if and only if"), producing a contrapositive statement or proof by contradiction. In the case of the latter, a proof might start with "Assume for the sake of contradiction that..." and then later conclude with "...violating our hypothesis". When the statement we are proving includes "there exists $X$ such that $Y$" we must provide an $X$ that satisfies $Y$, and in doing so we may say "Propose $X = $..." and then go on to prove that it satisfies $Y$; another name for such an $X$ we propose or more generally a form $X$ could take once some details are resolved, is an ansatz.
Moreover, in the process of a proof we will often have to apply prior theorems e.g. "by theorem 4.13, we have..." or inspect the meaning of our hypotheses and expose the inner statements of their propositional form, thereby exposing variables. This can be messy, since we may have a theorem for example that is "for all $\epsilon > 0$, there exists $\delta > 0$ such that ..." and then need to apply this twice with different $\epsilon$. In such cases, we try to either resolve whatever we were using the theorem for so that it is clear its inner variables have no baring on what comes after, or we label our variables with subscripts when we invoke them i.e. "from our hypothesis, we have that for all $\epsilon_1 > 0$ ..." so that our next invocation can then be labeled $\epsilon_2$. The naming and renaming of such variables that are usually hidden away in the statement of a theorem is a highly contextual thing. This contextual awareness is also often expected for certain 'obvious' deductions, so it is sometimes expected that when we say "applying theorem 4.13..." we have recently used this theorem often enough or defined it in ways that we think it is obvious how to satisfy the hypotheses of a theorem in order to apply it. Additionally, when the proof we are doing has multiple goals but these goals are similar enough, we may only solve one goal and use the expression "without loss of generality". This means that the structure of the proof we have done for the case we cover is near identical to the proof for the case we have not covered, barring some extremely minor alterations.
Finally, it is common to end the proof with some signifier. This is usually a square on the right hand side, indicating the proof is over, but it is also common to write Q.E.D., referring to the latin "quod erat demonstrandum", literally meaning "what was to be shown" i.e. we say this after we have deduced the thing we wanted to show. A professor of mine once joked that it is perfectly valid in your own proof to simply draw a picture of a cat at the end of your proofs as long as you are relatively consistent about it.
These blocks are particularly valuable when using a math text as a reference, since it is clear that anything not highlighted in such a block is merely of pedagogical value and can be ignored when the content is no longer unfamiliar. There are a few other named blocks of information we should list for completeness, as we will also use them.
Example: an example intended to make the practical usage/instantiation of a theorem or definition concrete, often with an instance you might already be familiar with. Sometimes examples will come in groups, starting with relatively trivial instantiations that don't show much of the consequences of the thing being demonstrated, then gradually moving towards more sophisticated examples. This is a good opportunity to talk about what is meant by 'trivially' in mathematics: you could for example construct a system of numbers that has a multiplication rule trivially by saying the system of numbers is just the number $1$ and the only multiplication you can do is $1 \times 1 = 1$. There is also the notion of a 'pathological case' or an example that is pathological: often times when we build up mathematical theories, we intend for them to describe some intuition formally, and it is sometimes possible within the framework of such a theory to construct an example which violates all of these intuitions but also isn't particularly useful for describing anything. These cases are useful for understanding what structure is not implied by the rules we have set, and when we need to add additional rules to avoid such cases. There is a pattern I find quite irritating in mathematics texts of using 'Examples' not only to give intuition but also to indicate examples which are vitally important for later discussion. It is my opinion that one should be able to skim or skip past examples on repeated readings, and as such, they should only be there only to serve the first-time reader when they feel uncertain about a concept. For that reason, many of the 'examples' of this text will in fact be writte in definitions or theorems of their own, signified as part of the continuous exposition in a way that the boxed 'Examples' will not.
Remark: As an extension of the blocked-information format, these can sometimes be useful as an extended footnote about something we have recently mentioned, either to give intuition or speak on the limitations of a tool or concept.
Notation: These will be useful occasionally to define new shorthands for things that we have already defined, or ways of writing things which does not fit neatly under the umbrella of a construction. When defining a symbol in a purely notational manner, it is common to use $:=$ to mean 'defined as equal to'. We may also sometimes speak of 'abuse of notation' in which we prescribe the notation of one concept onto a different concept when we wish to think of them similarly, even if they are quite different; in practice however, what counts as 'abuse' and what is considered a broadening of what the notation should be used for is a matter of opinion.
You may have also noticed that we primarily use first person plural pronouns; this is almost universal in mathematical writing. I have heard conflicting stories about this but I am told this is called Royal We or Author's We. Certainly in the way I use it and have come to understand it, is in two interchangable modes: the use of we is to mean I, the author, together with you, the reader, as though we step through mathematical procedures together, as if by guidance; and on occasion when I speak of we as in mathematicians as a class, or a subclass of mathematicians who have developed some tradition, i.e. "when X we write Y as Z". With this in mind, I make a certain distinction about when to use 'we' and when to use 'I' based on whether I am speaking about something of my own personal understanding.
The above, and indeed many of the instruction I have tried to preempt, may seem abstract for now. The following two chapters should serve to specify the firm symbolic meanings described above and contextualise the softer intuitions I have mentioned. Proceeding to them, you may want to keep this section handy as a reference for the time being.