ClocksSugars' Blog

My Blog and the home of Application Unification

Home/articles/0326-matrixgroups/

Matrix Groups as Discussed on 03/13/26

Page Index

The following is the article version of a talk given on 03/13/26. The intended audience of the talk is a split demographic of programmers/machine learning engineers and various levels of academicians/hardware engineers. Consequently the first part of this article will be focused on a highly pedagogical description of basic group theory and Lie groups absent much discussion of differential geometry. This article is not intended to give a one-to-one correspondance with the content of the talk, but rather to formally detail the scaffold of the talk. Accordingly, it is written to supplement the talk and to be able to stand on its own as a resource, but not as a transcript or an intrinsically motivated text.

To the engineer with a mild interest in mathematics (who has done well to avoid any serious algebraic courses), one's primary exposure to anything group-like is matrices. This is not exactly an accident on the part of the mathematicians who have come up with these formalisms; in fact the entire field of representation theory studies which groups can have their structure encoded in matrices (or rather linear maps) and what one can glean when this is possible. Additionally, some of the language of linear algebra is now standardized in such a way to match with the theory of groups; terms like the 'kernel' of a matrix show up meaning roughly the same thing in group theory.

This creates a problem for any pedagogical efforts around group theory to a 'mature audience', that is, a significant amount of theory must be developed before it becomes concrete to the engineer what exactly was gained from the theory of groups. After all, if you understand matrices, and matrices describe groups, surely you understand groups. In truth, there is simply much much more to be known about matrices, neverminding the depths to be known of groups.

1. What is a group?

We will focus here on an extrinsic discussion of groups. That is, for better or for worse, most of this article will concede any argument one might want to have about representation or parametrization of groups and simply consider them to be sets of matrices. In this case, the necessary definition is as follows.

Definition 0326-matrixgroups.1Matrix Group

Let $G \subset \reals^{m \times n}$ for some $m, n \in \mathbb{N}$. We say that $G$ is a multiplicative group if $m=n$ and the following conditions are satisfied.

  1. (Closure) For any pair $A,B \in G$, their product $AB$ is also a member of $G$, so the group cannot be escaped with multiplication from within the group.

  2. (Identity) The identity matrix $I \in \reals^{n \times n}$ is present in the group, i.e. $I \in G$.

  3. (Inverses) All group members have their inverse elements present, i.e. for all $A \in G$, we have $A^{-1} \in G$ such that $A A^{-1} = A^{-1} A = I$

Otherwise, we may also define a additive group if the following are satisfied.

  1. (Closure) For any pair $A,B \in G$, their sum $A + B$ is also a member of the group $G$

  2. (Identity) The identity matrix $0 \in \reals^{m \times n}$ is present in the group, i.e. $0 \in G$

  3. (Inverses) All group members have their negative elements present, i.e. for all $A\in G$, we have $-A \in G$ such that $A + (-A) = (-A) + A = 0$.

We try to keep our notation agnostic to additive and multiplicative groups, describing either as group multiplication and denoting it $gh$ for any $g,h \in G$.

When defining a group in a more purely algebraic context, it is common instead of the closure requirement to instead describe a group multiplication operation $\square \cdot \square: G \times G \to G$ (implied to only be able to produce elements of the set $G$ by construction) with the property of associativity, i.e. $(gh)k = ghk = g(hk)$ with the order of bracketing not mattering. It is also sometimes appropriate to emphasize the lack of commutativity, i.e. $gh \neq hg$ in general. However, these two properties are in some sense assumed if the reader knows anything about matrices. Matrix multiplication (and of course addition) are already associative (and generally known to not be commutative), but in such a context, it becomes important to emphasize that we are only interested in sets of matrices which multiply into themselves, i.e. are closed. This emphasis is doubly important once subgroups are introduced, but we'll hold that thought for now.

Any discussion of groups is incomplete to the point of bordering dishonesty if the group homomorphism is not discussed. We owe to the homomorphic property most of the results of group theory.

Definition 0326-matrixgroups.2Group Homomorphism

Let $G$ and $H$ be groups. We say that a map $f\colon G \to H$ is a group homomorphism if for any $g,h \in G$ and the corresponding $f(g),f(h) \in H$, they satisfy $$\begin{gather*} f(g\cdot h) = f(g) \cdot f(h) \end{gather*}$$ that is, sending the multiplication as defined for $G$ between $g$ and $h$ to a multiplication as defined for $H$ between $f(g)$ and $f(h)$. Homomorphisms are so called because they are structure preserving maps, in particular preserving the group structure.

It is here that we must point out that all of linear algebra is a very special case of group theory, focused on the additive group of column matrices $\reals^{n \times 1}$. First, notice that the column matrices do in fact form a group under addition: we may freely add them to obtain another column matrix (satisfying closure), we have the zero vector (satisfying identity), and we have negative vectors (satisfying inverses). When the homomorphic property is expressed for the group of column matrices, we see that it is $$\begin{gather*} f(u + v) = f(u) + f(v) \end{gather*}$$ for any $u,v \in \reals^{n\times 1}$. In other words, a homomorphic map on vectors is precisely a linear map. Such an example is useful in pointing out that aspects of group theory are far more pervasive than one might realise, but the simplicity of the example does us a disservice. Additive groups, because they are commutative, have little structure more than what you might think of as their 'dimension' (as well as how cyclic they are, but we'll get to that shortly). However this does mean we immediately have at least some intuition for subgroups. When we take a lower dimensional subspace in $\reals^n$, we know that addition within this subspace is closed, so we cannot escape it from within, and that orthogonal subspaces should not interfere. Something very close to this will be the case for subgroups.

This analogy also paints a good picture for how we should expect homomorphisms to help us in identifying subgroups. If we are to focus on vectors, not merely as a space of positions but as a space of actions, principly concerned not with the end point of their arrow (and thus neither concerned with the starting point of their arrow), then it should be obvious to us in our column matrix description that the directions of these arrows are innately separable. But absent that way of writing them, it is also obvious that the following linear operations actively separate dimensions from one another.

$$\begin{gather*} \begin{bmatrix} 1 & 0 & 0 \\[0em] 0 & 1 & 0 \\[0em] 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} x \\[0em] y \\[0em] z \end{bmatrix} = \begin{bmatrix} x \\[0em] y \\[0em] 0 \end{bmatrix} \\[0.7em] \begin{bmatrix} 1 & 0 & 0 \\[0em] 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} x \\[0em] y \\[0em] z \end{bmatrix} = \begin{bmatrix} x \\[0em] y \end{bmatrix} \end{gather*}$$

In fact, homomorphisms, when they are not isomorphisms (i.e. bijective or invertible homomorphisms), act precisely in this way to flatten out or project down something approximating a dimension of a group. These two instances are listed separately because they also make distinct something that will become important to us. In the first equation, our linear map $A$ is $A: \reals^3 \to \reals^3$ but can only materially map outputs within a two dimensional subspace of $\reals^3$. In the latter, we eliminate any pretense of being in $\reals^3$ and speak of a linear map $B$ as $B\colon \reals^3 \to \reals^2$. Since we can just as easily insert elements of $\reals^3$ into $\reals^2$, one might think of the inclusion map $\iota \colon (x,y) \mapsto (x,y,0)$ in composition with our linear maps to define $A = \iota \circ B$. This is fine, useful even, but very importantly distinct as we will also shortly see.

A final remark must be made before we describe subgroups and their kinds in detail. I would argue that the description of a group as a set of actions is the proper one (as motivated by the theory of group actions), and this intuition will serve as a basis in this article. That is to say, when we think of a group, we should think that the group has some underlying thing it acts on; in the case of linear maps, this is a vector space which it stretchs, skews, and rotates. In this frame, the property that a group must have an identity and inverses becomes contextualized; it must of course be able to do nothing to the underlying object (hence we need an identity action), and in particular we are only interested in things we can do to this object which are reversable, lest we find our exploration of properties leading us to a stuck-state. But this also contextualizes our repeated interest in separable structure, things approximating dimensions, subgroups, etcetera. The theory of groups is built with the capability that it tells us which actions on an underlying object will not interfere with one another, or in what ways they will interfere.

2. Subgroups

Above we spoke of additive matrix groups, namely vector spaces, but our principle interest going forward will be of square matrix groups under multiplication. So in order to speak of subgroups, we will need to define the maximal group from which those matrix groups come, and this is called the general linear group, denoted $\mathrm{GL_n(\reals)}$ for the group of $n\times n$ matrices with real entries. The sole requirement for a matrix to be a member of this group will be simply that the matrix has an inverse. This does more than just satisfy the inverse requirement of groups though, as we see from studying our first square matrix homomorphism. With any luck, you are already familiar with the matrix determinant, which satisfies $\det(AB) = \det(A) \det(B)$, an obvious multiplicative homomorphism.

It must be observed that this homomorphism, as a homomorphism, is mapping into a group, yet we think of the determinant as producing a mere number. The trick here is that numbers also form a group under multiplication, since we have an identity, the number one, and inverses for all $x \in \reals$, the number $1/x$, with exception only to zero. Accordingly, the multiplicative group of real numbers $\reals_\times$ must be built on the set $\reals \setminus\{0\}$, i.e. $\reals$ without zero, in order to satisfy the group axioms. Thus it becomes clear why non-invertable matrices must be excluded from a group; non-invertable matrices are 'zero-like', and so permitting them for multiplication would get us 'stuck' in the set of matrices with determinant zero.

In noticing such a structure in the output of the determinant, we are able to pull our notions of zero-ness and many other intuitions about real numbers back to the matrices. This pattern is an invaluable application of homomorphisms; the determinant map is able to ignore everything that is not 'number-like' about a matrix, turning matrix multiplication into number multiplication, a simpler group with which intuitions become much more obvious, and remain applicable on the original group. But in a very real sense, this $\reals_\times$ was hiding in the general linear group all along. For instance we could simply speak of matrices $$\begin{gather*} A_\lambda = \begin{bmatrix} \lambda & \vec{0}^T \\[0em] \vec{0} & I_{n-1} \end{bmatrix} \in \mathrm{GL}_n(\reals) \end{gather*}$$ which have some number $\lambda \in \reals_\times$ in the top left followed by an identity matrix of size $(n-1) \times (n-1)$, and zeroes $\vec{0} \in \reals^{(n-1) \times 1}$ in the remaining spots. Such a matrix would satisfy $\det(A_\lambda) = \lambda$, and for a similar matrix with $\mu \in \reals_\times$ instead of $\lambda$, we would have $A_\lambda A_\mu = A_{\lambda \mu}$ and $\det(A_{\lambda \mu}) = \lambda \mu$. In this way, we can embed $\reals_\times$ in $\mathrm{GL}_n(\reals)$ and see that it forms a self contained subgroup with all the same dynamics of $\reals_\times$.

Definition 0326-matrixgroups.3Subgroup

Let $G$ be a group and let $H$ be a subset of $G$. We say that $H$ is a subgroup of $G$ if it satisfies the group properties (i.e. closure, identity, inverses) under the same group multiplication operation. The closure property is of particular importance here since one must not be able to escape the subgroup from within it.

It is not always the case that the existence of a homomorphism implies that the output of the map is a subgroup within the input group. This is seen most obviously if one studies the following homomorphism: $$\begin{align*} f\colon & \reals_+ \to \mathrm{GL}_2(\reals) \\[0em] f\colon & x \mapsto \begin{bmatrix} \cos(2\pi x) & \sin(2\pi x) \\[0em] -\sin(2\pi x) & \cos(2\pi x) \end{bmatrix} \end{align*}$$ that is, the map that sends a vector space of dimension 1, the additive reals group, to $2\times 2$ rotation matrices. In brief, we prove the homomorphic property using the angle addition formulas $$\begin{gather*} \cos(a + b) = \cos(a) \cos(b) - \sin(a) \sin(b) \\[0em] \sin(a + b) = \sin(a) \cos(b) + \cos(a) \sin(b) \end{gather*}$$ to show $$\begin{align*} &f(x)f(y) \\[0em] =&\begin{bmatrix} \cos(2\pi x) & \sin(2\pi x) \\[0em] -\sin(2\pi x) & \cos(2\pi x) \end{bmatrix} \begin{bmatrix} \cos(2\pi y) & \sin(2\pi y) \\[0em] -\sin(2\pi y) & \cos(2\pi y) \end{bmatrix} \\[0em] =& \begin{bmatrix} \cos(2\pi x) \cos(2\pi y) - \sin(2 \pi x) \sin(2\pi y) & \sin(2\pi y) \cos(2\pi x) + \cos(2 \pi y) \sin(2\pi x) \\[0em] - \sin(2\pi x) \cos(2\pi y) - \cos(2 \pi x) \sin(2\pi y) & \cos(2\pi x) \cos(2\pi y) - \sin(2 \pi x) \sin(2\pi y) \end{bmatrix} \\[0em] =& \begin{bmatrix} \cos(2\pi (x + y)) & \sin(2\pi (x + y)) \\[0em] -\sin(2\pi (x + y)) & \cos(2\pi (x + y)) \end{bmatrix} \\[0em] =& f(x+y). \end{align*}$$ In effect, we have turned our group of adding-numbers into rotations, with a full rotation equivalent to no rotation at all. The trouble here is that there is no such a subgroup in $\reals_+$ which has a similar looping property. No matter how clever you get, if I ask you to find a non-zero number which I can add to itself to get zero, you can't do that in $\reals$. But the homomorphism has still told us about a subgroup, if one described directly in the output, there is the one we factored out to get such an output. In this case, we have factored out the additive integer group $\mathbb{Z}_+$, the positive and negative whole numbers under addition. Although we cannot do scalar multiplication on integers in the same way we might with a one-dimensional vector space, it still satisfies the requirements for a group, and we can indeed imagine a 'vector of integers', certainly if nothing else as a subset of vectors.

The only guarentee we have with a subgroup is that whatever property is held by the subgroup elements is preserved within it, but this isn't necessarily a guarentee that the dynamics of the group associated with this property can be factored out. The property we have when a subgroup can be factored out, in the way that a well chosen homomorphism might do, is called being normal.

Definition 0326-matrixgroups.4Normal Subgroup

Let $G$ be a group and $N$ a subgroup. We say that $N$ is a normal subgroup if for all $g \in G$ and $h \in N$, there exists a $k \in N$ such that $$\begin{gather*} gh = kg. \end{gather*}$$ This is often written $gN = Ng$, to imply that while $g$ does not necessarily commute (swap ordering) with elements of $N$, a commutation can be forced by replacing the member of $N$ with some other member of $N$, thus keeping within the dynamics of the group described by $N$.

This notation, of writing $gN$ where $N$ stands in for any possible element of the subgroup $N$, is called a coset. They are of interest in factoring out group dynamics since if $h\in N$ then $ghN = gN$; any possible element $k\in N$ also has $hk \in N$ by $N$'s closure. Additionally, if $g,h \in G$, then $(gN)(hN) = g(Nh)N$, which by the property $gN=Ng$, is $g(hN)N = (gh)N$. That is to say, these cosets can behave like a group on their own, and the group defined by factoring out $N$'s dynamics is readily defined merely by the group structure of this group of cosets. We call this the quotient group.

In the case of the determinant, the output group $\reals_\times$ implies the existence of a normal subgroup which we had factored out. Since it was factored out, we expect whatever structure this subgroup described to map to the identity in the output, i.e. we are interested in what matrices have determinant equal to one. This turns out to be a sufficient condition to characterize our normal subgroup, which we call the special linear group: $$\begin{gather*} \mathrm{SL}_n(\reals) = \left\{ A \in \reals^{n \times n} \mid \det(A) = 1 \right\}. \end{gather*}$$

If you are familiar with the property of determinants to give the volume of an $n$-dimensional parallelopiped with edges defined by the columns of a matrix, this property means that a matrix in $\mathrm{SL}_n(\reals)$ may rotate, flip, and skew a vector space, but never change its volume.

3. Matrix Groups of Interest

This article will primarily focus on Lie groups, i.e. continuous groups, but before we get there we should describe some other of groups of interest. Chiefly among the discrete groups is the groups of permutations, called the symmetric groups, $\mathrm{S}_n$. In the description given earlier of groups existing to act on things, the permutations of the symmetric group act to turn one ordering of a set into another ordering. For instance, the set of numbers $\{1,2,3,4,5\}$ may be reordered by a permutation matrix in the following manner. $$\begin{gather*} \begin{bmatrix} 0&1&0&0&0 \\[0em] 1&0&0&0&0 \\[0em] 0&0&0&0&1 \\[0em] 0&0&0&1&0 \\[0em] 0&0&1&0&0 \end{bmatrix} \begin{bmatrix} 1\\[0em]2\\[0em]3\\[0em]4\\[0em]5 \end{bmatrix} = \begin{bmatrix} 2\\[0em]1\\[0em]5\\[0em]4\\[0em]3 \end{bmatrix} \end{gather*}$$

The symmetric group is of particular interest because we know that every group with a finite number of members may be written as a subgroup of a permutation group, via Cayley's theorem. The reason for this powerful fact is deceptively simple: on a finite group, we can think of the group as merely a set in some order, and to each element of the set we associate a reordering for the rest of the set. In this way, one reads a group product $gh$ as $g(h)$ with $g$ a permuting function which takes $h$ to whatever we have decided we want $gh$ to be equal in the set.

Things can get a bit more dicey if we want to describe infinite discrete groups, but doing such a thing may well leave the realm of groups which can be expressed as matrices entirely, so we'll avoid it here. Instead, we will begin our focus on infinite continuous groups, which we retain for the rest of the article.

In particular, we will be primarily interested in continuous multiplicative groups since additive matrix groups generally have trivial structure; no matter what shape a matrix has in an additive matrix group, the group may as well be described by a vector. Nonetheless, we should at least mention a few such as the group of symmetric matrices, the matrices $A = A^T$ which remain the same when transposed or flipped across their diagonal. Since two symmetric matrices $A,B$ only has $AB$ symmetric when $AB=BA$, we cannot form a multiplicative group of symmetric matrices, but since $A$ and $B$ are described by entries $A_{ij}$ and $B_{ij}$ for their $i$'th row and $j$'th column with $A_{ij}=A_{ji}$ and $B_{ij}=B_{ji}$ (i.e. the $A=A^T$ property), we also have $$\begin{align*} (A+B)_{ij} =& A_{ij} + B_{ij} \\[0em] =& A_{ji} + B_{ji} \\[0em] =& (A+B)_{ji} \end{align*}$$ so sums preserve symmetry. This is also true of antisymmetric matrices, that is, matrices with the property $A = -A^T$, which become negative when flipped. For the group of antisymmetric $4\times 4$ matrices, we have matrices of the following form: $$\begin{gather*} \begin{bmatrix} 0 & a & b & c \\[0em] -a & 0 & d & e \\[0em] -b & -d & 0 & f \\[0em] -c & -e & -f & 0 \end{bmatrix}, \hspace{2em} a,b,c,d,e,f \in \reals. \end{gather*}$$ Unlike the symmetric matrices, the antisymmetric matrices must be zero on the diagonal since the diagonal stays the same when flipped; since zero is the only number which is its own negative and the entire matrix must be negative when flipped, the part of the matrix that stays in the same place when flipped must be zero.

We can also extend this to complex matrices, in particular the hermitian and anti-hermitian matrices. That is, we say that a matrix is hermitian when it satisfies not $A = A^T$ but $A = A^\dagger$, where $A^\dagger$ denotes the conjugate transpose, i.e. we flip the matrix and we flip the sign on its imaginary component. Accordingly, if we say that $A = B + iC$ where $B$ and $C$ are both matrices with real number entries, then $A^\dagger = B^T - iC^T$. Moreover, this also tells us that when $A$ is hermitian, $B$ must be symmetric and $C$ must be antisymmetric, since requiring $A=A^\dagger$ is the same as requiring $A^\dagger = B^T - iC^T = B + iC = A$. Due to this structure, we can flip these by simply multiplying by $i$: if $A$ is hermitian, then $iA = iB + i^2C = -C + iB$. With $C$ antisymmetric and $B$ symmetric, we can show by a chain of equalities that $iA$ is anti-hermitian, i.e. $(iA)^\dagger = -iA$. $$\begin{align*} (iA)^\dagger =& \big(i(B + iC)\big)^\dagger\\[0em] =& (iB - C)^\dagger \\[0em] =& (-iB - C)^T \\[0em] =& -iB^T - C^T \\[0em] =& -iB + C \\[0em] =& i( -B - iC) \\[0em] =& -i (B + iC) \\[0em] =& -iA \end{align*}$$ In this way we know that hermitian and antihermitian matrices, although they are the complex analogs of symmetric and antisymmetric matrices, are only a factor of $i$ away from one another. It is now time, finally, to get to a matrix group of serious and persisting interest to us. Consider the transpose operation $A \mapsto A^T$. We know this is not a homomorphism, by the property that $(AB)^T = B^T A^T$, which appears almost like the homomorphic property but reversed for some reason. Interestingly, this property is also shared for the property of taking a matrix's inverse, i.e. $(AB)^{-1} = B^{-1} A^{-1}$. So here's a thought, if we chain these two operations together, each which look like a homomorphism but with a reversing property, can their reversing properties cancel out? The answer is yes, however the resulting map is an isomorphism on the general linear group, so we don't get a normal subgroup out of it. $$\begin{gather*} (AB)^{-T} = (B^{-1} A^{-1})^T = A^{-T} B^{-T} \end{gather*}$$ We may still be interested in structure of the group which remains untouched by this isomorphism, which otherwise serves to rearrange the group within itself. Applying the determinant shows us that this homomorphism inverts the determinant, i.e. $\det(A^{-T}) = (\det(A))^{-1}$, which means in some sense that the isomorphism is swapping matrices with determinant $\lambda$ for those with determinant $1/\lambda$. So surely, if a matrix has a determinant one, it may be left alone by the homomorphism, but this is on its own not true. If we merely take a matrix in the special linear group $\mathrm{SL}_n(\reals)$, you will recall such a matrix may rotate, flip, and skew a vector space; the step of taking an inverse will skew in the opposite way, but the step of taking a transpose will skew in an entirely different way. One way to see this is to write out the matrix product between a matrix $A$ and a column vector $v$. Denoting $A_{ij}$ as the $i$'th row and $j$'th column and $v_j$ as the $j$'th row, we define an output vector $u$ which has entries $u_i$ in its $i$'th row, writing the product as $$\begin{gather*} \sum_{j} A_{ij} v_j = u_i \end{gather*}$$ This is the matrix multiplication you are hopefully familiar with in index notation; but notice in particular that while we sum across $j$, the way we track index $i$ remains intact across the equality. In fact, one can read this formula as saying that the output vector $u$ is merely a linear combination of the columns of $A$ with weights $v_j$. If $A \in \mathrm{SL}_n(\reals)$ then this linear combination can be thought of as sending $v$ to a different basis where its axes are squeezed, shrunk, stretched, etc. but we can guarentee any axis which is stretched will be accounted for in another which is shrunk so to keep a constant determinant. But when we take a transpose and calculate $A^T v$, the columns are simply completely different, and as such the matrix product is a linear combination of completely different vectors.

So clearly not all determinant-one matrices are of interest with respect to this inverse-transpose homomorphism. Instead, consider we had found a set of matrices $\Omega$ such that all $A \in \Omega$ satisfy $A^{-T} = A$. Obviously this property is true of the identity matrix, since $I^T = I$ and $I^{-1} = I$, so demanding such a thing does not lead to an empty set. What would be very interesting to us is if we can show that any two matrices having this property have a product matrix also with this property, and that other non-identity matrices exist with this property. The first is simple, i.e. asserting that two matrices $A,B \in \Omega$ have the property $A^{-T} = A$ and $B^{-T} = B$, we see $(AB)^{-T} = A^{-T} B^{-T} = AB$, by the homomorphic property.

Next, let's study this property. To say $A^{-T}=A$, by the involutive property $A^{TT} = A$ and $(A^{-1})^{-1}=A$, is the same as saying $A^{-1} = A^T$. Using the property of an inverse matrix that $AA^{-1} = I$ which is for our case $AA^T=I$, we may write the product as $$\begin{gather*} \sum_{k} A_{ik} (A^T)_{kj} = \sum_{k} A_{ik} A_{jk} = I_{ij}. \end{gather*}$$ If we follow the same pattern we did earlier, of noticing that $i$ and $j$ are preserved across the equality and thus that we are really speaking about the columns of $A$ like vectors, then this equation reads like a dot-product between columns of $A$. In particular, with $I_{ij}$ on the other side, we are saying that these dot products must be 1 when $i=j$ and zero otherwise; we have demanded that the columns of the matrix are orthogonal to one another, and normalized so that their dot-products with themselves are exactly one. Of course, this is a property we can very easily satisfy, populating a group of matrices called the orthogonal group, written $\mathrm{O}(n)$ to represent the matrices satisfying $A^T=A^{-1}$.

The orthogonal group refers to matrices which rotate and flip a vector space, but do not skew, stretch, or scale it. Orthogonal matrices have determinants of either one or negative one, on account of these flips; the reason for this is most obvious in $\reals^2$ however. Famously, one way to flip the sign on a determinant is to swap two rows or columns, however swapping rows is the same as swapping the $x$ and $y$ components of vectors in the output. If we think of the unit $x$ and $y$ vectors, in $\reals^2$ we think of the $x$ vector as being $90^\circ$ clockwise of the $y$ vector, and any rotation of the space will preserve this; swapping the places of $x$ and $y$ in a column vector however means that the $x$ vector is $90^\circ$ counter-clockwise of the $y$ vector. To turn such a flip into a rotation, the components need to be swapped and one of them multiplied by $-1$; drawing out $(x,y) \mapsto (y,-x)$ describes exactly a rotation, and so any flipped space is always one well placed negative sign away from its original orientation. Just as we factored $\reals_\times$ out of $\mathrm{GL}_n(\reals)$, the image of $\mathrm{O}(n)$ under the determinant homomorphism is the finite multiplicative group $\{1,-1\}$, and we can factor this out to form $\mathrm{SO}(n)$, the special orthogonal group, the matrices which only describe rotations in $n$-dimensions.

Finally, it will serve us to speak about the complex analogy for these groups, the unitary and special unitary groups $\mathrm{U}(n)$ and $\mathrm{SU}(n)$ respectively. Justifying their rules would require a departure into complex numbers that's just a little too long for this section, but the basic pattern remains the same. The unitary matrices are those $A \in \mathrm{U}(n)$ which satisfy $A^\dagger = A^{-1}$. The analogous notion of flips here gets a bit stranger, since a unitary matrix is not confined to only determinants $1$ and $-1$ but all complex numbers of magnitude one, so $i$, $-i$, $\sqrt{1/2} + i\sqrt{1/2}$, anything of the form $\cos(\theta) + i\sin(\theta)$. We may nonetheless want to speak of those matrices satisfying $A^\dagger = A^{-1}$ with determinant exactly the positive real number one, and for that we have the matrix group $\mathrm{SU}(n)$.

4. Lie Groups

A clear distinction must be drawn between discrete groups and continuous groups. That is, it is not too hard to enumerate the permutations in a symmetric group: one need only consider ways you could swap elements in a list. However if someone were to ask you to list all of the vectors in a vector space, you would rightly point out this is an equally absurd request as counting out all the numbers between zero and one. One can reason out that any permutation one may desire can usually be constructed by some sequence of the pairs of following permutations: $$\begin{gather*} (1,2,3,4,5) \mapsto (2,3,4,5,1), \\[0em] (1,2,3,4,5) \mapsto (2,1,3,4,5), \end{gather*}$$ but it is not so clear that a direct analogy for these exists for a vector space. Even in a one dimensional space, if I pick a single vector, thought of as an action that moves me some distance along a line, and apply it and its inverses repeatedly, what I generate is not a vector space but something that looks more like the additive integers. And yet we have a clear notion of directions in a vector space which make perfect sense if such a single vector, as above, is allowed to be applied fractionally, a la scalar multiplied, in order to construct the rest of the space.

It is clear to us that the general linear group, special linear group, special and unspecial orthogonal and unitary groups, are all continuous groups, if for no other reason than what they describe. The set of invertible transformations may simply stretch an axis continuously, the set of volume preserving transformations may deform volumes continuously, and rotations, while bounded by their own cyclic nature, are still continuous within that cycle. In each of these cases, any effect we are picturing may be pictured happening ever so slightly more or ever so slightly less, and that 'slightly'-ness may be shrunk even further as desired.

What this tells us is that in the same way that vector spaces are spaces, in particular modelling arrows within euclidean space as familiar to us, we must also think of these matrix groups as spaces. In fact, in a more formal setting we would call these spaces manifolds, placed on equal footing when discussing what the world looks like on the surface of a sphere, the surface of a torus, or existing in a curved space-time. But while manifolds are a far more general structure, describing all sorts of spaces which may lack uniformity, our continuous matrix groups are also groups, or rather actions that could be done on some other object invertibly. This imparts a few special properties, such as homogeneity of the space in the same way that a perfect sphere looks basically same no matter where you are in it; since no matter where you are in a group you may still multiply it by any other element in the group and move around in the same way you would at the identity.

Definition 0326-matrixgroups.5Lie Group

A group is said to be a Lie group if it is continuous and its group multiplication operation is differentiable in both its left and right arguments.

This construction introduces the principle trouble of any discussion of differential geometry, which is that the notion of differentiability doesn't hold in quite the same way in a space which is not euclidean. This is most obviously imagined on the surface of a sphere: when one thinks about the directions you can move in from some point and the speeds you could travel at in these motions, what you get is not a sphere but a tangent plane. All this is to say, taking the derivative of a function at a point, insofar as that derivative describes the rate of change of that function, must give you back some object which is not part of the original space you started with. Moreover, these tangent planes, these spaces of directions, are incomparable with one another: a sphere in space with a person on the equator may say "I will go north", but the direction they are calling north is in fact up, and when translated to the north pole, this direction points directly out of the sphere, an invalid direction.

Fortunately, two properties of our object of study mitigate the unfamiliarity of this to us. First, we are not working with abstract intrinsic manifolds, we are working with matrix groups, which in our present description are subsets of $\reals^{n\times n}$, a euclidean $n^2$-dimensional space. The tangent planes we speak of are accordingly not just abstract vector spaces associated to a point, i.e. a space of directions, but also a tangible plane we can place against the surface of our manifold. Second, the group structure on the manifold means that all of our tangent spaces can be discussed similarly. That is, if we first specify a point on the Lie group, and this could be the identity $e \in G$ or some other element $g\in G$, any other element $h\in G$ species a way you could move around the space without caring whether you start at $e$ or $g$; you simply end up at either $h$ or $hg$ instead. For this reason, we can limit our discussion of these tangent planes to exclusively the plane that is tangent to the identity, and this forms a vector space of directions which we call the Lie algebra of the Lie group. Moreover, this vector space will tell us about the dimensions of our Lie group as well as how to move through it without accidentally leaving it.

Unfortunately the Lie algebra has a few other special properties so we'll need to wait to define it formally, but for now we can definitely calculate what one looks like.

My favorite Lie group to calculate the Lie algebra for is the orthogonal group, since it demonstrates this method very clearly. First, recall that orthogonal matrices $A \in \mathrm{O}(n)$ satisfy $A^T = A^{-1}$. If we multiply both sides of this by our $A$, we obtain $AA^T = I$, which is very valuable since the derivative of a constant such as $I$ will be zero. So if we consider $A$ not just as a member of the orthogonal group but as a matrix in the vector space $\reals^{n \times n}$, we can speak of a tiny perturbation and ask which perturbations will continue to satisfy this $AA^T = I$ property. So for some tiny perturbation $\varepsilon B \in \reals^{n \times n}$ and a function $\varphi\colon \reals^{n\times n} \to \reals^{n\times n}$ defined $A \mapsto AA^T$ (which by the way is not a homomorphism), we calculate the derivative of $\varphi$ in the normal way: $$\begin{align*} \lim_{\varepsilon \to 0} \frac{\varphi(A + \varepsilon B) - \varphi(A)}{\varepsilon} &= \lim_{\varepsilon \to 0} \frac{(A+\varepsilon B)(A^T+\varepsilon B^T) - AA^T}{\varepsilon} \\[0em] &= \lim_{\varepsilon \to 0} \frac{AA^T + \varepsilon AB^T + \varepsilon BA^T + \varepsilon^2 BB^T - AA^T}{\varepsilon} \\[0em] &= \lim_{\varepsilon \to 0} \Big(AB^T + BA^T + \varepsilon BB^T \Big) \\[0em] &= AB^T + BA^T \end{align*}$$

Now, remember we said that a matrix is only orthogonal if $AA^T = I$, so differentiating both sides of this equality by $A$, we have on the left hand side $AB^T + BA^T$ where $B$ was the direction of our tiny perturbation $\varepsilon B$, and on the other side we have $\frac{d}{dA} I$ which is simply the zero matrix. So we write $$\begin{gather*} AB^T + BA^T = 0 \\[0em] AB^T = - BA^T \end{gather*}$$

And finally, remembering what we said earlier, since the Lie group looks the same everywhere, we focus our efforts to the tangent space set at the identity of the Lie group; this means we set $A = I$ now that we have this relation, and obtain $$\begin{gather*} B^T = -B. \end{gather*}$$

This tells us something remarkable: the directions that orthogonal matrices can move in are exactly those matrices which are their own negatives when transposed, i.e. the space of directions in the orthogonal group is exactly the anti-symmetric matrices! We usually write the Lie algebra as a fraktur font version of the original Lie group name, so $\mathrm{O}(n)$ has lie algebra $\mathfrak{o}(n)$.

We can repeat this calculation for a few other groups. The unitary matrices follow roughly the same calculation, concluding $B^\dagger = -B$, and thus that their Lie algebra is the anti-hermitian matrices. We can think of additive groups as like vector spaces such as $\reals^{n \times 1}$, and for these the vector space itself, in the same way we think of standard calculus as not having any geometric complications. For instance, the anti-symmetric matrices under addition in dimension 4 only have six parameters (the upper triangle) and so its Lie algebra is $\reals^6$. The general linear group is a little stranger since it is not a restriction to a subspace of the set of matrices but rather just an exclusion of those with determinant zero; noticing that fact alone (together with some awareness of topology, i.e. that the set of invertible matrices is an open set in $\reals^{n \times n}$) is enough to tell us that the directions one can move in $\mathrm{GL}_n(\reals)$ is in fact all of them; the lie algebra of $\mathrm{GL}_n(\reals)$ which we call $\mathfrak{gl}_n(\reals)$ is simply $\reals^{n \times n}$ itself.

Having a Lie algebra be a completely unrestricted set of matrices might be confusing; you might wonder how we are supposed to stay within the group if that is the case. And the answer to that is on its face that the Lie algebra alone does not stay inside the group for the same reason that tangent planes at a point do not stay within a surface. We need an additional tool that can take a direction and tell us what it means to trace along that direction while staying within the manifold. That tool will be the exponential map.

Before we continue, we should note that there are many other Lie groups which may be of interest in various circumstances. For instance, we can take the special orthogonal group in dimension two to be the group of rotations on a circle, itself a one dimensional space, and thus construct a torus by $\mathrm{SO}(2) \times \mathrm{SO}(2)$. The matrices of this group would look like so: $$\begin{gather*} \begin{bmatrix} \cos(\theta) & \sin(\theta) & 0 & 0 \\[0em] -\sin(\theta) & \cos(\theta) & 0 & 0 \\[0em] 0 & 0 & \cos(\phi) & \sin(\phi) \\[0em] 0 & 0 & -\sin(\phi) & \cos(\phi) \end{bmatrix} \hspace{3em} \theta,\varphi \in [0,2\pi) \end{gather*}$$

We can also take what we've described above and use it to figure out when a space can't be a Lie group. For instance, $\mathrm{SO}(3)$ describes the rotations you can do on a sphere, but we can reason out that the sphere itself can't be a Lie group. This is because at the equator, the directions one might think about moving in are north, south, east, and west, yet if one tries to map which direction east might be at every point of a globe, one finds that 'going east' means staying still if you are at the north pole. In a group setting, this would be like saying you found two matrices $A,B \in \mathrm{GL}_n(\reals)$ with the property that $AB = B$ where $A$ is distinctly not the identity, and this simply does not occur without $A = I$. The rotations of $\mathrm{SO}(3)$ does not mean that the space itself is a sphere, because even though the space of rotations of a sphere includes rotations that take you from any one point to any other point (thus mapping the sphere), they act on the whole sphere, and thus a point may be kept still (the poles) while the rest rotates.

Finally, we should note that strictly speaking, many of the groups we described fail to have a perfectly differentiable group multiplication operation. This too is immediately apparent using the intuitions we have built up: consider the Lie group $\reals_\times$. If we think of it as a space that is a line, then it is clear that the line has a cut in the middle, since we cannot include zero in the Lie group. Since the determinant is differentiable group homomorphism, it is similarly apparent that the orthogonal group must be made of two disconnected components since an orthogonal matrix has determinant of either exactly one or exactly negative one. The flips we mentioned earlier simply cannot be multiplied differentiably, except in the case of complex matrix groups as we described in the previous subsection. In $\mathrm{GL}_n(\mathbb{C})$, there is a hole in the middle, but the space is connected. Similarly, taking the determinant of unitary matrices shows that the space clearly has a hole in the middle but it is still connected. Regardless, all of these matrix groups are generally considered Lie groups. The caveat made is that for each $\reals$ group with this cut, we define positive and negative components, $\mathrm{O}^+(n) = \mathrm{SO}(n)$ and $\mathrm{O}^-(n)$ isomorphic to $\mathrm{O}^+(n)$ for instance. We then think of $\mathrm{O}(n)$ as $\mathrm{O}^+(n) \times \{1,-1\}$.

5. The Exponential Map

In regular calculus, it is well known that the derivative of an exponential $e^x$ is itself. Moreover, the differential equation $$\begin{gather*} \frac{df}{dx} = a f(x) \end{gather*}$$ for some constant $a$ is generally considered trivially simple to solve, since it is known that $e^{ax}$ has derivative $a e^{ax}$. The reason for this is owed to its Taylor expansion.

When we perturb the input of a function by some small amount, it should make sense that the way the function changes is related to its derivatives, since those are supposed to describe rates of change. This is the justification for numerical methods such as forward euler, which calculate $f(x+h) = f(x) + h\frac{df}{dx}(x)$. But these methods are not perfect, and the reason for this is that the derivative also has its own rate of change, and the derivative of that also has its rate of change, and so on and so forth. The full expression for this generates an infinite series: $$\begin{gather*} f(x+h) = f(x) + h\frac{df}{dx}(x) + \frac{h^2}{2!} \frac{d^2 f}{dx^2}(x) + \frac{h^3}{3!}\frac{d^3f}{dx^3}(x) + ... \end{gather*}$$

Now, the exponential is special since when we set $f(x) = e^x$ and take the derivative in our perturbation $h$ (since that is how we're changing $x$), we get back what we started with. $$\begin{align*} \frac{d}{dh} f(x + h) &= \frac{d}{dh}e^x + \frac{d}{dh} \left( h \frac{de^x}{dx}\right) + \frac{d}{dh} \left( \frac{h^2}{1\cdot 2} \frac{d^2e^x}{dx^2}\right) + \frac{d}{dh}\left(\frac{h^3}{1\cdot 2\cdot 3} \frac{d^3 e^x}{dx^3}\right) ... \\[0em] &= \frac{d}{dh}e^x + \frac{d}{dh} \left( h e^x\right) + \frac{d}{dh} \left( \frac{h^2}{1\cdot 2} e^x\right) + \frac{d}{dh}\left(\frac{h^3}{1\cdot 2\cdot 3} e^x \right) + ... \\[0em] &= 0 + e^x + \frac{2h}{1\cdot 2} e^x + \frac{3h^2}{1\cdot 2\cdot 3} e^x + \frac{4h^3}{4 \cdot 3!} e^x + ... \\[0em] &= e^x + \frac{h}{1} e^x + \frac{h^2}{1\cdot 2} e^x + \frac{h^3}{3!} e^x + ... \\[0em] &= e^x + h \frac{de^x}{dx} + \frac{h^2}{1\cdot 2} \frac{d^2e^x}{dx^2} + \frac{h^3}{1\cdot 2\cdot 3} \frac{d^3 e^x}{dx^3} ... \\[0em] \end{align*}$$

In a very real sense, this series is the definition of the exponential. For this reason, when we have a differential equation on vectors that appears as $$\begin{gather*} \frac{d\vec{x}}{dt} = A \vec{x}, \end{gather*}$$ the solution is written $e^{At}\cdot \vec{x}(0)$ in reference to the above series. That is, such differential equations are solved by $$\begin{gather*} e^{At} \cdot \vec{x}(0) = \vec{x}(0) + \frac{tA\vec{x}(0)}{1!} + \frac{t^2 AA \vec{x}(0)}{2!} + \frac{t^3 AAA \vec{x}(0)}{3!} + ... \end{gather*}$$ with the same property as above that taking a derivative in $t$ causes the series to remain mostly the same but with an extra $A$ in front.

The way that we use our Lie algebras to define a path through the Lie group is precisely by thinking of motion across the Lie group as solving a differential equation. Our differential equation looks like $$\begin{gather*} \frac{dA}{dt} = B\cdot A(t) \end{gather*}$$ for $A(t) \in \mathrm{O}(n)$ and $B \in \mathfrak{o}(n)$ using the orthogonal group as an example. We think of this as saying that since movement across a multiplicative group is defined by multiplication, we want to multiply continuously, and this might be thought of as some matrix $C\in \mathrm{O}(n)$ to the power of $t$, $C^t \cdot A(0)$ with $t$ any real number. Of course, it is not obvious how to compute $C^t$, but if we imagine $C$ to have a matrix logarithm where $\log(C) = B \in \mathfrak{o}(n)$ then our solution is $e^{tB}\cdot A(0)$. And $e^{tB}$ can be calculated, usually using any kind of diagonalization $B = P D P^{-1}$ with $D$ a diagonal matrix, such as by eigenvalue decomposition. We would then compute $$\begin{gather*} \exp(t P^{-1} D P) = P^{-1} \begin{bmatrix} \exp(\lambda_1 t) & 0 & ... \\[0em] 0 & \exp(\lambda_2 t) & ... \\[0em] \vdots & \vdots & \ddots \end{bmatrix} P, \end{gather*}$$ expontiating along the diagonal. We can do this since the matrix-polynomial called for in the exponential has factors such as $BBB$ expressed as $(P D P^{-1})(P D P^{-1})(P D P^{-1})$ which we can rewrite as $P DDD P^{-1}$ by cancelling. In effect, the diagonalizing matrix $P$ (which may, for instance, be the eigenvector matrix) will cancel inside such $B^n$ terms, leaving $D^n$ where $P$ and $P^{-1}$ can be extracted on the left and right, leaving $P e^{Dt} P^{-1}$.

In fact the matrix logarithm does exist, however for the same reason that you can't calculate the logarithm of a negative number, strict definitions for how to calculate the matrix logarithm comparable to the formula for the exponential as above only work for a fixed region rather than giving a general definition like we have for the exponential. Outside of this region, each formula for the matrix logarithm will begin to disobey $\exp(\log(C))=C$. Moreover, depending on the Lie group, it may simply become impossible to extend this region, as we will shortly discuss, since the existence of a perfect matrix logarithm would imply that the exponential map is an isomorphism, which it may not be.

The note we made at the end of the previous section now has very clear implications. Since $\mathrm{O}(n)$ is disconnected, we cannot get from the identity of many of our Lie groups to all the other parts of it. For instance, there is no Lie algebra element $B$ for which a sufficiently large $t$ will cause $e^{tB}$ to yield a matrix with a negative determinant. For the orthogonal matrices, this means that the Lie algebra of the orthogonal group and the Lie algebra of the special orthogonal group are literally the same. However two Lie groups can also share a Lie algebra for other reasons.

One example of this is the circle described by the rotations of $\mathrm{SO}(2)$ sharing a Lie group with $\reals^1$ since the tangent of a circle is a line, but the tangent of a line is also a line. This can be seen as the set of antisymmetric matrices in $\reals^{2\times 2}$ only has one free parameter on the off-diagonal (since the diagonal is zero), just as $\reals^1$ only has one free parameter; nonetheless, exponentiating the antisymmetric matrix will produce a member of $\mathrm{SO}(2)$, so we are not to think that two Lie groups sharing a Lie algebra means that the exponential map will suddenly fail.

No, the exponential map can fail for other reasons. That is to say, all the exponential map can do is tell you how to walk away from a point in a direction, but there are places in a Lie group that you may be unable to reach if all you do is move in one direction the whole time. Take for instance the special linear group. Recall we defined the special linear group as the group of matrices with determinant equal to one; this also means that it does not suffer from any of the issues of disconnected components such as the orthogonal group or general linear group, so we can't blame that for the exponential map failing to take us somewhere. Since we require of $A \in \mathrm{SL}_n(\reals)$ that $\det(A) = 1$, we can differentiate in $A$ much in the same way we did earlier for the orthogonal group, however in this instance I will simply assert that the determinant's derivative is the trace operator, i.e. the sum of the values on the diagonal. So the derivative of the requirement $\det(A) = 1$ yields that the Lie algebra $\mathfrak{sl}_n(\reals)$ is the matrices satisfying $\mathrm{tr}(A) = 0$. Now, let's select the following matrix: $$\begin{gather*} A = \begin{bmatrix} -2 & 0 \\[0em] 0 & -1/2 \end{bmatrix} \in \mathrm{SL}_2(\reals). \end{gather*}$$

The proof follows for those who want to see it.

Proof.

For the sake of contradiction, let's assume there does exist a matrix $B \in \mathfrak{sl}_2(\reals)$, i.e. $\mathrm{tr}(B) = 0$, which has $e^B = A$. By the method we described above, of taking an eigenvalue decomposition and applying an exponential on the diagonal, we'll presume $B$ has some $B = P D P^{-1}$ decomposition and say $e^B = P e^D P^{-1}$. Note also that the formula for the trace $\sum_i B_{ii}$, in the case of a matrix multiplication, looks a lot like a loop $\sum_i B_{ii} = \sum_{i,j,k} P_{ij} D_{jk} P^{-1}_{ki}$ which means writing $\mathrm{tr}(B) = 0$ is the same as writing $\mathrm{tr}(PDP^{-1}) = \mathrm{tr}(P^{-1} P D) = \mathrm{tr}(D)$, so we know the eigenvalues of $B$ sum to zero. Since we're discussing $2\times 2$ matrices, there are only two eigenvalues with $\lambda_1 + \lambda_2 = 0$, so we'll just say $D$ has $\lambda$ and $-\lambda$ on the diagonal. Now, sometimes matrices with real entries have complex eigenvalues, but that isn't a trouble for us, that just means that $\lambda$ is some complex number $a + ib$, and with $e^B = P e^D P^{-1}$, our $e^B$ must look like $$\begin{gather*} P\begin{bmatrix} e^a \big(\cos(b) + i \sin(b) \big) & 0 \\[0em] 0 & e^{-a} \big(\cos(b) - i \sin(b) \big) \end{bmatrix} P^{-1} \end{gather*}$$ For this to be a special linear matrix as we had wanted, we need it to have determinant one, but the only factor that affects this is $D$ since the determinants of $P$ and $P^{-1}$ will cancel. However, the same property we noticed that the trace of a matrix is the sum of its eigenvalues can now be used on $e^B$ and $A$. We see that the trace of $A$ is $-5/2$, we should hope that if $e^B$ is truly $A$, that there is a way to chose $a,b \in \reals$ such that the diagonal above sums to $-5/2$. We'll set $c = e^a$ and try that: $$\begin{align*} \frac{-5}{2} &= e^a \big(\cos(b) + i \sin(b) \big) + e^{-a} \big(\cos(b) - i \sin(b) \big) \\[0em] &= c \big(\cos(b) + i \sin(b) \big) + c^{-1} \big(\cos(b) - i \sin(b) \big) \\[0em] 0 &= c^2 \cos(b) + i c^2 \sin(b) + \frac{5}{2}c + \cos(b) - i\sin(b) \end{align*}$$ The complex numbers in this expression don't stop us from using the quadratic formula to calculate $c$ in terms of $b$ by treating the above expression as a quadratic equation in $c$, so let's do that too. $$\begin{gather*} c = \frac{-(5/2) \pm \sqrt{25/4 - 4(\cos(b) + i\sin(b))(\cos(b) - i\sin(b))}}{2(\cos(b) + i\sin(b))} \\[0em] = \frac{-(5/2) \pm 3/2 }{2(\cos(b) - i\sin(b))} \end{gather*}$$ We've used a few tricks here to simplify this, such as Pythagoras' identity and $e^{ix} = \cos(x) + i\sin(x)$, $\sin(-x) = \sin(x)$, etc. but our conclusion shows that the assumption $c = e^a$ with $a \in \reals$ is immediately violated if $b$ is not $\pi$, since not only is that one of the two values $b$ could take resulting in $c$ being a real number, but $b=0$ would imply that $c= e^a$ is negative, which is impossible for a real exponential. In that case, we obtain solutions $c = 1/2$ and $c = 2$ and $e^{ib} = -1$. This is almost what we wanted, except that $c$ can only be one of these. The result is we have deduced $\mathrm{tr}(e^B)$ is either $-1$ or $-4$.

A more general proof can be given without even picking a $A$ matrix in which one shows that all $B \in \mathfrak{sl}_2(\reals)$ has $\mathrm{tr}(e^B) \ge -2$, and so our choice of a matrix $A \in \mathrm{SL}_2(\reals)$ with trace $-5/2$ was the key trick.

Our example of the circle and the line is not to say that the Lie algebra is defined merely by its shape, although this is the case in one dimension. In dimensions higher than one, Lie algebras carry an extra piece of information that can distinguish them on the basis of what you might think of as curvature, but not on the basis of topology. Details of how topology comes into play will have to be delayed until part 2, however this forms the fundamental limitation of how much of the Lie group may be recovered from just the Lie algebra with the exponential map.

6. Lie Algebras

It is important to remember throughout all of this that groups are best thought of defining a set of invertible actions on some object. In fact one can even think of the group multiplication operator, in the sense that we had described it being differentiable on a Lie group, as itself being an action. One of the first group actions described in the theory of Lie groups is the 'left translation action', the map $L_g \colon G \to G$ which for some $g \in G$ sends each $h\in G$ to $gh$, or the respective 'right translation action' $R_g$ which sends $h$ to $hg$. When we speak of the group multiplication as being differentiable, it is precisely these maps $L_g$ and $R_g$ we are requiring to be differentiable, and upon integrating these derivatives back into curves, we obtain the previously discussed exponential map. Moreover, these actions allow us to take a Lie algebra in the sense that it is a direction space, and apply that set of directions to every point in the space, defining the left invariant vector fields and right invariant vector fields.

But these left and right translation actions are set-actions; they treat $G$ like any other manifold and simply move points around. They are not homomorphisms. When a group acts on another group (including itself), if it does so homomorphically, then it must do so isomorphically, since anything less than an isomorphism would mean an action is not reversible. In fact, the term for this, since they are isomorphisms of a group to itself, is automorphically, which is important since unlike isomorphisms which may map different groups to one another, automorphisms of a group can apply repeatedly and chain together as many times as desired. In fact, the automorphisms of a group tend to define a group itself, and thus defining a group action on a group ends up amounting to a homomorphism between the group and the group of automorphisms of the target. This is to say, if $G$ acts on $H$, then there is a homomorphism $\Phi \colon G \to \mathrm{Aut}(H)$, where for some $g \in G$, $\big(\Phi(g)\big) \colon H \to H$ defines some isomorphism; if we wrote $f_1 = \Phi(g_1)$ and $f_2 = \Phi(g_2)$, we would expect both $f_1$ and $f_2$ to have the homomorphic property, and for some $h\in H$, it would make sense to speak about $f_1(f_1(f_2(h)))$.

The reason we are discussing this is because multiplicative Lie groups are non-euclidean and vector spaces are euclidean. Even for a discrete group, we can speak of any group having a conjugate subgroup of all elements $[g,h] := ghg^{-1} h^{-1}$ (for any pair $g,h \in G$) which will capture all the ways in which the group is not commutative (i.e. violates $gh = hg$). This subgroup is actually normal, so we can factor it out to define a commutative quotient group written $G/[G,G]$, a process which we call abelianization. But factoring this subgroup out simply means we cease studying the group itself; we are often interested in precisely the ways in which the group fails to be commutative, and currently we lack the tools to do this in any way which takes advantage of the Lie group structure. What we find will be the missing piece that makes a Lie algebra more than just a vector space, and something that actually cares which Lie group it was derived from in the ways described above.

The trick for us will be to define an action called the conjugation action which will simultaneously capture similar properties to the conjugate subgroup while itself being a differentiable map. This action also exists on discrete groups as well, where it is a mainstay of many instructive group theory proofs due to its homomorphic property. We define it $\kappa_g\colon G \to G$ as $\kappa_g(h) = ghg^{-1}$, and see that for $a,b,c \in G$, we have $$\begin{align*} \kappa_{ab}c &= abc(ab)^{-1}\\[0em] &= abcb^{-1} a^{-1} \\[0em] &= a(\kappa_b (c)) a^{-1} \\[0em] &= \kappa_a (\kappa_b (c)) \end{align*}$$ Even better, we can define $\kappa_g$ as the combination of $L_g$ and $R_{g^{-1}}$. Our task now will be to find a way to differentiate this action so that it works for Lie algebras. The first step on that journey is the capital-A Adjoint map.

Definition 0326-matrixgroups.6Adjoint Map

Let $G$ be a Lie group, $\mathfrak{g}$ its Lie algebra, and for any $g\in G$, $\kappa_g\colon G \to G$ its conjugation action. We call its derivative the Adjoint map; for a matrix $g \in G$ and a matrix $\xi \in \mathfrak{g}$ in its Lie algebra, we define $\mathrm{Ad}_g \colon \mathfrak{g} \to \mathfrak{g}$ as a linear map on the vector space formed by the Lie algebra, $$\begin{gather*} \mathrm{Ad}_g \xi = \frac{d}{dt} \Big( g \cdot e^{t\xi } \cdot g^{-1} \Big)_{t=0} \end{gather*}$$ defined as the change in $g \cdot e^{t \xi} \cdot g^{-1}$ for a tiny change in $t$ away from $t=0$. On matrices, this is simply $g\xi g^{-1}$.

The Adjoint map has many uses to many people other than us; we are not done until even that little $g$ on the bottom of $\mathrm{Ad}_g$ is replaced with a Lie algebra element. We should at least think about what this map is doing, and thus what it means: we have replaced the $h$ in our previous $\kappa_g(h)$ with the exponential of a Lie algebra element $\xi \in \mathfrak{g}$. Since $\exp \colon \mathfrak{g} \to G$, this makes perfect sense, but the scalar $t$ in there means we can differentiate the quantity in a single value. If our intuitions about exponentials from normal calculus are any indication, $\frac{d}{dt} e^{t\xi}$ at $t=0$ should just be $\xi$. In fact that is true even in the Lie group context, but with the $g$ and $g^{-1}$ on either side, we have an important modification to this. What we have said is, before considering a direction we could go in, do not consider this direction from the identity as we might normally with the Lie algebra. Instead, start at $g^{-1}$, take a tiny step in the direction $\xi$ should be in, and then use $g$ to take $\xi$ at $g^{-1}$ back to the identity.

The final result is the answer to the question: what direction did our initial direction change in, when we considered it somewhere else then dragged it back? Consider the following example using the rotations of $\mathrm{SO(3)}$ on the Earth; Ecuador is roughly on the opposite side of the world as Indonesia, roughly on the equator, and is at a fairly close longitude to New York. If you were in New York, travelled to Utah across a line of longitude (perhaps by somehow being left behind as the Earth spun around its north-south-pole axis), took a tiny step in the direction you would go if the Earth underneath you were to spin on its Ecuador-Indonesia-equatorial axis (this might be towards San Diego), and then went back to New York, how would the direction of that step have changed? This is now a thing we can calculate, although the result is relatively disappointing when the three dimensions of $\mathrm{SO}(3)$ are flattened to the two dimensional surface of a sphere. The more interesting result will come when our trip from New York to Utah is no longer a journey that would take a plane but merely a single step.

Example 0326-matrixgroups.7

The calculation above can be done by hand if we remember that what $\mathrm{SO}(2)$ looks like. That is, the antisymmetric matrices of $\mathfrak{so}(3)$ are of the form $$\begin{gather*} \begin{bmatrix} 0 & -x & -y \\[0em] x & 0 & -z \\[0em] y & z & 0 \end{bmatrix}, \hspace{2em} x,y,z\in \reals \end{gather*}$$ which gives us the following basis vectors for our Lie algebra. $$\begin{gather*} \begin{bmatrix} 0 & -1 & 0 \\[0em] 1 & 0 & 0 \\[0em] 0 & 0 & 0 \end{bmatrix}, \hspace{1em} \begin{bmatrix} 0 & 0 & -1 \\[0em] 0 & 0 & 0 \\[0em] 1 & 0 & 0 \end{bmatrix}, \hspace{1em} \begin{bmatrix} 0 & 0 & 0 \\[0em] 0 & 0 & -1 \\[0em] 0 & 1 & 0 \end{bmatrix} \end{gather*}$$ The first and third matrices here look a lot like the matrices in $\mathfrak{so}(2)$, and so it turns out they look a lot like the rotations of $\mathrm{SO}(2)$. In fact their exponentials $\exp(tB)$ for $B$ equal to each of those are $$\begin{gather*} \begin{bmatrix} \cos(t) & -\sin(t) & 0 \\[0em] \sin(t) & \cos(t) & 0 \\[0em] 0 & 0 & 1 \end{bmatrix}, \hspace{0.5em} \begin{bmatrix} \cos(t) & 0 & -\sin(t) \\[0em] 0 & 1 & 0 \\[0em] \sin(t) & 0 & \cos(t) \end{bmatrix}, \hspace{0.5em} \begin{bmatrix} 1 & 0 & 0 \\[0em] 0 & \cos(t) & -\sin(t) \\[0em] 0 & \sin(t) & \cos(t) \end{bmatrix} \end{gather*}$$ We can set $g$ equal to one of these and $e^{t\xi}$ equal to another, and our calculation will become something like $$\begin{gather*} \frac{d}{d\phi}\left( \begin{bmatrix} \cos(\theta) & -\sin(\theta) & 0 \\[0em] \sin(\theta) & \cos(\theta) & 0 \\[0em] 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\[0em] 0 & \cos(\phi) & -\sin(\phi) \\[0em] 0 & \sin(\phi) & \cos(\phi) \end{bmatrix} \begin{bmatrix} \cos(\theta) & \sin(\theta) & 0 \\[0em] -\sin(\theta) & \cos(\theta) & 0 \\[0em] 0 & 0 & 1 \end{bmatrix} \right) \end{gather*}$$ with $\phi = 0$ at the end. Completing this calculation yields the following Lie algebra element. $$\begin{gather*} \begin{bmatrix} 0 & 0 & \sin(\theta) \\[0em] 0 & 0 & -\cos(\theta) \\[0em] -\sin(\theta) & \cos(\theta) & 0 \end{bmatrix} \end{gather*}$$ Placing this result in the context of our above description on the globe, this says that the direction you point in changes in the direction matching the spin of the Earth across its Algeria-Pacific-Ocean axis (the axis $90^\circ$ away on the equator from the Ecuador-Indonesia axis). When we describe this on the sphere instead of as a rotation of the space of rotations, these rotations look quite similar, making the example a bit less illustrative.

In the specific case of $\theta = 2\pi/9$, or $40^{\circ}$ of longitude as is the case of New York to Utah, we could think of this change as going from pointing south-south-west to south-west-west as rotated by a different axis $40^\circ$ west of Ecuador. This may seem confusing until you remember that lines of longitude are not straight lines on the earth, and to travel one (except on the equator) requires airplanes to turn constantly.

Much to the confusion of everyone involved, the derivative of the Adjoint map is often also called the adjoint map, distinguished by the capitalization of the word adjoint, but also written using the commutator bracket notation from before. The big Adjoint map can be thought of as $\mathrm{Ad}\colon G \times \mathfrak{g} \to \mathfrak{g}$ and the little adjoint map as $\mathrm{ad} \colon \mathfrak{g} \times \mathfrak{g} \to \mathfrak{g}$. This is the map we want to describe the failure of the group to be commutative.

Definition 0326-matrixgroups.8Lie Bracket/adjoint map

Let $G$ be a Lie group and $\mathfrak{g}$ its Lie algebra. In the same way we did before, for two Lie algebra elements $\xi,\nu \in \mathfrak{g}$, we define $\mathrm{ad}_{\xi} \nu$, also written $[\xi,\nu]$ to be $$\begin{align*} \mathrm{ad}_\xi \nu := [\xi,\nu] := \frac{d}{dt} \big(\mathrm{Ad}_{\exp(t\xi)} \nu \big)_{t=0} \end{align*}$$ We then call the notation $[\xi,\nu]$ the Lie bracket.

Immediately one may realize we need a better way to calculate this Lie bracket without derivatives if we are to have any hope of using it. In this case, for $\xi,\nu \in \mathfrak{g}$ as matrices, our calculation can simplify using $\mathrm{Ad}_g \xi = g \xi g^{-1}$ and the standard calculus product rule as $$\begin{align*} & \frac{d}{dt} \left(\frac{d}{d\tau} \left(e^{t\xi} e^{\tau \nu} e^{-t \xi} \right)_{\tau = 0}\right)_{t=0} \\[0em] &= \frac{d}{dt} \left( e^{t\xi} \nu e^{-t \xi} \right)_{t=0} \\[0em] &= \frac{d}{dt} \left( e^{t\xi} \right)_{t=0} \cdot \nu e^{-0 \cdot \xi} + e^{0 \cdot \xi} \nu \cdot \frac{d}{dt} \left( e^{-t \xi} \right)_{t=0} \\[0em] &= \xi \nu - \nu \xi \end{align*}$$

And this is the matrix commutator. It should make perfect sense for what we want it to do, measuring the failure of $gh \neq hg$, since this expression literally says "take two directions and take the difference of going in those directions in different orders". With this in hand, we know the ways that different Lie algebras are distinct beyond merely their dimension; the Lie algebra carries with it a Lie bracket which tells us how the group it came from fails to commute factors.

What is to stop us now from simply defining a Lie bracket, and insisting on some space of directions just to see what space it generates? Almost nothing, but we need to learn the rules that Lie brackets follow. Much is obvious from the formula $[\xi,\nu] = \xi\nu - \nu\xi$, but we need to derive the Jacobi identity. We can be derive it by first noticing that $\mathrm{Ad}_g [\xi,\nu] = [\mathrm{Ad}_g\xi, \mathrm{Ad}_g\nu]$. Observe: $$\begin{align*} \mathrm{Ad}_g[\xi,\nu] &= \mathrm{Ad}_g (\xi\nu - \nu \xi) \\[0em] &= g\xi\nu g^{-1} - g \nu \xi g^{-1} \\[0em] &= (g \xi g^{-1}) (g \nu g^{-1}) - (g \nu g^{-1}) (g \xi g^{-1}) \\[0em] &= (\mathrm{Ad}_g \xi) (\mathrm{Ad}_g \nu) - (\mathrm{Ad}_g \nu) (\mathrm{Ad}_g \xi) \\[0em] &= [ \mathrm{Ad}_g \xi , \mathrm{Ad}_g \nu] \end{align*}$$

Now consider, what if we take the Lie bracket of a Lie bracket? What is $[\xi,[\nu,\mu]]$? We can find this by once again referring to the definition of the Lie bracket as a derivative of the Adjoint map: $$\begin{align*} [\xi,[\nu, \mu]] &= \frac{d}{dt} \left( \mathrm{Ad}_{\exp(t\xi)}[\nu,\mu]\right)_{t=0} \\[0em] &= \frac{d}{dt} \left( [\mathrm{Ad}_{\exp(t\xi)}\nu, \mathrm{Ad}_{\exp(t\xi)}\mu]\right)_{t=0} \\[0em] &= \frac{d}{dt} \left( (\mathrm{Ad}_{\exp(t\xi)}\nu) (\mathrm{Ad}_{\exp(t\xi)}\mu) - (\mathrm{Ad}_{\exp(t\xi)}\mu)(\mathrm{Ad}_{\exp(t\xi)}\nu) \right)_{t=0} \end{align*}$$ Applying the product rule (and then packaging the difference back up into Lie brackets, this line would be a bit too long otherwise), we proceed with $$\begin{align*} &= \frac{d}{dt} \left( [\mathrm{Ad}_{\exp(t\xi)}\nu, \mu] \right)_{t=0} + \frac{d}{dt} \left( [\nu, \mathrm{Ad}_{\exp(t\xi)}\mu] \right)_{t=0} \\[0em] &= [[\xi,\nu],\mu] + [\nu,[\xi,\mu]] \end{align*}$$

Using $[\xi,\nu] = \xi\nu - \nu \xi$ to deduce that the Lie bracket is anti-commutative, i.e. $[\xi,\nu]= -[\nu,\xi]$, we can swap these around a bit and derive the Jacobi identity: $$\begin{gather*} [\xi,[\nu,\mu]] = - [\mu,[\xi,\nu]] - [\nu,[\mu,\xi]] \\[0em] \text{or} \hspace{2em} [\xi,[\nu,\mu]] + [\mu,[\xi,\nu]] + [\nu,[\mu,\xi]]= 0 \end{gather*}$$

It is worth noting that this identity, commonly touted as a defining property of the Lie bracket, is as we derived it, merely an expression of the product rule. This also means in a fashion, that the Lie bracket itself is a derivation of its own.

We are now, finally, ready to give the formal definition of a Lie algebra.

Definition 0326-matrixgroups.9

Let $\mathfrak{g}$ be a vector space together with a binary operation $[\square,\square]\colon \mathfrak{g} \times \mathfrak{g} \to \mathfrak{g}$. We say that this vector space and the binary operation together form a Lie algebra if the binary operation satisfies $[\xi,\nu] = -[\nu,\xi]$ and the Jacobi identity: $$\begin{gather*} [\xi,[\nu,\mu]] + [\mu,[\xi,\nu]] + [\nu,[\mu,\xi]]= 0. \end{gather*}$$

We should build some intuitions here. The Lie bracket tells us how a tiny step in a direction changes if before we take it, we take a step in some other direction, then afterwards we take the reverse step in that other direction. Obviously such a change would be miniscule, but the tools of calculus can renormalize this tiny change to something large enough to speak about. The first thing we should expect is that if we apply a Lie bracket to the same element twice, this is the same as asking how a step changes if we take two steps in the same direction and one step backwards; the step in the middle does not change at all, and so we have $[\xi,\xi] = \xi\xi - \xi \xi = 0$ as we should expect.

Returning to our example from earlier, we can see by a very simple calculation that the Lie bracket of any two of the three basis vectors of $\mathfrak{so}(3)$ is simply the third. We can do this calculation explicitly.

Example 0326-matrixgroups.10

We'll recall and now name our Lie algebra matrices for $\mathfrak{so}(3)$: $$\begin{gather*} L_x =\begin{bmatrix} 0 & -1 & 0 \\[0em] 1 & 0 & 0 \\[0em] 0 & 0 & 0 \end{bmatrix}, \hspace{1em} L_y = \begin{bmatrix} 0 & 0 & -1 \\[0em] 0 & 0 & 0 \\[0em] 1 & 0 & 0 \end{bmatrix}, \hspace{1em} L_z = \begin{bmatrix} 0 & 0 & 0 \\[0em] 0 & 0 & -1 \\[0em] 0 & 1 & 0 \end{bmatrix} \end{gather*}$$ Knowing now that $[\xi,\nu] = \xi\nu - \nu\xi$, we can calculate a pair explicitly and we see $$\begin{align*} &\begin{bmatrix} 0 & -1 & 0 \\[0em] 1 & 0 & 0 \\[0em] 0 & 0 & 0 \end{bmatrix}\begin{bmatrix} 0 & 0 & -1 \\[0em] 0 & 0 & 0 \\[0em] 1 & 0 & 0 \end{bmatrix} - \begin{bmatrix} 0 & 0 & -1 \\[0em] 0 & 0 & 0 \\[0em] 1 & 0 & 0 \end{bmatrix} \begin{bmatrix} 0 & -1 & 0 \\[0em] 1 & 0 & 0 \\[0em] 0 & 0 & 0 \end{bmatrix} \\[0em] &= \begin{bmatrix} 0 & 0 & 0 \\[0em] 0 & 0 & -1 \\[0em] 0 & 0 & 0 \end{bmatrix} - \begin{bmatrix} 0 & 0 & 0 \\[0em] 0 & 0 & 0 \\[0em] 0 & -1 & 0 \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 \\[0em] 0 & 0 & -1 \\[0em] 0 & 1 & 0 \end{bmatrix} = L_z. \end{align*}$$ The other calculations proceed similarly.