The Four Fundamental Subspaces
This is a first blog post in the series “Fundamental Theorem of Linear Algebra”, where we are working through Gilbert Strang’s paper “The fundamental theorem of linear algebra” published by American Mathematical Monthly in 1993.
In this post, we will go through the first two parts of the Fundamental Theorem: the dimensionality and the orthogonality of the Fundamental Subspaces.
But first, let’s start with defining the notation we will use. Greek lower case letter α,β, ... are used for scalars, bold letters b,x, ... – vectors, lowercase indexed letters x1,x2, ... – components of vectors, capital letters A – matrices, indexed bold letters a1,a2, ... ,r1,r2, ... –
columns or rows of a matrix. 0 is a vector of appropriate dimensionality with 0 in each component.
Fundamental subspaces
A vector space is a set where we have addition and scalar multiplication operators (refer to wikipedia for more details). A subspace is a subset of some vector space such that it’s closed under addition and scalar multiplication, i.e. given some subspace S, if x,y∈S then for any α,β∈R, (αx+βy)∈S.
Let A be a matrix with m rows and n columns. Then there are four fundamental subspaces for A:
- C(A): the column space of A, it contains all linear combinations of the columns of A
- C(AT): the row space of A, it contains all linear combinations of the rows of A (or, columns of AT)
- N(A): the nullspace of A, it contains all solutions to the system Ax=0
- N(AT): the left nullspace of A, it contains all solutions to the system ATy=0.
Example. Consider a matrix A=[111123234]
The column space C(A) is formed by all linear combinations of columns of A: i.e. it is α1[112]+α2[123]+α3[134] with all possible choices of (α1,α2,α3).
The row space C(AT) is formed by linear combinations of rows of A: β1[111]+β2[123]+β3[234] for all possible choices of (β1,β2,β3).
We will see examples for N(A) and N(AT) a bit later.
Dimensionality
The first part of the theorem talks about the dimensionality of these subspaces. It states the following:
- dimC(A)=dimC(AT)=r, where r is the rank of A,
- dimC(AT)+dimN(A)=n and
- dimC(A)+dimN(AT)=m
It means that
- dimC(A)=dimC(AT)=r,
- dimN(A)=n−r and
- dimN(AT)=m−r
How can we see that?
Let’s continue with the example and find the nullspace of A=[111123234].
First, we notice that the third row of A is a sum of the first one and the second one. Thus, the dimensionality of the row space C(AT) is 2, and it means that the dimensionality of the nullpace N(A) must be m−r=3−2=1.
Let’s check it: apply Gaussian elimination to transform A to the Row-Reduced Echelon Form (“RREF” for short).
A matrix is in the echelon form when the leading zero elements of each row form a “staircase”:
The first non-zero element of each row is called pivot, and the corresponding columns are called pivot columns. In RREF, the pivot elements must be equal to 1 and all other elements of the pivot column must be equal to 0. Other non-pivot columns are called free columns.
RREF is typically obtained via Gaussian Elimination: you can subtract a multiple of a row from any other row and also you can multiply any row by a scalar until your matrix is in RREF.
If you have Matlab, you can calculate RREF as follows:
R = rref(A)
Let’s get back to our matrix A=[111123234] and its RREF is R=[10−1012000]
There are two pivot variables, let’s these variables x1 and x2. Also, there is one free variable x3 that doesn’t have a pivot: it has 0 on this position.
The free variable can take any value, for example, we can assign x3=1, so the solution to Ax=0 is xn=[x1x21]. Now we solve the system and obtain x1=1,x2=−2, so the solution is xn=[1−21].
There is nothing special about the choice x3=1, so instead we can choose x3=α, and obtain xn=[α−2αα]=α[1−21]. All possible choices of α form the nullspace N(A).
So the nullspace of A is a line.
Orthogonality
The second part of the theorem says that the row space C(AT) and the nullspace N(A) of A are orthogonal and that the column space C(A) and the left nullspace N(AT) are also orthogonal.
Two vectors are orthogonal if their dot product is 0. If all vectors of one subspace are orthogonal to all vectors of another subspace, these subspaces are called orthogonal.
It is easy to see:
- Consider the system Ax=0
- Let’s write A as rows: [—(row 1)——(row 2)—⋮—(row n)—]⋅x=[00⋮0]
- You can convince yourself that it’s the same as writing [(row 1)Tx=0(row 2)Tx=0⋮(row n)Tx=0]
- Thus every row of A is orthogonal to this x.
- SinceAx=0, this x belongs to the nullspace N(A), and it means that every x∈N(A) is orthogonal to any row of A.
- Now let’s consider some linear combinations of rows, e.g. (α(row 1)+β(row 2))Tx, then α(row 1)Tx+β(row 2)Tx=α0+β0=0
- So any linear combination of rows of A (i.e. any vector from the row space C(AT)) is orthogonal to any vector from the nullspace N(A).
- It means that C(AT) and N(A) are orthogonal.
The same is true for C(A) and N(AT): all we have to do is to transpose the matrix A and come to the same conclusion.
If two spaces are orthogonal and they together span the entire vector space, they are called orthogonal complements. C(AT) and N(A) are
orthogonal complements, and the same is true for C(A) and N(AT). We can illustrate this with a picture:
On the left part of the diagram we have the row space C(AT) and the nullspace N(A). They are orthogonal, meet only in the origin, and they together span the space Rn. On the right we have C(A) and N(AT): they are also orthogonal and they together span Rm.
Solution to Ax=b
Now we can use the first two parts of the theorem and develop some intuition about the system Ax=b and the crucial role the subspaces play in it.
Row Space Solution
The general solution to the system is x=xp+xn, where xp is some solution to the system, and xn is the homogenous (nullspace) solution to Ax=0. It works because Ax=A(xp+xn)=Axp+Axn=b+0=b.
Since C(AT) and N(A) are orthogonal complements, they span the entire space Rn and every vector x from Rn can be expressed as x=xr+xn such that xr is from the row space C(AT) and xn is from the nullspace N(A).
There are many possible choices of xp, but among them only one of them belongs to the row space. We call it the row space solution xr to the system.
But sometimes there is no solution. A solution exists only if the right-side b belongs to the column space C(A), i.e. when b is a linear combination of columns of A.
Let’s put it more formally: if ai are the columns of A, i.e. A=[|a1||a2| ⋯ |an|], then the solution exists if we can find coefficients (x1, ... ,xn) such that x1a1+x2a2+ ... +xnan=b. If we can, then they form a solution x=[x1⋮xn].
You can convince yourself that writing x1a1+x2a2+ ... +xnan=b is the same as writing Ax=b.
We can illustrate this with a diagram:
In this diagram b is in the column space C(A), so there is a solution x to the system Ax=b. This solution x can be expressed as xr+xn such that xr belongs to the row space C(AT) and xn is from the nullspace N(A). We know that Axr=b and Ax=b, and we can show that with arrows from xr and x to b.
Let’s get back to our example: consider a system Ax=b with A=[111123234] and b=[011].
We already know that the column space is α1[112]+α2[123]+α3[134]. Now to check if the system has a solution, we need to show that b∈C(A). Or, in other words, we need to show that it is possible to find such x=[x1x2x3] that x1[112]+x2[123]+x3[134]=[011].
In this example it is possible: x1=−1,x2=1 and x3=0. So we have −1[112]+1[123]+0[134]=[011].
Not only have we established that b∈C(A), but also found a solution xp=[x1x2x3]=[1−10]. This solution is not necessarily the row space solution, i.e. it may not belong to the row space C(AT).
The nullspace N(A) contains all the solutions to Ax=0. We already know how to find the nullspace solution, and for this example it’s xn=[α−2αα]=α[1−21].
The complete solution x is a sum of some solution xp and the homogenous solutions xn:
x=xp+xn=[011]+α[1−21]. Note that the set of all solutions x is just a shifted nullspace N(A).
We can illustrate it with following picture:
Because the rank of A is two, the row space C(AT) contains only two linearly independent vectors, so it is a plane in R3 formed by two rows r1 and r2. The row space solution xr also belongs to C(AT), but the solution xp=[1−10] does not.
Also, maybe this animation will be helpful:
Click here if it doesn’t play.
Here we draw the row space, the nullspace and the solution set, and we just rotate it to see what’s going on (otherwise it’s hard to understand a 3D picture in 2D). Here the plane is the row space, the black dashed line is the nullspace, and the red dashed line is the solution set. We see that the solution space and the row space intersect in one special point, and this point is the special solution xr.
Let’s freeze it and look at one particular angle:
In this 2D projection we don’t see the row space altogether, so it looks like its two basis vectors are lying on the line. Then we have the nullspace, which is orthogonal to the row space, with one basis vector. Finally, we have a solution set (dotted red line) and one particular solution (the red arrow). The row space solution xr is on the intersection between the solution set and the row space (the black arrow).
You are probably wondering how we can obtain the row space solution xr. To do this, at first we need to recognize that it’s a projection of xp onto the row space C(AT). We will see how to find this projection in the next post of this series: “The Least Squares: The Projection onto Subspaces”.