Linear Transformations

  • A linear transformation (or linear map) is a function that takes a vector and produces another vector, while preserving addition and scaling. If $T$ is linear, then:

    • $T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v})$
    • $T(c\mathbf{u}) = cT(\mathbf{u})$
  • Every linear transformation can be represented as multiplication by a matrix. The matrix is the transformation. When you multiply a vector by a matrix, you are applying a linear transformation to it.

  • Think of a $2 \times 2$ matrix as a machine that takes in 2D vectors and outputs new 2D vectors. The columns of the matrix tell you where the standard basis vectors $\hat{\mathbf{i}}$ and $\hat{\mathbf{j}}$ end up after the transformation. Everything else follows from linearity.

The columns of a matrix show where the basis vectors land

  • For example, if
$$ A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} $$

then $\hat{\mathbf{i}} = [1, 0]^T$ lands at $[2, 1]^T$ (column 1) and $\hat{\mathbf{j}} = [0, 1]^T$ lands at $[1, 2]^T$ (column 2). Every other vector is a combination of these two, so its output follows automatically.

  • Multiplying two matrices can be thought of as applying one transformation after another. If $B$ transforms vectors from one space and $A$ transforms the result, then $AB$ does both in sequence. In a game engine, rotating a character and then moving them forward is a different result from moving them first and then rotating, which is why matrix multiplication is not commutative.

  • Rotation turns vectors by an angle $\theta$ without changing their length. The vector stays the same size, it just points in a new direction.

Rotation preserves length but changes direction

  • In 2D, the rotation matrix is:
$$ R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} $$
  • For $\theta = 90°$:
$$ R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} $$

so $[1, 0]^T$ becomes $[0, 1]^T$. The vector pointing right now points up. Rotation matrices are orthogonal and always have determinant 1. When you rotate a photo on your phone, this is the exact matrix being applied to every pixel coordinate.

  • In 3D, there are separate rotation matrices for each axis. A robotic arm rotates each joint around a specific axis, and each joint is one rotation matrix. Rotation around the z-axis looks like the 2D case embedded in 3D:
$$ R_z(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix} $$
  • Scaling stretches or shrinks vectors along each axis independently:
$$ S(s_x, s_y) = \begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix} $$

Scaling stretches each axis by a different factor

  • $S(2, 1.5)$ doubles the x-component and multiplies the y-component by 1.5. Scaling by $-1$ along an axis flips that component. A diagonal matrix is always a scaling transformation. When you resize an image to 50%, you are applying $S(0.5, 0.5)$ to every pixel coordinate.

  • Reflection flips vectors across an axis or line, like a mirror. Reflecting across the x-axis keeps the x-component and negates the y-component:

$$ \text{Ref}_x = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} $$

Reflection across the x-axis flips the y-component

  • For example, $[3, 2]^T$ becomes $[3, -2]^T$. When your phone flips a selfie horizontally so text reads correctly, it is applying a reflection matrix. Reflecting across the line $y = x$ swaps the two components:
$$ \text{Ref}_{y=x} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} $$
  • Reflection matrices have determinant $-1$, confirming they flip orientation.

  • Rotations and reflections are both rigid transformations: they preserve distances and angles. The matrices that represent them are orthogonal matrices, which is why orthogonal matrices always have determinant $+1$ (rotation) or $-1$ (reflection).

  • Shearing skews vectors along one axis proportionally to the other. A horizontal shear by factor $k$:

$$ \text{Sh}_x(k) = \begin{bmatrix} 1 & k \\ 0 & 1 \end{bmatrix} $$

Shearing slides the top sideways while the bottom stays fixed

  • Each point slides horizontally by $k$ times its height. With $k = 0.5$, a point at height 2 shifts right by 1. The bottom row stays put, the top row slides. This is how italic text works: upright letters are sheared so they slant to the right.

  • All of the above (rotation, scaling, reflection, shearing) are linear transformations. They keep the origin fixed and preserve straight lines. But what about translation (shifting everything by a fixed amount)?

  • Translation is not a linear transformation because it moves the origin. If you shift every point right by 3, the zero vector moves to $[3, 0]^T$, breaking linearity. To handle it, we use an affine transformation, which combines a linear transformation with a translation:

$$\mathbf{y} = A\mathbf{x} + \mathbf{t}$$

  • To represent this as a single matrix multiplication, we use homogeneous coordinates: add an extra 1 to every vector and use an $(n+1) \times (n+1)$ matrix:
$$ \begin{bmatrix} A & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix} \begin{bmatrix} \mathbf{x} \\ 1 \end{bmatrix} = \begin{bmatrix} A\mathbf{x} + \mathbf{t} \\ 1 \end{bmatrix} $$
  • Affine transformations preserve straight lines and parallelism, but not necessarily angles or lengths. Every object in a video game is positioned using affine transformations: rotate it, scale it, then place it at the right location, all encoded in a single matrix.

  • A degenerate transformation (singular matrix) collapses space into a lower dimension.

  • For example, the matrix

$$ \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} $$

maps every 2D vector onto a single line, because both columns point in the same direction. The determinant is zero, information is lost, and the transformation cannot be undone.

  • Converting a colour image (3 values per pixel: red, green, blue) to grayscale (1 value per pixel) is a degenerate transformation: the colour information is permanently gone.

  • In ML, linear transformations are the core of neural networks, data is represented as a matrix (a stack of vectors representing features of an object like humans, planes, text, image...anything!)

  • Each layer applies a matrix multiplication (linear transformation), details are provided in other chapters, we need to explain hpw to structure these data and motivate neural networks properly.

  • However, the most used techniques today often almost exclusively passes the data through a bunch of linear transformations, we call these Transformers.

  • Gemini, ChatGPT, Claude, Qwen, DeepSeek and the best performing AI in the world today, are transformers!

Coding Tasks (use CoLab or notebook)

  1. Apply a rotation matrix to a vector and plot both the original and rotated vector. Try different angles.
import jax.numpy as jnp
import matplotlib.pyplot as plt

theta = jnp.pi / 3
R = jnp.array([[jnp.cos(theta), -jnp.sin(theta)],
               [jnp.sin(theta),  jnp.cos(theta)]])

v = jnp.array([1.0, 0.0])
v_rot = R @ v

plt.figure(figsize=(5, 5))
plt.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='red', label='original')
plt.quiver(0, 0, v_rot[0], v_rot[1], angles='xy', scale_units='xy', scale=1, color='blue', label='rotated')
plt.xlim(-1.5, 1.5); plt.ylim(-1.5, 1.5)
plt.grid(True); plt.legend(); plt.gca().set_aspect('equal')
plt.show()
  1. Apply a shearing transformation to a set of points forming a square and visualise the deformed shape.
import jax.numpy as jnp
import matplotlib.pyplot as plt

square = jnp.array([[0,0],[1,0],[1,1],[0,1],[0,0]]).T

k = 0.5
shear = jnp.array([[1, k],
                    [0, 1]])
sheared = shear @ square

plt.figure(figsize=(6, 4))
plt.plot(square[0], square[1], 'r-o', label='original')
plt.plot(sheared[0], sheared[1], 'b-o', label='sheared')
plt.grid(True); plt.legend(); plt.gca().set_aspect('equal')
plt.show()