Introduction to Special Relativity
Special relativity is a theory of spacetime that was developed by Albert Einstein in 1905. It is based on two postulates:
- The laws of physics are the same in all inertial frames of reference.
- The speed of light in a vacuum \(c\) is the same for all observers, regardless of the motion of the light source or the observer.
These postulates lead to a number of counterintuitive consequences, such as time dilation, length contraction, and the relativity of simultaneity. On this page, we introduce the basic concepts of special relativity and explore some of its key results.
Implications of Special Relativity
Time Dilation
One of the famous results of special relativity is time dilation - that a moving clock runs slower than a stationary clock.
To see that this is a consequence of the postulates, we can consider a light clock, which consists of two mirrors facing each other with a light pulse bouncing between them. One 'tick' of the clock is the time it takes for the light pulse to travel from one mirror to the other.
Let’s put the light clock in a train moving at a constant velocity \(v\) relative to the station. In the frame of the train, the mirrors are stationary, and so the light pulse travels vertically between the mirrors. The time it takes for the light pulse to travel from one mirror to the other is given by
\[\Delta t_{\text{train}} = \frac{L}{c} \, ,\]where \(L\) is the distance between the mirrors. In the frame of the station, however, the light pulse travels along a diagonal path, since the mirrors are moving horizontally with the train. The distance the light pulse travels is equal to the hypotenuse of a right triangle whose length is \(\sqrt{L^2 + (v \Delta t_{\text{station}})^2}\), where \(\Delta t_{\text{station}}\) is the time it takes for the light pulse to travel from one mirror to the other in the station frame.
How do we know that the distance between the mirrors \(L\) in the direction perpendicular to the motion is the same in both frames? If we imagine people in both the train and station frames holding rulers along the direction perpendicular to their relative motion, then as they pass, the two rulers must be the same length. If one were shorter than the other (something both observers would agree on - imagine that the rulers had paintbrushes taped on both ends!), there would be a preferred frame, violating the first postulate.
For this reason, lengths perpendicular to the motion are the same in both frames. We will see in the next section that lengths parallel to the motion do change.
Light clock in the train frame (left) and station frame (right).
From the postulate that the speed of light is frame-independent, we then have
\[\Delta t_{\text{station}} = \frac{\sqrt{L^2 + (v \Delta t_{\text{station}})^2}}{c} \, .\]If we plug in \(L = c \Delta t_{\text{train}}\), we can rearrange to get
\[\Delta t_{\text{station}} = \frac{\Delta t_{\text{train}}}{\sqrt{1 - {v^2}/{c^2}}} = \gamma \Delta t_{\text{train}} \, ,\]where \(\gamma = 1/\sqrt{1 - v^2/c^2}\) is the Lorentz factor. When \(v = 0\), \(\gamma = 1\), and as \(v \rightarrow c\), \(\gamma \rightarrow \infty\). Thus we see that the time interval between ticks of the light clock is longer when the clock is moving (station frame) than when it is stationary (train frame).
This theoretical light clock can be synchronised with any clock, and the two must be in agreement, otherwise there would be a way to tell if you were moving or not, which violates the first postulate. Therefore, this result applies to all moving clocks, including the human body, which is a biological clock! We can say that time itself runs slower for an object in motion - this is the phenomenon of time dilation.
Muons are subatomic particles that are produced in the upper atmosphere by cosmic rays. They have a mean lifetime of $\tau_0 = 2.2 \, \mu$s when at rest. Estimate the $\gamma$ factor for cosmic muons that reach the Earth's surface (travelling a distance of $L = 15$ km) before decaying.
Length Contraction
A natural consequence of time dilation is length contraction, that moving objects appear shorter in the direction of motion.
Let us consider muons that are produced in the upper atmosphere by cosmic rays, as in the example above. As viewed from the Earth’s surface, the muons have a lifetime larger than their rest lifetime by a factor of \(\gamma\) due to time dilation. This enables them to reach the Earth’s surface before decaying. But if we go to the muon’s frame, its rest lifetime is too short to travel the full \(15\) km to the Earth’s surface.
This is only possible if the atmosphere’s thickness is contracted in the muon’s frame. The length contraction factor must be exactly \(\gamma\), so that the muon can reach the Earth’s surface before decaying.
To do this a bit more formally, we introduce the notion of proper time and proper length, which are the time and length measured in the frame in which the object in question is at rest. We let \(\tau_0\) be the proper lifetime of the muon and \(t\) the lifetime in the Earth’s frame. Similarly, we let \(L_0\) be the proper length of the atmosphere and \(L\) the length in the muon’s frame.
The speed of the muon in the Earth’s frame is \(v = L_0/t\), and the speed of the Earth in the muon’s frame is \(v = L/\tau_0\). These speeds must be equal by the principle of relativity (there is no preferred frame), so we have
\[v = \frac{L_0}{t} = \frac{L}{\tau_0} \quad \implies \quad \frac{L_0}{L} = \frac{t}{\tau_0} = \gamma \, ,\]where we have used the time dilation formula \(t = \gamma \tau_0\) from the section above.
Therefore, the length of the atmosphere in the muon’s frame is contracted (\(L = L_0/\gamma\)). As before, this result applies to all moving objects, which appear contracted in the direction of motion.
Relativity of Simultaneity
Another consequence of special relativity is the relativity of simultaneity, that events that are simultaneous in one frame are not simultaneous in another frame.
We can see this with our previous setup of a train moving at a constant velocity \(v\) relative to the station. We put two light detectors at the front and back of the train, and a light source at the midpoint. When the light source is turned on, the light pulse travels to the front and back detectors.
In the frame of the train, the light pulse travels the same distance to each detector, and so the detectors record the light pulse at the same time.
In the frame of the station, however, the front detector is moving away from the light source, and the back detector is moving towards it. The light pulse has to travel a longer distance to reach the front detector than the back detector, and so the detectors do not record the light pulse at the same time. This is shown in the diagram below.
Order of events in station frame when light pulse is emitted from midpoint of train (not to scale).
This result is a consequence of the fact that the speed of light is the same for all observers, regardless of the motion of the light source or the observer. It means that simultaneity is relative, and that two events that are simultaneous in one frame are not simultaneous in another frame.
Two events at the front and back of a train occur simultaneously in the train frame. The train is moving at a speed $v = 0.6c$ relative to the station and has a proper length of $L_0 = 100$ m. Calculate the time difference between the events in the station frame.
Lorentz Transformation
We now look to quantify the effects of special relativity using the Lorentz transformation. This is a set of equations that relate the coordinates of an event in one frame to the coordinates of the same event in another frame moving at a velocity \(v\) relative to the first.
An event in spacetime is described by four coordinates: three spatial coordinates \(x, y, z\) and one time coordinate \(t\). We can combine these coordinates into a four-vector \(x^\mu = (ct, x, y, z)\), where \(c\) is the speed of light.
We will normally consider a lab frame \(S\) and a frame \(S^\prime\) moving at a velocity \(v\) in the \(x\) direction relative to \(S\). At time \(t = t^\prime = 0\), the origins of the two frames coincide. The coordinates of an event in \(S^\prime\) are given by \(x^{\prime \mu} = (ct^\prime, x^\prime, y^\prime, z^\prime)\). We want to find a mathematical description of the map \(x^\mu \rightarrow x^{\prime \mu}\).
The first thing to note is that if the origins of the frames coincide, so that \(x^\mu = (0, 0, 0, 0) \to x^{\prime \mu} = (0, 0, 0, 0)\), the transformation must be linear. The reason for this is that it shouldn’t matter if we choose to measure lengths in half-metres or double-metres, or if we choose to measure time in half-seconds or double-seconds. Otherwise, we would have to introduce some arbitrary length or time scale into our equations.
This means that the transformation can be written as a matrix equation
\[\begin{pmatrix} ct^\prime \\ x^\prime \\ y^\prime \\ z^\prime \end{pmatrix} = \begin{pmatrix} a & b & c & d \\ e & f & g & h \\ i & j & k & l \\ m & n & o & p \end{pmatrix} \begin{pmatrix} ct \\ x \\ y \\ z \end{pmatrix} \, .\]Or more succintly in index notation
\[x^{\prime \mu} = \Lambda^{\mu}_{\phantom{\mu} \nu} x^\nu \, ,\]where \(\Lambda^{\mu}_{\phantom{\mu} \nu}\) is the Lorentz transformation matrix. The task now is to find the components of this matrix.
We can argue that \(y = y^\prime\) and \(z = z^\prime\), since if we imagine observers in both frames holding metre sticks in the \(y\) or \(z\) direction, then as they pass each other, the metre sticks must be the same length. If one were shorter than the other, this would be a preferred frame, which violates the first postulate.
We can also argue that since the motion is only in the \(x\) direction, only the \(x\) and \(t\) coordinates will be mixed up. This means that the matrix must be of the form
\[\Lambda^{\mu}_{\phantom{\mu} \nu} = \begin{pmatrix} A & B & 0 & 0 \\ C & D & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \, .\]The calculation of the components of this 2x2 submatrix is given as an exercise below.
Calculate the components of the Lorentz transformation of the $x$ and $t$ coordinates $$ \begin{pmatrix} ct^\prime \\ x^\prime \end{pmatrix} = \begin{pmatrix} A & B \\ C & D \end{pmatrix} \begin{pmatrix} ct \\ x \end{pmatrix} \, . $$ Use the following pieces of information: $$ \begin{align*} x = ct & \implies x^\prime = ct^\prime \, , \\ x^\prime = 0 & \implies x = vt \, , \\ x = 0 & \implies x^\prime = -vt^\prime \, , \\ x = 0 & \implies t^\prime = \gamma t \, , \end{align*} $$ with $\gamma = 1/\sqrt{1 - v^2/c^2}$ the Lorentz factor. The first equation is the requirement that the speed of light is the same in all frames, the second and third equations are that the origins of the frames have relative speed $v$, and the last equation is the time dilation formula.
Thus, the Lorentz transformation matrix is
\[\Lambda^{\mu}_{\phantom{\mu} \nu} = \begin{pmatrix} \gamma & -\gamma v/c & 0 & 0 \\ -\gamma v/c & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \, .\]This can be rewritten in terms of the rapidity \(\eta\), where \(v = c \tanh \eta\), as
\[\Lambda^{\mu}_{\phantom{\mu} \nu} (\eta) = \begin{pmatrix} \cosh \eta & -\sinh \eta & 0 & 0 \\ -\sinh \eta & \cosh \eta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \, .\]Successive Lorentz transformations can be combined by multiplying the corresponding matrices, and it is found that rapidites add (just as angles add in rotations). That is, \(\Lambda^{\mu}_{\phantom{\mu} \nu} (\eta_1) \Lambda^{\nu}_{\phantom{\nu} \rho} (\eta_2) = \Lambda^{\mu}_{\phantom{\mu} \rho} (\eta_1 + \eta_2)\). For the inverse transformation, we can simply change \(v \to -v\) (which gives \(\eta \to -\eta\)), since from if \(S^\prime\) is moving at a velocity \(v \mathbf{\hat{x}}\) relative to \(S\), then \(S\) is moving at a velocity \(-v \mathbf{\hat{x}}\) relative to \(S^\prime\).
This particular transformation is a Lorentz boost in the \(x\) direction. This is the only form we need to consider, since we can always rotate the axes to make the motion in the \(x\) direction. A general Lorentz transformation is a combination of a boost and a spatial rotation.
Revisiting Time Dilation, Length Contraction, and Relativity of Simultaneity
To make our lives easier, we should be using units where \(c = 1\), e.g., distances in light-seconds and times in seconds. This is because the algebra is much simpler, and we can always put the \(c\) back in at the end to get the correct units. In these units, the Lorentz transformation takes the form
\[\begin{align*} \Delta t^\prime &= \gamma (\Delta t - v \Delta x) \, , \\ \Delta x^\prime &= \gamma (\Delta x - v \Delta t) \, , \\ \Delta y^\prime &= \Delta y \, , \\ \Delta z^\prime &= \Delta z \, . \end{align*}\]The Lorentz factor is $\gamma = 1/\sqrt{1 - v^2}$. We use \(\Delta t\) and \(\Delta x\) to denote the differences in time and space coordinates between two events in the lab frame \(S\), and \(\Delta t^\prime\) and \(\Delta x^\prime\) to denote the differences in time and space coordinates between the same two events in the moving frame \(S^\prime\). This form is generally more useful for calculations.
Note that the inverse transformation is given by \(v \to -v\), i.e.,
\[\begin{align*} \Delta t &= \gamma (\Delta t^\prime + v \Delta x^\prime) \, , \\ \Delta x &= \gamma (\Delta x^\prime + v \Delta t^\prime) \, . \end{align*}\]For time dilation, the object is at rest in the \(S^\prime\) frame, so \(\Delta x^\prime = 0\), and we have \(\Delta t = \gamma \Delta t^\prime\).
For length contraction, the two events are at the front and back of the object, with \(\Delta t = 0\) (i.e., simultaneous in \(S\)). Even though the events are not simultaneous in \(S^\prime\), the object is at rest in this frame, so \(\Delta x^\prime = l_0\), where \(l_0\) is the proper length of the object, regardless of what \(\Delta t^\prime\) is. We have \(\Delta x^\prime = \gamma \Delta x\), and so the measured length in \(S\) is \(\Delta x = l_0/\gamma\).
For the relativity of simultaneity, we can consider two events that are simultaneous in \(S^\prime\), so \(\Delta t^\prime = 0\). We have \(\Delta t = \gamma v \Delta x^\prime\), so the events are not simultaneous in \(S\).
All of these results are consistent with what we found earlier (check for yourself!), but the Lorentz transformation gives us a more general way to calculate these effects.
Spacetime Diagrams
We can represent the Lorentz transformation graphically using spacetime diagrams. These are diagrams in which the time coordinate is plotted on the vertical axis and the space coordinate is plotted on the horizontal axis. The worldline of an object is a curve in spacetime that represents the object’s motion through space and time.
The Lorentz transformation can be can be visualised as a shift of the axes, as shown below.
Velocity Addition
4-Vectors
We will now introduce the concept of 4-vectors, which are objects that transform via the Lorentz transformation. That is, if we have a 4-vector \(A^\mu = (A^0, \mathbf{A})\) in one frame, then in another frame moving at a velocity \(\mathbf{v} = v \mathbf{\hat{x}}\) relative to the first, the components of the 4-vector are given by (for the remainder of this page we use units where \(c = 1\))
\[\begin{align*} A^{\prime 0} &= \gamma (A^0 - v A^1) \, , \\ A^{\prime 1} &= \gamma (A^1 - v A^0) \, , \\ A^{\prime 2} &= A^2 \, , \\ A^{\prime 3} &= A^3 \, , \end{align*}\]where \(\gamma = 1/\sqrt{1 - v^2}\) is the Lorentz factor. This can be written in matrix form as
\[\begin{pmatrix} A^{\prime 0} \\ A^{\prime 1} \\ A^{\prime 2} \\ A^{\prime 3} \end{pmatrix} = \begin{pmatrix} \gamma & -\gamma v & 0 & 0 \\ -\gamma v & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} A^0 \\ A^1 \\ A^2 \\ A^3 \end{pmatrix} \, ,\]or in index notation as
\[A^{\prime \mu} = \Lambda^{\mu}_{\phantom{\mu} \nu} A^\nu \, ,\]where \(\Lambda^{\mu}_{\phantom{\mu} \nu}\) is the Lorentz transformation matrix.
We can define the inner product of two 4-vectors as
\[A^\mu B_\mu = A \cdot B = - A^0 B^0 + A^1 B^1 + A^2 B^2 + A^3 B^3 = - A^0 B^0 + \mathbf{A} \cdot \mathbf{B} \, ,\]where we use the notation \(A \equiv A^\mu\) and \(B \equiv B^\mu\), which can save some writing, especially for collision questions.
Show that the inner product of two 4-vectors is invariant under Lorentz transformations, i.e., that $$ A \cdot B = A^\prime \cdot B^\prime \, . $$
The invariance of the inner product of two 4-vectors is extremely useful, since it means we can always choose the frame in which the calculation is easiest. This is analogous to the invariance of the dot product of two 3-vectors under rotations.
As a side note, when we deal with collisions (e.g., in particle physics), we often use the opposite sign convention, that \(A \cdot B = A^0 B^0 - \mathbf{A} \cdot \mathbf{B}\). The reason for this is to make the invariant \(P \cdot P = m^2\) (rather than \(-m^2\)), where \(P \equiv p^\mu\) is the 4-momentum that we will meet shortly.
Position 4-Vector
We have already introduced the position 4-vector \(x^\mu = (t, x, y, z)\). This is in a sense the most basic 4-vector, since we used it to derive the Lorentz transformation. For the position 4-vector between two events, we use \(\Delta x^\mu = (\Delta t, \Delta x, \Delta y, \Delta z)\). We can then form the invarian quantity
\[\Delta x^\mu \Delta x_\mu = - \Delta t^2 + \Delta x^2 + \Delta y^2 + \Delta z^2 \, .\]This is sometimes called the spacetime interval \(s^2\) (with a minus sign depending on convention), and is the same in all inertial frames. We can classify intervals between events as follows:
- \(\Delta t^2 > \Delta \|\mathbf{x}\|^2\): timelike interval, where the events can be causally connected.
- \(\Delta t^2 = \Delta \|\mathbf{x}\|^2\): lightlike interval, where the events are separated by a light signal.
- \(\Delta t^2 < \Delta \|\mathbf{x}\|^2\): spacelike interval, where the events are not causally connected.
The invariant interval between two events on an object’s worldline can be evaluated in its rest frame, where $\Delta x^\mu = (\Delta \tau, \mathbf{0})$ (with \(\tau\) the proper time), giving
\[\Delta x^\mu \Delta x_\mu = - \Delta \tau^2 \, .\]Therefore the proper time interval between two events is frame-independent.
Velocity and Momentum 4-Vectors
The 4-velocity is defined as
\[u^\mu = \frac{dx^\mu}{d\tau} \, .\]Since \(\text{d} x^\mu\) is a 4-vector and the proper time interval \(d\tau\) is frame-independent, \(u^\mu\) transforms in the same way as \(x^\mu\), and is therefore a 4-vector. Since \(\text{d} t = \gamma \text{d} \tau\), we have
\[u^\mu = \gamma \frac{d}{dt} (t, \mathbf{x}) = \gamma (1, \mathbf{v}) \, ,\]where \(\mathbf{v} = d\mathbf{x}/dt\) is the 3-velocity. The inner product of the 4-velocity with itself is easiest to evaluate in the rest frame, where \(u^\mu = (1, \mathbf{0})\), giving
\[u^\mu u_\mu = -1 \, .\]We could have also evaluated this in the lab frame, but we know that the inner product of two 4-vectors is invariant under Lorentz transformations so we would have obtained the same result (check this if you’re not convinced!).
Show that the Lorentz factor of the relative velocity $\mathbf{w}$ of two objects with velocities $\mathbf{u}$ and $\mathbf{v}$ is given by $$ \gamma_{w} = \gamma_{u} \gamma_{v} (1 - \mathbf{u} \cdot \mathbf{v}) \, , $$ where $\gamma_{u} = 1/\sqrt{1 - u^2}$ and $\gamma_{v} = 1/\sqrt{1 - v^2}$ are the Lorentz factors of the two objects. As a hint, evaluate the inner product of the velocity 4-vectors $u^\mu$ and $v^\mu$ in two different frames, and use the fact that the inner product of two 4-vectors is invariant under Lorentz transformations.
The 4-momentum is defined as
\[p^\mu = m u^\mu = \gamma m (1, \mathbf{v}) \, ,\]where \(m\) is the rest mass of the object. We can define energy as $E = \gamma m$ and momentum as $\mathbf{p} = \gamma m \mathbf{v}$ (which we won’t attempt to justify here), so that the momentum 4-vector can be written as
\[p^\mu = (E, \mathbf{p}) \, .\]In the rest frame of the object, the momentum 4-vector is $(m, \mathbf{0})$, and in the lab frame, it is $(E, \mathbf{p})$. The inner product of the momentum 4-vector with itself is invariant under Lorentz transformations, and so equating the two gives
\[m^2 = E^2 - p^2 \, .\]Note that for a massless particle, such as a photon, we have $E = p$.
You can hopefully see how much writing is saved by using natural units, where $c = 1$. To put the factors of $c$ back in, remember that $E/c$, $\mathbf{p}$, and $mc$ all have units of momentum.
Acceleration and Force 4-Vectors
The acceleration 4-vector is defined as
\[a^\mu = \frac{du^\mu}{d\tau} = \gamma (\dot{\gamma}, \dot{\gamma} \mathbf{v} + \gamma \mathbf{a}) \, ,\]where $\mathbf{a} = \dot{\mathbf{v}}$ is the 3-acceleration. Note that if we evaluate the derivative \(\dot{\gamma}\), we find that
\[\dot{\gamma} = \frac{\text{d}}{\text{d} t} \frac{1}{\sqrt{1 - v^2}} = \frac{1}{(1 - v^2)^{3/2}} v \frac{\text{d} v}{\text{d} t} = \gamma^3 \mathbf{v} \cdot \mathbf{a} \, ,\]where we have used
\[v \frac{\text{d} v}{\text{d} t} = \frac{1}{2} \frac{\text{d}}{\text{d} t} \mathbf{v} \cdot \mathbf{v} = \mathbf{v} \cdot \mathbf{a} \, .\]This can be understood simply as the fact that only the component of the acceleration parallel to the velocity changes the speed (the perpendicular component changes the direction of the velocity).
The force 4-vector is defined as
\[f^\mu = m a^\mu = \gamma m (\dot{\gamma}, \dot{\gamma} \mathbf{v} + \gamma \mathbf{a}) \, .\]Other Resources
For this topic, I highly recommend reading the chapters on special relativity in The Feynman Lectures on Physics (volume 1, chapters 15-17). They provide a slightly different perspective on the subject, and are a great read in general.