Introduction

Picture a busy Saturday at Westfield. Shoppers flow smoothly up two parallel escalators, until one suddenly lurches to a halt.
Within minutes a knot of people forms, blocking the landing, spilling back onto shop floors and even jamming the still‑working escalator.

Why does one failure bring the whole system to its knees?
The answer sits at the heart of queueing theory.

Escalators as Servers

Treat each escalator as an independent server with:

Arrival rate $\lambda$ (people per second)
Service rate $\mu$ (how many people the escalator transports per second)

With two functioning escalators we have an M/M/2 system (Poisson arrivals, exponential service, two servers).
When one breaks, it instantly degrades to M/M/1, doubling utilisation and destroying spare capacity.

Utilisation is

\rho = \frac{\lambda}{n\mu}

so dropping from $n=2$ to $n=1$ makes
$\rho_{\text{new}} = 2\rho_{\text{old}}$ .

Even if the mall was operating at a comfortable $\rho_{\text{old}} = 0.45$ , the failure pushes it to $0.9$ : dangerously near saturation.

Little’s Law: How Many People Are Stuck?

Little’s Law links average number in system $L$ , arrival rate $\lambda$ , and waiting time $W$ :

L = \lambda W.

As $\rho\!\to\!1$ , waiting time $W$ (and thus queue length $L$ ) explodes non‑linearly.

For an M/M/1 queue

L_q = \frac{\rho^{2}}{1-\rho}, \quad W_q = \frac{\rho}{\mu(1-\rho)}.

At $\rho=0.9$ we expect nine people on average per metre of escalator, ignoring the growing tail.

Birth–Death Chains & Steady State

The queue length over time follows a birth–death Markov chain where:

Births occur at rate $\lambda$ (arrivals)
Deaths occur at rate $\mu$ (services)

Steady‑state probabilities are

\pi_k = (1-\rho)\,\rho^{k}, \quad k \ge 0.

The missing escalator removes a ‘death’ process, instantly shifting the distribution to heavier tails, more probability mass in long queues.

A Quick Simulation

Below is a discrete‑event simulation of Saturday traffic.
Watch how queue length skyrockets the moment one escalator stops:

Queue length over time

# Simplified sim (Poisson arrivals, exponential service)
import random, heapq

def mmn_sim(lam, mu, n, t_max):
    t, q, busy, next_arrival = 0.0, 0, 0, random.expovariate(lam)
    departures, timeline = [], []

    events = [(next_arrival, 'arr')]
    while t < t_max:
        t, typ = heapq.heappop(events)
        if typ == 'arr':
            q += 1
            if busy < n:
                busy += 1
                dep = t + random.expovariate(mu)
                heapq.heappush(events, (dep, 'dep'))
            heapq.heappush(events, (t + random.expovariate(lam), 'arr'))
        else:                       # departure
            q -= 1; busy -= 1
            if q:
                busy += 1
                dep = t + random.expovariate(mu)
                heapq.heappush(events, (dep, 'dep'))
        timeline.append((t, q))
    return timeline

Run two minutes with $n=2$ , then switch to $n=1$ , queue length leaps from single digits to hundreds.

Takeaways

Redundancy hides fragility. Parallel servers mask high utilisation until one fails.
Utilisation, not capacity, drives delay. Crossing (\rho = 1) is catastrophic.
Design for failure. Add buffer space and overflow paths or implement real‑time redirection (e.g. open staircases).