panning and zooming around a point explained visually

My friend Zak completed his Ph.D. in computer vision at Harvard in 2012, and for his disseration defense, we wired up a system that would take a picture of the audience, perform facial recognition on the crowd, and then pan and zoom around each face while displaying who Zak's algorithms thought was sitting in each seat.

The recognition aspect was cool, but I wanted to discuss conceptually how panning and zooming works, since it's useful for many projects in photography, mapping, visualization, and beyond.


Panning is relatively straightforward: one must keep track of when a touch or drag begins, and then ensure that the content translation offset matches the total displacement of the pointer.

Drag me!

Just like in physics, it's important to keep frames of reference consistent: in the above example, all translations are calculated relative to the top-left of the container. This is somewhat arbitrary. Whatever frame of reference is chosen, it's important to stay consistent within it, otherwise you'll go insane with various transformations between coordinate spaces.


Maintaining the top-left reference frame for scaling, we can easily add zooming of the content in response to the mouse wheel or pinch gestures. There's a problem, however: as we zoom, the content moves under our pointer, making it difficult to actually zoom in on the detail that we want as a user. Try zooming in on Boston or the islands—you'll have to alternate between zooming and panning to get closer.

Drag and scroll me!

Zooming around a point

To avoid this, you must compensate for the offset introduced by the zoom by translating the content in the opposite direction by the same magnitude. Everything works as expected.

Drag and scroll me!

Of course, many other frames of reference exist that might make your particular application simpler, for example calculating everything based on the center of the container and content.