Feed: Shape Operator

Entries found: 10

How to lie with units

Published: Sat, 15 Feb 2020 00:00:00 -0800
Updated: Sat, 15 Feb 2020 00:00:00 -0800
UTC: 2020-02-15 08:00:00+00:00
URL: http://www.shapeoperator.com/2020/02/15/how-to-lie-with-units/

For some reason, a six year old study claiming that dogs align with Earth’s magnetic field when they poop in “calm magnetic field conditions” is making the rounds online again. After this article was originally published in December 2013, it was reported uncritically by PBS, NPR, National Geographic, Vice, and many others.
Content Preview

For some reason, a six year old study claiming that dogs align with Earth’s magnetic field when they poop in “calm magnetic field conditions” is making the rounds online again. After this article was originally published in December 2013, it was reported uncritically by PBS , NPR , National Geographic , Vice , and many others.

The study’s conclusions hinge on a surprising distinction: dogs don’t always tend to align with Earth’s magnetic field when they poop; they only tend to do it during times when the field’s direction is especially steady These steady conditions occurred about one fifth of the time in the study ( table 8 ). .

If you’re familiar with the idea of p-hacking or data dredging , this kind of binning is probably enough to make you anxious See this xkcd cartoon for a fun take on the general concept, and this post for a criticism of this particular study along these lines. , but I don’t want to focus on statistics today.

Instead, I want to highlight exactly how small these variations in the Earth’s magnetic field direction actually are, because I think the study’s authors took several steps to obscure this point.

They measure variability of the field in % declination, and find that dogs alignment with the field while pooping is only significant when variability of the fields is less than 0.1%. Okay, but what exactly is this a percentage of? Declination essentially just means “direction of the field”—technically it is the angle between the direction of the local magnetic field and the direction to the North Pole, i.e. the angle between magnetic north and geographic north.

What could be meant by a percentage of a direction? My first guess was that maybe % declination meant “percentage of a full turn around a compass,” so that 1% is equivalent to 3.6 degrees. But the authors report observing variations of up to 8%, and I know from experience that a compass doesn’t just sit there swinging around by tens of degrees, at least not unless there is some exciting electrical equipment nearby.

No, the caption of figure 4 makes it clear that what is actually meant by % declination is arcminutes of change in declination per minute of time. An arcminute is one sixtieth of one degree. Calling this ratio a percentage is an odd pun on two different meanings of the word “minute”: minutes of time and arcminutes of direction The authors make it easy to miss exactly how small these variations are by using unfamiliar units, and again by using drawings of compasses with a highly exaggerated scale of rotation in figure 4. .

The authors claim that the alignment effect is only significant when the field variation is less than 0.1% declination. One way to rephrase this is that if, in the perhaps one minute it takes a dog to decide which direction to face while pooping, the earth’s magnetic field direction changes by 0.002 degrees, that will have a measurable effect on the behavior of the dog.

To put this in even more familiar terms, imagine the hour hand on a clock. It moves 360 degrees in 12 hours, or 0.5 degrees per minute. The authors are claiming that if the local direction of the Earth’s magnetic field is rotating at a rate that is more than 100 times slower than the hour hand on a clock, this will have a measurable effect on the behavior of dogs.

Perhaps you’re willing to believe that dogs are sensitive to magnetic fields. Nature is full of surprises, and there’s good evidence for magnetoreception in several other species. But are you also willing to believe that dogs are sensitive to such tiny variations in magnetic fields? Much more sensitive than a handheld magnetic compass?

Proving theorems about angles without angles

Published: Sun, 10 Feb 2019 00:00:00 -0800
Updated: Sun, 10 Feb 2019 00:00:00 -0800
UTC: 2019-02-10 08:00:00+00:00
URL: http://www.shapeoperator.com/2019/02/10/proving-theorems-about-angles-without-angles/

In several recent posts, I have been exploring a way of doing trigonometry using vectors and their various products while de-emphasizing angle measures and trigonometric functions.
Content Preview

In several recent posts, I have been exploring a way of doing trigonometry using vectors and their various products while de-emphasizing angle measures and trigonometric functions.

In this system, triangles are represented as sets of three vectors that add to 0

Triangle a b c
a + b + c = 0

The traditional law of cosines can be replaced with a vector equation that uses the dot product

c^2 = a^2 + b^2 + 2 a\cdot b

the law of sines can be replaced with a vector equation that uses the wedge product

a \wedge b = b \wedge c = c \wedge a

and rotations and reflections can be represented using geometric products of vectors. For vectors in the plane, the rotation of a vector v through the angle between vectors a and b can be represented by right multiplying by the product \hat{a}\hat{b} Reminder on notation: in these posts, lower case latin letters like a and b represent vectors, greek letters like \theta and \phi represent real numbers such as lengths or angles, and \hat{a} represents a unit vector directed along a , so that \hat{a}^2=1 and a = |a|\hat{a} . Juxtaposition of vectors represents their geometric product, so that ab is the geometric product between vectors a and b , and the geometric product is non-commutative, so the order of terms is important.

v_\mathrm{rot.} = v \hat{a}\hat{b}

and the reflection of v in any vector c can be represented as the “sandwich product”

v_\mathrm{refl.} = c v c^{-1} = \hat{c} v \hat{c}

Notice that none of these formulae make direct reference to any angle measures.

But without angle measures, won’t it be hard to state and prove theorems that are explicitly about angles?

Not really. Relationships between directions that can be represented by addition and subtraction of angle measures can be represented just as well using products and ratios of vectors with the geometric product. And the geometric product is better at representing reflections, which can sometimes provide fresh insights into familiar topics.

We’ll take as our example the inscribed angle theorem , because it is one of the simplest theorems about angles that doesn’t seem intuitively obvious (at least, it doesn’t seem obvious to me…).

The inscribed angle theorem says that every angle inscribed on a circle that subtends the same arc has the same angle measure Terminology: an inscribed angle of a circle is an interior angle of a triangle with 3 vertices lying on the circle’s circumference. Roughly speaking, an angle at a point subtends the things that you could see from the point if your field of view was limited to the given angle. In the figure, the blue inscribed angles all subtend the same purple arc.

A set of equal angles inscribed in a cirlce that subtend the same arc

and that a central angle that subtends the same arc as an inscribed angle has twice the angle measure as the inscribed angle:

The central angle that subtends the same arc of a circle as an inscribed angle has twice the angle measure

Let’s first go over a traditional proof of the inscribed angle theorem to gain some familiarity. The key is to draw in one more radius of the circle to form a pair of isosceles triangles that share a leg Isosceles terminology
Terminology: an isosceles triangle is a triangle with two sides of equal length. The two equal length sides are called legs and the third side is called the base . The legs meet at the vertex and the interior angle at the vertex is the vertex angle . The interior angles formed by the base and each leg are the base angles .
:

Drawing a third radius vector helps prove the inscribed angle theorem

The two base angles of an isosceles triangle are equal, so we can label angles on the figure as follows:

Labeled angles for proving the inscribed angle theorem

where \phi_1 and \phi_2 are base angles of two equilateral triangles, their sum, \phi_1 + \phi_2 , is an inscribed angle on the circle, \theta_1 and \theta_2 are vertex angles of the isosceles triangles and also central angles of the circle, and \theta_3 is the central angle that subtends the same arc as the inscribed angle \phi_1 + \phi_2.

The three central angles add up to a full turn

\theta_1 + \theta_2 + \theta_3 = 2 \pi

and the interior angles of the triangles each add up to a half turn (because interior angles of a triangle always add up to a half turn, or 180 degrees)

\begin{aligned} \theta_1 + 2\phi_1 &= \pi \\ \theta_2 + 2\phi_2 &= \pi \end{aligned}

Adding the previous two expressions and re-arranging gives

2(\phi_1 + \phi_2) = 2 \pi - (\theta_1 + \theta_2)

and recognizing that the right hand side is equal to \theta_3 gives

2(\phi_1 + \phi_2) = \theta_3

This proves the theorem Technically, this only proves the second part of the theorem. See Appendix A . because the left hand side is twice the inscribed angle, and the right hand side is the corresponding central angle.

This proof depended on the theorem that the base angles of an isosceles triangle are equal. Do you remember how to prove this?

Here’s one way: drop a median from the vertex to the midpoint of the base:

A median from the vertex of an isosceles triangle to the midpoint of the base

This produces two triangles that are congruent because they have three equal sides, and the base angles are corresponding angles in these congruent triangles, so they are equal.

Do you remember why triangles with equal sets of side lengths are congruent? I find that it’s pretty easy to memorize facts like this but forget exactly why they must be true If you want to remember how Euclid did all these things, Nicholas Rougeux has published a gorgeous new online edition of Byrne’s Euclid . The distinctive feature of Byrne’s Euclid, originally published in 1847, is that it uses small color-coded pictograms throughout the text to reference diagrams instead of letters, which gives it an appealing concreteness.

Euclid actually proves that the base angles of an isosceles triangle are equal ( proposition V ) a different way without referencing the theorem that triangles with equal sides are congruent, and then later uses this theorem as part of the proof that triangles with equal sides are congruent ( proposition VII and proposition VIII ).
.

Geometric algebra provides an interesting algebraic way to prove that the base angles of an isosceles triangle are equal, embedded as a special case of an equation that is true for all triangles. As usual, we begin with the condition that three vectors form a triangle

Triangle a b c
a + b + c = 0

Left multiplying the triangle equation by a gives

a^2 + ab + ac = 0

and alternatively, right multiplying by b gives

ab + b^2 + cb = 0

Subtracting these two equations gives

\left(a^2 - b^2\right) + (ac - cb) = 0

and in the special case that the lengths of a and b are equal so that the triangle is isosceles, the first term vanishes leaving

ac = cb

Dividing by |a||c| = |b||c| gives

\frac{a}{|a|}\frac{c}{|c|} = \frac{c}{|c|}\frac{b}{|b|}

or

\hat{a}\hat{c} = \hat{c}\hat{b}

which is an equation for the equal rotations through the equal base angles, represented as products of unit vectors \hat{a}\hat{c} and \hat{c}\hat{b} represent plane rotations through exterior angles; the corresponding interior angle rotations are -\hat{a}\hat{c} and -\hat{c}\hat{b} . .

The equation

ac = cb

also allows solving for a in terms of b and c by multiplying on the right by c^{-1} The ability to solve equations of products of vectors like this is one of the special advantages of geometric algebra. ,

a = cbc^{-1}

The right hand side is the sandwich product representation of the reflection of b in c , so in words, this equation says that reflecting one of the leg vectors of an isosceles triangle across the base vector gives the remaining leg vector Recall that equality of vectors implies they have the same length and direction, but implies nothing about location; vectors are not considered to have a location, and may be freely translated without change. . This is a fact about isosceles triangles that I had not considered until doing this manipulation.

The reflection of a leg vector of an isosceles triangle across the base is equal to the other leg vector

With this preparation, we are ready to prove the inscribed angle theorem using geometric algebra.

Labeld vectors representing chords and radii of a circle

Call chord vectors through a point on the circumference of a circle c_1 and c_2 , radius vectors subtending the corresponding central angle r_1 and r_2 , and the radius vector from the circle’s center to the shared point of the two chords r_3 .

Then the plane rotation through the central angle is

\hat{r}_1 \hat{r}_2

and we’re seeking a relationship between this rotation and the plane rotation through the inscribed angle,

\hat{c}_1 \hat{c}_2
Labeled unit vectors along chords and radii of a circle

We can summarize the geometrical content of these diagrams algebraically with two triangle equations and an equation expressing the equal lengths of the radius vectors There’s actually one other important piece of geometric information that isn’t spelled out in the algebra here. I’ll come back to this below. :

\begin{aligned} c_1 - r_1 + r_3 &= 0 \\ c_2 - r_2 + r_3 &= 0 \\ r_1^2 = r_2^2 &= r_3^2 \end{aligned}

We can take the first triangle equation and perform a similar manipulation as we did above: left multiply by -r_1 , separately right multiply by r_3 , and subtract the two results to give

\left(r_1^2 - r_3^2\right) - (r_1 c_1 + c_1 r_3) = 0

The first term is zero because the lengths of r_1 and r_3 are equal, so this becomes

r_1 c_1 = - c_1 r_3

or

r_1 = - c_1 r_3 c_1^{-1}

and in terms of unit vectors this is

\hat{r}_1 = - \hat{c}_1 \hat{r}_3 \hat{c}_1

Similar reasoning for the second triangle gives

\hat{r}_2 = - \hat{c}_2 \hat{r}_3 \hat{c}_2

Making these substitutions in the central angle product \hat{r}_1\hat{r}_2 gives

\begin{aligned} \hat{r}_1 \hat{r}_2 &= (- \hat{c}_1 \hat{r}_3 \hat{c}_1) (- \hat{c}_2 \hat{r}_3 \hat{c}_2) \\ &= \hat{c}_1 (\hat{r}_3 \hat{c}_1 \hat{c}_2) \hat{r}_3 \hat{c}_2 \\ &= \hat{c}_1 (\hat{c}_2 \hat{c}_1 \hat{r}_3) \hat{r}_3 \hat{c}_2 \\ &= \hat{c}_1 \hat{c}_2 \hat{c}_1 \hat{c}_2 \\ &= (\hat{c}_1 \hat{c}_2)^2 \end{aligned}

and the final line represents applying the rotation through the inscribed angle twice, which proves the inscribed angle theorem Again, technically, this only proves the second part of the theorem. See Appendix A . .

Let’s go through this manipulation line by line

  1. Substitute for \hat{r}_1 and \hat{r}_2 based on the fact that they are part of isosceles triangles that share a leg. The two negative signs will cancel one another.
  2. Re-associate the geometric product. The geometric product is associative, so the parentheses here serve only as a guide to the eye.
  3. Use the planarity condition This planarity condition is the “missing algebraic information” that I referred to above. On my first pass through this problem, it took me a while to connect this need to re-order three vectors to the fact that they lie in a single plane. In the plane, there is only one point that is equidistant from three given points, but in more dimensions there are more points that satisfy this condition. Consider: what does this set of points look like in 3D? , which says that the geometric product of any three vectors in the same plane equals its reverse, to reverse the parenthesized product of three vectors.
  4. Re-associate and use the fact that \hat{r}_3 is a unit vector, so \hat{r}_3^2 = 1.
  5. Collect identical products as a square.

The geometric product allowed us to exploit relationships between reflections and rotations that I typically wouldn’t think to see directly on a diagram, allowing the proof to be particularly concise.

Alternatively, it’s possible to mimic the original angle calculation a little more directly. Where we previously added internal angles of the triangles, we can instead multiply rotations to compose them:

Labeled unit vectors along chords and radii of a circle
\begin{aligned} (\hat{r}_1 \hat{c}_1)(\hat{c}_1 \hat{r}_3)(\hat{r}_3 \hat{r}_1) &= 1 \\ (\hat{r}_2 \hat{r}_3)(\hat{r}_3 \hat{c}_2)(\hat{c}_2 \hat{r}_2) &= 1 \end{aligned}

Interpreting these rotations as internal or external angles requires a little bit of care about signs, but algebraically, the equations are true almost trivially by re-associating and canceling squares of unit vectors.

Next, we can apply equality of base angles by replacing \hat{r}_1\hat{c}_1 with -\hat{c}_1\hat{r}_3 and \hat{c}_2\hat{r}_2 with -\hat{r}_3\hat{c}_2 :

\begin{aligned} -(\hat{c}_1 \hat{r}_3)^2(\hat{r}_3 \hat{r}_1) &= 1 \\ -(\hat{r}_2 \hat{r}_3)(\hat{r}_3 \hat{c}_2)^2 &= 1 \end{aligned}

In place of summing the equations for interior angles of two triangles, we can multiply these equations for compositions of rotations:

(\hat{c}_1 \hat{r}_3)^2(\hat{r}_3 \hat{r}_1)(\hat{r}_2 \hat{r}_3)(\hat{r}_3 \hat{c}_2)^2 = 1

Products of pairs of vectors in the plane commute (in other words, rotations in the plane commute), so we can rearrange this in order to pair and cancel factors of \hat{r}_3

\begin{aligned} 1 &= (\hat{r}_2 \hat{r}_3)(\hat{r}_3 \hat{r}_1)(\hat{c}_1 \hat{r}_3)(\hat{r}_3 \hat{c}_2)(\hat{c}_1 \hat{r}_3)(\hat{r}_3 \hat{c}_2) \\ &= \hat{r}_2 \hat{r}_1 \hat{c}_1 \hat{c}_2 \hat{c}_1 \hat{c}_2 \end{aligned}

Now left multiplying by \hat{r}_1\hat{r}_2 gives the desired result:

\hat{r}_1 \hat{r}_2 = (\hat{c}_1 \hat{c}_2)^2

Conclusion

It has become customary in elementary geometry to identify relationships between directions with relationships between angles, and to identify angles with numerical angle measures. But this is not strictly necessary In classical Greek geometry, lengths and angles actually weren’t identified with numbers, likely at least in part due to the fact that lengths are often irrational and angles are often transcendental. Instead, length, angle, and number were treated as separate concepts with some similar relationships. , and in this post, I have shown an example of how the same relationships between directions can instead be modeled by products and ratios of vectors without direct reference to numerical angle measures.

The vector approach has the advantage that the condition that three vectors form a triangle is very simple to write down,

a + b + c = 0

whereas the condition that three lengths and three angles describe a triangle is somewhat more complicated, and is not typically written down in full.

This linear vector equation for a triangle makes it straightforward to derive classical relationships between lengths and angles in a triangle, which typically appear as quadratic equations in the vectors. These manipulations can usually be described in a phrase or two:

  • Law of cosines : solve for c, square both sides
  • Law of sines : form the wedge product of the triangle equation with one of the vectors, solve for one of the two non-zero terms
  • Equal isosceles base angles (shown in this post): multiply on the left by one of the vectors and alternatively on the right by a different vector and subtract the two results; then two equal lengths imply two equal angles

An advantage of the vector model is that the relevant equations are algebraic in the vectors and their products, whereas they are transcendental in angle measures. So for example, the law of cosines in terms of angles,

c^2 = a^2 + b^2 - 2|a||b|\cos(\theta)

is transcendental in the angle \theta , whereas the vector equation

c^2 = a^2 + b^2 + ab + ba

is algebraic in the corresponding vector product ab .

The catch, of course, is that these quadratic equations involve non-commutative products, and there is a definite learning curve to becoming comfortable with this. But in my experience, the effort is repaid along the way through the insight you gain from viewing familiar things from an unfamiliar perspective.

Thanks as usual to Jaime George for editing this post.

Appendix A

Being careful about what we have proved

Technically, we have only proved the second part of the inscribed angle theorem: a central angle that subtends the same arc as an inscribed angle is equal to the inscribed angle composed with itself:

The central angle that subtends the same arc of a circle as an inscribed angle has twice the angle measure

which seems like it would imply the first part of the inscribed angle theorem, that all inscribed angles that subtend the same arc are equal,

Inscribed angles on a circle that subtend the same arc have the same angle measure

but actually doesn’t. The technicality is that we have proved that the squares of the rotations through these inscribed angles are equal, but this leaves an ambiguity about the sign Equating doubled angle measures has exactly the same ambiguity as equating the squares of rotations, though the ambiguity is perhaps easier to overlook at first glance. :

\begin{aligned} \left(\hat{c}_3\hat{c}_4\right)^2 &= \left(\hat{c}_1\hat{c}_2\right)^2 \\ \hat{c}_3\hat{c}_4 &= \pm \hat{c}_1\hat{c}_2 \end{aligned}

And in fact there are different inscribed angles that subtend different (but closely related) arcs that satisfy all the algebraic rules we wrote down (including planarity), but that do not describe equal rotations:

An an inscribed angle on a circle that subtends the same chord but not the same arc as several other inscribed angles does not have the same angle measure

The unequal inscribed angle here subtends an equal chord but opposite arc as the other equal inscribed angles. The relationship between these angles is exactly the same as the relationship between interior and exterior angles, and equating squares of rotations (or doubled angle measures) leaves this distinction ambiguous.

I will sketch a way to distinguish these cases and prove the full theorem, without going into full detail.

If c_1 and c_2 are vectors from two points on a line to a given point not on the line, then the sign of the unit bivector

\frac{c_1 \wedge c_2}{|c_1 \wedge c_2|}

determines which side of the line the given point is on. Given equal squares of products of vectors,

(c_1 c_2)^2 = (c_3 c_4)^2

then the products are equal if the corresponding unit bivectors are equal

\frac{c_1 \wedge c_2}{|c_1 \wedge c_2|} = \frac{c_3 \wedge c_4}{|c_3 \wedge c_4|}

and otherwise the products differ by a sign.

Then inscribed angles that subtend equal chords of a circle are equal when they lie on the same side of the subtended chord, or equivalently, when they also subtend the same arc.

Appendix B

A path not taken

I have to admit that my first attempt to analyze the inscribed angle theorem with vectors went down an unproductive path. I knew I wanted a relationship between a product of radius vectors r_1 r_2 and chord vectors c_1 c_2 , and I had the triangle equations

\begin{aligned} c_1 - r_1 + r_3 &= 0 \\ c_2 - r_2 + r_3 &= 0 \end{aligned}

so my first instinct was to solve the triangle equations for r_1 and r_2 , substitute into the product r_1 r_2 , and expand:

\begin{aligned} r_1 r_2 &= (c_1 + r_3)(c_2 + r_3) \\ &= c_1 c_2 + r_3 c_2 + c_1 r_3 + r_3^2 \end{aligned}

There’s nothing wrong with the algebra here, but it just isn’t very productive. We now have a sum of 4 terms that is hard to interpret geometrically. Since the right hand side that we’re seeking, (c_1 c_2)^2 , is a monomial, we would have to eventually factor the sum on the right hand side, and factoring is difficult—especially when dealing with multiple non-commuting variables as we are here.

This mis-step is worth discussing because it is an example of a pattern that turns out to be pretty common when using geometric algebra to analyze geometry problems. When presented with a product of sums, it is a natural instinct to try expanding into a sum of products. This is sometimes productive, but it introduces a few issues:

  • It is often harder to interpret a sum of many different products geometrically than it is to interpret a single product
  • Expanding products can rapidly (exponentially) increase the total number of terms
  • Simplifying sums of products often requires factoring, and factoring is hard

For this reason, a useful heuristic is to prefer using commutation relationships (e.g. between parallel or perpendicular vectors, or between vectors that all lie in a single plane) to re-arrange a geometric product. When it works, this is better than expanding products, hoping that some terms will cancel, and then attempting to factor the result. When it is necessary to expand a product, it can be useful to expand as few terms as possible and then try to factor again before further expansions in preference to expanding the entire product at once Hongbo Li elaborates this idea extensively in his book . For a briefer introduction, see this paper or these tutorial slides . .

Visualizing geometric product relationships in plane geometry

Published: Mon, 07 Jan 2019 00:00:00 -0800
Updated: Mon, 07 Jan 2019 00:00:00 -0800
UTC: 2019-01-07 08:00:00+00:00
URL: http://www.shapeoperator.com/2019/01/07/relating-dot-wedge/

In previous posts, I have shown how to visualize both the dot product and the wedge product of two vectors as parallelogram areas. In this post, I will show how the dot product and the wedge product are related through a third algebraic product: the geometric product. Along the way, we will see that the geometric product provides a simple way to algebraically model all of the major geometric relationships between vectors: rotations, reflections, and projections.
Content Preview

In previous posts, I have shown how to visualize both the dot product and the wedge product of two vectors as parallelogram areas. In this post, I will show how the dot product and the wedge product are related through a third algebraic product: the geometric product. Along the way, we will see that the geometric product provides a simple way to algebraically model all of the major geometric relationships between vectors: rotations, reflections, and projections.

Before introducing the geometric product, let’s review the wedge and dot products and their interpretation in terms of parallelogram areas.

Given two vectors, a and b , their wedge product, a \wedge b , is straightforwardly visualized as the area of the parallelogram spanned by these vectors:

Wedge product of vectors a and b

Recall that algebraically, the wedge product a \wedge b produces an object called a bivector that represents the size and direction (but not the shape or location) of a plane segment in a similar way that a vector represents the size and direction (but not the location) of a line segment.

The dot product of the same two vectors, a \cdot b , can be visualized as a parallelogram formed by one of the vectors and a copy of the other that has been rotated by 90 degrees:

Dot product of vectors a and b

Well, almost. When I originally wrote about this area interpretation of the dot product, I didn’t want to get into a discussion of bivectors, but once you have the concept of bivector as directed plane segment, it’s best to say that what this parallelogram depicts is not quite the dot product, a \cdot b , which is a scalar (real number), but rather the bivector (a \cdot b) I where I is a unit bivector.

Dot product of vectors a and b as a parallelogram should include a factor of I

The scalar a \cdot b scales the unit bivector I to produce a bivector with magnitude/area a \cdot b . It’s hard to draw a scalar on a piece of paper without some version of this trick. Once you’re looking for it, you’ll see that graphical depictions of real numbers/scalars almost always show how they scale some reference object. It could be a unit segment of an axis or a scale bar; here it is instead a unit area I .

Examining the way that the dot product and the wedge product can be represented by parallelograms suggests an algebraic relationship between them:

(a \cdot b) I = b \wedge a_\perp

where a_\perp represents the result of rotating a by 90 degrees. Since the dot product is symmetric, we also have

(a \cdot b) I = a \wedge b_\perp
Relationship between dot and wedge product

To really understand this relationship, we’ll need an algebraic way to represent how a_\perp is related to a ; in other words, we’ll need to figure out how to represent rotations algebraically.

To work towards an algebraic representation of rotations, recall how the dot product and wedge product can be expressed in terms of lengths and angles:

\begin{aligned} a \cdot b &= |a| |b| \cos(\theta_{ab}) \\ a \wedge b &= |a| |b| \sin(\theta_{ab}) I \end{aligned}

If you’re familiar with complex numbers, then adding these two products together produces a very suggestive result:

\begin{aligned} a \cdot b + a \wedge b &= |a| |b| (\cos(\theta_{ab}) + \sin(\theta_{ab}) I) \\ &= |a| |b| \exp(\theta_{ab} I) \end{aligned}

where the second line is an expression of Euler’s Formula At this point, it probably isn’t so clear what it could mean to exponentiate a bivector. Suspend disbelief, I’ll come back to this. .

In words, the sum of the dot product and the wedge product of two vectors is the product of their lengths, |a| and |b| , and a factor representing the rotation between their directions, \exp(\theta_{ab} I) . This sum is so geometrically meaningful that Geometric Algebra gives it its own name: the geometric product, and its own notation, simple juxtaposition of vectors:

a b = a \cdot b + a \wedge b = |a||b| \exp(\theta_{ab} I)

Since the dot product is symmetric, but the wedge product is anti-symmetric, the order of the factors in a geometric product matters. The reverse of this product is

\begin{aligned} b a &= b \cdot a + b \wedge a \\ &= a \cdot b - a \wedge b = |a||b| \exp(-\theta_{ab} I) \end{aligned}

These product formulas can be solved in order to represent the dot product and wedge product in terms of the geometric product As an alternative, it is possible to define the geometric product as a linear, associative product between vectors such that the square of a vector is a scalar, and then define the dot product and wedge product as the symmetric and anti-symmetric parts of the geometric product. ,

\begin{aligned} a \cdot b &= \frac12(ab + ba) \\ a \wedge b &= \frac12(ab - ba) \end{aligned}

In other words, the dot and wedge products are half the symmetric and anti-symmetric sums of the geometric product and its reverse.

Using these relations gives a very interesting way to characterize parallel and perpendicular vectors: vectors are parallel when their wedge product is zero, so that the geometric product commutes:

a \parallel b \Leftrightarrow a \wedge b = 0 \Leftrightarrow a b = b a

and they are perpendicular when their dot product is zero, so that the geometric product anti-commutes:

a \perp b \Leftrightarrow a \cdot b = 0 \Leftrightarrow a b = - b a

In general, for vectors that are neither perpendicular nor parallel, the geometric product is neither commutative nor anti-commutative. For this reason, it’s important to keep track of the order of terms in a multiplication when using the geometric product. We’ll see that with the geometric product, you can get an awful lot of geometry done by thinking about when the order of various symbols can be swapped.

Properties of the geometric product

Vectors square to scalars

Since the wedge product of a vector with itself is always 0 , the geometric product of a vector with itself is equal to the dot product of that vector with itself:

a a = a^2 = a \cdot a = |a|^2

Associative

The geometric product has another property that is less obvious from its decomposition as the sum of the dot and wedge products: it is associative,

(a b) c = a (b c) = a b c

Associativity is extremely useful algebraically. Combined with the fact that vectors square to real numbers (scalars), associativity means that equations involving products of vectors can be solved. For example, given

a b = c d

if we know b , we can solve for a by first multiplying on the right by b

\begin{aligned} a b b &= c d b \\ a b^2 &= c d b \\ a |b|^2 &= c d b \end{aligned}

and then dividing through by the scalar |b|^2

a = c d \frac{b}{|b|^2}

Invertible

Another way to say that we can solve equations involving geometric products is to say that under the geometric product, vectors have unique inverses. This is a consequence of associativity and the fact that vectors square to scalars.

For an inverse of vector b , we require

b b^{-1} = 1

Multiplying both sides of this equation by b gives

b^2 b^{-1} = b

and dividing by the scalar b^2 = |b|^2 gives a formula for the inverse of a vector:

b^{-1} = \frac{b}{b^2} = \frac{b}{|b|^2}

In other words, to find the inverse of a vector, divide the vector by the square of its length.

Since the order of geometric multiplications matters in general, we should check that b^{-1} b also equals 1 :

b^{-1} b = \frac{b}{b^2} b = \frac{b^2}{b^2} = 1

Neither the wedge product nor the dot product alone admit unique inverses, but their sum, the geometric product, contains just the right information to admit a unique inverse.

Interpreting the geometric product geometrically

Algebraically, the geometric product has some very useful properties, but how can we interpret it geometrically?

To simplify, let’s first consider the geometric product of two unit vectors. I’ll use the notation \hat{a} to represent the unit vector in the same direction as a :

\hat{a} = \frac{a}{|a|}

so that

|\hat{a}| = \frac{|a|}{|a|} = 1

and

\hat{a}^2 = |\hat{a}|^2 = 1

We will use the fact that the square of a unit vector is 1 in the process of simplifying several expressions below.

Recalling the lengths-and-angles formula for the geometric product,

a b = |a| |b| (\cos(\theta_{ab}) + \sin(\theta_{ab}) I) = |a| |b| \exp(\theta_{ab} I)

we can see that the geometric product of two unit vectors effectively represents the rotation between them:

\hat{a} \hat{b} = \cos(\theta_{ab}) + \sin(\theta_{ab}) I = \exp(\theta_{ab} I)

The rotation represented by \hat{a}\hat{b} can be applied to another vector in the plane, v , by multiplying on the right (remember, order matters) to form v\hat{a}\hat{b} Some care is required for rotations in more than two dimensions. When v is not in the same plane as a and b, then v\hat{a}\hat{b} no longer represents a rotation of v . A closely related formula does generalize: \hat{b}\hat{a}v\hat{a}\hat{b} is a rotation of v by twice the angle between a and b in any number of dimensions even when v, a, and b are not all in the same plane. :

Applying a rotation to a vector

As a check,

\hat{a} (\hat{a} \hat{b}) = (\hat{a} \hat{a}) \hat{b} = \hat{b}

so right multiplication by \hat{a}\hat{b} rotates \hat{a} to \hat{b}.

Reversing the order of factors in a geometric product of unit vectors reverses the sense of rotation, so that \hat{b}\hat{a} is a rotation in the opposite direction as \hat{a}\hat{b}. As a check,

\hat{b} (\hat{b} \hat{a}) = (\hat{b}\hat{b}) \hat{a} = \hat{a}

To compose two rotations represented as geometric products, \hat{a} \hat{b} and \hat{b} \hat{c} , we simply multiply them:

Composition of two rotations
\begin{aligned} (\hat{a} \hat{b}) (\hat{b} \hat{c}) &= \hat{a} \hat{b} \hat{b} \hat{c} \\ &= \hat{a} (\hat{b} \hat {b}) \hat{c} \\ &= \hat{a} \hat{c} \end{aligned}

In terms of angles, this represents the same calculation as

\begin{aligned} \exp(\theta_{ab} I) \exp(\theta_{bc} I) &= \exp((\theta_{ab}+\theta_{bc}) I) \\ &= \exp(\theta_{ac} I) \end{aligned}

Triangle angle sum laws

This is enough to start to give geometric product interpretations to some familiar facts about triangles. Given a triangle with edge vectors a , b , and c ,

Triangle a b c

the rotations through the exterior angles are represented by \hat{a}\hat{b} , \hat{b}\hat{c} , and \hat{c}\hat{a} .

Composition of exterior angles is an identity angle

The composition of exterior angles is a full rotation, which is simply an identity operation for vectors.

To show this algebraically, we simply multiply the geometric products representing each rotation to compose the rotations, and then re-associate:

\begin{aligned} &(\hat{a} \hat{b})(\hat{b} \hat{c})(\hat{c} \hat{a}) \\ &= \hat{a} (\hat{b} \hat{b}) (\hat{c} \hat{c}) \hat{a} \\ &= \hat{a} \hat{a} \\ &= 1 \end{aligned}

A “straight angle”, i.e. a rotation by 180 degrees, is the rotation between \hat{a} and -\hat{a} , which is simply

\hat{a} (- \hat{a}) = -\hat{a}\hat{a} = -1

If the exterior angle rotations of a triangle are \hat{a}\hat{b} , \hat{b}\hat{c} , and \hat{c}\hat{a} , then the interior angle rotations are simply the negatives of each of these, -\hat{a}\hat{b} , -\hat{b}\hat{c} , and -\hat{c}\hat{a} , and so the composition of the interior angles is One way to visually see that these angles add up to a half-turn is to notice that every sector of the circle has an equal missing sector on the opposite side.

Composition of interior angles is a straight angle
\begin{aligned} &(-\hat{a} \hat{b})(-\hat{b} \hat{c})(-\hat{c} \hat{a}) \\ &= - \hat{a} \hat{b} \hat{b} \hat{c} \hat{c} \hat{a} \\ &= -1 \end{aligned}

which is a straight angle. In other words, the composition of the interior angles of a triangle is a straight angle, i.e. a 180 degree angle.

Right angles

If two unit vectors have zero dot product, so that they anti-commute under the geometric product,

\begin{aligned} \hat{a} \cdot \hat{b} &= 0 \\ \frac12 (\hat{a} \hat{b} + \hat{b} \hat{a}) &= 0 \\ \hat{a} \hat{b} &= -\hat{b}\hat{a} \end{aligned}
Right angle

then multiplying both sides on the right by \hat{a}\hat{b} gives

(\hat{a} \hat{b})^2 = -\hat{b}\hat{a} \hat{a} \hat{b} = -1

which we can interpret geometrically as saying that the composition of a right angle with itself is a straight angle. The above manipulation is an algebraic way of representing the fact that the following statements are all equivalent:

  • two vectors are perpendicular
  • two vectors have zero dot product
  • two vectors anti-commute under the geometric product
  • the angle between two vectors bisects a straight angle

Dilations

For vectors that are not unit vectors, it’s a little easier to supply a geometric interpretation for the geometric ratio, a^{-1}b Note that for unit vectors, there is no distinction between ratios and products because a unit vector is its own inverse: \hat{a}^{-1}=\hat{a} . , than the geometric product ab. In terms of lengths and angles, the geometric ratio is

a^{-1} b = \frac{|b|}{|a|} \left(\cos(\theta_{ab}) + \sin(\theta_{ab}) I\right) = \frac{|b|}{|a|} \exp(\theta_{ab} I)

This is a composition of the rotation that takes the direction of a to the direction of b with the dilation that takes the length of a to the length of b. In the Argand diagram picture of complex arithmetic, multiplying by a complex number has exactly this same effect of dilation and scaling. In Visual Complex Analysis , Needham calls this an “amplitwist” for “amplification” and “twist”.

Ratios of vectors in the plane behave in exactly the same way as complex numbers; the Argand diagram effectively models vectors in the plane as complex numbers by forming ratios of every vector with a constant unit vector pointing along the “real axis”.

Geometric ratio of a and b

As a check,

a (a^{-1} b) = (a a^{-1}) b = b

Applying the geometric ratio repeatedly produces a sequence of vectors lying on a logarithmic spiral, with adjacent pairs of vectors forming the legs of a sequence of similar triangles.

Geometric ratio spiral

Rotation, Reflection, Projection, Rejection

Reversing the order of the terms in the geometric ratio from a^{-1}b to ba^{-1} reverses the sense of rotation, but leaves the dilation unchanged

b a^{-1} = \frac{|b|}{|a|} \left(\cos(\theta_{ab}) - \sin(\theta_{ab}) I\right) = \frac{|b|}{|a|} \exp(-\theta_{ab} I)
Reflection of b over a

Examining this picture suggests that aba^{-1} is the reflection of b across a. This representation of a reflection is sometimes referred to as a “sandwich product” because b is sandwiched between a and its inverse. It can equivalently be written as \hat{a}b\hat{a} because

\begin{aligned} aba^{-1} &= a b \left(\frac{a}{|a|^2}\right) \\ &= \left(\frac{a}{|a|}\right) b \left(\frac{a}{|a|}\right) \\ &= \hat{a} b \hat{a} \end{aligned}

Notice that aba^{-1} is linear in b , but independent of the scale of a , as expected for a reflection of b across a

This sandwich product expression for reflection makes it particularly evident that a vector and its reflection have the same length. If

b_\mathrm{refl} = a b a^{-1}

then

\begin{aligned} b_\mathrm{refl}^2 &= (a b a^{-1}) (a b a^{-1}) \\ &= a b (a^{-1} a) b a^{-1} \\ &= a b^2 a^{-1} \\ &= b^2 a a^{-1} \\ &= b^2 \end{aligned}

where we have made use of the fact that b^2 is a scalar and so it can be moved freely within a geometric product.

Left multiplying aba^{-1} by the unit factor bb^{-1} gives another representation for the reflection of b across a :

\begin{aligned} aba^{-1} &= (bb^{-1}) (a b a^{-1}) \\ &= b \left(\frac{b}{|b|^2}\right) a b \left(\frac{a}{|a|^2}\right) \\ &= b \frac{b}{|b|}\frac{a}{|a|}\frac{b}{|b|}\frac{a}{|a|} \\ &= b (\hat{b}\hat{a})(\hat{b}\hat{a}) \\ &= b (\hat{b}\hat{a})^2 \end{aligned}

The last line represents applying the rotation between b and a to the vector b twice. In other words, to reflect b across a , rotate b through twice the angle between b and a .

There is a simple relationship between the reflection of b across a and the parallel and perpendicular components (i.e. the projection and “rejection”) of b relative to a : the projection and rejection are the half symmetric and anti-symmetric sums of b and its reflection in a

Relationship of reflection, projection, and rejection

In other words, the sum of b and its reflection across a is twice the projection of b onto a

Parallel part as reflection sum
b_{\parallel a} = \frac12 \left(b + aba^{-1}\right)

and the difference of b and its reflection across a is twice the rejection of b from a

Parallel part as reflection difference
b_{\perp a} = \frac12 \left(b - aba^{-1}\right)

Using the freedom to insert the unit factor aa^{-1} into the equation for b_\parallel and re-associating gives

\begin{aligned} b_{\parallel a} &= \frac12 \left(b\left(aa^{-1}\right) + aba^{-1}\right) \\ &= \frac12 \left(ba + ab\right)a^{-1} \\ &= (b \cdot a) a^{-1} \\ &= (b \cdot \hat{a}) \hat{a} \end{aligned}

which is a familiar way to interpret the dot product in terms of a projection, and similarly,

\begin{aligned} b_{\perp a} &= \frac12 \left(b\left(aa^{-1}\right) - aba^{-1}\right) \\ &= \frac12 \left(ba - ab\right)a^{-1} \\ &= (b \wedge a) a^{-1} \\ &= (b \wedge \hat{a} ) \hat{a} \end{aligned}

which is a probably less familiar way to interpret the wedge product in terms of rejection.

We can check that combining these components gives back the whole vector b :

\begin{aligned} b &= b(aa^{-1}) \\ &= (ba)a^{-1} \\ &= (b \cdot a + b \wedge a) a^{-1} \\ &= (b \cdot a) a^{-1} + (b \wedge a) a^{-1} \\ &= b_{\parallel a} + b_{\perp a} \end{aligned}

Planarity

We have seen that the condition that three vectors form a triangle is

a + b + c = 0

A similar but weaker condition on three vectors is that they are in the same plane, and the traditional way to state this condition algebraically is in terms of linear dependence:

\alpha a + \beta b + \gamma c = 0

for some set of scalars \alpha , \beta , and \gamma that are not all 0 . In words:

  • A weighted sum of the vectors is 0
  • It is possible to scale the vectors so that they form a triangle
  • At least one of the vectors can be written as a weighted sum of the other two

Geometric Algebra provides two ways to restate the planarity condition without the need for introducing extra scalar parameters. Suppose (without loss of generality) that \alpha is non-zero, and wedge on the right by b \wedge c :

\begin{aligned} (\alpha a + \beta b + \gamma c) \wedge (b \wedge c) &= 0 \\ \alpha (a \wedge b \wedge c) + \beta (b \wedge b \wedge c) + \gamma (c \wedge b \wedge c) &= 0 \end{aligned}

The last two terms are 0 by anti-symmetry of the wedge product, and so, after dividing through by the non-zero scalar \alpha , we have

a \wedge b \wedge c = 0

as a restatement of the condition that a , b , and c are in the same plane. A geometrical interpretation of this formula is that the vectors span a parallelepiped with no volume In general, the wedge product of n vectors is 0 if and only if the vectors all lie in an n-1 dimensional linear subspace. The wedge product of two vectors is 0 when they are directed along the same line, the wedge product of three vectors is 0 when they are in the same plane, and so on. .

There is a further alternative statement of the condition of planarity of three vectors using the geometric product that is very useful for proofs in plane geometry. Again starting from

\alpha a + \beta b + \gamma c = 0

and assuming that \alpha is non-zero, multiplying on the right by bc gives

\alpha abc + \beta b^2c + \gamma cbc = 0

and alternatively, multiplying on the left by cb gives

\alpha cba + \beta cb^2 + \gamma cbc = 0

Subtracting this equation from the previous one and noting that b^2c = c b^2 because b^2 is a scalar gives

\alpha (abc - cba) = 0

and dividing by the non-zero scalar \alpha and re-arranging gives

abc = cba

as a third way of stating the condition that three vectors are in the same plane. This condition is harder to extend to other dimensions, but it’s very useful in computations because it means that we’re always free to reverse the geometric product of any three vectors that are in the same plane.

We’ve seen that in order to rotate a vector c by the angle between two other vectors, a and b , we can form

c_\mathrm{rot} = c\hat{a}\hat{b}

and using the planarity condition above to reverse the three vectors shows that multiplying on the right by \hat{a}\hat{b} is equivalent to multiplying on the left by \hat{b}\hat{a}

c_\mathrm{rot} = \hat{b}\hat{a}c

This freedom to reverse triples of vectors in the plane makes it easy to check that rotation preserves length:

\begin{aligned} c_\mathrm{rot}^2 &= (c\hat{a}\hat{b})(c\hat{a}\hat{b}) \\ &= (c\hat{a}\hat{b})(\hat{b}\hat{a}c) \\ &= c\hat{a}\hat{b}^2\hat{a}c \\ &= c\hat{a}^2c \\ &= c^2 \end{aligned}

Multiplying on both the left by \hat{b}\hat{a} and on the right by \hat{a}\hat{b} applies the rotation twice, and re-associating shows that this is equivalent to reflecting first in a and then in b This double-reflection formula for rotation is the form that works in any dimension; i.e. applies to vectors that may not lie completely in the plane of rotation. :

c_{\mathrm{rot} \times 2} = (\hat{b}\hat{a})c(\hat{a}\hat{b}) = \hat{b}(\hat{a}c\hat{a})\hat{b}

Applying the planarity condition twice to a product of four vectors shows that products of pairs of vectors in the same plane commute with one another:

\begin{aligned} (ab)(cd) &= (abc)d \\ &=(cba)d \\ &= c(bad) \\ &= c(dab) \\ &= (cd)(ab) \end{aligned}

For exactly the same reason, ratios of pairs of vectors in the plane also commute:

ab^{-1}cd^{-1} = cd^{-1}ab^{-1}

We have seen above that ratios of vectors in the plane behave similarly to complex numbers, so it’s comforting to see that they are commutative.

Coordinates

To better understand how the directed unit plane segment (unit bivector), I , behaves under the geometric product, it is useful to introduce a pair of orthogonal (perpendicular) unit vectors, e_1 and e_2 , which can serve as coordinate basis vectors. Then I = e_1 e_2 = e_1 \wedge e_2 .

I is spaned by unit vectors

The condition that these are unit vectors is that they square to 1 :

e_1^2 = e_2^2 = 1

and the condition that they are orthogonal is that they anti-commute, so that their geometric product is equal to their wedge product:

e_1 e_2 = - e_2 e_1 = e_1 \wedge e_2 = I

Since I can be written as a product of unit vectors, we expect that multiplying vectors in the plane by I will rotate them. Let’s consider what happens when we multiply the coordinate vectors on the right by I:

\begin{aligned} e_1 I &= e_1 e_1 e_2 = e_2 \\ e_2 I &= e_2 e_1 e_2 = - e_1 e_2 e_2 = - e_1 \end{aligned}

so right multiplication by I rotates both unit vectors counter-clockwise by 90 degrees. Any other vector in the plane can be written as a linear combination of these unit vectors, so right multiplication by I rotates any vector in the plane by 90 degrees counter-clockwise.

As expected for a right-angle rotation, I^2 = -1 :

\begin{aligned} I^2 &= e_1 e_2 e_1 e_2 \\ &= e_1 (-e_1 e_2) e_2 \\ &= -e_1^2 e_2^2 \\ &= -1 \end{aligned}

which suggests why I behaved so similarly to the imaginary unit in some earlier formulas The fact that I squares to negative one lets us make sense of expressions like \exp(I \theta) = \cos(\theta) + I\sin(\theta) Just as for complex numbers , the strategy is to expand the exponential as a power series, reduce all higher powers of I using I^2 = -1 and then recognize the series for \sin and \cos . .

Furthermore, we can show that I anti-commutes with any vector in the plane using the planarity condition to reverse products of three vectors in the plane, and anti-commutativity of orthogonal vectors to reverse e_1 and e_2

\begin{aligned} a I &= a e_1 e_2 \\ &= e_2 e_1 a \\ &= - e_1 e_2 a \\ &= - I a \end{aligned}

Since I is a product of a pair of vectors in the plane, it commutes with other products of pairs of vectors in the plane.

Duality

With this, we finally have enough preparation to understand the relationship between the parallelogram representations of the dot product and wedge product.

We set out to understand why

Duality of dot product and wedge product
(a \cdot b) I = a \wedge b_\perp

and we can now prove this result algebraically:

\begin{aligned} (a \cdot b) I &= \frac12 (ab + ba) I \\ &= \frac12 (abI + ba I) \\ &= \frac12 (abI - bIa) \\ &= \frac12 (a (bI) - (bI) a) \\ &= a \wedge (bI) \\ &= a \wedge b_\perp \end{aligned}

Let’s go through this calculation line by line:

  1. Expand the dot product as the symmetric part of the geometric product
  2. Distributivity of the geometric product
  3. I anti-commutes with vectors in the plane, so it anti-commutes with a
  4. The geometric product is associative
  5. Recognize the anti-symmetric part of the geometric product as the wedge product
  6. Right-multiplication of b by I rotates b counter-clockwise by 90 degrees, so we can identify bI with b_\perp

In the fifth step,

(a \cdot b) I = a \wedge (bI)

the unit bivector I acts in two interestingly different ways: on the left hand side, it is scaled by a \cdot b to make a bivector with the right magnitude; on the right hand side, it rotates b counter-clockwise by 90 degrees to produce a parallelogram spanned by vectors with the right direction relationship.

This relationship between the dot product and the wedge product is called “duality” because it gives a way of exchanging one kind of product for the other.

Scaling constrains what is easily visualized

The reason that this duality relationship is useful for visualizing the dot product is that the dot product depends linearly on two vectors, and if we draw the vectors as arrows on the page, then their dot product naturally has units of area on the page. Duality gives us a natural way to convert the scalar result of the dot product to a bivector, so that the grade matches the units.

This turns out to be a generally useful constraint: if you want to visualize some object built out of vectors on the page, its grade should match its units Most introductory-level discussions of geometry don’t really have the language to discuss grade, and frequently choose not to emphasize units either, so there is a fair amount of confusion about these points. In physics, it’s basically impossible to draw a “to scale” representation of the cross product of two vectors represented as arrows, because the cross product produces a vector (grade 1 ) with units of area. . This is why the dot product a \cdot b is very often visualized through one of the related projections (a \cdot \hat{b}) \hat{b} or (b \cdot \hat{a}) \hat{a} . The dot product alone is not easily visualizable because it is a scalar (grade 0 ), but it has units of area. The projections are visualizable because they are vectors (grade 1 ) with units of length.

The geometric product of two vectors contains a scalar (grade 0 ) and a bivector (grade 2 ), so if we want to visualize it, we need it to be unitless. This is why it’s easier to visualize the geometric ratio of two vectors (which is unitless) than the geometric product of two vectors (which has units of area).

Conclusion

The geometric product is an important unifying concept for representing Euclidean geometry algebraically. It combines the more familiar dot and wedge products into a single associative and invertible product, and clarifies their relationship. The geometric product represents rotations, reflections, and dilations through simple products or ratios of vectors, allows composing these operations with multiplication, and often allows simplifying these compositions by re-associating within the products. In this post, we were able to use this same trick over and over again to show that

  • The exterior angle rotations of a triangle compose to a full turn
  • The interior angles of a triangle compose to a half turn
  • Rotations and reflections preserve the length of vectors
  • The composition of two reflections is a rotation

and the geometric product makes it possible to reduce many more geometric proofs to algebra without ever needing to introduce coordinates or parameterizations of angles and trigonometric functions.

Thanks as usual to Jaime George for editing this post.

An algebraic approach to the law of sines

Published: Mon, 19 Nov 2018 00:00:00 -0800
Updated: Mon, 19 Nov 2018 00:00:00 -0800
UTC: 2018-11-19 08:00:00+00:00
URL: http://www.shapeoperator.com/2018/11/19/algebraic-law-of-sines/

A visual way of expressing that three vectors, a, b, and c, form a triangle is
Content Preview

A visual way of expressing that three vectors, a , b , and c , form a triangle is

A triangle with sides a, b, and c.

and an algebraic way is

a + b + c = 0

In a previous post , I showed how to generate the law of cosines from this vector equation—solve for c and square both sides—and that this simplifies to the Pythagorean theorem when two of the vectors are perpendicular.

In this post, I’ll show a similarly simple algebraic route to the law of sines.

In understanding the law of cosines, the dot product of two vectors, a \cdot b , played an important role. In understanding the law of sines, the wedge product of two vectors, a \wedge b , will play a similarly important role.

Properties of the wedge product

Let’s review what the wedge product is, since it’s probably less familiar than the dot product. Geometrically, the wedge product of two vectors represents the area of the parallelogram spanned by the two vectors There is a similar interpretation of the dot product as the area of a parallelogram discussed in Geometry, Algebra, and Intuition : instead of forming a parallelogram from the two vectors directly, the parallelogram is formed from one of the vectors and a copy of the other rotated 90 degrees. I’ll say more about the connection between these products another time. :

Wedge product of a and b represented as a parallelogram

Mathematically, the wedge product is a function that takes two vectors and produces a new kind of object called a bivector. Similarly to how a vector represents a magnitude and a direction, a bivector represents the size of an area and its direction In one dimension, there are only two directions that a vector can have: positive or negative. In more dimensions, there are more possible directions. Similarly, in two dimensions, there are only two directions a bivector can have: positive or negative. In more dimensions, there are again more possible directions. . The wedge product is defined (more-or-less uniquely) as a product between vectors that is anti-commutative, linear, and associative. Let’s go over these properties one by one.

Anti-commutative:

Algebraically, anti-commutativity means

a \wedge b = - b \wedge a

and in a picture, it is

Parellelograms representing a wedge b and b wedge a have opposite area.

A plane region traversed clockwise is considered to have opposite directed area as the same region traversed counter-clockwise. The concept of negative area is useful for similar reasons that negative numbers are useful: the difference of two areas can continue to be represented geometrically even if the second area is bigger than the first In my opinion, the missing concept of sign/orientation for edges, areas, and angles is one of the biggest deficiencies of classical Greek Euclidean geometry. It leads to more special cases in theorems, like having to consider acute and obtuse angles separately in the inscribed angle theorem . .

Consider the incoming and outgoing edges at each vertex in the diagram above, starting from the bottom right vertex of the a \wedge b parallelogram. If the wedge product at each vertex is to be consistent, we must have

a \wedge b = b \wedge (-a) = (-a) \wedge (-b) = (-b) \wedge a

If we’re allowed to pull the minus signs out in front of these products (and the next property says that we are), these equalities imply anti-commutativity.

Anti-commutativity also implies that any vector wedged with itself is 0 , and the parallelogram area interpretation supports this:

a \wedge a = - a \wedge a = 0

Linear and distributive

Vectors can be added together to make new vectors, and the area of parallelograms spanned by vectors adds consistently with vector addition Arrangements of parallelograms like this one often look like they’re depicting something in 3D, but all of the diagrams in this post are 2D diagrams. This diagram does also happen to work in 3D as long as you use the right interpretation of what it means to add areas in different planes (i.e. as long as you use directed areas represented as bivectors). .

(u + v ) \wedge w = u \wedge w + v \wedge w
Distributivity of the wedge product

Geometrically, this can be understood by dissection: cut a triangle from one edge of a parallelogram and glue it to the opposite edge. The resulting figure is the union of two parallelograms with the same total area.

Vectors can also be scaled, and the area of parallelograms spanned by vectors scales consistently Here and everywhere in the post I’m using the convention that greek letters like \alpha and \beta represent scalars (real numbers), and lower case roman letters like a , b , c , u , v and w represent vectors. .

(\alpha u) \wedge (\beta v) = \alpha \beta (u \wedge v)

Associative

Wedging three vectors together represents the (directed) volume of the parallelepiped that they span, and associativity means that the order that we combine the vectors doesn’t matter:

(u \wedge v) \wedge w = u \wedge (v \wedge w)

When you wedge k different vectors together, the result is an object called a k-vector that represents a k-dimensional volume. Just as vectors represent the size and direction of a linear region, and bivectors represent the size and direction of a plane region, k-vectors represent the size and direction of a k-dimensional region.

In the remainder of this post, we’ll only consider vectors in the plane. Three vectors in the same plane can’t span any volume, and so their wedge product must be 0 . This means we won’t make any use of associativity here. But it’s nice to know that the wedge product works consistently in any number of dimensions.

Relationship to lengths and angles

If you know the lengths, |a| and |b| , of two vectors a and b and the angle between them, \theta_{ab} , you can compute the wedge product of the vectors:

a \wedge b = |a| |b| \sin(\theta_{ab}) I

where I represents a unit plane segment. You can think of I as a square spanned by two perpendicular unit vectors, e_1 and e_2 , in the same plane as a and b : I=e_1\wedge e_2 .

Orthogonal unit vectors span a unit area.

In terms of triangles, the angle between two vectors, \theta_{ab} , is an exterior angle. In classical trigonometry, it’s more common to consider interior angles. But the \sin of an exterior angle and the \sin of the corresponding interior angle are equal, so for the wedge product, the distinction isn’t so important. Here’s the relationship.

Wedge product of a and b represented as a parallelogram

Since \sin \theta_{ab} = \sin C

\begin{aligned} a \wedge b &= |a| |b| \sin(\theta_{ab}) I \\ &= |a| |b| \sin(C) I \end{aligned}

We can see why this formula works by considering the projection of b onto a line perpendicular to a . Call this projected vector h . The parallelogram areas a \wedge b and a \wedge h are equal by a simple dissection argument:

Interior and exterior angles of a triangle

The rectangle on the right side of this diagram can be constructed by cutting a triangle from one edge of the parallelogram on the left side of the diagram and pasting it to the opposite edge, so the area of the two figures is equal.

Since h and b are a leg and the hypotenuse of a right triangle respectively, their lengths are related by

|h| = |b| \sin(C)

and, because a and h are perpendicular and so span a rectangle,

a \wedge h = |a| |h| I

so

a \wedge b = a \wedge h = |a| |b| \sin(C) I

Coordinates

If you know two vectors in terms of coordinates, you can compute their wedge product directly from the coordinates without going through lengths and angles. There’s a formula for this, but there’s no need to memorize it because it’s almost as simple to compute directly using properties like anti-commutativity and linearity. Let’s work out a concrete example A lot of literature about this kind of algebra is fairly abstract, and when I started reading about Geometric Algebra, it seemed useful for theory, but I wasn’t really sure how to calculate anything. Seeing a few calculations in terms of concrete coordinates gives you a reassuring feeling that the whole system might be grounded in reality after all.

At the other end of the spectrum, many introductory treatments of vectors choose to define products (like the dot product, cross product, or wedge product) in terms of either lengths and angles or coordinates. I can see the pedagogical value of this at a certain point, but eventually, I think it’s very useful to realize that all of these things can be (should be) defined in terms of algebraic properties like linearity, associativity, (anti-)commutativity, etc. Then you can prove the formulas for lengths and angles or coordinates, and rather than seeming like collections of symbols pulled out of a hat, they suddenly seem inevitable.
:

Wedge product in coordinates

Compute:

B = (3e_1 + e_2) \wedge (2e_1+4e_2)

First, use the distributive property to expand the product of sums into a sum of products, and linearity to pull the scalars to the front of each product:

\begin{aligned} B &= 3e_1 \wedge 2e_1 + 3e_1 \wedge 4e_2 + e_2 \wedge 2e_1 + e_2 \wedge 4e_2 \\ &= 6e_1 \wedge e_1 + 12e_1 \wedge e_2 + 2e_2 \wedge e_1 + 4e_2 \wedge e_2 \end{aligned}

From anti-commutativity, we know that e_1 \wedge e_1 = e_2 \wedge e_2 = 0 so that the first and last terms vanish, and e_2 \wedge e_1 = -e_1 \wedge e_2 . Using these gives

\begin{aligned} B &= 12e_1 \wedge e_2 - 2e_1 \wedge e_2 \end{aligned}

And finally, collecting terms gives

\begin{aligned} B &= (12 - 2)e_1 \wedge e_2 \\ &= 10e_1 \wedge e_2 \end{aligned}

The wedge product of any two vectors representable as weighted sums of e_1 and e_2 will always be proportional to e_1 \wedge e_2 like this. Using exactly the same steps, we can work out the general coordinate formula.

Given e_1 and e_1 are a basis , and in this basis, \alpha_1 and \alpha_2 are the coordinates of a , and \beta_1 and \beta_2 are the coordinates of b .

\begin{aligned} a &= \alpha_1 e_1 + \alpha_2 e_2 \\ b &= \beta_1 e_1 + \beta_2 e_2 \end{aligned}

then

a \wedge b = \left(\alpha_1 \beta_2 - \alpha_2 \beta_1\right) e_1 \wedge e_2

and if e_1 and e_2 are perpendicular unit vectors so that e_1 \wedge e_2 = I , then this is In more dimensions, to calculate the wedge product of several vectors in terms of coordinates, you can arrange the coordinates into a matrix and take its determinant. Perhaps you have learned about the connection of the determinant to (any-dimensional) volumes; the wedge product of several vectors has that same connection.

a \wedge b = \left(\alpha_1 \beta_2 - \alpha_2 \beta_1\right) I

Deriving the law of sines

With the wedge product at our disposal, we can now derive the law of sines with some simple algebra.

Given

a + b + c = 0

wedging both sides with a gives

a \wedge (a + b + c) = a \wedge a + a \wedge b + a \wedge c = 0

Using anti-commutativity

a \wedge a = 0,\quad a \wedge c = -c \wedge a

this equation reduces to

a \wedge b = c \wedge a

A similar simplification of

b \wedge (a + b + c) = 0

gives

a \wedge b = b \wedge c

and so we have the 3-way equality

a \wedge b = b \wedge c = c \wedge a

and we will see below that this is essentially equivalent to the law of sines.

In pictures of parallelograms, this is Here’s a Desmos Geometry construction that combines this figure and a couple others.

Equal-area parallelograms representing a wedge b, b wedge c, and c wedge a.

Geometrically, the areas of these parallelograms are equal because each of them is made out of two copies of the same triangle. This also implies that the area of the parallelograms is twice the area of the triangle. Notice that the parallelograms aren’t congruent, though, because the triangles are joined along different edges in each case.

It’s a short distance from here to the traditional law of sines. Just substitute in the “lengths and angles” expression for each wedge product:

\begin{aligned} a \wedge b &= |a| |b| \sin(C) I \\ b \wedge c &= |b| |c| \sin(A) I \\ c \wedge a &= |c| |a| \sin(B) I \\ \\ |a| |b| \sin(C) I =&\ |b| |c| \sin(A) I = |c| |a| \sin(B) I \end{aligned}

and then divide each term by |a| |b| |c| I :

\frac{\sin C}{|c|} = \frac{\sin A}{|a|} = \frac{\sin B}{|b|}

It’s hard to interpret this traditional form of the law of sines in terms of a figure because each of these terms has units of 1/length, and it’s not so clear how to draw anything with those units.

Taking the reciprocal of each term fixes the units. Then dividing through by 2 produces another equivalent version of the law of sines that does have an interesting geometric interpretation:

\frac{|c|}{2\sin C} = \frac{|a|}{2 \sin A} = \frac{|b|}{2 \sin B} = \rho

where \rho is equal to the radius of the circle that circumscribes the triangle (that is, that passes through each of its vertices) I have tried and not yet succeeded at coming up with any geometric intuition about why a factor of |a| |b| |c| connects the area of a triangle with the radius of its circumcircle. I’d be grateful for any leads. :

Radius of the triangles circumcircle

I won’t show that this is true, but if you want to try it, it’s a nice exercise in classical trigonometry. In any case, I think it’s clear that the geometric interpretation of the area form of the law of sines is simpler: three ways of calculating the area of the same triangle are equivalent (no circles needed).

Another nice thing about the area form of the law of sines is that it handles degeneracies, where one of the sides has length 0 or where the sides are parallel and so the triangle sits on a single line, without having to worry about dividing by 0 .

For example, the area form of the law of sines says that if two sides of a triangle are parallel, so that their wedge product is 0 , then the third side must also be parallel to these two.

\begin{aligned} a \wedge b &= b \wedge c \\ a \wedge b = 0 &\Leftrightarrow b \wedge c = 0 \\ \end{aligned}

If the wedge product of two vectors is zero, a \wedge b = 0 , then the vectors are proportional to one another, a \propto b , and the lines that the vectors span are parallel, a \parallel b . So here are 3 ways of saying the same thing in different languages:

\begin{aligned} a \wedge b = 0 &\Leftrightarrow b \wedge c = 0 \\ a \propto b &\Leftrightarrow b \propto c \\ a \parallel b &\Leftrightarrow b \parallel c \\ \end{aligned}

Conclusion

My goal in this series of posts is to elaborate the idea that the simple vector equation

a + b + c = 0

contains all the facts of classical trigonometry, if you have a rich enough vector algebra to extract these facts.

So far, we’ve seen how to get the law of cosines using the dot product (solve for c , square both sides), and how to get the law of sines using the wedge product (wedge both sides with a , equate the remaining two terms).

Some of what remains to be said will require the geometric product, which unites the dot product and wedge product together.

Thank you to Jaime George for editing this post, and in particular, for helping me re-think the section on coordinates.

Lawnmower Puzzle Solution

Published: Sun, 23 Apr 2017 00:00:00 -0700
Updated: Sun, 23 Apr 2017 00:00:00 -0700
UTC: 2017-04-23 07:00:00+00:00
URL: http://www.shapeoperator.com/2017/04/23/lawnmower-puzzle-solution/

I recently posted a geometry puzzle about an autonomous lawn mower steered by a rope and a peg. How much rope remains unspooled from the peg when the mower collides with it? If you haven’t seen the puzzle yet, go check out last week’s post and give it a try.
Content Preview

I recently posted a geometry puzzle about an autonomous lawn mower steered by a rope and a peg. How much rope remains unspooled from the peg when the mower collides with it? If you haven’t seen the puzzle yet, go check out last week’s post and give it a try.

Lawnmower puzzle

I received two very nice (and satisfyingly different) solutions from Suzanne von Oy and Patrick Honner Suzanne von Oy ( blog , twitter ) and Patrick Honner ( blog , twitter ) are both inspiring educators, and I’m honored that they took the time to submit solutions. You should check out their writing. :

I’ll begin my solution by stating some assumptions and naming some quantities.

  • The mower is a square with side length 2m
  • The peg is a circe of radius r
  • The length of unspooled rope is l
  • The rope is attached to the mower at its center
Lawnmower variable names

In the figure, I’ve also drawn in two right angles, and seeing these is a lot of the work of solving the problem.

  • The rope is tangent to the peg at the point that they lose contact, and is therefore always perpendicular to the radius connecting the center of the peg to this point.
  • The side of the mower is always perpendicular to the rope (because the wheels on the mower are perpendicular to the rope, and the sides are parallel to the wheels).

Together, these assumptions mean that the side of the mower (and also its midline) is parallel to the radius of the peg that passes through the loss-of-contact point, and from this observation, we’ll just need to draw a couple of rectangles and write down the solution, more or less.

Once you’ve done enough problems like this, writing down these kinds of right angles starts to become habitual, and you don’t really need to carefully justify to yourself why you’re doing it every single time. But it’s worth remembering that these right angle conditions are not trivial, and there are plenty of similar looking problems where they wouldn’t be right!

Both of these right angles depend crucially on the fact that the rope is under tension .

The rope certainly can’t penetrate the peg, so that eliminates half the directions it could conceivably take on at the loss-of-contact point, but a slack rope could leave at any angle that doesn’t cause it to penetrate the peg. Under tension, however, the rope will spool or unspool itself until the loss of contact occurs at a point where the tangent direction to the peg is parallel to the tension. Because the peg is a circle (and only for this reason!), tangent directions to its perimiter are always perpendicular to radii through its center.

Similarly, if the rope weren’t under tension, it could make any angle with the side of the mower. Just imagine pointing the mower directly toward the peg. It would happily head directly there, maybe running over and chewing up the slack rope on its way. So we have to start the mower off pointed in a direction that produces tension, and then the tension will steer the mower in a way that maintains the tension.

I can go on. We’re effectively assuming the mower wheels roll without slipping. If you put the mower on very slick ice and gave it a push, it could wrap itself around the peg with its center following exactly the same trajectory but without ever changing the direction it’s pointing (its attitude). And we’re assuming that the rope has no thickness, and no stiffness (i.e. provides no resistance to a bending force).

Of course it isn’t news that puzzles are about idealized models (so are scientific theories, by the way), and I’m belaboring the point just to say that if these right angles don’t seem obvious to you, that’s ok! They really do require some assumptions and justification, and the first time you wrestle with these arguments, you’re in for a good long hard think.

These two right angle conditions hold over the mower’s entire path. The additional condition of contact is that a point on the perimeter of the mower is coincident with a point on the perimeter of the peg, and the mower and peg don’t penetrate one another.

It turns out that there are two cases that need to be treated separately, and that’s a lot of what makes this problem tricky. Either the mower contacts the peg somewhere along the side of the mower, or it contacts the peg at its corner. The first case is simpler to analyze.

Side contact

Lawnmower side contact

When the mower contacts the peg along its side, a square of side length r is formed by four important points: the contact point of the mower with the peg, the center of the peg, the loss-of-contact point of the rope with the peg, and the point where the rope crosses the side of the mower. In this case, the length of unspooled rope at contact is then

l = m + r

Corner contact

Lawnmower corner contact

When the mower contacts the peg at its corner, these same four points form a quadrilateral with 3 known sides, and two right angles.

In general, I might start trying to solve this quadrilateral by disecting it across opposite vertices to make two triangles and then solving the triangles, but the adjacent right angles make this problem simpler than the general case: we can instead divide the quadrilateral into a rectangle and a right triangle with two known sides, and then solve the right triangle using the Pythagorean theorem Patrick Honner took a different approach in his solution : he did divide the quadrilateral along opposite vertices, and then noticed a pair of similar isoceles triangles that also lead to a nice, simple way of solving the problem. .

Lawnmower quadrilateral solution

The only unknown length here, which I’ll call d , is the length of unspooled rope between the loss-of-contact point and the point where the rope crosses the side of the mower. Applying the Pythagorean theorem gives

d = \sqrt{r^2 - \left(r-m\right)^2}

and so the total length of the rope is

l = d + m = \sqrt{r^2 - \left(r-m\right)^2} + m

The two solutions are equivalent when the diameter of the peg is equal to the width of the mower. Under this condition, the quadrilateral of the corner contact case becomes the square of the side contact case. When the peg is smaller than this, contact will occur along a side, and when the peg is larger than this, contact will occur at a corner.

So the full solution is then

l = \begin{cases} r + m & r \leq m \\ \sqrt{r^2 - \left(r-m\right)^2} + m & r \geq m \end{cases}

The corner contact case is really a little silly in this scenario. Your eyeball and intuition will probably tell you that the peg should be smaller than the mower to have a chance of allowing successive paths to be properly adjacent. And to be honest, when we first designed these animations, we missed the corner contact case completely.

We could have just set a lower limit on the peg size and been done with it (the thought crossed our mind), but in these activities, we generally prefer to show you the implications of going a little off the rails rather than steering you hard back onto them.

Anyway, if you took a crack at this problem, I hope you enjoyed thinking about it. I know that I did.

Bonus idea: what if the mower and the peg lived on the surface of a sphere instead of on the plane?

Lawnmower Puzzle

Published: Sat, 08 Apr 2017 00:00:00 -0700
Updated: Sat, 08 Apr 2017 00:00:00 -0700
UTC: 2017-04-08 07:00:00+00:00
URL: http://www.shapeoperator.com/2017/04/08/lawnmower-puzzle/

One of the joys of being an engineer at Desmos is that my collaborators occasionally bring me math problems that they need to solve to make something that they’re building work right. I love tricky little geometry problems, and I’d solve them as a hobby if they weren’t part of my job. When it helps our team get on with the work, so much the better.
Content Preview

One of the joys of being an engineer at Desmos is that my collaborators occasionally bring me math problems that they need to solve to make something that they’re building work right. I love tricky little geometry problems, and I’d solve them as a hobby if they weren’t part of my job. When it helps our team get on with the work, so much the better.

In today’s post, I’d like to share one of these puzzles that came up while building Lawnmower Math , and invite you to solve it yourself.

The activity is centered around a lazy lawnmowing scheme: tie the mower to a peg in the center of the lawn with a length of rope, and then sit back and sip your lemonade as the mower cuts an autonomous spiral in toward the peg (see the video evidence on screen 2 of the activity ). But the peg needs to be just the right size to mow the lawn efficiently with no gaps and not too much overlap between passes. In the activity, students interact with animations to build intuition before developing a mathematical model of how large the peg should be.

Building the animations for this activity brought up a different but related mathematical challenge. You want the mower to stop moving at the moment it collides with the peg. Collision detection can get pretty tricky, and that’s the puzzle I’m posing to you: how much rope remains unspooled from the peg when the mower collides with it?

Lawn mower puzzle

At this point, I’m tempted to state the question formally with named variables, labeled angles and such. But instead, I’ll follow Dan’s lead and leave the work and the joy of formulating the problem to you. In other words, I’ll be less helpful.

Feel free to e-mail solutions to jason@shapeoperator.com. There won’t be any prizes, but I’ll give a shout out to anyone whose solution teaches me something, and I’ll post my own solution some time next week.

Update: my solution is now available.

Geometry, Algebra, and Intuition

Published: Tue, 28 Feb 2017 00:00:00 -0800
Updated: Tue, 28 Feb 2017 00:00:00 -0800
UTC: 2017-02-28 08:00:00+00:00
URL: http://www.shapeoperator.com/2017/02/28/geometry-algebra-intuition/

I have a confession to make: I have always found symbolic algebra more intuitive than geometric pictures. I think you’re supposed to feel the opposite way, and I greatly admire people who think and communicate in pictures, but for me, it’s usually a struggle.
Content Preview

I have a confession to make: I have always found symbolic algebra more intuitive than geometric pictures. I think you’re supposed to feel the opposite way, and I greatly admire people who think and communicate in pictures, but for me, it’s usually a struggle.

For example, I have seen many pictorial “proofs without words” of the Pythagorean Theorem. I find some of them to be quite beautiful, but I also often find them difficult to unpack, and I never really think “oh, I could have come up with that myself.”

Here’s Pythagoras’ own proof Image by William B. Faulk , lifted from the Pythagorean Theorem Wikipedia Page . It’s worth looking at some of the many other visual proofs given there. :

Pythagoras' visual proof of the Pythagorean theorem.

I like this proof a lot. It’s fairly simple to interpret (more so than some of the other examples in the genre), and quite convincing. We have

c^2 = a^2 + b^2

because, along with the same four copies of a triangle, both sides of this equation fill up an identical area.

Even so, it’s odd to me that this diagram involves four copies of the triangle. This is one of those “I couldn’t have come up with this myself” stumbling blocks.

For comparison, I’ll give an algebraic proof Here and throughout I am using the word “proof” quite loosely. Forgive me, I am a physicist, not a mathematician. of the Pythagorean theorem using vectors. The condition that three vectors a , b , and c traverse a triangle is that their sum is zero:

a + b + c = 0

Solving for c gives

c = -(a+b)

and then dotting each side with itself and distributing gives

\begin{aligned} c \cdot c &= \left(a+b\right) \cdot \left(a + b\right) \\ &= a \cdot a + a \cdot b + b \cdot a + b \cdot b \\ &= a^2 + b^2 + 2 a \cdot b \end{aligned}

The condition that vectors a and b form a right angle is just that a \cdot b = 0 , and in that special case, we have the Pythagorean theorem:

c^2 = a^2 + b^2

The thing I like about this algebraic manipulation is that it is a straightforward application of simple rules in sequence. There are dozens of ways to arrange 4 congruent triangles on a page (probably much more than dozens, really), but the algebra feels almost inevitable It does take practice to get a feel for which rules to apply to achieve a given goal, but there are really only a few rules to try: distributivity, commutativity, associativity, linearity over scalar multiplication, and that’s about it. .

  1. Write down the algebraic condition that vectors forming the sides of any triangle must satisfy.
  2. We’re interested in a function of one of the side vectors, c^2 , so we solve for c and apply the function to both sides.
  3. We transform the right hand side by applying distributivity of the dot product across addition, and commutativity of the dot product, i.e. a \cdot b = b \cdot a .
  4. Right triangles in particular are a simplifying special case where one term drops out.

I also think it’s important that the algebraic manipulation embeds the Pythagorean theorem as a special case of a relationship that holds for all triangles: the Law of Cosines The following diagram shows the relationship between the vector form of the law of cosines, c^2 = a^2 + b^2 + 2 a \cdot b , and the angle form of the law of cosines, c^2 = a^2 + b^2 - 2 |a||b|\cos C In the angle form, C is an interior angle, but in the vector form, if a \cdot b = |a||b|\cos(\theta_{ab}) , then \theta_{ab} is an exterior angle. This is the origin of the difference in sign of the final term between the two forms. . If you have a theorem about right triangles, then you’d really like to know whether it’s ever true for non-right triangles, and how exactly it breaks down in cases where it isn’t true. Perhaps there’s a good way to deform Pythagoras’ picture to illustrate the law of cosines, but I don’t believe I’ve seen it.

For these reasons, I’ve generally been satisfied with the algebraic way of thinking about the Pythagorean theorem. So satisfied, I recently realized, that I’ve barely even tried to think about what pictures would “naturally” illustrate the algebraic manipulation.

In the remainder of this post, I plan to remedy this complacency.

The first illustration challenge is that it’s actually kind of subtle to draw a picture that represents the magnitude of the dot product between two different vectors. E.g. given a picture showing two different vectors a and b , how exactly do you draw a picture of their dot product, a \cdot b ? The dot product is related to the orthogonal projection of one vector onto the other, but it is not equal to that projection.

Orthogonal projections:

Projection

The issue is that the dot product is linear in the magnitude of both a and b , but each orthogonal projection (there are two of them) is linear in the magnitude of only one of the two vectors, and is independent of the other.

The dot product is also related to the cosine of the angle between the vectors, but again, it is not equal to either this angle or its cosine.

To make progress here, it’s useful to step back and think about exactly what pictorial proofs of the Pythagorean theorem represent.

Squares of triangle legs

These pictures have a common feature: they represent a^2 , b^2 , and c^2 , which we might otherwise think of as plain old numbers on the number line, by areas. And not just any areas, but squares: areas spanned by one vector, and another orthogonal vector with the same length.

Taking a leap of faith, we can try to extend this prescription for pictorially representing the square of a vector (the dot product of a vector with itself) to the dot product of a vector with a different vector. To pictorially represent the dot product, a \cdot b , form the parallelogram spanned by one of the vectors and a copy of the other vector rotated by 90 degrees.

Parallelogram dot product

This turns out to work, and I’m planning to give a few different reasons why in a future post. For now, you might like to try to come up with your own proof, or you might like to just take my word for it.

Applying this insight to the law of cosines gives the following pictorial formula:

Given

a + b + c = 0
Triangle only

Then

c^2 = a^2 + b^2 + 2 a \cdot b
Law of Cosines as separate pieces

This is fine, but the difficulty is that there are again many, many possible ways to arrange this combination of three different squares and two copies of the same parallelogram generated by the legs of our triangle, and most of them don’t make the truth of the formula visually apparent.

We can improve the situation by taking inspiration from the algebra to “derive” a sensible diagram through a series of steps.

Step 1

Since c = -(a + b) , dotting on both sides by c gives c \cdot c = -(a + b) \cdot c and distributing gives c \cdot c = - a \cdot c - b \cdot c

First use of distributive property

You can see that this is true pictorially by dissection: if you cut a copy of the triangle out of the top of the square and paste it onto the bottom, you get a figure that’s the union of two parallelograms which must therefore have the same total area as the square; but it is also an expression of the fundamental algebraic property of distributivity Seeing this geometric property as algebraic distributivity makes it much easier to extend to cases involving products where each term might contain the sum of two or more vectors. As the situation becomes more complicated, reasoning by pictorial dissection gets out of hand faster than expanding algebraic products. .

Step 2

Now use exactly the same trick to expand the c_{\perp} legs of each parallelogram into -a_{\perp} - b_{\perp} and distribute.

In algebra:

\begin{aligned} c \cdot c &= -a \cdot c -b \cdot c \\ &= a \cdot \left(a + b\right) + b \cdot \left(a + b\right) \\ &= a \cdot a + a \cdot b + b \cdot a + b \cdot b \end{aligned}

In a picture:

Full Construction

So here we finally do have a diagram that makes the law of cosines visually apparent.

The gray square is equal to the sum of the red and blue squares and parallelograms because we can form one from the other by dissection. Cut out a triangle from the top of the the gray square, and place it on the bottom. Cut out another copy of this triangle (rotated by 90 degrees) from the right side of the gray square, and place it on the left side. This procedure produces the area filled by the red and blue squares and parallelograms.

There are some nice special cases of this picture.

When a and b form a right angle, the parallelograms disappear, and we have a pictorial demonstration of the Pythagorean Theorem.

Right Angle Case

I find this one a little trickier to understand than Pythagoras’ picture in the introduction, but it works according to the same dissection argument that makes the general case work, and it has the advantage of being embedded as a special case of a diagram that depicts the law of cosines.

When a , b and c become proportional to one another (i.e. when they are all parallel or anti-parallel), we have a pictorial demonstration of the scalar identity

(|a| + |b|)^2 = |a|^2 + |b|^2 + 2|a||b|
Scalar Case

So far, all the triangles I have shown have either been obtuse (they have one interior angle larger than 90 degrees), or right (they have one angle equal to 90 degrees). In the case of an acute triangle (all interior angles less than 90 degrees), the diagram becomes more complicated:

Acute Case

In the acute case, the dot product a \cdot b becomes negative, and so the parallelograms must be interpreted as having negative area.

This case shows the importance of taking orientation into account. If you take care to orient the sides and faces consistently, the parallelogram faces will be oriented clockwise when the square faces are oriented counterclockwise, and so their areas will have opposite signs. Failing to recognize orientation in geometry has similar consequences to failing to recognize negative numbers in arithmetic: an unnecessary proliferation of special cases in algebra and theorems.

Conclusion

I’m very happy to have discovered a picture that nicely illustrates how the Pythagorean theorem is embedded into the law of cosines, and that corresponds so closely to the algebraic way of thinking. I’d like to explore this process of “deriving pictures” further and see what I can find. I’d also like to tell you more—much, much more—about the Geometric Algebra behind the correspondence between dot products and areas of associated parallelograms, but I think I will save that for another day.

Though I still feel more comfortable “doing algebra” than “doing pictures”—I doubt this will ever really change—struggling to put the two in close correspondence has strengthened my understanding of both.

Algebra is a powerful tool for manipulating ideas, but pictures often seem to be the more powerful tool for communicating ideas. And there’s no better way to improve your own understanding of something than to attempt to communicate it.

Or as Hestenes put it, “ Geometry without algebra is dumb! Algebra without geometry is blind!

Thanks to Jacob Rus for an interesting discussion that led me to write these ideas down carefully, Bret Victor for inspiring me to work harder at making diagrams, and especially to Jaime George, who has edited this post and also all of my other posts, and who I should have thanked sooner for doing this!

Updates

Nick Jackiw points to some prior art:

Your dissection with the vanishing parallelograms is very close to Erwin Dintzl (1939) . The late 19th and early 20th century, basically up to the time Hilbert’s ideas spread into curricula of the German gymnasiums, is an incredibly fertile time for geometric construction/interpretation/dissection of more modern results.


Daniel Scher writes on Twitter :

Dan Bennett has a nice activity in Pythagoras Plugged In that focuses on this visualization of the Law of Cosines.

(See the Tweet for a photograph of the relevant book page.)


Several folks contributed interactive versions of the Law of Cosines construction:

Sunset Geometry

Published: Mon, 12 Dec 2016 00:00:00 -0800
Updated: Mon, 12 Dec 2016 00:00:00 -0800
UTC: 2016-12-12 08:00:00+00:00
URL: http://www.shapeoperator.com/2016/12/12/sunset-geometry/

Robert Vanderbei has written a beautiful series of articles and talks about a method for finding the radius of the earth based on a single photograph of a sunset over a large, calm lake.
Content Preview

Robert Vanderbei has written a beautiful series of articles and talks about a method for finding the radius of the earth based on a single photograph of a sunset over a large, calm lake.

Vanderbei’s analysis is an elegant and subtle exercise in classical trigonometry. In this post, I would like to present an alternative analysis in a different language: Geometric Algebra. I believe that geometric algebra is a more powerful system for formulating and solving trigonometry problems than the classical “lengths and angles” approach, and it deserves to be better known. Vanderbei’s sunset problem is simple to understand and challenging to solve, so it makes a nice benchmark.

Here’s Vanderbei’s sunset problem. If the earth was flat, photographs of the sun setting over water would look like this:

Flat earth sunset diagram

Notice that the reflection dips just as far below the horizon as the sun peaks above it.

Actual photographs of the sun setting over calm water ( like Vanderbei’s Update: I should have been more careful to note that most photographs of sunsets over water actually don’t look like Vanderbei’s photograph (or my diagram) because of waves and atmospheric effects, and that sensor saturation artifacts make it hard to interpret images like this. Reproducing Vanderbei’s image may be somewhere between hard and impossible. More Below . ) look more like this:

Round earth sunset diagram

Notice the shortened reflection. This happens because of the curvature of the earth, and by measuring the size of this effect, it is possible to infer the ratio of the earth’s radius to the camera’s height above the water The main virtue of Vanderbei’s method is that the evidence is so directly visual (and that you can collect it with a smart phone on vacation). If you want to make a simpler and better measurement with a similar flavor, climb a mountain and use an astrolabe ; the math is simpler and the measurement will be more accurate. .

Vanderbei calls the angle of the sun above the horizon \alpha , and the angle of the sun’s reflection below the horizon \beta . With geometric algebra at our disposal, it’s often algebraically simpler to work with unit directions than angles, so I will also label unit directions from the camera to the top of the sun, s , the horizon, t , and the bottom of the sun’s reflection from the water, w .

Labeled round earth sunset diagram

To analyze this problem, it’s helpful to consider a side view:

Side view sunset diagram

There are two important triangles in this diagram: the triangle formed by the center of the earth, the camera, and the horizon (drawn in red), and the triangle formed by the center of the earth, the camera, and the reflection point where the top of the sun reflects off the water (drawn in green).

The triangle equations

Triangles have a very simple algebraic representation in terms of vectors In this post, I am following the common geometric algebra convention of writing vectors as plain, lower-case letters, and using Greek letters for scalars. This takes a little getting used to if you are accustomed to bold face or over-arrows for vectors, but skipping all the decorations makes it simpler to work with lots of vectors. :

\begin{aligned} r_1 - d_1 = p \\ r_2 - d_2 = p \end{aligned}

These simple sums of vectors subsume all the information about the relationships of lengths and angles that is expressed in classical trigonometry through “soh-cah-toa”, the triangle postulate (sum of interior angles is 180 degrees), the Pythagorean theorem, and the laws of cosines and sines. Quite an improvement.

It will be useful to re-express d_1 and d_2 in terms of the unit directions defined previously in order to relate other vectors to known directions:

(1a)

r_1 - |d_1|t = p

(1b)

r_2 - |d_2|w = p

In other words, d_1 is directed toward the horizon, and d_2 is directed toward the bottom of the reflection from the water.

Besides these triangles, there are a few salient geometric facts:

The horizon condition

The line of sight to the horizon is tangent to the earth at the horizon, and is therefore perpendicular to the radius of the earth through the horizon.

(2)

r_1 \cdot d_1 = r_1 \cdot t = 0

The reflection condition

In terms of angles, this is expressed as “angle of incidence equals angle of reflection”. In terms of vectors, it can be restated as

r_2 \cdot s = - r_2 \cdot w

or

(3)

r_2 \cdot (s + w) = 0

Known lengths

We know the lengths of some of these vectors in terms of the earth’s radius, \rho , and the height of the camera above the shoreline, \delta ,

(4a)

r_1^2 = r_2^2 = \rho^2

(4b)

p^2 = (\rho + \delta)^2

(4c)

s^2 = w^2 = t^2 = 1

Squaring both sides of the first triangle equation (1a), and using the horizon condition (2) (or equivalently, using the Pythagorean theorem) also allows finding the length of d_1 :

(4d)

d_1^2 = p^2 - r_1^2 = (\rho + \delta)^2 - \rho^2

Equations (1-4) contain all of the geometrical information I assumed one other important piece of geometrical information by writting “s” in two places on the side-view diagram. This corresponds to the (excellent) approximation that the sun is very far away compared to other lengths. needed to solve algebraically for the Earth’s radius, \rho , in terms of the given angles/directions ( \alpha and \beta , or s , w , and t ) and the height of the camera above the shoreline, \delta .

Introducing Geometric Algebra

So far, I have formulated everything in terms of vector algebra that should look familiar to students of physics or engineering (Gibbs vector algebra). To actually solve the equations, I will use a few additional notions from geometric algebra.

Geometric algebra is the answer to the question “what if I could multiply and divide by vectors?” It introduces a new associative (but non-commutative) invertible product between vectors: the geometric product. The geometric product between vectors a and b is simply written ab . The geometric product of a vector with itself equals a scalar (the square of the length of the vector),

aa = a^2 = |a|^2

This fact, combined with associativity and the other familiar rules for multiplication, is enough to define the geometric product uniquely.

The symmetric and anti-symmetric parts of the geometric product have important geometric meaning, and are traditionally given their own special symbols Physicists may be mystified to realize that, based on this definition of the geometric product, parallel vectors commute, and perpendicular vectors anti -commute. What else does that remind you of? :

\begin{aligned} (ab + ba)/2 & = a \cdot b = b \cdot a \\ (ab - ba)/2 & = a \wedge b = - b \wedge a \end{aligned}

I will assume that the dot product, a \cdot b , is familiar: it is related to the projection of one vector onto another, and proportional to the cosine of the angle between them.

The wedge product, a \wedge b , is probably only familiar if you have studied differential forms (or geometric algebra, of course), but it is very similar to the more familiar cross product, a \times b . It represents the directed area of the parallelogram spanned by two vectors, and is proportional to the sine of the angle between them Anti-symmetry and bi-linearity are exactly what is needed to represent area: a vector spans no area with itself (anti-symmetry), and the area of a parallelogram scales linearly with the lengths of each of its legs (bi-linearity).

The wedge product is extremely useful in linear algebra because it represents linear subspaces spanned by any number of vectors in a way that can be manipulated algebraically.
. Whereas the cross product represents directed area by a vector orthogonal to the area (a trick that works only in 3 dimensions), the wedge product represents a directed area by a different kind of object called a "bivector." The wedge product is associative (like the geometric product, but unlike the cross or dot products), and the wedge product of more than two vectors builds objects of higher "grades." The wedge product between 3 vectors is a trivector representing a directed volume (of the parallelepiped spanned by them), and the wedge product between k different vectors is a k-vector representing a directed k-dimensional volume (which is always zero in spaces of dimension less than k).

We can turn these definitions around to write the geometric product in terms of the dot and wedge products,

ab = a \cdot b + a \wedge b = \left\langle a b \right\rangle_0 + \left\langle a b \right\rangle_2

where \left\langle a b \right\rangle_0 and \left\langle a b \right\rangle_2 are notations for “the scalar part” and “the bivector part”.

There is a strange thing about this object: it represents the sum of two different “kinds of things,” a scalar and a bivector. But this should be no more troubling than the fact that a complex number represents the sum of a “real number” and an “imaginary number,” (in fact, there is an extremely close relationship between complex numbers and the geometric product of two vectors). With experience, it becomes clear that a sum of a scalar and a bivector is exactly what is needed to represent the product of two vectors in an associative, invertible way.

The geometric product gives us two new super powers when working with vector equations:

Solving equations involving products of vectors.

Given an equation for two different products of vectors

ab = cd

if b is known, we can solve for a by right-multiplying by b^{-1} (i.e. dividing by b ).

a = cdb^{-1}

b^{-1} is well-defined by demanding

bb^{-1} = 1

left-multiplying by b

b^2 b^{-1} = b

and dividing through by the scalar b^2

b^{-1} = \frac{b}{b^2}

Contrast this to the dot product and the cross/wedge product. In general, even when b is known, it is not possible to uniquely solve any one of the following equations for a .

\begin{aligned} a \cdot b & = c \cdot d \\ a \wedge b & = c \wedge d \\ a \times b & = c \times d \end{aligned}

The first equation only determines the part of a that is parallel to b , and the second two equations only determine the part of a that is perpendicular to b . You need both of these to solve for all of a , and that’s what the single geometric product gives you.

Transitive relationships between vectors

It frequently occurs that we know the relationships of two vectors a and b to a third vector c , and we would like to use this information to determine the relationship between a and b . Algebraically, we can take the unknown product

ab

and insert the identity

cc^{-1} = \frac{cc}{|c|^2} = 1

between the factors and re-associate

ab = a \left(cc^{-1}\right) b = \left(a c\right) \left(c^{-1} b\right) = \frac{1}{|c|^2}(ac)(cb)

thus re-expressing the unknown product ab in terms of the known products products ac and cb .

Because cc^{-1} = 1 , we can insert it anywhere that it’s convenient in any product of vectors. This has the same practical effect as resolving vectors into parts parallel and perpendicular to c. This is an example of a very general technique that shows up in many forms throughout mathematics: inserting an identity to resolve a product into simpler pieces.

We will use this trick twice below at a critical moment in solving the sunset problem.

Reformulating the horizon and reflection conditions

In order to make efficient use of geometric algebra’s tools, it is useful to reformulate equations (2) and (3) (the horizon and reflection conditions) in terms of the geometric product instead of the dot product.

Horizon condition

Consider the geometric product

r_1 t = r_1 \cdot t + r_1 \wedge t = r_1 \wedge t

where the first equality is an expression of the general vector identity ab = a \cdot b + a \wedge b , and the second equality follows from r_1 \cdot t = 0 , our previous form of the horizon condition (2).

In two dimensions, there is only one unit bivector, called I , spanned by any two orthogonal unit vectors e_1 and e_2 :

e_1 \wedge e_2 = e_1 e_2 = I

Therefore r_1 \wedge t is proportional to I , and since r_1 and t are orthogonal, we can write

(2’)

r_1 t = r_1 \wedge t = |r_1||t|I = |r_1|I = \rho I

Reflection Condition

Our previous version of the reflection condition (3) also has the form of an orthogonality condition:

r_2 \cdot (s + w) = 0

so, similarly to the way we rewrote the horizon condition, we can rewrite this in terms of the geometric product as There’s another way to write reflections in geometric algebra that shows up more commonly: r_2 s = - w r_2 or s = - r_2^{-1} w r_2 = - r_2 w r_2^{-1} .

This other form is very useful for composing reflections into rotations, or factoring rotations into reflections, but the form we use here involving forming an orthogonal vector will be more convenient when it comes time to solve for r_2 .

r_2 (s + w) = |r_2| |s + w| I

It will simplify later algebra to define a new unit vector based on this equation:

g \equiv \frac{ s + w }{|s + w|}
g^2 = 1

so that the reflection condition becomes

(3’)

r_2 g = |r_2| |g| I = |r_2| I = \rho I

Solving for the Earth’s radius

Now that we have rewritten our main geometric conditions in terms of the geometric product instead of the dot product, we are ready to solve the triangle equations.

First, eliminate p and set the left hand sides of (1a) and (1b) equal to one another This equation involving a sum of four vectors is a “quadrilateral equation” in exactly the same sense as our earlier triangle equations: it expresses the fact that the red vectors and green vectors in our diagram form a quadrilateral. :

r_1 - |d_1| t = r_2 - |d_2| w

The magnitude |d_2| is unknown, so we could proceed by solving for it, but it is more efficient to simply eliminate it in the following way. First, multiply both sides of the equation on the right by w :

r_1 w - |d_1| t w = r_2 w - |d_2| w^2

Now we can use “grade separation” to separately consider the scalar and bivector parts of this equation. Since w^2 is a scalar, the |d_2| dependence drops out of the bivector part:

\left\langle r_1 w - |d_1| t w \right\rangle_2 = \left\langle r_2 w \right\rangle_2

Rearranging to isolate the |d_1| term gives

|d_1| \left\langle t w \right\rangle_2 = \left\langle r_1 w - r_2 w \right\rangle_2

We can now take advantage of the horizon and reflection conditions to rewrite the unknown products r_1 w and r_2 w in terms of known products by inserting factors of t t = 1 and g g = 1 and re-associating (this is the second “super power” from the introduction above):

|d_1|\left\langle t w \right\rangle_2 = \left\langle r_1 t t w - r_2 g g w \right\rangle_2

We can simplify both r_1 t and r_2 g to \rho I using the horizon (2’) and reflection (3’) conditions:

|d_1| \left\langle t w \right\rangle_2 = \rho \left\langle I t w - I g w \right\rangle_2

Now expanding the geometric products of vectors into dot and wedge product gives

|d_1| t \wedge w = \rho I (g \cdot w - t \cdot w)

where I have dropped terms like \left\langle t \cdot w\right\rangle_2 = 0 and \left\langle I (t \wedge w - g \wedge w)\right\rangle_2 = 0 because they contain no part with grade 2.

We can take advantage of the known length |d_1|^2 derived as (4d) by taking the magnitude squared of both sides:

\begin{aligned} |d_1|^2 |t \wedge w|^2 & = \rho^2 \left(g \cdot w - t \cdot w\right)^2 \\ |d_1|^2 & = \rho^2 \left[ \frac{\left(g \cdot w - t \cdot w\right)^2}{|t \wedge w|^2}\right] \end{aligned}

To simplify further algebra, for the term in brackets, introduce

\epsilon^2 \equiv \frac{(g \cdot w - t \cdot w)^2}{|t \wedge w|^2}

which is written entirely in terms of known products of directions. This gives

|d_1|^2 = \rho^2 \epsilon^2

Now substituting from (4d) for |d_1|^2 gives

(\rho + \delta)^2 - \rho^2 = \rho^2 \epsilon^2

and dividing through by \rho^2 gives

\left(1 + \frac{\delta}{\rho}\right)^2 - 1 = \epsilon^2

and finally, we are able to solve for \rho , the radius of the earth I (and Vanderbei) have chosen a positive square root here. What, if anything, does the negative square root represent?

(5)

\rho = \frac{\delta}{\sqrt{1 + \epsilon^2} - 1}

We can recover Vanderbei’s final answer by rewriting \epsilon^2 in terms of angles using the following relationships:

\begin{aligned} t \cdot w & = \cos(\beta) \\ t \wedge w & = \sin(\beta) I \\ g \cdot w & = \cos(\gamma) = \cos\left(\frac{\alpha + \beta}{2}\right) \end{aligned}

so

\epsilon^2 = \frac{\left(\cos(\gamma) - \cos(\beta)\right)^2}{\sin(\beta)^2}

From this and (5), it is a simple exercise in trigonometric identities to recover the form given on slide 28 in Vanderbei’s talk , but there are two better forms that I will show instead.

First, a small angle form.

Using the general approximations

\begin{aligned} \sqrt{1+x^2} &\approx 1 + x^2/2 &\mathrm{for}\quad x \ll 1 \\ \sin{\theta} &\approx \theta &\mathrm{for}\quad \theta \ll 1 \\ \cos{\theta} &\approx 1 - \theta^2/2 &\mathrm{for}\quad \theta \ll 1 \\ \end{aligned}

we can simplify (5) to

(5’)

\rho \approx \frac{2}{\epsilon^2}\delta \approx \frac{8\beta^2}{\left(\beta^2 - \gamma^2\right)^2} \delta

The angles in Vanderbei’s photo are quite small, so this approximation is accurate to better than one part in a million It’s also simpler than other small angle approximations previously given by Vanderbei. . Since the uncertainty in the angles and camera height are about 10% each, this small angle approximation is certainly sufficient.

In fact, when calculating with rounded (floating point) arithmetic, the small angle form (5’) is actually more accurate than the exact form given by Vanderbei. This counterintuitive fact occurs because the exact form suffers from “catastrophic cancellation,” the result of subtracting approximately equal values that have been computed with rounding error.

We can get rid of one place such cancellation occurs by multiplying the numerator and denominator of our exact expression (5) by 1 + \sqrt{1 + \epsilon^2} to get

(5’’)

\rho = \frac{ \sqrt{ 1 + \epsilon^2 } + 1 }{\epsilon^2}\delta

and we can get rid of another source of cancelation by replacing the difference of cosines in our expression for \epsilon^2 with a difference of sines using the general trigonometric identity This identity is related to the classical Haversine Formula from spherical trigonometry. Evelyn Lamb has written a wonderful blog post on this and other creatures in the zoo of forgotten trigonometry functions .

\cos(\theta) = 1-2\sin ^2\left(\frac{\theta}{2}\right)

so that:

\epsilon^2 = 4 \frac{\left(\sin^2\left(\frac{\gamma}{2}\right) - \sin^2\left(\frac{\beta}{2}\right)\right)^2}{\sin(\beta)^2}

This is helpful because we’re now subtracting two numbers that are very close to 0 instead of very close to 1, and intermediate rounding to a fixed number of digits throws away less information for numbers that are close to 0 than for numbers that are close to 1.

Plugging this expression for \epsilon^2 into the non-cancelling form for \rho (5’’) now allows computing \rho without undue rounding issues.

To evaluate \rho in terms of measured parameters, insert the following values into either (5’) or (5’’)

\begin{aligned} \alpha &= 69\ \mathrm{px} \cdot 0.5^{\circ}/317\ \mathrm{px} &=& 0.001899\ \mathrm{rad}\\ \beta &= 29\ \mathrm{px} \cdot 0.5^{\circ}/317\ \mathrm{px} &=& 0.0007983\ \mathrm{rad}\\ \gamma &= \frac{\alpha + \beta}{2} &=& 0.001349\ \mathrm{rad}\\ \epsilon^2 &= 5.482 \times 10^{-7} \\ \delta &= 7\ \mathrm{ft} \\ \rho &= 2.55 \times 10^{7}\ \mathrm{ft} &=& 4836\ \mathrm{mi} \end{aligned}

This is 20% larger than the true value, 3960 mi It’s also different in the second digit from the answer given by Vanderbei (slide 28). I think this is attributable to the “catastrophic cancellation” discussed above, combined with low precision calculation. . Not too bad.

Comparison to other systems

Classical trigonometry

At this point, perhaps you are thinking that I’m crazy to believe that the analysis above is simpler than doing classical trigonometry. We learned trigonometry in high school, and it wasn’t all that hard, right? And the geometric algebra analysis involves a bunch of unfamiliar notations, keeping track of non-commuting multiplications, and the new geometric notion of “bivector On the other hand, it does give you algebraic super powers. .”

But the classical trigonometry analysis of this problem is hard . Harder than the trigonometry problems that you solved in high school. If you don’t believe me, take a crack at solving it without referring to Vanderbei’s analysis. Or even just follow along with the talk and fill in all the algebra.

The subtlest part of Vanderbei’s formulation of the problem involves noticing a non-trivial relationship between 4 angles:

\phi + \beta = \theta + \gamma

and the subtlest part of solving the problem involves solving the trigonometric equation This equation is transcendental in the angles, but turns out to be algebraic (quadratic) in \cos(\phi) .

\cos(\phi + \beta) = \cos(\phi)\cos(\gamma)

for \cos(\phi) .

Solving difficult trigonometry problems in the classical language tends to involve constantly moving back and forth between algebraic expressions and the diagrams. This is because the full relationships between lengths and angles as separate entities are much more complicated than the relationships between vectors, which keep information about magnitude and direction conveniently integrated together. For this reason, in classical analyses, much of the information about lengths and angles is typically left implicit in diagrams, rather than being written down in an explicitly algebraic form.

In contrast, in the geometric algebra formulation, once the (fairly simple) equations (1-4) were written down, the rest of the solution was entirely algebraic. It also did not involve invoking any laws for relationships between transcendental functions ( \sin and \cos ).

Besides classical trigonometry, there are a few other competing (and partially overlapping) algebraic systems for solving geometrical problems, and all of them are capable of solving this problem.

Gibbs Vector Algebra

You could stick to Gibbs’ vector analysis (dot products and cross products), and use a cross product with w to annihilate |d_2| in a similar way that we used “grade separation”. There is no explicitly algebraic analogue to our trick of inserting factors of gg = 1 and tt = 1 to relate unknown products of two vectors to known products with a third. Even so, you could split r_1 and r_2 into parts parallel and perpendicular to g and t and achieve essentially similar results, with a little bit more of the reasoning left in prose instead of algebra. The major deficiency of Gibbs’ vector analysis is that the cross product is funny in 2D (because it returns an object that lives outside the plane), forces you to think about things like “pseudo-vectors pseudo-vectors are vectors that want to be bivectors. ” in 3D if you consider the behavior of the cross product under transformations, and doesn’t work at all in more than 3 dimensions. But none of those problems is fatal here, and Gibbs’ vector algebra is a good and efficient way to solve plane trigonometry problems. If you know it, use it.

Complex Numbers

Alternatively, you could use complex numbers, and this works very well for problems in the plane. Like geometric algebra, complex numbers provide an associative and invertible product between directed magnitudes in the plane, and there are analogues to all the algebraic tricks we used here. The following mapping is useful for understanding the relationship between systems:

\begin{aligned} a \cdot b &\longleftrightarrow \mathrm{Re}\left(a b^{\dagger}\right) \\ a \wedge b &\longleftrightarrow \mathrm{Im}\left(a b^{\dagger}\right) \\ a b &\longleftrightarrow a b^{\dagger} \end{aligned}

The main deficiencies of complex numbers are that they don’t extend well to three or more dimensions, and that they single out the real axis as a special direction in the plane in a way that isn’t appropriate to problems with rotational symmetry. I also think there isn’t as much of a culture of viewing and understanding complex numbers geometrically as there is for Gibbs vector analysis or geometric algebra. For example, did you know that if two complex numbers are oriented along orthogonal directions, then

\mathrm{Re}\left(a b^{\dagger}\right) = 0

This is an important geometric idea, but I only know it because it falls out of the mapping to geometric algebra.

Rational Trigonometry

NJ Wildberger has recently advocated a system of doing trigonometry called Rational Trigonometry that avoids all use of transcendental functions and many uses of the square root function. It’s a pretty system with some definite merits, and I’d be interested to see someone analyze this problem with it.

Nevertheless, with geometric algebra, we were able to avoid all the same transcendental functions and square roots that Wildberger’s system avoids, and geometric algebra extends more easily to more than two dimensions and is more thoroughly coordinate free. Rational trigonometry also has the same problem as classical trigonometry in that directions and magnitudes are represented and manipulated separately instead of integrated together as vectors.

I wonder what a Rational Trigonometer would do at the step where we use grade separation to annihilate |d_2| , or at the steps where we insert factors of gg = 1 and tt = 1 to decompose unknown vectors against known vectors.

Pauli Matrices

Physicists use Pauli matrices to model 3D geometry in quantum mechanics problems. That system is completely isomorphic to the geometric algebra of 3D vectors, and can (and should) be used completely algebraically without ever introducing explicit matrix representations. But physicists rarely contemplate using Pauli matrices to solve non-quantum mechanical geometry problems because they believe that Pauli matrices are fundamentally quantum objects, somehow related to spin-1/2 particles like the electron. I have never seen someone try to use Pauli matrices to solve a trigonometry problem, but it can certainly be done.

Conclusion

Through years of experience solving physics and computer graphics problems, I have slowly learned to be skeptical of angles. In many problems where your input data is given in terms of coordinates or lengths, it is possible to solve the problem completely without ever introducing angles, and in these cases, introducing angles adds algebraic complications and computational inefficiencies. In 3D, introducing angles also invites gimbal lock .

This is not to say that angles are never useful; quite the contrary Lest I give the wrong impression, geometric algebra is perfectly capable of handling angles, and in fact smoothly extends the way that complex numbers handle angles to more dimensions. It’s just that it also allows you to avoid angles where they aren’t fundamental.

For example, geometric algebra’s version of the Euler formula is

\exp(I\theta) = \cos(\theta) + \sin(\theta)I

where I is any unit bivector, and in more than two dimensions, this formula is valid for each separate plane.
. They are exactly what you need for problems explicitly involving arc lengths on a circle (so especially problems involving rolling discs, cylinders, or spheres), or rotation at a uniform rate, or for interpolating continuously between known discrete rotations. They’re also handy for making small angle approximations. However, for most problems involving areas, projections, reflections, and other simple relationships between vectors (in other words, most problems of trigonometry ), angles are a distraction from more direct solutions.

To say it another way, Wildberger (author of Rational Trigonometry) emphasizes that you don’t need to think about arc lengths on a circle to understand triangles, and on this point I agree with him completely. Of course, you do need to think about arc lengths on a circle to understand problems involving… arc lengths on a circle. For these problems, we should of course know and use angles.

Given this point of view, when I came across Vanderbei’s sunset problem, I thought “there must be a simpler way with fewer angles.” So this is my best attempt, using the most efficient tool I know.

If you didn’t know any geometric algebra before, and you have made it here, thank you and congratulations. I hope I have at least made you more intrigued about its possibilities. It is impossible to teach someone geometric algebra in one blog post (just as it is impossible to teach someone classical trigonometry, complex numbers, or vector analysis in one blog post). The best places I know to learn more are:

  • Hestenes’ Oersted Medal lecture : ( PDF ). This is a compact introduction by the founder of modern Geometric Algebra.
  • Dorst, Fontijne, and Mann, Geometric Algebra for Computer Science . This book is by far the best pedagogical introduction I have seen if you want to actually learn how to calculate things. Unlike most other books on the subject, you don’t need to know any physics (or even care about physics) to appreciate it.

Additionally, Hestenes’ website has many wonderful papers, and I also especially recommend two of his other books for more advanced readers:

  • Space-Time Algebra . Hestenes’ original book on the subject, a very compact, energetic, and wide-ranging presentation with some deep physical applications. Recently re-issued by Birkhäuser.
  • Clifford Algebra to Geometric Calculus . This is a challenging, advanced, and sometimes frustrating reference book, but it presents the subject in far more depth than it has been presented anywhere else. It used to be hard to find, and I suspect that few people have truly read it carefully. But it contains results that will probably continue to be rediscovered for decades. Chapter 7 on directed integration theory is especially notable: it contains extensions of most of the magical integral formulas of complex analysis (like Cauchy’s integral formula) to any number of dimensions (and even to curved manifolds!).

Finally, thank you to Steven Strogatz for first pointing me to this problem (and a related, fiendishly difficult pure trigonometric functions problem ). And of course, thank you to Robert Vanderbei for dreaming up this wonderful problem, and presenting it so beautifully.

Updates

Bret Victor notes on Twitter that many more of the equations in this post could be accompanied by diagrams. Interactive diagrams. His example:

Side view sunset diagram annotated with bivectors.

I’ll list a few relevant resources along these lines:

  • Daniel Fontijne wrote a program called GAViewer to accompany Geometric Algebra for Computer Science (recommended above) that allows visualizing and visually manipulating GA expressions. I wish you could embed it on the web.
  • Apparatus is a very cool environment for drawing interactive diagrams, and I hear you can embed apparatus creations into other web pages now. Maybe someone can teach it how to speak GA!
  • weshoke ported the Versor GA library to Javascript ( Versor.js ) and there are stubs of an expression parser and a canvas renderer written by a few other contributors (including me). Maybe you could be the one to flesh this out!
  • Finally, I’ll mention that Desmos (my employer) is a great way to interact with equations, functions, and graphs (but GA isn’t on our current roadmap).

Jacob Rus writes to suggest a few more resources for learning about GA:

I was a bit surprised you didn’t mention Hestenes’s NFCM book, as that seems closest to the type of problem solving you were doing:

David Hestenes, New Foundations for Classical Mechanics

A couple other sources you might mention:

At a closer to high school geometry level, perhaps also:

New Foundations for Classical Mechanics is the first book I read about GA. It has tons of great diagrams, and gives some interesting new perspectives on well known mechanics problems. This book got me excited about GA, but I personally found Geometric Algebra for Computer Science (recommended above) did a better job teaching me how to calculate things and solve problems on my own.


A few readers have written in with skepticism about whether Vanderbei’s photograph shows what he (and I) said it shows. A quick search on Google Images shows that ripples and waves in water generally play a dominant role in the appearance of the sun’s reflection. Vanderbei addresses waves, atmospheric effects, and CCD saturation in his article and concludes that the reflection in his image does display the geometric effect under discussion, but that he was lucky to obtain such an image.

Randall Munroe writes:

There are some cool and somewhat similar tricks for estimating the size of the Earth by carefully observing the sunset. My favorite is to go to a beach at sunrise and stand atop some steps or a lifeguard stand. When the sun first appears, start a stopwatch. Then run/drop quickly to the ground, and time how long it takes the sun to appear a second time. I think you can use that to work out the Earth’s radius.

Here’s a page spelling out the relevant calculations . If you live on the West Coast, like me, you don’t even have to wake up early to try this.

Optimizing (part of) a Kakuro puzzle solver in Julia

Published: Mon, 02 Nov 2015 00:00:00 -0800
Updated: Mon, 02 Nov 2015 00:00:00 -0800
UTC: 2015-11-02 08:00:00+00:00
URL: http://www.shapeoperator.com/2015/11/02/optimizing-kakuro-in-julia/

Kakuro is a number puzzle that is a bit like a combination between Sudoku and a crossword puzzle. Imagine a crossword puzzle where, instead of words, blocks of boxes are filled with combinations of digits between 1 and 9, and instead of clues about words, you are given sums that a block of digits must add up to.
Content Preview
Solved Kakuro puzzle
A solved Kakuro puzzle.

Kakuro is a number puzzle that is a bit like a combination between Sudoku and a crossword puzzle. Imagine a crossword puzzle where, instead of words, blocks of boxes are filled with combinations of digits between 1 and 9, and instead of clues about words, you are given sums that a block of digits must add up to.

When you’re solving a Kakuro puzzle, it’s helpful to be able to generate all the combinations of m different digits that add up to a given sum. A recent thread on the julia-users mailing list considered how to implement this task efficiently on a computer.

In this post, I’d like to show a progression of a few different implementations of the solution of this same problem. I think the progression shows off one of Julia’s core strengths: in a single language, you are free to think in either a high level way that is close to your problem domain and easy to prototype, or a low level way that pays more attention to the details of efficient machine execution. I don’t know any other system that even comes close to making it as easy to switch back and forth between these modes as Julia does.

Attention Conservation Notice : If you’re looking for information on how to solve Kakuro with a computer, you should probably look elsewhere. This post is a deep dive into a tiny, tiny subproblem. On the other hand, I’ll show how to speed up the solution of this tiny, tiny subproblem by a factor of either ten thousand or a million, depending how you count, so if that sounds fun you’re in the right place.

Step 1: Recursive enumeration

The original solution presented by Patrick Useldinger is based on the following decomposition of the problem:

Suppose, for example, that we want to find a set of 4 different digits between 1 and 9 that sum to 12. Each answer will either include 1 along with 3 other digits between 2 and 9 that sum to 11, or else it won’t include 1, but will instead include 4 digits between 2 and 9 that sum to 12.

In general, a set of m different digits between p and 9 that sum to s either contains p along with a subset of m -1 digits between p +1 and 9 that sum to s-p , or it doesn’t contain p and instead contains m digits between p +1 and 9 that sum to s . In either case, we’ve turned one problem into two new problems with digits drawn from smaller ranges. Like any good recursive algorithm, we’re expressing the solution to the original problem as a simple combination of the solution to successively simpler problems.

To turn this insight into code, we need to make a few choices:

  1. How will we represent and combine the solutions to subproblems?
  2. In what order will we solve the subproblems, and how will we keep track of where we are in this order.

Early discussion focused on point (1), and the relative merits of storing the solution digit sets in a Set , a Vector , or a linked list.

Here’s an example that stores digit sets as a vector of integers, and organizes the process of enumerating solutions as a recursive depth-first search.

function decompose_rec(sum, m)
  # Container to hold all solutions
  solutions = Vector{Int}[]
  # Container to hold digits in a single solution
  partial_solution = Int[]
  lower = 1
  upper = 9
  decompose_rec!(
    solutions, partial_solution,
    sum, m, lower, upper
  )
  solutions
end

# Julia has a convention that functions that mutate
# one of their arguments should end with a "!". The
# argument that will be mutated is usually placed
# first.
function decompose_rec!(
  solutions, partial_solution,
  sum, m, lower, upper
)
  if sum == 0 && m == 0
    # In this case, partial_solution is in fact a
    # full solution. Push it onto solutions.
    push!(solutions, partial_solution)
  elseif sum > 0 && m > 0 && upper >= lower
    # Make an extended partial solution that
    # includes lower. The copy operation is
    # important because we don't want to mutate
    # existing partial solutions.
    extended = copy(partial_solution)
    push!(extended, lower)
    # Find solutions that include lower
    decompose_rec!(
      solutions, extended,
      sum - lower, m - 1, lower + 1, upper
    )
    # Find solutions that don't include lower
    decompose_rec!(
      solutions, partial_solution,
      sum, m, lower + 1, upper
    )
  else
    # stop searching
  end
end

This is a reasonably idiomatic Julia solution. It spells out the types of the solutions and partial_solution containers when they are created, but otherwise doesn’t get overly bossy about annotations. I’m not really using any special features of Julia here, and you could write basically this same code in practically any language.

Let’s see how the performance is looking so far. We’ll benchmark finding the number that can be written in the most different ways as a sum of m different digits:

function benchmark_rec()
 nmax = 0
 solnmax = Vector{Int}[]
 for sum in 1:45, m in 1:9
   soln = decompose_rec(sum, m)
   if length(soln) > nmax
     nmax = length(soln)
     solnmax = soln
   end
 end
 solnmax
end

and the results:

julia> @time benchmark_rec(); @time benchmark_rec()
0.038101 seconds (194.83 k allocations: 13.916 MB, 26.53% gc time)
0.009737 seconds (182.40 k allocations: 13.293 MB, 26.66% gc time)
12-element Array{Array{Int,1},1}:
 [1,2,8,9]
 [1,3,7,9]
 [1,4,6,9]
 [1,4,7,8]
 [1,5,6,8]
 [2,3,6,9]
 [2,3,7,8]
 [2,4,5,9]
 [2,4,6,8]
 [2,5,6,7]
 [3,4,5,8]
 [3,4,6,7]

So there are 12 different subsets of 4 digits between 1 and 9 that sum to 20.

Note a few good benchmarking practices here:

  1. The benchmark is wrapped in a function. Julia currently does a poor job optimizing code that uses non-constant global variables. For this reason, benchmarks should be wrapped in functions so that they only use local variables.
  2. The benchmark uses each result that is produced in the inner loop, and returns a result. This prevents the optimizer from just skipping the loop.
  3. We run the benchmark multiple times. The first run includes time to compile all the code, which you typically want to ignore.

Other early solutions from the mailing list in Python, lisp, and Julia had runtimes of ~500ms to ~10s. At ~10ms, the solution above already compares pretty favorably. A couple reasons that the early Julia solutions were slower are that

  1. Using a Set or using union on a Vector is slower than using push! on a Vector , because Julia needs to check whether the new element is already a member of the set before adding it. Our enumeration is constructed to make this check unnecessary.
  2. Inner functions, anonymous functions, and closures are currently relatively slow in Julia Anonymous functions will probably be fast some day . . If you’re going for performance, just pass all the context you need using function arguments. You can make this more tidy by defining a type to hold the context that’s relevant to a given problem. This is generally preferable to passing around half a dozen positional arguments as I’ve done above. As an added bonus, once you have a type, you will often see useful operations that you can define on it to further simplify and structure your code.

There is probably more room for improvement here. Using a linked list instead of a Vector to represent solutions would eliminate the need to copy partial solutions, because linked lists can share tails with each other. Alternatively, memory usage could be reduced by storing digits in a Vector{Int8} instead of a Vector{Int} , since we’re only storing digits between 1 and 9.

Rather than spend more time experimenting with different containers, I’d like to take a step back and consider some other solution strategies.

Step 2: Filtering Combinations

Julia ships with built in routines for iterating through combinations of the elements of a collection If you’re curious, you can see how the combinations iterator is implemented in Julia’s source code . . Instead of recursively building combinations with the correct sum, we can just look at all combinations and keep only those that have the correct sum. The number of combinations of m digits between 1 and 9 is binomial(9, m) ,

julia> [(m, binomial(9, m)) for m in 0:9]
10-element Array{Tuple{Int64,Int64},1}:
 (0,1)
 (1,9)
 (2,36)
 (3,84)
 (4,126)
 (5,126)
 (6,84)
 (7,36)
 (8,9)
 (9,1)

so we’ll never need to look at more than 126 combinations for a given (sum, m) pair.

Here’s a solution that uses this strategy:

function decompose_com(desired_sum, m)
  solutions = Vector{Int}[]
  for c in combinations(1:9, m)
    if sum(c) == desired_sum
      push!(solutions, c)
    end
  end
  solutions
end

Short and simple. It could be a one liner if we used filter , but higher order functions aren’t as fast as they could be in Julia (yet).

Let’s benchmark this version:

function benchmark_com()
  nmax = 0
  solnmax = Vector{Int}[]
  for sum in 1:45, m in 1:9
    soln = decompose_com(sum, m)
    if length(soln) > nmax
      nmax = length(soln)
      solnmax = soln
    end
  end
  solnmax
end

I’m running the benchmark repeatedly below because some runs invoke the garbage collector, and other runs don’t.

julia> @time benchmark_com();
  0.042942 seconds (102.48 k allocations: 6.814 MB)
julia> @time benchmark_com();
  0.006712 seconds (70.39 k allocations: 5.357 MB, 45.27% gc time)
julia> @time benchmark_com();
  0.004499 seconds (70.39 k allocations: 5.357 MB)
julia> @time benchmark_com();
  0.005283 seconds (70.39 k allocations: 5.357 MB)
julia> @time benchmark_com();
  0.005495 seconds (70.39 k allocations: 5.357 MB)
julia> @time benchmark_com();
  0.005883 seconds (70.39 k allocations: 5.357 MB, 36.67% gc time)
julia> @time benchmark_com()
  0.002831 seconds (70.39 k allocations: 5.357 MB)
12-element Array{Array{Int,1},1}:
 [1,2,8,9]
 [1,3,7,9]
 [1,4,6,9]
 [1,4,7,8]
 [1,5,6,8]
 [2,3,6,9]
 [2,3,7,8]
 [2,4,5,9]
 [2,4,6,8]
 [2,5,6,7]
 [3,4,5,8]
 [3,4,6,7]

After the first couple runs, timing settles down to ~3 ms when the gc doesn’t run, and ~6 ms when it does. That’s a decent improvement over the recursive code, and as a bonus, it’s much simpler.

Even so, we’re doing a fair amount of repeated work in the benchmark since we form the same combinations of m digits over and over again for different target sums. Since there are so few total subsets of the digits between 1 and 9 (only 2 9 =512 of them), we could just precompute the sums of all of them, and store the relevant combinations in a look up table.

Step 3: A lookup table

My first thought for the lookup table was to use a Dict that maps (sum, m) pairs to combinations stored as a vector of digits. But the possible values of sum are the integers between 1 and 45, and the possible values of m are the integers between 1 and 9, so we might as well just use a 2D array instead of a Dict . This saves a bit of time spent hashing pairs of integers.

function buildlut()
  # Preallocate empty containers for each
  # (sum, m) pair.
  lut = [Vector{Int}[] for i in 1:45, j in 1:9]

  for m in 1:9, c in combinations(1:9, m)
    push!(lut[sum(c), m], c)
  end

  lut
end

# Now decompose is just a table lookup
decompose_lut(lut, i, j) = lut[i, j]

Here’s the corresponding benchmark

function benchmark_lut()
  lut = buildlut()
  nmax = 0
  solnmax = Vector{Int}[]
  for sum in 1:45, m in 1:9
    soln = decompose_lut(lut, sum, m)
    if length(soln) > nmax
      nmax = length(soln)
      solnmax = soln
    end
  end
  solnmax
end
julia> @time benchmark_lut(); @time benchmark_lut()
  0.048013 seconds (49.26 k allocations: 2.317 MB)
  0.000158 seconds (2.15 k allocations: 161.313 KB)
12-element Array{Array{Int,1},1}:
 [1,2,8,9]
 [1,3,7,9]
 [1,4,6,9]
 [1,4,7,8]
 [1,5,6,8]
 [2,3,6,9]
 [2,3,7,8]
 [2,4,5,9]
 [2,4,6,8]
 [2,5,6,7]
 [3,4,5,8]
 [3,4,6,7]

So with the look up table strategy, we’re down to ~150 microseconds.

Note that this benchmark includes the time it takes to build the look up table. A real Kakuro solving program could just build the table at program startup time, and then it wouldn’t be included in the time that it took to solve any particular puzzle.

With that in mind, let’s build the table ahead of time instead of as part of the benchmark.

const lut = buildlut()

function benchmark_lut_precomputed()
  nmax = 0
  solnmax = Vector{Int}[]
  for sum in 1:45, m in 1:9
    soln = decompose_lut(lut, sum, m)
    if length(soln) > nmax
      nmax = length(soln)
      solnmax = soln
    end
  end
  solnmax
end

Here are the results:

julia> @time benchmark_lut_precomputed();
  0.006212 seconds (4.04 k allocations: 197.841 KB)
julia> @time benchmark_lut_precomputed()
  0.000007 seconds (5 allocations: 224 bytes)
12-element Array{Array{Int,1},1}:
 [1,2,8,9]
 [1,3,7,9]
 [1,4,6,9]
 [1,4,7,8]
 [1,5,6,8]
 [2,3,6,9]
 [2,3,7,8]
 [2,4,5,9]
 [2,4,6,8]
 [2,5,6,7]
 [3,4,5,8]
 [3,4,6,7]

So we’re now down to ~7μs. That’s probably close to the resolution of the timer used by @time , so you can’t really trust times that small. Let’s add an outer loop to the benchmark to bring the time back up to measurable levels.

function benchmark_lut_precomputed_x_1e6()
  nmax = 0
  solnmax = Vector{Int}[]
  for i in 1:1_000_000, sum in 1:45, m in 1:9
    soln = decompose_lut(lut, sum, m)
    if length(soln) > nmax
      nmax = length(soln)
      solnmax = soln
    end
  end
  solnmax
end

The results:

julia> @time benchmark_lut_precomputed_x_1e6();
  0.562733 seconds (6.12 k allocations: 292.998 KB)
julia> @time benchmark_lut_precomputed_x_1e6()
  0.546012 seconds (5 allocations: 224 bytes)
12-element Array{Array{Int,1},1}:
 [1,2,8,9]
 [1,3,7,9]
 [1,4,6,9]
 [1,4,7,8]
 [1,5,6,8]
 [2,3,6,9]
 [2,3,7,8]
 [2,4,5,9]
 [2,4,6,8]
 [2,5,6,7]
 [3,4,5,8]
 [3,4,6,7]

Dividing by our factor of a million, this shows that the benchmark runtime has been reduced to ~550ns. The original Python and lisp programs from the mailing list ran in ~1s.

I can’t think of very many other times that I have been able to optimize a program by a factor of a million.

Depending on your sense of fairness, you might not like that I’ve lifted the computation of the table out of the benchmark (even though you can solve any number of puzzles with the same table). But remember, even computing the table took less than 200μs.

Even so, our representation of solution sets is not as efficient as it could be. As I mentioned before, we could represent the digits with an Int8 instead of an Int . But we can actually do much better than that.

Step 4: Bitmasks

We know that there are only 512 subsets of digits that we want to represent, and our lookup table actually contains each of these subsets. Instead of using a vector of digits, we could use a simple binary number as a bitmask, so that a 1 represents the presence of a certain digit in a subset, and a 0 represents the absence of that digit Julia also has a built in BitArray type, so we could use a BitArray{9} to store our bitmask even more compactly, but there are advantages to staying aligned to bytes, so I’ll proceed with an Int16 mask. . An Int16 has 16 binary digits, and we only need 9, so we can pack a bitmask representing any set of digits between 1 and 9 into an Int16 with room to spare.

As an example, the number 23 is written as 10111 in binary, so you can also think of it as representing the set of digits {1,2,3,5}.

julia> bits(Int16(23))
"0000000000010111"

All we really need to switch our table over to this new representation is a way to convert an Int16 bitmask into a set of digits. As an optimization, I’ll also implement routines that count the digits and sum them without producing an intermediate array.

function bitmaskdigits(x::Integer)
  digits = Int[]
  n = 0
  while x > 0
    shift = 1 + trailing_zeros(x)
    n += shift
    x >>= shift
    push!(digits, n)
  end
  digits
end

function bitmasksum(x::Integer)
  sum = 0
  n = 0
  while x > 0
    shift = 1 + trailing_zeros(x)
    n += shift
    x >>= shift
    sum += n
  end
  sum
end

bitmaskcount(x::Integer) = count_ones(x)

With these operations defined, we can implement a very similar version of table-based decompose as before:

function buildlut_int()
  lut = [Int16[] for i in 1:45, j in 1:9]

  # Only 511 bitmasks because we're skipping the
  # empty set
  for i in 1:511
    push!(lut[bitmasksum(i), bitmaskcount(i)], i)
  end

  lut
end

Let’s see how long it takes to build the look up table now:

julia> @time buildlut_int(); @time buildlut_int();
  0.022075 seconds (20.20 k allocations: 971.572 KB)
  0.000076 seconds (597 allocations: 32.766 KB)

With this new representation, we’re down to only 80μs to build the look up table (twice as fast as before), and it also takes up less memory. We can also expect set operations like intersection and union to be faster on these bitmasks than they would be on a Vector{Int} .

We should probably expect the benchmark of finding the largest number of solutions to be the same as the previous look up table solution, since we’ve optimized it down to in-order array access in both cases, but let’s check anyway:

decompose_lut_int(lut, i, j) = lut[i, j]

const lut_int = buildlut_int()

function benchmark_lut_int_precomputed_x_1e6()
  nmax = 0
  solnmax = Vector{Int}[]
  for i in 1:1_000_000, sum in 1:45, m in 1:9
    soln = decompose_lut_int(lut_int, sum, m)
    if length(soln) > nmax
      nmax = length(soln)
      solnmax = soln
    end
  end
  solnmax
end
julia> @time benchmark_lut_int_precomputed_x_1e6();
  0.561131 seconds (8.69 k allocations: 428.280 KB)
julia> @time benchmark_lut_int_precomputed_x_1e6()
  0.567186 seconds (5 allocations: 224 bytes)
12-element Array{Int16,1}:
 108
 114
 156
 170
 177
 198
 201
 282
 294
 297
 325
 387

As expected, the time to complete the benchmark is again ~550ns.

If we want to see a more friendly representation of the solutions, we can map bitmaskdigits over them:

julia> map(
    bitmaskdigits,
    benchmark_lut_int_precomputed_x_1e6()
  )
12-element Array{Array{Int,1},1}:
 [3,4,6,7]
 [2,5,6,7]
 [3,4,5,8]
 [2,4,6,8]
 [1,5,6,8]
 [2,3,7,8]
 [1,4,7,8]
 [2,4,5,9]
 [2,3,6,9]
 [1,4,6,9]
 [1,3,7,9]
 [1,2,8,9]

Unfortunately, this new representation has come at a cost in clarity and safety. It’s annoying to have to call bitmaskdigits to see a human-friendly display, and changing the default display of Int16 is out of the question. It would also be easy to perform inappropriate operations on these masks, like addition, negation, or multiplication, since all those operations are of course defined on integers.

Julia is a strongly typed language, but by mapping our data onto integers, we have been using it in a weakly typed way. We can make better use of Julia’s type system to win back and actually improve clarity and safety with no loss of performance.

Step 5: DigitSets

If we wrap our bitmask digit sets in an immutable type , we can then restrict which operations can be performed on them, and even better, we can overload existing operations (e.g. sum , length , union , etc.) to perform differently on them than they do on integers.

immutable DigitSet
  d::Int16
end

With this definition, you can construct a DigitSet to wrap an Int16 like this:

julia> DigitSet(1)
DigitSet(1)

The first operation we’ll define is a way to get the individual digits out of a DigitSet . The logic we need is already contained in the bitmaskdigits function from before. However, rather than just returning a vector, we can do something more flexible by hooking into Julia’s iteration protocol . To do that, we just need to define start , next , and done on DigitSet . We’ll also override length , in , and isempty , although these are not strictly necessary for the iteration protocol.

# Allow iterating over the members of a digit set
Base.start(ds::DigitSet) = (ds.d, 0)
function Base.next(ds::DigitSet, state)
  (d, n) = state
  shift = 1 + trailing_zeros(d)
  n += shift
  d >>= shift
  return (n, (d, n))
end
Base.done(ds::DigitSet, state) = state[1] <= 0
Base.length(ds::DigitSet) = count_ones(ds.d)
Base.in(n, ds::DigitSet) = (ds.d & (1 << (n - 1))) != 0
Base.isempty(ds::DigitSet) = ds.d == 0

Then we can collect digits into whatever kind of container we want, or iterate over them in a streaming fashion. For example, here we collect a DigitSet into a Vector{Int8} :

julia> collect(Int8, DigitSet(100))
3-element Array{Int8,1}:
 3
 6
 7

Now let’s define a way to construct digit sets from an array of digits, and a nicer way to display them:

function DigitSet(a::AbstractArray)
  d = Int16(0)
  # For each digit in a, set the corresponding
  # bit in d to 1.
  for n in a
    d |= 1 << (n - 1)
  end
  DigitSet(d)
end

function Base.show(io::IO, ds::DigitSet)
  print(io, "DigitSet")
  print(io, "([")
  print_joined(io, ds, ",")
  print(io, "])")
end

It’s nice (but not strictly required) to define this constructor and this way of displaying a DigitSet together, so that the result that is displayed can also be parsed back in as a DigitSet .

As an aside, I wish I could define this constructor to work for any kind of iterable object instead of just for arrays, but there are two reasons that this is not possible:

  1. Julia doesn’t (yet) have a way of dispatching on interfaces like “iterable”; instead, you can only dispatch on type inheritance relationships. There is no Iterable supertype that all iterable types inherit from, and there probably shouldn’t be.
  2. Integers are iterable in Julia, so this would conflict with the raw DigitSet(d::Int16) constructor.

It sounds like both of these issues will probably be addressed at some point, but interface dispatch is still at the design stage (see SimpleTraits for a promising start, though).

Anyway, let’s implement a few set operations on DigitSet :

# Set operations
Base.union(a::DigitSet, b::DigitSet) =
  DigitSet(a.d | b.d)
Base.intersect(a::DigitSet, b::DigitSet) =
  DigitSet(a.d & b.d)
Base.setdiff(a::DigitSet, b::DigitSet) =
  DigitSet(a.d & (~b.d))
Base.symdiff(a::DigitSet, b::DigitSet) =
  DigitSet(a.d $ b.d)

Couldn’t be easier!

Let’s see where we’ve gotten ourselves:

julia> a = DigitSet([1,2,7])
DigitSet([1,2,7])
julia> b = DigitSet([2, 5])
DigitSet([2,5])
julia> union(a, b)
DigitSet([1,2,5,7])
julia> intersect(a, b)
DigitSet([2])
julia> symdiff(a, b)
DigitSet([1,5,7])
julia> 7 in a
true
julia> 7 in b
false

Because we hooked into the iteration protocol, we also get implementations of sum , minimum , maximum , etc. for free:

julia> sum(a)
10
julia> minimum(a)
1
julia> maximum(a)
7

And for those of us who have worked with other people’s code (or our own code, 3 months later) often enough to appreciate having boundaries:

julia> a+b
ERROR: MethodError: `+` has no method matching +(::DigitSet, ::DigitSet)
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...)
julia> -a
ERROR: MethodError: `-` has no method matching -(::DigitSet)

This is all looking pretty convenient, but at what cost? At no cost, of course!

function buildlut_ds()
  lut = [DigitSet[] for i in 1:45, j in 1:9]

  for i in 1:511
    ds = DigitSet(i)
    push!(lut[sum(ds), length(ds)], ds)
  end

  lut
end

The time to build the look up table stays about the same at ~80μs:

julia> @time buildlut_ds(); @time buildlut_ds();
  0.015037 seconds (15.36 k allocations: 620.064 KB)
  0.000077 seconds (1.11 k allocations: 40.750 KB)

The time to run the benchmark once the lookup table is built is also again ~550ns.

decompose_lut_ds(lut, i, j) = lut[i, j]

const lut_ds = buildlut_ds()

function benchmark_lut_ds_precomputed_x_1e6()
  nmax = 0
  solnmax = Vector{DigitSet}[]
  for i in 1:1_000_000, sum in 1:45, m in 1:9
    soln = decompose_lut_ds(lut_ds, sum, m)
    if length(soln) > nmax
      nmax = length(soln)
      solnmax = soln
    end
  end
  solnmax
end
julia> @time benchmark_lut_ds_precomputed_x_1e6();
  0.585304 seconds (9.46 k allocations: 465.916 KB)
julia> @time benchmark_lut_ds_precomputed_x_1e6()
  0.558804 seconds (5 allocations: 224 bytes)
12-element Array{DigitSet,1}:
 DigitSet([3,4,6,7])
 DigitSet([2,5,6,7])
 DigitSet([3,4,5,8])
 DigitSet([2,4,6,8])
 DigitSet([1,5,6,8])
 DigitSet([2,3,7,8])
 DigitSet([1,4,7,8])
 DigitSet([2,4,5,9])
 DigitSet([2,3,6,9])
 DigitSet([1,4,6,9])
 DigitSet([1,3,7,9])
 DigitSet([1,2,8,9])

In conclusion

I think this progression of implementations is a nice illustration of some of Julia’s strengths.

When you’re prototyping solutions to a problem, you can write high level code without thinking much about types or how things are laid out in memory. Julia code written in this way feels similar to Python. But once you have a correct solution, you sometimes want to make it fast.

In other high level languages that are pleasant for prototyping, the general strategy for making important parts of your programs fast is to rewrite them in a lower level language like C. This strategy works for some people, but it involves a very sharp change in the slope of the learning curve. Many smart programmers, and certainly most scientists, never make it to the “rewrite it in C” part of the process, although they do benefit from using libraries written by those sage few who do.

Julia lets you transition more gradually from high level thinking about your problem to low level thinking about how computers work. And critically, you don’t have to learn a new syntax, new build systems, or arcane language bindings for communicating between the high level and low level parts of your code.

Then, once you have a fast, low level implementation, Julia’s type system lets you structure and clarify it by assigning additional meaning to collections of bits, without losing performance. Even though a DigitSet “is” just an Int16 (and performs like one), we can make it print however we want and restrict or rename operations on it however we want. This process is sometimes called “zero cost abstraction.” It’s highly valued in languages like C++ that are used to implement games and other performance intensive, polished, professional software. But C++ is a very complex language that is infamously difficult to learn, and it isn’t pleasant for prototyping.

One thing that Julia generally won’t do is save you completely from having to know some details about how computers represent data. At least not if you want to write code that performs up to your hardware’s potential. Newcomers are occasionally disappointed that direct translations of their code from Matlab or Python into Julia don’t always run much faster than they did in the original languages (they may even run slower, though usually not by a lot). Julia isn’t the mythical “sufficiently smart compiler” that turns a high level specification of any problem into an efficiently implemented solution. Instead, it’s a single language that lets you decide whether you want to think at a high level or a low level, and it gives you a smooth path between these two.

Thanks to Patrick Useldinger for presenting this problem and the other contributers on julia-users for an interesting conversation. It was fun for me to think about.

Appendix: Version Info

julia> versioninfo()
Julia Version 0.4.0
Commit 0ff703b* (2015-10-08 06:20 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

Bisecting Floating Point Numbers

Published: Sat, 22 Feb 2014 00:00:00 -0800
Updated: Sat, 22 Feb 2014 00:00:00 -0800
UTC: 2014-02-22 08:00:00+00:00
URL: http://www.shapeoperator.com/2014/02/22/bisecting-floats/

Bisection is about the simplest algorithm there is for isolating a root of a continuous function:
Content Preview

Bisection is about the simplest algorithm there is for isolating a root of a continuous function:

  1. Start with an interval such that the function takes on oppositely signed values on the endpoints.
  2. Split the interval at its midpoint.
  3. Recurse into whichever half has endpoints on which the function takes on oppositely signed values.

After each step, the new interval is half as large as the previous interval and still contains at least one zero (by the Intermediate Value Theorem ).

I want to highlight a couple of interesting issues that arise when implementing bisection in floating point arithmetic that you might miss if you just looked at the definition of the algorithm.

In Julia code Julia treats floating point arithmetic the same way all modern programming environments do: according to the IEEE 754 standard. The examples here are in Julia because I plan to talk more about the language in the future, but everything in this post could as easily be written in any other language. , a single step of bisection looks like this:

function bisect_step(fn, x1, x2)
  xm = (x1 + x2)/2

  # Return the sub-interval with
  # oppositely-signed endpoints
  if sign(fn(x1)) != sign(fn(xm))
    return x1, xm
  else
    return xm, x2
  end
end

For example,

julia> x1, x2 = bisect_step(sin, 2.0, 4.0)
(3.0,4.0)
julia> x1, x2 = bisect_step(sin, x1, x2)
(3.0,3.5)
julia> x1, x2 = bisect_step(sin, x1, x2)
(3.0,3.25)
julia> x1, x2 = bisect_step(sin, x1, x2)
(3.125,3.25)
julia> x1, x2 = bisect_step(sin, x1, x2)
(3.125,3.1875)

The first interesting question when implementing bisection is when should I stop bisecting? In pure mathematics, you can think of carrying the process on indefinitely, but a computer program should halt.

Here’s a little puzzle. I claim that one of the following functions always halts, and the other can loop forever. The functions differ only in their stopping criteria. Which one is which?

function bisect1(fn, x1, x2)
  @assert sign(fn(x1)) != sign(fn(x2))
  tol = 1e-13
  # Stop when function values are below
  # a set tolerance
  while abs(fn(x1)) > tol || abs(fn(x2)) > tol
    x1, x2 = bisect_step(x1, x2, fn)
  end
  return x1, x2
end

function bisect2(fn, x1, x2)
  @assert sign(fn(x1)) != sign(fn(x2))
  # Stop when the mean of the endpoints
  # is equal to one of the endpoints
  while x1 < (x1 + x2)/2 < x2
    x1, x2 = bisect_step(x1, x2, fn)
  end
  return x1, x2
end

Let’s try them out:

julia> bisect1(x -> 1000*sin(x), 2.0, 4.0)
# loops forever...
julia> bisect2(x -> 1000*sin(x), 2.0, 4.0)
(3.141592653589793,3.1415926535897936)

This is the opposite of what would have happened if we ran these algorithms using mathematical real numbers instead of computer floating point numbers.

Over the reals, the first algorithm terminates for continuous functions by the definition of continuity , and the second algorithm doesn’t terminate because for any two non-equal real numbers x_1 < x_2 it’s always true that x_1 < (x_1 + x_2)/2 < x_2 .

Over floating point doubles, the first algorithm doesn’t terminate because there is no floating point number 2.0 < x < 4.0 such that 1000\sin(x) < 10^{-13} , and the second algorithm does terminate because for any finite floating point number x_1 (except the largest finite float), there exists a strictly larger floating point number x_2 such that (x_1 + x_2)/2 = x_1 or (x_1 + x_2)/2 = x_2 .

Both of these results might be surprising if you aren’t familiar with the details of floating point numbers. They arise due to the granularity of floats. There is a finite gap between any float and the next largest float. The size of the gap depends (proportionally) on the size of the number. In Julia, you can find the size of the gap using eps :

julia> eps(1.0)
2.220446049250313e-16

If x_1 is a float and x_2 is the next largest float, it is always true that their mean is either x_1 or x_2 , because there are no other values between them.

For example,

julia> let x=3.0, y=x+eps(x)
         (x + y)/2
       end
3.0

The floating point representation of \pi is

julia> fpi = convert(Float64, pi)
3.141592653589793

Its sine is positive:

julia> sin(fpi)
1.2246467991473532e-16

The next largest floating point number is

julia> fpi_plus = fpi + eps(fpi)
3.1415926535897936

and its sine is negative:

julia> fpi_plus = fpi + eps(fpi)
-3.216245299353273e-16

Neither of these outputs is within 10^{-16} of 0.0, which is why bisect1 fails to terminate. On the other hand, bisect2 managed to find exactly these inputs as lower and upper bounds of a root of sin . It didn’t need any explicit tolerances at all.

In this sense, bisect2 is an exemplary floating point algorithm. The answer to our present question, when should I stop bisecting? , is when there are no more floating point numbers between your lower and upper bound , whenever this is practical. Checking whether the mean of the endpoints is equal to one of the endpoints is a convenient way to check this condition.

Choosing a different arbitrary tolerance in a general purpose floating point algorithm is impolite. Absolute tolerances like the 1e-13 in bisect1 are inappropriate in general purpose algorithms because floating point numbers don’t come with units attached, so an algorithm with an absolute tolerance will behave differently depending on whether your user measures, e.g. , lengths in millimeters or meters. Relative tolerances are better but fail when the answer is supposed to be 0.0. The spacing between floating point numbers cleanly elides these two limits, being relative for finite numbers, and finite for 0.0.

If you write your algorithms to compute to full precision, you save your users from having to think about tolerance conventions specific to your library. It can be tempting to think of floating point numbers as broken real numbers, but it’s much more productive to think of floating point numbers as a carefully thought out set of conventions for rounding the output of one algorithm to be appropriate as an input to another algorithm. Floating point numbers help with the hard work of making our programs composable .

Now that we know when to stop bisecting, the next interesting question is how many iterations will it take to bisect a root to full precision? As we’ve just discussed, floating point numbers are a finite set. There are 2^{64} floating point doubles (actually somewhat less because a whole group of them are NaN ). Each step of bisection should halve the number of floats contained in the interval. This means it should always take less than 64 steps to reach full precision.

Let’s see some examples:

function count_bisect2_steps(fn, x1, x2)
  i=0
  while x1 < (x1 + x2)/2 < x2
    x1, x2 = bisect_step(fn, x1, x2)
    i += 1
  end
  return i
end
# Isolate root at pi
julia> count_bisect2_steps(sin, 3.0, 4.0)
51 # good
# Isolate root at 0.0
julia> count_bisect2_steps(sin, -1.0, 1.0)
1075 # bad

What happened?

Earlier, I said “each step of bisection should halve the number of floats contained in the interval,” but as written, bisect_step doesn’t actually do this. The problem is that floats aren’t evenly distributed. They are much denser near 0.0 than far from it. This means that bisecting toward a root at 0.0 using the arithmetic mean eliminates fewer than half of the floats in the interval at each step.

Instead of bisecting the values of floating point numbers, what we really want to do is bisect a function that counts them. That would make it easy to eliminate exactly half of them at each step. Conveniently, the underlying binary representation of floating point numbers is exactly a function that counts them. If we reinterpret the binary representation of a float as an integer, we can find the mean of the two integers that represent the endpoints, instead of the mean of the values of the two endpoints. Here’s a function that does just that:

function _middle(x1::Float64, x2::Float64)
  # Use the usual float rules for combining
  # non-finite numbers
  if !isfinite(x1) || !isfinite(x2)
    return x1 + x2
  end

  # Always return 0.0 when inputs have opposite sign
  if sign(x1) != sign(x2) && x1 != 0.0 && x2 != 0.0
    return 0.0
  end

  negate = x1 < 0.0 || x2 < 0.0

  x1_int = reinterpret(UInt64, abs(x1))
  x2_int = reinterpret(UInt64, abs(x2))
  unsigned = reinterpret(Float64, (x1_int + x2_int) >> 1)

  return negate ? -unsigned : unsigned
end

There are some minor complications for negative numbers. The wikipedia page on the floating point standard has more information on exactly how the binary encoding of floats works. I also recommend “What every computer scientist should know about floating point numbers” . In _middle , I avoid these complications by returning 0.0 for intervals with one negative endpoint and one positive endpoint, and by casting negative intervals to positive intervals and back. The >> 1 is right shift, a fast way to divide integers by 2.

Using this function, we can define our final version of bisect_root :

function bisect_root(fn, x1, x2)
  xm = _middle(x1, x2)
  s1 = sign(fn(x1))
  s2 = sign(fn(x2))

  while x1 < xm < x2
    sm = sign(fn(xm))

    if s1 != sm
      x2 = xm
      s2 = sm
    else
      x1 = xm
      s1 = sm
    end

    xm = _middle(x1, x2)
  end

  return x1, x2
end

This version also inlines bisect_step in a way that avoids unnecessary reevaluations of the function. It has the nice property that it will always converge to full precision in 65 function evaluations or fewer (two initial evaluations, and one more per step for up to 63 additional steps).

There are root bracketing algorithms that converge to full precision in fewer function evaluations for smooth functions (e.g. Brent’s method ), but well-implemented bisection has the advantages of simplicity (fewer places for bugs to hide), and very low overhead, because there is so little arithmetic aside from the objective function evaluations.

I recently worked with John Verzani to get code very much like this final version of bisect_root into the Roots.jl package. As of this writing, the pull request is currently awaiting review.