So now everybody knows why I was posting more and more rarely. I recently posted a proof of the ternary Goldbach conjecture: it was split into two articles – a new one on major arcs, and a new and improved version of the article on minor arcs I wrote last year.
Of course, I’m expected to write a summary of the main ideas here – not just emphasizing the new points, as I do when giving a talk to an audience of specialists, but giving an overall picture of the proof and all its parts, old and new. Let me do that.
I know that the audience here is very mixed – if you don’t have the background to follow a paragraph, just keep on reading.
History
Leonhard Euler – one of the greatest mathematicians of the eighteenth century, and indeed of all time – and his close friend, the amateur and polymath Christian Goldbach, kept a regular and copious correspondence. Goldbach made a guess about prime numbers, and Euler quickly reduced it to the following conjecture, which, he said, Goldbach had already stated to him: every positive integer can be written as the sum of at most three prime numbers.
We would now say “every positive integer greater than 5”, since we no longer think of 1 as a prime number. Moreover, the conjecture is nowadays split into two: the weak, or ternary, Goldbach conjecture states that every odd integer greater than 5 can be written as the sum of three primes; the strong, or binary, Goldbach conjecture states that every even integer greater than 2 can be written as the sum of two primes. As their names indicate, the strong conjecture implies the weak one (easily: subtract 3 from your odd number , then express
as the sum of two primes).
See Dickson, History of the theory of numbers, Vol I., Ch. XVIII, for the early history of the subject. In summary — Waring seems to have come up with the weak conjecture again in the late eighteenth century; the nineteenth century saw some computational work (checking the conjecture for small integers – by hand!) but little real progress.
The strong conjecture remains out of reach. A few weeks ago — my preprint appeared on May 13, 2013 — I proved the weak Goldbach conjecture.
The proof builds on the progress made in the early 20th century by Hardy, Littlewood and Vinogradov. Vinogradov proved (1937) that the conjecture is true for all odd numbers larger than some constant . (Hardy and Littlewood had shown the same under the assumption of the Generalized Riemann Hypothesis; more on that later.) Since Vinogradov’s day, the constant
has been specified and gradually improved, but the best (i.e., smallest) available value for
was
(Liu-Wang), which was much too large. Even
would be too large: since
is larger than the estimated number of subatomic particles in the universe times the number of seconds since the Big Bang, there wouldn’t be any hope of checking every case of
by computer (even if the computer were really highly parallel, and you had really high cosmic priority)!
I brought down to
(and could bring it farther down if needed). D. Platt and I had checked the conjecture for all odd numbers up to
by computer (and could have gone farther), so that was the end of the story.
What goes into the proof? Let us first step back and look at the general framework of the circle method, introduced by Hardy and Littlewood.
The circle method: Fourier analysis on the integers
Fourier analysis is something we do every time we tune to a radio station: there is a signal, and we decompose it into the contributions from different frequencies. In mathematical terms – we are given a function (i.e., a function on a single real variable; in the case of a radio, the variable is time) and we define the Fourier transform
by
, where we write
for
. Then, as we learn in any Fourier analysis course,
, provided that
decays rapidly enough and is otherwise well-behaved. (This is the “Fourier inversion formula”.)
In other words, has been decomposed as a sum of (complex) exponential functions, with the (complex) exponential function
present with “strength”
. (This is equivalent to a decomposition into “sine waves”
and
, since
.) To go back to the example of a radio:
is large when
is close to the frequency of some radio station, and small otherwise. (What your radio receives is a superposition
of what all stations transmit; your radio receiver’s business is precisely to figure out the contribution of frequencies
around a given
.)
We can do the same if is a function from the integers
to
. In fact, things are now simpler — we get to define
by a sum rather than an integral:
. A funny thing here is that
doesn’t change at all if we add
, or any other integer
, to
. This is so because, for
an integer,
(Thanks again, Euler.) Thus, we may restrict to the interval
— or, more abstractly, we can think of
as living in the quotient
. Topologically,
is a circle; this is just the same as saying that, since it doesn’t matter whether we add or substract
to our frequency, we might as well have the little frequency marker in our radio go around a circle marked with numbers from
up to
, rather than have it slide back and forth (a segment of) the real line (as in the actual radio on my table). This is where the phrase circle method comes from.
The decomposition of now looks as follows:
provided that
decays rapidly enough.
Why do we care? The Fourier transform is immediately useful if we are dealing with additive problems, such as the Goldbach conjectures. The reason behind this is that the transform of a convolution equals a product of transforms: . Recall that the (additive) convolution of
is defined by
.
We can see right away from this that can be non-zero only if
can be written as
for some
,
such that
and
are non-zero. Similarly,
can be non-zero only if
can be written as
for some
such that
,
and
are all non-zero. This suggests that, to study the ternary Goldbach problem, we define
,
,
so that they take non-zero values only at the primes.
Hardy and Littlewood defined for
composite (or
zero or
negative), and
for
prime (where
is a parameter to be fixed later). Here the factor
is there to provide “fast decay”, so that everything converges; as we will see later, Hardy and Littlewood’s choice of
(rather than some other function of fast decay) is actually very clever, though not quite the best possible. The term
is there for technical reasons (basically, it turns out that it makes sense to weigh a prime
by
because roughly one out of every
integers of size about
is a prime).
We see that if and only if
can be written as the sum of three primes. Our task is then to show that
(i.e.,
) is non-zero for every
larger than a constant. Since the transform of a convolution equals a product of transforms,
Our task is thus to show that the integral
is non-zero.
As it happens, is particularly large when
is close to a rational with small denominator; it is as if there were really radio stations transmitting at the (small-denominator) frequencies marked in the “tuning” drawing above – when the dial is close to one of them, there is a loud, clear
, and when we are away from all of them, we can hear only a low hum. This suggests the following strategy: estimate
for all
within small arcs around the rationals with small denominators (the major arcs — so called because they make a major contribution, in spite of being small); bound
for
outside the major arcs (everything outside the major arcs is called minor arcs); then show that the contribution of the minor arcs to the integral is smaller in absolute value than the contribution of the major arcs, thereby forcing the integral
to be non-zero.
It is this general strategy that gets called the circle method. Hardy and Littlewood introduced it to deal with a wide variety of additive problems; for example, this was also part of their approach to Waring’s problem (yes, same Waring), on integers that are sums of th powers of integers. This was taken over by Vinogradov, who was the first to give good, unconditional bounds on
for
in the minor arcs (something considered very remarkable at the time). The circle method is also my general strategy: what I have done is to give much better estimates on the major and minor arcs than were previously available, for
,
and
chosen with great care.
(Incidentally: while we can start to treat the binary, or strong, Goldbach conjecture with the circle method, we soon hit a wall: the “noise” from the minor arcs overwhelms the contribution from the major arcs. This is well explained in this post by Tao.)
Dirichlet L-functions and their zeros
Before we can start working on the major arcs, we need to discuss L-functions. First, there is the zeta function , first studied for
complex by Riemann, after whom it is now named. This is given by
when the real part
of
is greater than
. For
, the series diverges, but the function can be defined (uniquely) by analytic continuation (and this can be done explicitly by, e.g., Euler-Maclaurin, as in Davenport, Multiplicative Number Theory, 2nd. ed, p. 32), with a pole at
.
Analogously, there are Dirichlet L-functions, defined by for
, and by analytic continuation for
. Here
is any Dirichlet character; for every
given ,
is a function of
. A Dirichlet character
(of modulus
) is just a function
of period
(i.e.
for all
), with the additional properties that it be multiplicative (
for all
,
) and that
whenever
and
are not coprime. (The sophisticated way to put it is that it is a character of
lifted to
.) Dirichlet characters and Dirichlet L-functions were introduced by, um, Dirichlet, in order to study primes in arithmetic progressions.
A zero of a function is just an
such that
. A non-trivial zero of
, or of
, is a zero of
, or of
, such that
. (The other zeros are called trivial because it is easy to tell where they are (namely, at negative integers and sometimes at
).) The Riemann hypothesis states that all non-trivial zeros of the Riemann zeta function “lie in the critical line”, meaning that they satisfy
. The Generalized Riemann hypothesis for Dirichlet L-functions states that, for every Dirichlet character
, every non-trivial zero of
satisfies
.
Since both the Riemann Hypothesis (RH) and the Generalized Riemann Hypothesis (GRH) remain unproven, any result proven using them will be conditional; we want to prove unconditional results. What can indeed be proven, and used, are partial results in the direction of GRH. Such results are of two kinds:
– Zero-free regions. Ever since the late nineteenth century (de la Vallée-Poussin) we have known that there are hour-glass shaped regions (more precisely, of the shape , where
is a constant and where we write
) outside which non-trivial zeroes cannot lie;
– Finite verifications of GRH. It is possible to (ask the computer to) prove small, finite chunks of GRH, in the sense of verifying that all non-trivial zeros of a given function with imaginary part less than some constant
lie on the critical line
.
Most work up to date follows the first alternative. I chose the latter, and this had consequences on the precise way in which I defined the major and minor arcs: I got very precise results on the major arcs, but I had to define them to be few and very narrow,
or else the method wouldn’t work. This meant that the minor arc methods had to be particularly potent, since a particularly large part of the circle was left for them to deal with.
Let us look more closely at how one can deal with major arcs using partial results towards GRH, and, in particular, finite verifications of GRH.
Major-arc estimates
Recall that we want to estimate sums of the type , where
is something like
(say) for
equal to a prime, and
otherwise. Let us modify this just a little – we will actually estimate
, where
is the von Mangoldt function:
if
is either a prime power
,
, and
otherwise. (The use of
rather than
is just a bow to tradition, as is the use of the letter
(for “sum”); however, the use of
rather than just plain
does actually simplify matters when you deal with so-called explicit formulas, which we will see in a minute.) Here
is some function of fast decay; it can be
, as in Hardy and Littlewood’s work, or (as in my work) something else. (It could even be just the “brutal truncation”
, defined to be
when
and
otherwise; that would be fine for the minor arcs, but, as we will see, it is a bad idea as far as the major arcs are concerned.)
Assume is on a major arc, meaning that we can write
for some
(
small) and some
(with
small). We can express
as a linear combination (that is, a sum of multiples) of terms of the form
, where
and
runs over Dirichlet characters of modulus
.
Why are these sums nicer than the plainer sums
? The argument has become
, whereas before it was
. Here
is small — smaller than a constant, in our setup. In other words,
will go around the circle a bounded number of times as
goes from
up to a constant times
(by which time
has become small). This makes the sum much easier to estimate.
It is a standard fact that we can express by an explicit formula (yes, the phrase has a technical meaning, just like Jugendtraum):
Here the term between brackets appears only for . In the sum,
goes over all non-trivial zeros of
, and
is the Mellin transform of
. We win if we manage to show that the sum over
is small.
The point is this – if we check GRH up to imaginary part , then we know that all
with
satisfy
, and thus
. In other words,
is then very small (compared to
). However, for any
whose imaginary part has absolute value greater than
, we know nothing about its real part, other than
. (All right, we could use a zero-free region, but known zero free regions are notoriously weak for
large – meaning they tell us little in practice.) Hence, our only chance is to make sure that
is small when
.
This, mind you, has to be true both for tiny (including:
) and for
not so tiny (
between
and a constant). If you toy around with the method of stationary phase, we get that
behaves like
for
tiny (here
is the Mellin transform of
) and like
for
not so tiny (where
). Thus, we are in a classical dilemma, often called the uncertainty principle because it is the mathematical fact underlying the physical principle of the same name: you cannot have a function
that decreases extremely rapidly and whose Fourier transform (or, in this case, its Mellin transform) also decays extremely rapidly.
What does “extremely rapidly” mean? It means “faster than any exponential “. Thus, Hardy and Littlewood’s choice
seems essentially optimal.
Not so fast! What we can do is choose so that
decreases exponentially (with a constant
a bit worse than before), but
decreases faster than exponentially. This is a rather good idea, as it is
(and not so much
) that risks being fairly small.
One choice obeying this description is the Gaussian
. The Mellin transform
turns out to be a parabolic cylinder function, with imaginary values for one of the parameters. Parabolic cylinder functions seem to be much-loved and much-studied by applied people — but mostly for real values of the said parameter. There are some asymptotic expansions of
in the literature for general parameters (notably by F. W. J. Olver), but none that were explicit enough for my purposes. Thus, I had to go and provide fully explicit estimates myself, using the saddle-point method. This took me a while, but the results should be of general applicability – hi there, engineers – and the Gaussian smoothing will hopefully become a little more popular in explicit work in number theory.
Ah, by the way, these estimates on parabolic cylinder functions allow us to take not just , but also, more generally,
, here
is any band-limited function, meaning, in this context, any function whose Mellin transform restricted to the
axis has compact support. We will want to optimize the choice of
— more on that later.
The minor arcs
How do you bound when
is not close to any rational
of small denominator? That this is at all possible was Vinogradov’s great achievement. Progress since then has been gradual. My own take on things is the subject of my minor arcs paper. Let me just discuss a few of the ideas behind my improvements.
Vinogradov’s proof was greatly simplified in the 70s by Vaughan, who introduced the identity that now bears his name. Basically, Vaughan’s identity is a gambit: it gives you a great deal of flexibility, but at a cost — here, a cost of two logs, rather than, say, two pawns. The problem is that, if we are to reach our goal, we cannot afford to waste logs. The only way is to recover these logs, finding cancellation in the different sums that arise from Vaughan’s identity. This I had to do, mind you, without using -functions, since I could no longer assume that
was small.
Here is another theme of this part of the proof. Every has an approximation
the fact that
is in the minor arcs just tells us that
is not small. In fact, we are looking for bounds that decrease with
; the bound I obtain is proportional to
. What is the effect of
?
Something I realized early on was that, if is not tiny, it can actually be used to our advantage. One reason is that there are terms of the form
, and the Fourier transforms of smooth functions decay as the argument grows. There are other issues, however. Something we can use is the following: by basic results on Diophantine approximation, every
has very good approximations by rationals of non-huge denominator. If
is not tiny, then the approximation
is good, but not very good; hence, there must be another, better approximation
with
non-huge (meaning: considerably smaller than
). We can go back and forth between the approximations
and
, depending on which one is more useful in context. This turns out to be better than using a single approximation
, however good it may be.
Another way in which large gets used is to scatter the inputs to a large sieve. The large sieve can be seen as an approximate form of the Plancherel identity, recast as an inequality: whereas the Plancherel identity tells us that the
-norm
of a Fourier transform
of a function
defined on the integers (for instance; groups other than the integers are also valid) equals the
-norm
of the function
itself, the large sieve tells us that the total of
for a well-spaced sample of points
is bounded by (a multiple of)
. Now, in our case, the points
are multiples of our angle
. If
, the spacing of the points
is
, which is nice — but we may have to apply the large sieve many times, since we have to apply it afresh for each chunk of
points. However, if
and
is not tiny, we can go around the circle many times, and rely on
rather than on
to give us the spacing. Yes, the spacing may be smaller, but the effect of this is more than compensated by the fact that we have to invoke the large sieve far fewer times (perhaps only once). What is more, this scattering can be combined with a more traditional kind of scattering (Montgomery’s lemma; see Montgomery’s “Topics in multiplicative number theory”, or else the exposition in Iwaniec-Kowalski, section 7.4) so as to take advantage of the fact that we are dealing with sums on the primes.
Putting it all together
I’ve been telling you what goes into bounding for
within the minor arcs
, but what we really want to do is bound the integral
. One easy – and traditional – way to do this is to just use the trivial inequality
. Unfortunately, this wastes a factor of
.
Since our bounds for ,
, are given in terms of
, it makes sense to combine them with estimates for integrals of the type
, where
is a union of arcs around rationals with denominator greater than a constant but less than
. How do you estimate such integrals? It turns out that this is very closely related to a question on the large sieve: what bounds can you get for samples
,
, where
is of moderate size?
There was an answer in the literature (based on Montgomery’s lemma; the link with the circle method was already noticed by Heath-Brown) but it was sub-optimal by a factor of at least (or in fact more). There was a newer estimate for the large sieve due to Ramaré, but it had not been made fully explicit. I had to work that out, and then adapted the new large-sieve result to estimate the integral over
above. As was expected, the spurious factor of
(or really a bit more) disappeared.
It remains to look at the main term vs. the error term. It turns out that we have some room to choose what the main term will be, since it depends on the smoothings we choose. The main term is proportional to , where
and
are the two smoothings we choose to work with,
is the odd number we want to express as the sum of three primes, and
is again a parameter of our choosing. For comparison, the error term is proportional to
. Thus, we have an optimization problem (“maximize the size of the double integral divided by
“). It is best to choose
symmetric or close to symmetric (
), making sure, moreover, that
for
. This is not too hard to achieve while keeping
of the form
, where
is band-limited.
What about ? The solution to the optimization problem tells us that it should be of small support, or at least concentrated near the origin. Other than that – there is, so to speak, a political problem:
, unlike
, gets used both in the major and the minor arcs; the major arcs really want it to be of the form
or
, whereas the minor arcs would prefer it to be something simple, like
or like
(as in Tao’s five-prime paper or my own minor-arcs paper).
The solution is simple: define , where
is a large constant,
and
. For
and
pretty much arbitrary, if you know how to compute (or estimate)
for some
, and you also know how to estimate
for the other
, then you know how to estimate
for all
: just write it out and you will see!
(Perhaps you will also see that it helps if and
are very small near
.)
The moral is that different problems, and different parts of the same problem, require different smoothings. At least in the context of exponential sums, there turns out to be a simple trick for combining them, as we have just seen.
Some final remarks on computing
An analytic proof usually gives a proof valid for all larger than a constant
. The reason is simple: say that we want to show that a quantity is positive. Typically, at the end of a great deal of analytic work, you will have proven that the quantity is of the form
, where the absolute value of the error term is at most
(for example; this is obviously simplified). This certainly shows that the quantity is positive — provided that
. The task, then, is to sharpen the proof to such an extent that the constant
becomes small enough that all cases
can be checked by hand (meaning either literally your hand or a computer). This is what my work was largely about; checking the conjecture up to
(and in fact up to
) was a bit of a side task – as we are about to see, it wasn’t even the main computational effort involved.
First, let me say a few more words about analytic results. There are results of the type “the statement is true for all larger than a constant
, but this proof can tell you nothing about
, other than that it exists”. This is called an ineffective estimate; many proofs of Vinogradov’s result in textbooks are of this kind. (The reason behind this is the possibility of so-called Siegel zeros.) A result can also say “the statement is true for all
, and you should in principle be able to determine some value of
by using ideas from the proof, but the author would much rather go drink coffee”. This is an effective, non-explicit statement; Vinogradov’s definitive version of his own proof was of this sort (as are many other results in mathematics, including some of my own past results). If you do give an explicit value of
, then the result is called, well, explicit. Then comes the fourth stage: making
reasonable, i.e., low enough that the case
can be checked by hand. It was clear from the beginning that, in the case of the ternary Goldbach conjecture, “reasonable” meant roughly
, even though nobody had actually checked it that far.
I said before that D. Platt and I had checked the conjecture for all odd numbers up to . Here is how we proceeded. It was already known (thanks to a major computational effort by Oliveira e Silva, Herzog and Pardi) that the binary Goldbach conjecture is true up to
— that is, every even number up to
is the sum of two primes. Given that, all we had to do was to construct a “prime ladder”, that is, a list of primes from
up to
such that the difference between any two consecutive primes in the list is at most
. Thus, if anybody gives you an odd integer
up to
, you know that there is a prime
in the list such that
is positive and at most
. By assumption, we can write
for some primes
,
, and so
.
Constructing such a ladder does not take that much time. (In fact, getting a ladder up to is probably something you can do on your old personal computer in the basement, over a few weeks — though storing it is another matter.) It’s all integer arithmetic, and we use deterministic primality checking (which is fast for primes of special form), so there is really no room for concern.
The major computation consists of verifying that, for every -function of conductor
up to about
(or twice that for
even), all zeroes of the
-function with imaginary part bounded by
lie on the critical line. This was entirely Platt’s work; my sole contribution was to run around asking for computer time at different places (see the acknowledgements section of the major arcs paper). In fact, he went up to conductor
(or twice that for
even); he had already gone up to conductor
in his thesis. The verification took, in total, about
core-hours (i.e., the total number of processor cores used times the number of hours they ran equals
; nowadays, a top-of-the-line processor — such as the ones in the MesoPSL machine — typically has eight cores). In the end, as I said, I used only
(or twice that for
even), so the number of hours actually needed was more like
; since I could have made do with about
, you could say that, in retrospect, only about
core-hours were needed. The computers and I were digging on opposite ends of the mountain, and we met down the middle. The fact that the computers ran for longer than needed is hardly something to be regretted: the computation is of general use, and so it’s so much the better that it not be too tightly fitted to my needs; moreover, with proofs of this length, you want to “build like a Roman”, i.e., overcalculate in case you (not the computer!) have made a small mistake somewhere. (Why did you think those walls were so thick?)
Checking zeros of -functions computationally is something as old as Riemann (who did it by hand); it is also one of the things that were tried on electronic computers already in their early days (Turing had a paper on that). One of the main issues to be careful about arises whenever one manipulates real numbers: honestly speaking, a computer cannot store
; moreover, while a computer can handle rationals, it is really most comfortable handling just those rationals whose denominators are powers of two. Thus, you cannot really say: “computer, give me the sine of that number” and expect a precise result. What you should do, if you really want to prove something (as is the case here!), is to say: “computer, I am giving you an interval
; give me an interval
, preferrably very short, such that
“. This is called interval arithmetic; it is really the easiest way to do floating-point computations rigorously.
Now, processors don’t do this natively, and if you do this purely on software, you can slow things down by a factor of . Fortunately, there are ways of doing this halfways on hardware, halfways on software. Platt has his own library, but there are others online (e.g. PROFIL/BIAS).
(Oh, by the way, don’t use the function on an Intel processor if you want the result to be correct up to the last bit. Just what were they thinking? Use the crlibm library instead.)
Lastly, there were several rather minor computations that I did myself; you’ll find them mentioned in my papers. A typical computation was a rigorous version of a “proof by graph” (the maximum of a function is clearly less than
because gnuplot told me so). You’ll find algorithms for this in any textbook on “validated computing” – basically, it’s enough to combine the bisection method with interval arithmetic.
Finally, let me point out that there is an elementary inequality in the minor arcs paper (namely, (4.24), in the proof of Lemma 4.2) that got proven in part by a human (me) and in part by a quantifier-elimination program. In other words, there are now computer programs out there (in this case, QEPCAD) that can actually prove useful things! Now, I have no doubt that the same inequality can be proven purely by the use of human beings, but it is nice to know that our computer friends can (pretend to) do something other than munch on numbers…
Very nice article (at least the part I’ve gotten through).
e^z = cos z + i sin z looks like a typo. e^{iz}, right?
Yes, exactly. Thanks!
Dear Harald,
I know this sort of elementary expository writing is hard work, and perhaps not very rewarding, but there’s at least one person out on the internet who appreciates it.
A small correction: first line of the section on Major arcs should be
and not
.
Small typo: the roles of $\alpha$ and $n$ should be reversed in the LHS of the equation displayed just before “(Thanks again, Euler.)”
Pingback: En Passant II | Persiflage
Pingback: Harald Andrés Helfgott nos habla sobre su demostración de la conjetura débil de Goldbach - Gaussianos | Gaussianos
Dear Mr. Helfgott,
My warmest congratulations.
Yours very truly.
Thank you for this beautifully clear explanation, but I have a question. Why does the use of the Von Mangoldt function not undermine the argument? It seems to me that this proves only that an odd number must be the sum of 3 prime powers. Please tell me what I am missing. Thanks, Richard Webb
Ah! Well, the proof actually gives a lower bound for the number of ways of expressing an odd integer
as a sum of three prime powers. Now, the number of ways of expressing
as a sum of three prime powers at least one of which is not a prime (or, for that, matter, as a sum
, where
) is pitifully small – much smaller than the lower bound I just mentioned – simply because squares (and cubes, and fourth powers, etc.) are so rare: at most
numbers up to
are squares! So, you see, you can substract that pitifully small number from the lower bound and still get something positive. This is, in fact, the last step of the proof.
Thank you so much for taking the time to get back to me, and congratulations on your achievement.
Dr. Helfgott
Nice explanation, but in the first paragraph of “The Circle method” you need to correct the inversion Formula, is dr not dx
Best Regards
You are right – thanks!
If I understand your exposition, one implication is that every odd number less than 8.8e30 is the sum of three primes one of which is at most 2e18. Can the bound 8.8e30 be removed? – in other words, can the ternary ex-conjecture be strengthened by demonstrating an upper bound on the smallest (or even on the smallest two) of the three primes?
thank you !