Real numbers are uncountable

One of the slickest proofs of all time is using Georg Cantor‘s 1891 diagonal argument in proving that the real numbers \mathbb{R}  constitute an uncountable set, that is, they cannot be put in a bijective correspondence with \mathbb{N}. Strictly speaking, Cantor original paper only proved that the set of all 0,1-sequences is uncountable. However since it is easy to argue that the set of all such sequences and \mathbb{R} have the same cardinality, indirectly we get a proof for the uncountable nature of the reals. Moreover, the argument given by Cantor in his proof can be easily modified to give a direct proof for the uncountability of the reals. This is what we do presently.

Theorem: \mathbb{R} is uncountable.

Proof: We will accept that every real number x has a decimal expansion, x=N.x_1x_2\cdots and it is uniquely determined by x if one agrees never to terminate the expansion with an infinite string of 9‘s. The proof is based on contradiction, and so we suppose that \mathbb{R} is not uncountable. Then it must be that \mathbb{R} is countable and we can make a list of all the elements of \mathbb{R} along with their decimal expansion as follows:


We now consider the real number 0.y_1y_2y_3\cdots which is defined by y_i=1 if x_{ii}\ne 1 and y_i=2 otherwise. It is clear that this number differs from every x_i in the list in at least the i‘th position. Hence this list is incomplete, a contradiction.\Box

Leave a comment

Filed under Real Analysis, Set theory

Niven’s proof for the irrationality of π

In this post I will present a beautiful proof owing to Ivan Niven on the irrationality of \pi. An analytical definition of \pi is that \pi is twice the smallest positive x for which \cos x equals 0. (This definition is better then the usual circumference/diameter  one since it is not dependent on Euclidean axioms).

Theorem (Niven): \pi is irrational.

Proof: Let, by way of contradiction \pi=a/b with a,b\in\mathbb{N}. Now let 

f(x)=\frac{x^n(a-bx)^n}{n!} and 


where we will specify n later. Now n!f(x)=a_0x^n+a_1x^{n+1}+\cdots a_nx^{2n} with a_i\in\mathbb{Z} and so f(0)=0. Not only this, f^{(j)}(0)=0 for 1\le j\le n-1. For n\le j\le 2n we have n!f^{(j)}(x)=n!b_0+b_1x+\cdots b_nx^n with b_i\in\mathbb{Z} and so f^{(j)}(0) is integral in this case as well. Also note that f(x)=f(\pi-x) and so for any j\ge 0,j\in \mathbb{Z} we have f^{(j)}(x)=(-1)^jf^{(j)}(\pi-x) following which f^{(j)}(\pi)=(-1)^jf^{(j)}(0) i.e. f^{(j)}(\pi) is also integral.

Now it follows by elementary calculus that \frac{d}{dx}(F'(x)\sin x-F(x)\cos x)=(F''(x)+F(x))\sin x=f(x)\sin x and so \int_0^\pi f(x)\sin x dx=(F'(x)\sin x-F(x)\cos x)_0^\pi=F(\pi)+F(0). Now F(\pi)+F(0) is integral because f^{(j)}(\pi) and f^{(j)}(0) are both integers for any j. It is also easy to see that for 0<x<\pi we have 0<f(x)<\frac{a^n\pi^n}{n!} and hence 0<f(x)\sin x<\frac{a^n\pi^n}{n!}. But now  0<\int_0^\pi f(x)\sin x dx\le\frac{\pi(a\pi)^n}{n!} and as n! grows faster then any exponential so this upper bound for the integral can be made arbitrarily small. Hence the value of the integral lies between 0 and 1. This is the contradiction which proves the result.\Box

One final word. Even though the proof is credited to Ivan Niven, it is very much possible that the idea was given by Hermite much earlier and all did Niven was to tweak Hermite’s idea to give this proof.

Leave a comment

Filed under Real Analysis

A geometric proof for the irrationality of √2

Tom Apostol in 2000 gave a geometric proof of the irrationality of 2, which for a bright student can even be considered as a “proof without words”:


(The figure has been taken from wikipedia)

The explanation is as follows: Suppose \sqrt{2} is rational and equals m/n where m,n\in\mathbb{N} with (m,n)=1. Consider an isoceles right \vartriangle ABC with legs of length n and hypotenuse m. Draw the blue arcs with center A. Observe that \vartriangle ABC is congruent to \vartriangle ADE by SAS, and furthermore note the various lengths given in the figure follow from this fact. Hence we have an even smaller right isosceles triangle \vartriangle CDF, with hypotenuse length 2n-m and legs m-n. The fact these values are integers even smaller than m and n and in the same ratio is clear from the figure and thus we contradict the hypothesis that (m,n)=1.

The above proof may be aptly summarized in one line:  If \sqrt{2}=m/n is in lowest terms then \sqrt{2}=(2n-m)/(m-n) is in lower terms. In this case however the definition of the square root is implicitly used to show that 2n-m<m and m-n<n.

Leave a comment

Filed under Miscellaneous

The Gauss triangle trick

This is a part of a series of proofs I plan to write here which seem essential for any student of mathematics. This series is inspired by the stackexchange post here, although I am not sure yet as to how many proofs from the list I will pick up.

Theorem: \sum_{k=1}^nk=\frac{n(n+1)}{2}.

Proof: Let S=1+2+\cdots n and write 2S as follows:

1+2+\cdots n +\\n+(n-1)+\cdots 1.

Now addition is associative, so adding each number in the top line with the number directly below it, we get 2S= (1+n) + (2+(n-1))+\cdots (n+1). Clearly there are n parenthesis with each summing to n+1 and so 2S=n(n+1). The  result follows.\Box

So why is this called the Gauss triangle trick. Well, the story goes that when Gauss was in preparatory school, Gauss’s teacher told the kids to add up all the numbers from 1 to 100. He probably thought that this will give him some time to rest and was surprised when in a few minutes Gauss came up with answer using this trick.

The numbers obtained as the various sums, eg 1,1+2=3,1+2+3=6 etc are called triangle numbers. A triangle number is so called because it is a count of the number of balls that can form a equilateral triangle.

Leave a comment

Filed under Miscellaneous

Bipartite graphs

The topic of this post is bipartite graphs. These are graphs whose vertex set can be partitioned into two parts, called partite sets, such that all edges xy of the graph are such that x and y are in different partite sets. The most common examples of bipartite graphs are the trees and even cycles. Any union of bipartite graphs obviously yields another bipartite graph. The complete bipartite graph K_{m,n} consists of two partite sets X and Y containing m and n elements respectively with all possible edges between X and Y filled out.

To conclude here is a list of characterizations.

Suppose G is a graph with at least two vertices. The following are equivalent:

1. G is bipartite.

2. Every cycle of G is even.

3. \chi(G)=2.

1 \Leftrightarrow 2: Suppose G is bipartite and there is a cycle v_1v_2\cdots v_kv_1 which has an odd number of edges. Now v_i for odd i is in the same partite set as v_1. But since the cycle is odd v_1 and v_k are in the same partite set which is a contradiction.

Conversely suppose G has no odd cycles and x\in V(G). Let X=\{y\in V(G):d(x,y) is even \} and Y=\{y\in V(G):d(x,y) is odd \}. Then X and Y partition V(G). Suppose pq\in E(G) and p,q\in X. Neither p nor q is x for otherwise d(x,q) (resp d(x,p)) is odd. So it makes sense to talk of a path from x to p (resp q). Let x=v_0v_1\cdots v_{2k-1}v_{2k}=p and x=w_0w_1\cdots w_{2m-1}w_{2m}=q be the shortest such paths. (Note that they are necessarily even). Let the last common vertex in the two paths be v_i=w_i. Now the cycle v_iv_{i+1}\cdots v_{2k}w_{2m}w_{2m-1}\cdots w_{i+1}w_i=v_i is an odd cycle. This contradiction proves there are no edges within vertices of X. Similarly there are no edges within vertices of Y. The result follows.

1 \Leftrightarrow 3: This is more of a restatement of the definition of a bipartite graph then a characterization, since the two partite sets may be considered as differently colored subsets of V(G).\Box

In addition to the above general characterizations the following are also true:

a) G is bipartite iff the spectrum of G is symmetric with respect to the origin.

b) Suppose G is connected and \lambda is its maximum eigenvalue. Then G is bipartite iff -\lambda is an eigenvalue of G.

c) Suppose G is planar. Then G is bipartite iff b(R), the bound degree of a region R, is even for every region R.

I will defer a proof of these statements.

Leave a comment

Filed under Combinatorics, Graph theory

The Cantor Set

This is a post regarding the basic properties of the Cantor set.

We start with the definition: The Cantor set is obtained by first constructing a sequence (C_n) of closed sets and then taking the intersection of the sets in this sequence. The sequence is constructed as follows:

1. Start with [0,1] and remove the middle third, i.e. the interval (\frac{1}{3},\frac{2}{3}) to obtain the set C_1=[0,\frac{1}{3}]\cup[\frac{2}{3},1]. So C_1 contains 2 disjoint closed intervals each of length \frac{1}{3}.

2. Next remove the middle third of each of these two intervals leaving C_2=[0,\frac{1}{9}]\cup[\frac{2}{9},\frac{1}{3}]\cup[\frac{2}{3},\frac{7}{9}]\cup[\frac{8}{9},1] consisting of 2^2 disjoint closed intervals each of length \frac{1}{3^2}.

3. Assuming C_n has been constructed and consists of 2^n disjoint closed intervals each of length \frac{1}{3^n}, remove the middle thirds of all these intervals to obtain C_{n+1} consisting of 2^{n+1} disjoint closed intervals each of length \frac{1}{3^{n+1}}.

By induction we have our sequence (C_n).

Now, we define the Cantor set C as C=\cap_{n=1}^\infty C_n.

We now describe the properties of the Cantor set:

1. In the usual topology on \mathbb{R} the Cantor set is closed (being an intersection of closed sets). Moreover since it is bounded so by the Heine Borel theorem it is compact. What’s extraordinary is that every compact metric space is a continuous image of the Cantor set! The proof may be found here.

2. The Cantor set is uncountable. We prove this by considering the ternary expansion of all numbers in [0,1]. All such numbers may have the form 0.a_1a_2\cdots with a_i\in\{0,1,2\} (including 1=0.22\cdots). We claim that x\in C iff x has a ternary expansion of the form 0.a_1a_2\cdots with a_i\in\{0,2\}.

To prove this consider the construction of C through the ternary lens: In ternary the construction of C_1 involves removing all numbers in (0.011\cdots,0.2) from [0.00\cdots,0.22\cdots]. This shows that in C_1 every number has a ternary expansion of the form 0.a_1a_2\cdots with a_1\in\{0,2\} and a_i\in\{0,1,2\}\forall i>1. Conversely every number of this form is in C_1. Likewise the construction of C_2 involves removing all numbers in (0.0011\cdots,0.02) and in (0.2011\cdots,0.22) from C_1. This shows that in C_2 every number has a ternary expansion of the form 0.a_1a_2\cdots with a_1,a_2\in\{0,2\} and a_i\in\{0,1,2\}\forall i>2. Again conversely every number of this form is in C_2. Continuing in this way we can conclude that x\in C_n iff x=0.a_1\cdots a_na_{n+1}a_{n+2}\cdots with a_1\cdots a_n\in\{0,2\} and a_i\in\{0,1,2\}\forall i>n. Hence by definition of C our claim is established.

Now assume that C is countable and has been enumerated in the list described below:


Here each d_i^{(j)}\in \{0,2\}.

Now let x=0.a_1a_2a_3\cdots where a_i=0 if d_i^{(i)}=2 and a_i=2 otherwise. Then x\notin C despite being of the requisite ternary form. Hence C is uncountable.

3. The Cantor set has Lebesgue measure zero. We define a set A to be of Lebesgue measure zero if \forall \epsilon>0\exists a sequence of intervals (I_n) such that A\subset\cup_{n=1}^\infty I_n and \sum_{n=1}^\infty l(I_n)<\epsilon. Here if the end points of I_n are a_n and b_n (with a_n\le b_n) then l(I_n)=b_n-a_n. (The proof that this definition is consistent with the Lebesgue measure may be found in this book.) Now given \epsilon>0 choose n so that (\frac{2}{3})^n<\epsilon. Since C\subset C_n and the total length of C_n is (\frac{2}{3})^n, so it is clear that C is a set of Lebesgue measure zero. Together with point 2, we see that this establishes the existence of uncountable sets of zero measure!

Leave a comment

Filed under Measure Theory, Real Analysis, Set theory

The axiom of choice

This post is about the axiom of choice, arguably the most famous and the most controversial of the axioms underpinning the set theoretic foundations of mathematics today. The axiom of choice is about making choices: choosing an element each from an arbitrary collection of nonempty sets. The axiom basically says that this can always be done, although it provides no recipe for doing so.

Before formally presenting the axiom we state some definitions. The treatment is essentially taken from here.

Definition: A function I from a set \Lambda onto a set X is said to index the set X by \Lambda. The set \Lambda is called the index and X is the indexed set. If I(\lambda)=x, we write x_\lambda for I(\lambda).

Definition: Let X be a nonempty set indexed by a set \Lambda. The cartesian product of X is defined to be the set \Pi _{\lambda\in\Lambda}x_\lambda of all functions f with domain \Lambda and codomain \cup X=\cup_{x\in X}x, satisfying the condition f(\lambda)\in x_\lambda.

Let us consider an example before we go further. We let X=\{\{a,b\},\{c,d\}\} be indexed by \Lambda=\{1,2\} where 1\to \{a,b\}, 2\to \{c,d\}. Now there are four functions possible from \Lambda to \{a,b,c,d\} which satisfy the definition:

The function f_1 for which f(1)=a, f(2)=c.
The function f_2 for which f(1)=b, f(2)=c.
The function f_3 for which f(1)=a, f(2)=d.
The function f_4 for which f(1)=b, f(2)=d.

The collection \{f_1,f_2,f_3,f_4\} is the cartesian product. Now intuitively the behavior of the function f_1 can be captured by simply writing (a,c) since the fact that the first coordinate corresponds to a and the second to c indicates the function 1\to a,2\to c. Similarly the behavior of the function f_2,f_3,f_4 are also captured by (b,c),(a,d),(b,d) respectively. Note that a particular ordered pair always corresponds to a unique function f_i. So the cartesian product may be simply represented as \{(a,c),(b,c),(a,d),(b,d)\}.

In fact if \Lambda has at most countable number of elements within it, we can always consider it as an subset of the form \{1,2,3\cdots\} (which may or may not terminate), and index appropriately. Then the cartesian product obtained can be equivalently thought of as a collection of n-tuples or infinite sequences where the ith coordinate corresponds to the image of i in the indexing. For example if X=\{\{0,1\}\} and \Lambda=\mathbb{N} then all zero-one sequences correspond to the requisite functions and their collection forms the cartesian product.

Definition: Any function f which satisfies the conditions required to make it a member of a cartesian product is called a choice function.

We now give three formulations of the axiom of choice:

Axiom of choice:

1. For every nonempty set whose elements are nonempty sets and are indexed by a set I there exists a choice function.

2. If \{a_i\} is a family of nonempty sets, indexed by a nonempty set I, then there exists a family \{x_i\} with i\in I such that x_i\in a_i for each i\in I. (This corresponds roughly to our informal comment about choices in the first paragraph.)

3. The cartesian product of a nonempty collection of nonempty sets is nonempty.

Theorem: The three formulations are equivalent.

Proof: 1\Rightarrow 2: Let \{a_i\}_{i\in I} be a family of sets, indexed by the nonempty set I. Let S=\{a_i\mid i\in I\}. (Note that the elements of S are themselves sets.) We index S by itself, and consider the cartesian product of S. By the assumption and by I\ne \emptyset, the set S is a nonempty set of nonempty sets. Hence owing to (1) there exists a choice function, i.e. a function f\colon S\to \bigcup S such that f(x)\in x for all x \in S. For i\in I let x_i=f(a_i). Then \{x_i\}_{i\in I} is a family with the required properties.

2\Rightarrow 1: Let S=\{a_i:i\in I\} be a nonempty set whose elements are themselves nonempty where S is indexed by I. By (2) we have a family x_i with x_i\in a_i. We define our choice function to be f:I\to\cup S where f(i)=x_i.

1\Leftrightarrow 3: This is obvious by the definitions.\Box

Due to this result any of these three statements may be referred to as the axiom of choice. In fact, there are many other equivalent statements. The term ZFC is usually used to denote the Zermelo Fraenkel axioms together with the axiom of choice, and these axioms form a basis of most of the mathematics done today.

Why is this axiom so relevant? Firstly if one refuses to accept this axiom, one loses a lot of interesting results. Secondly, many times assuming the axiom of choice is so much easier. Many results such as the Cantor-Berstein-Schroeder theorem were proved originally using the axiom of choice, although later a “choice-free” proof was discovered. Similarly the axiom of choice makes undergraduate analysis easier by enabling one to say that f(x) is continuous at x=a if f(a_n) tends to f(a) for each sequence (a_n) that tends to a.

Are there any negative issues with the axiom of choice? Well, firstly it is not constructive and doesn’t actually tell us the choices to make. It just guarantees that some choice exists. In one formulation it tells us that the real numbers can be well ordered but it gives us no recipe of doing so. In other words, it doesn’t actually construct a well order of the reals for us. Secondly, some counter intuitive results such as the Banach Tarsky paradox are true in ZFC and many mathematicians find this troubling and “fishy”.

Thirdly, and probably most importantly, there is history. After it was introduced (in another form) in 1904 by Zermelo, many mathematicians wondered whether or not the axiom was consistent with ZF. In other words people asked whether accepting it would lead to contradictions. Many decades passed before in 1938 Godel showed that ZFC is indeed consistent. Another doubt was whether the axiom of choice was independent of ZF. It was only by 1963 that Cohen proved that the axiom of choice cannot derived from ZF and so together with Godel’s result this implied that the axiom of choice is independent of ZF. During this long period from 1904 to 1963, many bizzare results such as the Banach Tarsky paradox, existence of non-Lebesgue measurable sets etc were found. All this sowed a suspicion in people’s minds as to whether this axiom was “false” at some fundamental level. As a result mathematicians began to pay close attention to results in which the axiom was used, and to try and device proofs which avoided the axiom as much as possible. This habit somewhat lessened after the independence results, but it has not been dropped by everyone even today.

The fact of the matter is that while rejecting the axiom of choice leads to many weird results, so does accepting it. See the chapters entitled “Disasters without Choice” and “Disasters with Choice” in this book for details. However in the long run, the advantages of accepting the axiom outweigh the advantages of rejecting it primarily because the sheer quantity of beautiful and aesthetic results flowing from the axiom of choice provide substance to mathematics as a whole.


Filed under Miscellaneous, Set theory