## Real numbers are uncountable

One of the slickest proofs of all time is using Georg Cantor‘s 1891 diagonal argument in proving that the real numbers $\mathbb{R}$ constitute an uncountable set, that is, they cannot be put in a bijective correspondence with $\mathbb{N}$. Strictly speaking, Cantor original paper only proved that the set of all $0,1$-sequences is uncountable. However since it is easy to argue that the set of all such sequences and $\mathbb{R}$ have the same cardinality, indirectly we get a proof for the uncountable nature of the reals. Moreover, the argument given by Cantor in his proof can be easily modified to give a direct proof for the uncountability of the reals. This is what we do presently.

Theorem: $\mathbb{R}$ is uncountable.

Proof: We will accept that every real number $x$ has a decimal expansion, $x=N.x_1x_2\cdots$ and it is uniquely determined by $x$ if one agrees never to terminate the expansion with an infinite string of $9$‘s. The proof is based on contradiction, and so we suppose that $\mathbb{R}$ is not uncountable. Then it must be that $\mathbb{R}$ is countable and we can make a list of all the elements of $\mathbb{R}$ along with their decimal expansion as follows:

$x_1=N_1.x_{11}x_{12}x_{13}\cdots\\x_2=N_2.x_{21}x_{22}x_{23}\cdots\\x_3=N_3.x_{31}x_{32}x_{33}\cdots\\\vdots$

We now consider the real number $0.y_1y_2y_3\cdots$ which is defined by $y_i=1$ if $x_{ii}\ne 1$ and $y_i=2$ otherwise. It is clear that this number differs from every $x_i$ in the list in at least the $i$‘th position. Hence this list is incomplete, a contradiction.$\Box$

Filed under Real Analysis, Set theory

## Niven’s proof for the irrationality of π

In this post I will present a beautiful proof owing to Ivan Niven on the irrationality of $\pi$. An analytical definition of $\pi$ is that $\pi$ is twice the smallest positive $x$ for which $\cos x$ equals $0$. (This definition is better then the usual circumference/diameter  one since it is not dependent on Euclidean axioms).

Theorem (Niven): $\pi$ is irrational.

Proof: Let, by way of contradiction $\pi=a/b$ with $a,b\in\mathbb{N}$. Now let

$f(x)=\frac{x^n(a-bx)^n}{n!}$ and

$F(x)=f(x)-f^{(2)}(x)+f^{(4)}(x)-\cdots(-1)^nf^{(2n)}(x)$

where we will specify $n$ later. Now $n!f(x)=a_0x^n+a_1x^{n+1}+\cdots a_nx^{2n}$ with $a_i\in\mathbb{Z}$ and so $f(0)=0$. Not only this, $f^{(j)}(0)=0$ for $1\le j\le n-1$. For $n\le j\le 2n$ we have $n!f^{(j)}(x)=n!b_0+b_1x+\cdots b_nx^n$ with $b_i\in\mathbb{Z}$ and so $f^{(j)}(0)$ is integral in this case as well. Also note that $f(x)=f(\pi-x)$ and so for any $j\ge 0,j\in \mathbb{Z}$ we have $f^{(j)}(x)=(-1)^jf^{(j)}(\pi-x)$ following which $f^{(j)}(\pi)=(-1)^jf^{(j)}(0)$ i.e. $f^{(j)}(\pi)$ is also integral.

Now it follows by elementary calculus that $\frac{d}{dx}(F'(x)\sin x-F(x)\cos x)=(F''(x)+F(x))\sin x=f(x)\sin x$ and so $\int_0^\pi f(x)\sin x dx=(F'(x)\sin x-F(x)\cos x)_0^\pi=F(\pi)+F(0)$. Now $F(\pi)+F(0)$ is integral because $f^{(j)}(\pi)$ and $f^{(j)}(0)$ are both integers for any $j$. It is also easy to see that for $0 we have $0 and hence $0. But now  $0<\int_0^\pi f(x)\sin x dx\le\frac{\pi(a\pi)^n}{n!}$ and as $n!$ grows faster then any exponential so this upper bound for the integral can be made arbitrarily small. Hence the value of the integral lies between $0$ and $1$. This is the contradiction which proves the result.$\Box$

One final word. Even though the proof is credited to Ivan Niven, it is very much possible that the idea was given by Hermite much earlier and all did Niven was to tweak Hermite’s idea to give this proof.

Filed under Real Analysis

## A geometric proof for the irrationality of √2

Tom Apostol in 2000 gave a geometric proof of the irrationality of 2, which for a bright student can even be considered as a “proof without words”:

(The figure has been taken from wikipedia)

The explanation is as follows: Suppose $\sqrt{2}$ is rational and equals $m/n$ where $m,n\in\mathbb{N}$ with $(m,n)=1$. Consider an isoceles right $\vartriangle ABC$ with legs of length $n$ and hypotenuse $m$. Draw the blue arcs with center $A$. Observe that $\vartriangle ABC$ is congruent to $\vartriangle ADE$ by SAS, and furthermore note the various lengths given in the figure follow from this fact. Hence we have an even smaller right isosceles triangle $\vartriangle CDF$, with hypotenuse length $2n-m$ and legs $m-n$. The fact these values are integers even smaller than $m$ and $n$ and in the same ratio is clear from the figure and thus we contradict the hypothesis that $(m,n)=1$.

The above proof may be aptly summarized in one line:  If $\sqrt{2}=m/n$ is in lowest terms then $\sqrt{2}=(2n-m)/(m-n)$ is in lower terms. In this case however the definition of the square root is implicitly used to show that $2n-m and $m-n.

Filed under Miscellaneous

## The Gauss triangle trick

This is a part of a series of proofs I plan to write here which seem essential for any student of mathematics. This series is inspired by the stackexchange post here, although I am not sure yet as to how many proofs from the list I will pick up.

Theorem: $\sum_{k=1}^nk=\frac{n(n+1)}{2}$.

Proof: Let $S=1+2+\cdots n$ and write $2S$ as follows:

$1+2+\cdots n +\\n+(n-1)+\cdots 1$.

Now addition is associative, so adding each number in the top line with the number directly below it, we get $2S= (1+n) + (2+(n-1))+\cdots (n+1)$. Clearly there are $n$ parenthesis with each summing to $n+1$ and so $2S=n(n+1)$. The  result follows.$\Box$

So why is this called the Gauss triangle trick. Well, the story goes that when Gauss was in preparatory school, Gauss’s teacher told the kids to add up all the numbers from $1$ to $100$. He probably thought that this will give him some time to rest and was surprised when in a few minutes Gauss came up with answer using this trick.

The numbers obtained as the various sums, eg $1,1+2=3,1+2+3=6$ etc are called triangle numbers. A triangle number is so called because it is a count of the number of balls that can form a equilateral triangle.

Filed under Miscellaneous

## Bipartite graphs

The topic of this post is bipartite graphs. These are graphs whose vertex set can be partitioned into two parts, called partite sets, such that all edges $xy$ of the graph are such that $x$ and $y$ are in different partite sets. The most common examples of bipartite graphs are the trees and even cycles. Any union of bipartite graphs obviously yields another bipartite graph. The complete bipartite graph $K_{m,n}$ consists of two partite sets $X$ and $Y$ containing $m$ and $n$ elements respectively with all possible edges between $X$ and $Y$ filled out.

To conclude here is a list of characterizations.

Suppose $G$ is a graph with at least two vertices. The following are equivalent:

1. $G$ is bipartite.

2. Every cycle of $G$ is even.

3. $\chi(G)=2$.

Proof:
$1 \Leftrightarrow 2$: Suppose $G$ is bipartite and there is a cycle $v_1v_2\cdots v_kv_1$ which has an odd number of edges. Now $v_i$ for odd $i$ is in the same partite set as $v_1$. But since the cycle is odd $v_1$ and $v_k$ are in the same partite set which is a contradiction.

Conversely suppose $G$ has no odd cycles and $x\in V(G)$. Let $X=\{y\in V(G):d(x,y)$ is even $\}$ and $Y=\{y\in V(G):d(x,y)$ is odd $\}$. Then $X$ and $Y$ partition $V(G)$. Suppose $pq\in E(G)$ and $p,q\in X$. Neither $p$ nor $q$ is $x$ for otherwise $d(x,q)$ (resp $d(x,p)$) is odd. So it makes sense to talk of a path from $x$ to $p$ (resp $q$). Let $x=v_0v_1\cdots v_{2k-1}v_{2k}=p$ and $x=w_0w_1\cdots w_{2m-1}w_{2m}=q$ be the shortest such paths. (Note that they are necessarily even). Let the last common vertex in the two paths be $v_i=w_i$. Now the cycle $v_iv_{i+1}\cdots v_{2k}w_{2m}w_{2m-1}\cdots w_{i+1}w_i=v_i$ is an odd cycle. This contradiction proves there are no edges within vertices of $X$. Similarly there are no edges within vertices of $Y$. The result follows.

$1 \Leftrightarrow 3$: This is more of a restatement of the definition of a bipartite graph then a characterization, since the two partite sets may be considered as differently colored subsets of $V(G)$.$\Box$

In addition to the above general characterizations the following are also true:

a) G is bipartite iff the spectrum of $G$ is symmetric with respect to the origin.

b) Suppose $G$ is connected and $\lambda$ is its maximum eigenvalue. Then $G$ is bipartite iff $-\lambda$ is an eigenvalue of $G$.

c) Suppose $G$ is planar. Then $G$ is bipartite iff $b(R)$, the bound degree of a region $R$, is even for every region $R$.

I will defer a proof of these statements.

Filed under Combinatorics, Graph theory

## The Cantor Set

This is a post regarding the basic properties of the Cantor set.

We start with the definition: The Cantor set is obtained by first constructing a sequence $(C_n)$ of closed sets and then taking the intersection of the sets in this sequence. The sequence is constructed as follows:

1. Start with $[0,1]$ and remove the middle third, i.e. the interval $(\frac{1}{3},\frac{2}{3})$ to obtain the set $C_1=[0,\frac{1}{3}]\cup[\frac{2}{3},1]$. So $C_1$ contains $2$ disjoint closed intervals each of length $\frac{1}{3}$.

2. Next remove the middle third of each of these two intervals leaving $C_2=[0,\frac{1}{9}]\cup[\frac{2}{9},\frac{1}{3}]\cup[\frac{2}{3},\frac{7}{9}]\cup[\frac{8}{9},1]$ consisting of $2^2$ disjoint closed intervals each of length $\frac{1}{3^2}$.

3. Assuming $C_n$ has been constructed and consists of $2^n$ disjoint closed intervals each of length $\frac{1}{3^n}$, remove the middle thirds of all these intervals to obtain $C_{n+1}$ consisting of $2^{n+1}$ disjoint closed intervals each of length $\frac{1}{3^{n+1}}$.

By induction we have our sequence $(C_n)$.

Now, we define the Cantor set $C$ as $C=\cap_{n=1}^\infty C_n$.

We now describe the properties of the Cantor set:

1. In the usual topology on $\mathbb{R}$ the Cantor set is closed (being an intersection of closed sets). Moreover since it is bounded so by the Heine Borel theorem it is compact. What’s extraordinary is that every compact metric space is a continuous image of the Cantor set! The proof may be found here.

2. The Cantor set is uncountable. We prove this by considering the ternary expansion of all numbers in $[0,1]$. All such numbers may have the form $0.a_1a_2\cdots$ with $a_i\in\{0,1,2\}$ (including $1=0.22\cdots$). We claim that $x\in C$ iff $x$ has a ternary expansion of the form $0.a_1a_2\cdots$ with $a_i\in\{0,2\}$.

To prove this consider the construction of $C$ through the ternary lens: In ternary the construction of $C_1$ involves removing all numbers in $(0.011\cdots,0.2)$ from $[0.00\cdots,0.22\cdots]$. This shows that in $C_1$ every number has a ternary expansion of the form $0.a_1a_2\cdots$ with $a_1\in\{0,2\}$ and $a_i\in\{0,1,2\}\forall i>1$. Conversely every number of this form is in $C_1$. Likewise the construction of $C_2$ involves removing all numbers in $(0.0011\cdots,0.02)$ and in $(0.2011\cdots,0.22)$ from $C_1$. This shows that in $C_2$ every number has a ternary expansion of the form $0.a_1a_2\cdots$ with $a_1,a_2\in\{0,2\}$ and $a_i\in\{0,1,2\}\forall i>2$. Again conversely every number of this form is in $C_2$. Continuing in this way we can conclude that $x\in C_n$ iff $x=0.a_1\cdots a_na_{n+1}a_{n+2}\cdots$ with $a_1\cdots a_n\in\{0,2\}$ and $a_i\in\{0,1,2\}\forall i>n$. Hence by definition of $C$ our claim is established.

Now assume that $C$ is countable and has been enumerated in the list described below:

$0.d_1^{(1)}d_2^{(1)}d_3^{(1)}\cdots$
$0.d_1^{(2)}d_2^{(2)}d_3^{(2)}\cdots$
$0.d_1^{(3)}d_2^{(3)}d_3^{(3)}\cdots$
$\cdots$

Here each $d_i^{(j)}\in \{0,2\}$.

Now let $x=0.a_1a_2a_3\cdots$ where $a_i=0$ if $d_i^{(i)}=2$ and $a_i=2$ otherwise. Then $x\notin C$ despite being of the requisite ternary form. Hence $C$ is uncountable.

3. The Cantor set has Lebesgue measure zero. We define a set $A$ to be of Lebesgue measure zero if $\forall \epsilon>0\exists$ a sequence of intervals $(I_n)$ such that $A\subset\cup_{n=1}^\infty I_n$ and $\sum_{n=1}^\infty l(I_n)<\epsilon$. Here if the end points of $I_n$ are $a_n$ and $b_n$ (with $a_n\le b_n)$ then $l(I_n)=b_n-a_n$. (The proof that this definition is consistent with the Lebesgue measure may be found in this book.) Now given $\epsilon>0$ choose $n$ so that $(\frac{2}{3})^n<\epsilon$. Since $C\subset C_n$ and the total length of $C_n$ is $(\frac{2}{3})^n$, so it is clear that $C$ is a set of Lebesgue measure zero. Together with point 2, we see that this establishes the existence of uncountable sets of zero measure!

Filed under Measure Theory, Real Analysis, Set theory

## The axiom of choice

This post is about the axiom of choice, arguably the most famous and the most controversial of the axioms underpinning the set theoretic foundations of mathematics today. The axiom of choice is about making choices: choosing an element each from an arbitrary collection of nonempty sets. The axiom basically says that this can always be done, although it provides no recipe for doing so.

Before formally presenting the axiom we state some definitions. The treatment is essentially taken from here.

Definition: A function $I$ from a set $\Lambda$ onto a set $X$ is said to index the set $X$ by $\Lambda$. The set $\Lambda$ is called the index and $X$ is the indexed set. If $I(\lambda)=x$, we write $x_\lambda$ for $I(\lambda)$.

Definition: Let $X$ be a nonempty set indexed by a set $\Lambda$. The cartesian product of $X$ is defined to be the set $\Pi _{\lambda\in\Lambda}x_\lambda$ of all functions $f$ with domain $\Lambda$ and codomain $\cup X=\cup_{x\in X}x$, satisfying the condition $f(\lambda)\in x_\lambda$.

Let us consider an example before we go further. We let $X=\{\{a,b\},\{c,d\}\}$ be indexed by $\Lambda=\{1,2\}$ where $1\to \{a,b\}, 2\to \{c,d\}$. Now there are four functions possible from $\Lambda$ to $\{a,b,c,d\}$ which satisfy the definition:

The function $f_1$ for which $f(1)=a, f(2)=c$.
The function $f_2$ for which $f(1)=b, f(2)=c$.
The function $f_3$ for which $f(1)=a, f(2)=d$.
The function $f_4$ for which $f(1)=b, f(2)=d$.

The collection $\{f_1,f_2,f_3,f_4\}$ is the cartesian product. Now intuitively the behavior of the function $f_1$ can be captured by simply writing $(a,c)$ since the fact that the first coordinate corresponds to $a$ and the second to $c$ indicates the function $1\to a,2\to c$. Similarly the behavior of the function $f_2,f_3,f_4$ are also captured by $(b,c),(a,d),(b,d)$ respectively. Note that a particular ordered pair always corresponds to a unique function $f_i$. So the cartesian product may be simply represented as $\{(a,c),(b,c),(a,d),(b,d)\}$.

In fact if $\Lambda$ has at most countable number of elements within it, we can always consider it as an subset of the form $\{1,2,3\cdots\}$ (which may or may not terminate), and index appropriately. Then the cartesian product obtained can be equivalently thought of as a collection of n-tuples or infinite sequences where the ith coordinate corresponds to the image of $i$ in the indexing. For example if $X=\{\{0,1\}\}$ and $\Lambda=\mathbb{N}$ then all zero-one sequences correspond to the requisite functions and their collection forms the cartesian product.

Definition: Any function $f$ which satisfies the conditions required to make it a member of a cartesian product is called a choice function.

We now give three formulations of the axiom of choice:

Axiom of choice:

1. For every nonempty set whose elements are nonempty sets and are indexed by a set $I$ there exists a choice function.

2. If $\{a_i\}$ is a family of nonempty sets, indexed by a nonempty set $I$, then there exists a family $\{x_i\}$ with $i\in I$ such that $x_i\in a_i$ for each $i\in I$. (This corresponds roughly to our informal comment about choices in the first paragraph.)

3. The cartesian product of a nonempty collection of nonempty sets is nonempty.

Theorem: The three formulations are equivalent.

Proof: $1\Rightarrow 2$: Let $\{a_i\}_{i\in I}$ be a family of sets, indexed by the nonempty set $I$. Let $S=\{a_i\mid i\in I\}$. (Note that the elements of $S$ are themselves sets.) We index $S$ by itself, and consider the cartesian product of $S$. By the assumption and by $I\ne \emptyset$, the set $S$ is a nonempty set of nonempty sets. Hence owing to (1) there exists a choice function, i.e. a function $f\colon S\to \bigcup S$ such that $f(x)\in x$ for all $x \in S$. For $i\in I$ let $x_i=f(a_i)$. Then $\{x_i\}_{i\in I}$ is a family with the required properties.

$2\Rightarrow 1$: Let $S=\{a_i:i\in I\}$ be a nonempty set whose elements are themselves nonempty where $S$ is indexed by $I$. By (2) we have a family $x_i$ with $x_i\in a_i$. We define our choice function to be $f:I\to\cup S$ where $f(i)=x_i$.

$1\Leftrightarrow 3$: This is obvious by the definitions.$\Box$

Due to this result any of these three statements may be referred to as the axiom of choice. In fact, there are many other equivalent statements. The term ZFC is usually used to denote the Zermelo Fraenkel axioms together with the axiom of choice, and these axioms form a basis of most of the mathematics done today.

Why is this axiom so relevant? Firstly if one refuses to accept this axiom, one loses a lot of interesting results. Secondly, many times assuming the axiom of choice is so much easier. Many results such as the Cantor-Berstein-Schroeder theorem were proved originally using the axiom of choice, although later a “choice-free” proof was discovered. Similarly the axiom of choice makes undergraduate analysis easier by enabling one to say that $f(x)$ is continuous at $x=a$ if $f(a_n)$ tends to $f(a)$ for each sequence $(a_n)$ that tends to $a$.

Are there any negative issues with the axiom of choice? Well, firstly it is not constructive and doesn’t actually tell us the choices to make. It just guarantees that some choice exists. In one formulation it tells us that the real numbers can be well ordered but it gives us no recipe of doing so. In other words, it doesn’t actually construct a well order of the reals for us. Secondly, some counter intuitive results such as the Banach Tarsky paradox are true in ZFC and many mathematicians find this troubling and “fishy”.

Thirdly, and probably most importantly, there is history. After it was introduced (in another form) in 1904 by Zermelo, many mathematicians wondered whether or not the axiom was consistent with ZF. In other words people asked whether accepting it would lead to contradictions. Many decades passed before in 1938 Godel showed that ZFC is indeed consistent. Another doubt was whether the axiom of choice was independent of ZF. It was only by 1963 that Cohen proved that the axiom of choice cannot derived from ZF and so together with Godel’s result this implied that the axiom of choice is independent of ZF. During this long period from 1904 to 1963, many bizzare results such as the Banach Tarsky paradox, existence of non-Lebesgue measurable sets etc were found. All this sowed a suspicion in people’s minds as to whether this axiom was “false” at some fundamental level. As a result mathematicians began to pay close attention to results in which the axiom was used, and to try and device proofs which avoided the axiom as much as possible. This habit somewhat lessened after the independence results, but it has not been dropped by everyone even today.

The fact of the matter is that while rejecting the axiom of choice leads to many weird results, so does accepting it. See the chapters entitled “Disasters without Choice” and “Disasters with Choice” in this book for details. However in the long run, the advantages of accepting the axiom outweigh the advantages of rejecting it primarily because the sheer quantity of beautiful and aesthetic results flowing from the axiom of choice provide substance to mathematics as a whole.