Category Archives: Set theory

The Inclusion Exclusion Principle

The Inclusion Exclusion Principle is one of most fundamental results in combinatorics. It deals with counting through a seive technique: i.e. we first overcount the quantity to be enumerated, then we try to balance out the overcounting by subtracting the extra. We may or may not subtract more then what is needed and so we count again the extra bits. Continuing in this way we “converge” at the correct count.

An easy example is |A\cup B\cup C|=|A|+|B|+|C|-|A\cap B|-|A\cap C|-|B\cap C|+|A\cap B\cap C| where A,B,C are three finite sets.

Here we start off by summing the number of elements in A,B,C seperately. The overcount is compensated by subtracting the number of elements in |A\cap B|,|A\cap C|, |B\cap C|. We actually compensate for a bit more then needed and so to arrive at the correct count we must add |A\cap B\cap C|.

Our goal here is to prove the inclusion-exclusion principle and then to look at a few corollaries. We first establish two lemmas.

Lemma 1: If X is any set containing n elements and k is any field, then the set of all functions f:X\to k is an n-dimensional vector space V over k with the naturally induced operations.

Proof: Let X=\{x_1,\cdots,x_n\}. It is easy to see that V together with the operations f+g, defined by (f+g)(x_i)=f(x_i)+g(x_i) and (\alpha f)(x_i)=\alpha (f(x_i)) for all x_i\in X is a vector space.

We now exhibit a basis for V consisting of n elements. For all x_i\in X let f_{x_i} be the function defined as f_{x_i}(y)=\begin{cases}         \hfill 1    \hfill & \text{ if }y=x_i \\        \hfill 0 \hfill & \text{ otherwise} \\    \end{cases}. Now, we claim that B=\{f_{x_i}:x_i\in X\} is a basis. Clearly it is spanning as for any f\in V we have f=\sum_{x_i\in X}f(x_i)f_{x_i}. It is linearly independent as \sum_{x_i\in X} \alpha_i f_{x_i}=0 means that for any j, with 1\le j\le n, we have \sum_{x_i\in X} \alpha_i f_{x_i}(x_j)=0, i.e. \alpha_j=0. This completes the proof.\Box

Lemma 2: If m\ge 0 then \sum_{i \mathop = 0}^m \left({-1}\right)^i \binom m i = \delta_{0m}.

Proof: If m>0 put x=1, y=-1 in the binomial theorem (x+y)^m=\sum_{i=0}^mx^{m-i}y^i. If m=0, then the sum on the left reduces to only one term: (-1)^0\binom 0 0. This is clearly 1.\Box

We now prove the Inclusion Exclusion principle. This theorem in its purest form, is simply a formula for an inverse of a linear operator. The theorem is as follows:

Theorem 3: Let S be a set with n elements. Let V be the 2^n dimensional vector space of all functions f:\mathcal{P}(S)\to k, where k is some field and \mathcal{P}(S) is the power set of S. Let \phi:V\to V be the linear operator defined by

\displaystyle\phi f(T)=\sum_{Y\supseteq T}f(Y)\text{ for all } T\subseteq S.

Then \phi^{-1} exists and is given by:

\displaystyle\phi^{-1} f(T)=\sum_{Y\supseteq T}(-1)^{|Y-T|}f(Y)\text{ for all } T\subseteq S.

Proof: To show \phi^{-1} as given above is indeed the inverse it suffices to show that \phi^{-1}\phi(f)=f for all f\in V.

Let f\in V. Then,

\displaystyle\phi^{-1}\phi f(T)=\sum_{Y\supseteq T}(-1)^{|Y-T|}\phi f(Y)=\sum_{Y\supseteq T}(-1)^{|Y-T|}\sum_{Z\supseteq Y}f(Z)=\sum_{Z\supseteq T}\Big(\sum_{Z\supseteq Y\supseteq T}(-1)^{|Y-T|}\Big)f(Z)

Now fix Z,T and let m=|Z-T|. Consider \displaystyle\sum_{Z\supseteq Y\supseteq T}(-1)^{|Y-T|}. Any Y is obtained by choosing some elements out of Z-T which has m elements, and taking the union of such elements with T. So for every i, with 0\le i\le m, there are exactly \binom m i ways of choosing a Y, which has i+|T| elements. Any such Y also has i elements in |Y-T|. So \displaystyle\sum_{Z\supseteq Y\supseteq T}(-1)^{|Y-T|}=\sum_{i=0}^m(-1)^i\binom m i=\delta_{0m}. This when substituted in the expression for \phi^{-1}\phi f(T) shows that \phi^{-1}\phi f(T)=f(T), which proves the theorem.\Box

We now discuss some corollaries of the Inclusion Exclusion principle. Let S be a set of properties that elements of a given set A may or may not have. For any T\subseteq S, let A'_T\subseteq A be those elements which have exactly the properties in T and no others. We define a function f_=:S\to \mathbb{R} such that f_=(T)=|A'_T|. Similarly, for any T\subseteq S, let A''_T\subseteq A be those elements which have at least the properties in T. We define a function f_\ge:S\to \mathbb{R} such that f_=(T)=|A''_T|. It is clear that \displaystyle f_\ge(T)=\sum_{Y\supseteq T}f_=(T) for any T\subseteq S, and so by the Inclusion Exclusion principle we conclude that

Corollary 4: For any T\subseteq S, we have \displaystyle f_=(T)=\sum_{Y\supseteq T}(-1)^{|Y-T|}f_\ge (Y).

In particular we have \displaystyle f_=(\emptyset)=\sum_{Y\subseteq S}(-1)^{|Y|}f_\ge (Y), which gives us a formula for the number of elements having none of the properties.\Box

In the above corollary we think of f_\ge(T) as the first approximation to f_=(T). So we “include” that much “quantity” in our initial count. From this we subtract all terms of the type f_=(Y) where Y has just one extra element then T. Thus we “exclude” that much “quantity” from our count. This gives a better approximation. Next we add all terms of the type f_=(Y) where Y has two extra elements then T, and so on. This is the reason behind the terminology inclusion-exclusion.

We now discuss another corollary. Let A be a finite set and let A_1,\cdots A_n be some of its subsets. We define a set of properties S=\{P_1,\cdots,P_n\} which elements of A may or may not enjoy as follows: For any i, with 1\le i\le n, x\in A satisfies the property P_i if and only if x\in A_i. Also for any I\subseteq [n], let \emptyset\ne A_I=\cap_{i\in I}A_i be the set of elements which have at least the properties P_i for i\in I. Define A_\emptyset to be A. By Corollary 4, \displaystyle f_=(\emptyset)=\sum_{I'\subseteq S}(-1)^{|I'|}f_\ge (I')=\sum_{I\subseteq [n]}(-1)^{|I|}|A_I| where in the second equality we correspond each subset of properties with a subset of [n]. We summarize this as

Corollary 5: Let A be a finite set and let A_1,\cdots A_n be some of its subsets. For any \emptyset\ne I\subseteq [n], let A_I=\cap_{i\in I}A_i and let A_\emptyset=A. The number of elements in A-\cup_{i=1}^n A_i is given by \displaystyle\sum_{I\subseteq [n]}(-1)^{|I|}|A_I|=|A|+\sum_{\emptyset\ne I\subseteq [n]}(-1)^{|I|}|A_I|.\Box

A further special case is obtained by considering any finite sets A_1,A_2,\cdots,A_n and letting A=\cup_{i=1}^nA_i. Then the above corollary translates to \sum_{I\subseteq [n]}(-1)^{|I|}|A_I|=0. Considering the case of I=\emptyset seperately, we see that |A|+\displaystyle\sum_{\emptyset\ne I\subseteq [n]}(-1)^{|I|}|\cap_{i\in I}A_i|=0. This easily yields the following corollary.

Corollary 6: Let A_1,A_2\cdots,A_n be any finite sets. For any \emptyset\ne I\subseteq [n], let A_I=\cap_{i\in I}A_i and let A_\emptyset=A. Then \displaystyle|A_1\cup A_2\cup\cdots\cup A_n|=\displaystyle\sum_{\emptyset\ne I\subseteq [n]}(-1)^{|I|-1}|\cap_{i\in I}A_i|.\Box

Now by grouping terms involving the same size of I we can restate both Corollary 5 and 6 as follows.

Corollary 7: Let A be a finite set and let A_1,\cdots A_n be some of its subsets. The number of elements in A-\cup_{i=1}^n A_i is given by
\displaystyle |A|+\sum_{k=1}^n(-1)^{k}\sum_{1\le i_1 < i_2 < \cdots <i_k\le n}|A_{i_1}\cap A_{i_2}\cap\cdots\cap A_{i_k}|.\Box

Corollary 8: If A_1,A_2\cdots,A_n are any finite sets then

\displaystyle|A_1\cup A_2\cup\cdots\cup A_n|=\sum_{k=1}^n(-1)^{k-1}\sum_{1\le i_1< i_2<\cdots<i_k\le n}|A_{i_1}\cap A_{i_2}\cap\cdots\cap A_{i_k}|.\Box

Corollaries 5 to 8 are often referred to as the principle of inclusion-exclusion themselves as in combinatorial settings they are the ones most often used. A further simplified version can also be derived from them when the intersection of any k distinct sets A_i always has the same cardinality N_k. In that case we only need to multiply N_k with the number of such ways to select the k sets to get the value of the inner sums in Corollaries 7 and 8.

Corollary 9: Let A be a finite set and let A_1,\cdots A_n be some of its subsets. Suppose that for any k with 1\le k\le n there exists a natural number N_k so that for any \{i_1,\cdots, i_k\}\subseteq [n] we have A_{i_1}\cap\cdots\cap A_{i_k}=N_k. Then the number of elements in A-\cup_{i=1}^n A_i is given by |A|+\sum_{k=1}^n(-1)^k\binom{n}{k}N_k and |A_1\cup A_2\cup\cdots\cup A_n|=\sum_{k=1}^n(-1)^{k-1}\binom{n}{k}N_k.\Box

Leave a comment

Filed under Combinatorics, Set theory

Real numbers are uncountable

One of the slickest proofs of all time is using Georg Cantor‘s 1891 diagonal argument in proving that the real numbers \mathbb{R}  constitute an uncountable set, that is, they cannot be put in a bijective correspondence with \mathbb{N}. Strictly speaking, Cantor original paper only proved that the set of all 0,1-sequences is uncountable. However since it is easy to argue that the set of all such sequences and \mathbb{R} have the same cardinality, indirectly we get a proof for the uncountable nature of the reals. Moreover, the argument given by Cantor in his proof can be easily modified to give a direct proof for the uncountability of the reals. This is what we do presently.

Theorem: \mathbb{R} is uncountable.

Proof: We will accept that every real number x has a decimal expansion, x=N.x_1x_2\cdots and it is uniquely determined by x if one agrees never to terminate the expansion with an infinite string of 9‘s. The proof is based on contradiction, and so we suppose that \mathbb{R} is not uncountable. Then it must be that \mathbb{R} is countable and we can make a list of all the elements of \mathbb{R} along with their decimal expansion as follows:


We now consider the real number 0.y_1y_2y_3\cdots which is defined by y_i=1 if x_{ii}\ne 1 and y_i=2 otherwise. It is clear that this number differs from every x_i in the list in at least the i‘th position. Hence this list is incomplete, a contradiction.\Box

Leave a comment

Filed under Real Analysis, Set theory

The Cantor Set

This is a post regarding the basic properties of the Cantor set.

We start with the definition: The Cantor set is obtained by first constructing a sequence (C_n) of closed sets and then taking the intersection of the sets in this sequence. The sequence is constructed as follows:

1. Start with [0,1] and remove the middle third, i.e. the interval (\frac{1}{3},\frac{2}{3}) to obtain the set C_1=[0,\frac{1}{3}]\cup[\frac{2}{3},1]. So C_1 contains 2 disjoint closed intervals each of length \frac{1}{3}.

2. Next remove the middle third of each of these two intervals leaving C_2=[0,\frac{1}{9}]\cup[\frac{2}{9},\frac{1}{3}]\cup[\frac{2}{3},\frac{7}{9}]\cup[\frac{8}{9},1] consisting of 2^2 disjoint closed intervals each of length \frac{1}{3^2}.

3. Assuming C_n has been constructed and consists of 2^n disjoint closed intervals each of length \frac{1}{3^n}, remove the middle thirds of all these intervals to obtain C_{n+1} consisting of 2^{n+1} disjoint closed intervals each of length \frac{1}{3^{n+1}}.

By induction we have our sequence (C_n).

Now, we define the Cantor set C as C=\cap_{n=1}^\infty C_n.

We now describe the properties of the Cantor set:

1. In the usual topology on \mathbb{R} the Cantor set is closed (being an intersection of closed sets). Moreover since it is bounded so by the Heine Borel theorem it is compact. What’s extraordinary is that every compact metric space is a continuous image of the Cantor set! The proof may be found here.

2. The Cantor set is uncountable. We prove this by considering the ternary expansion of all numbers in [0,1]. All such numbers may have the form 0.a_1a_2\cdots with a_i\in\{0,1,2\} (including 1=0.22\cdots). We claim that x\in C iff x has a ternary expansion of the form 0.a_1a_2\cdots with a_i\in\{0,2\}.

To prove this consider the construction of C through the ternary lens: In ternary the construction of C_1 involves removing all numbers in (0.011\cdots,0.2) from [0.00\cdots,0.22\cdots]. This shows that in C_1 every number has a ternary expansion of the form 0.a_1a_2\cdots with a_1\in\{0,2\} and a_i\in\{0,1,2\}\forall i>1. Conversely every number of this form is in C_1. Likewise the construction of C_2 involves removing all numbers in (0.0011\cdots,0.02) and in (0.2011\cdots,0.22) from C_1. This shows that in C_2 every number has a ternary expansion of the form 0.a_1a_2\cdots with a_1,a_2\in\{0,2\} and a_i\in\{0,1,2\}\forall i>2. Again conversely every number of this form is in C_2. Continuing in this way we can conclude that x\in C_n iff x=0.a_1\cdots a_na_{n+1}a_{n+2}\cdots with a_1\cdots a_n\in\{0,2\} and a_i\in\{0,1,2\}\forall i>n. Hence by definition of C our claim is established.

Now assume that C is countable and has been enumerated in the list described below:


Here each d_i^{(j)}\in \{0,2\}.

Now let x=0.a_1a_2a_3\cdots where a_i=0 if d_i^{(i)}=2 and a_i=2 otherwise. Then x\notin C despite being of the requisite ternary form. Hence C is uncountable.

3. The Cantor set has Lebesgue measure zero. We define a set A to be of Lebesgue measure zero if \forall \epsilon>0\exists a sequence of intervals (I_n) such that A\subset\cup_{n=1}^\infty I_n and \sum_{n=1}^\infty l(I_n)<\epsilon. Here if the end points of I_n are a_n and b_n (with a_n\le b_n) then l(I_n)=b_n-a_n. (The proof that this definition is consistent with the Lebesgue measure may be found in this book.) Now given \epsilon>0 choose n so that (\frac{2}{3})^n<\epsilon. Since C\subset C_n and the total length of C_n is (\frac{2}{3})^n, so it is clear that C is a set of Lebesgue measure zero. Together with point 2, we see that this establishes the existence of uncountable sets of zero measure!

Leave a comment

Filed under Measure Theory, Real Analysis, Set theory

The axiom of choice

This post is about the axiom of choice, arguably the most famous and the most controversial of the axioms underpinning the set theoretic foundations of mathematics today. The axiom of choice is about making choices: choosing an element each from an arbitrary collection of nonempty sets. The axiom basically says that this can always be done, although it provides no recipe for doing so.

Before formally presenting the axiom we state some definitions. The treatment is essentially taken from here.

Definition: A function I from a set \Lambda onto a set X is said to index the set X by \Lambda. The set \Lambda is called the index and X is the indexed set. If I(\lambda)=x, we write x_\lambda for I(\lambda).

Definition: Let X be a nonempty set indexed by a set \Lambda. The cartesian product of X is defined to be the set \Pi _{\lambda\in\Lambda}x_\lambda of all functions f with domain \Lambda and codomain \cup X=\cup_{x\in X}x, satisfying the condition f(\lambda)\in x_\lambda.

Let us consider an example before we go further. We let X=\{\{a,b\},\{c,d\}\} be indexed by \Lambda=\{1,2\} where 1\to \{a,b\}, 2\to \{c,d\}. Now there are four functions possible from \Lambda to \{a,b,c,d\} which satisfy the definition:

The function f_1 for which f(1)=a, f(2)=c.
The function f_2 for which f(1)=b, f(2)=c.
The function f_3 for which f(1)=a, f(2)=d.
The function f_4 for which f(1)=b, f(2)=d.

The collection \{f_1,f_2,f_3,f_4\} is the cartesian product. Now intuitively the behavior of the function f_1 can be captured by simply writing (a,c) since the fact that the first coordinate corresponds to a and the second to c indicates the function 1\to a,2\to c. Similarly the behavior of the function f_2,f_3,f_4 are also captured by (b,c),(a,d),(b,d) respectively. Note that a particular ordered pair always corresponds to a unique function f_i. So the cartesian product may be simply represented as \{(a,c),(b,c),(a,d),(b,d)\}.

In fact if \Lambda has at most countable number of elements within it, we can always consider it as an subset of the form \{1,2,3\cdots\} (which may or may not terminate), and index appropriately. Then the cartesian product obtained can be equivalently thought of as a collection of n-tuples or infinite sequences where the ith coordinate corresponds to the image of i in the indexing. For example if X=\{\{0,1\}\} and \Lambda=\mathbb{N} then all zero-one sequences correspond to the requisite functions and their collection forms the cartesian product.

Definition: Any function f which satisfies the conditions required to make it a member of a cartesian product is called a choice function.

We now give three formulations of the axiom of choice:

Axiom of choice:

1. For every nonempty set whose elements are nonempty sets and are indexed by a set I there exists a choice function.

2. If \{a_i\} is a family of nonempty sets, indexed by a nonempty set I, then there exists a family \{x_i\} with i\in I such that x_i\in a_i for each i\in I. (This corresponds roughly to our informal comment about choices in the first paragraph.)

3. The cartesian product of a nonempty collection of nonempty sets is nonempty.

Theorem: The three formulations are equivalent.

Proof: 1\Rightarrow 2: Let \{a_i\}_{i\in I} be a family of sets, indexed by the nonempty set I. Let S=\{a_i\mid i\in I\}. (Note that the elements of S are themselves sets.) We index S by itself, and consider the cartesian product of S. By the assumption and by I\ne \emptyset, the set S is a nonempty set of nonempty sets. Hence owing to (1) there exists a choice function, i.e. a function f\colon S\to \bigcup S such that f(x)\in x for all x \in S. For i\in I let x_i=f(a_i). Then \{x_i\}_{i\in I} is a family with the required properties.

2\Rightarrow 1: Let S=\{a_i:i\in I\} be a nonempty set whose elements are themselves nonempty where S is indexed by I. By (2) we have a family x_i with x_i\in a_i. We define our choice function to be f:I\to\cup S where f(i)=x_i.

1\Leftrightarrow 3: This is obvious by the definitions.\Box

Due to this result any of these three statements may be referred to as the axiom of choice. In fact, there are many other equivalent statements. The term ZFC is usually used to denote the Zermelo Fraenkel axioms together with the axiom of choice, and these axioms form a basis of most of the mathematics done today.

Why is this axiom so relevant? Firstly if one refuses to accept this axiom, one loses a lot of interesting results. Secondly, many times assuming the axiom of choice is so much easier. Many results such as the Cantor-Berstein-Schroeder theorem were proved originally using the axiom of choice, although later a “choice-free” proof was discovered. Similarly the axiom of choice makes undergraduate analysis easier by enabling one to say that f(x) is continuous at x=a if f(a_n) tends to f(a) for each sequence (a_n) that tends to a.

Are there any negative issues with the axiom of choice? Well, firstly it is not constructive and doesn’t actually tell us the choices to make. It just guarantees that some choice exists. In one formulation it tells us that the real numbers can be well ordered but it gives us no recipe of doing so. In other words, it doesn’t actually construct a well order of the reals for us. Secondly, some counter intuitive results such as the Banach Tarsky paradox are true in ZFC and many mathematicians find this troubling and “fishy”.

Thirdly, and probably most importantly, there is history. After it was introduced (in another form) in 1904 by Zermelo, many mathematicians wondered whether or not the axiom was consistent with ZF. In other words people asked whether accepting it would lead to contradictions. Many decades passed before in 1938 Godel showed that ZFC is indeed consistent. Another doubt was whether the axiom of choice was independent of ZF. It was only by 1963 that Cohen proved that the axiom of choice cannot derived from ZF and so together with Godel’s result this implied that the axiom of choice is independent of ZF. During this long period from 1904 to 1963, many bizzare results such as the Banach Tarsky paradox, existence of non-Lebesgue measurable sets etc were found. All this sowed a suspicion in people’s minds as to whether this axiom was “false” at some fundamental level. As a result mathematicians began to pay close attention to results in which the axiom was used, and to try and device proofs which avoided the axiom as much as possible. This habit somewhat lessened after the independence results, but it has not been dropped by everyone even today.

The fact of the matter is that while rejecting the axiom of choice leads to many weird results, so does accepting it. See the chapters entitled “Disasters without Choice” and “Disasters with Choice” in this book for details. However in the long run, the advantages of accepting the axiom outweigh the advantages of rejecting it primarily because the sheer quantity of beautiful and aesthetic results flowing from the axiom of choice provide substance to mathematics as a whole.


Filed under Miscellaneous, Set theory

The Zermelo Fraenkel axioms

In this post we plan to discuss the Zermelo Fraenkel axioms of set theory (a term which we will abbreviate as ZF). These axioms are used by most mathematicians as the pillars on which theorems and lemmas are build. Although the axioms are named after the mathematicians Zermelo and Fraenkel, contributions from Skolem too helped in evolving the consensus on what the basic axioms ought to be. A detailed history of the evolution of these axioms and of set theory in general may be found here.

We will take a naive approach towards the basic underlying principles of logic used to illustrate the axioms. Our overall goal is to give an essential idea, and not describe a rigorous formal system. Note that in ZF terms always denote sets. There is no consensus on the exact wordings and the sequence of the axioms. We are following the treatment given in this book.

1. Axiom of extensionality: If a and b have the same elements then a=b.

It is easy to prove using the underlying logic that equal sets have the same elements: Suppose x=y. If t\in x then by substitution t\in y. So all elements of x are in y. Similarly all elements of y are in x. Hence both x and y have the same elements. The above axiom is the converse. Therefore we can conclude that a=b iff a and b have the same elements. This also leads to the following theorem:

Theorem: a=b\iff (a\subset b\land b\subset a).
Proof: Suppose that a=b. Now for x\in a since it follows that x\in a (this is not a misprint) so a\subset a. By substitution we have a\subset b. Similarly we have b\subset a. Conversely if a\subset b and b\subset a then clearly \forall x, x\in a\iff x\in b and so a=b.\Box

2. Axiom of the empty set: A set \emptyset having no elements exists.

This axiom may be thought of as redundant in certain versions of ZF, where it may be deduced from other axioms. By a little piece of vacuous reasoning it follows that the empty set is a subset of every set.

3. Axiom of pairing: For every x and y, the pair set \{x,y\} exists.

We can prove the following theorem with the aid of this axiom:

Theorem: For every x, the set \{x\} exists.

Proof: Substituting x for y in the above axiom yields a set which by the axiom of extensionality equals \{x\}.\Box

It is also now clear that if y\subset\{x\} then y=\emptyset or y=\{x\}. For if y\ne \emptyset and t\in y then t\in \{x\} and as \{x\} is a singleton so t=x. But then x\in y and so \{x\}\subset y following which y=\{x\}.

4. Axiom of union: The elements of \cup X are the elements of the elements of X.

Hence the elements of \cup \{x,y\} are not x and y. Instead they are the elements of x and y. In other words \cup\{x,y\} is the familiar set x\cup y.

5. Axiom of power set: The elements of \mathcal{P}(X) are precisely the subsets of X.

We do not define other set functions, like intersection and Cartesian product since their existence follows from the next axiom.

6. Axiom of separation: If \psi(x) is a formula and X is a set then the set \{x\in X: \psi(x)\} exists.

It is now an easy matter to define intersection of the sets in X as Y=\cap X where Y=\{x\in\cup X:\forall t(t\in X\rightarrow x\in t)\}, provided X\ne \emptyset. To define the Cartesian product note that we define the ordered pair (a,b) as \{\{a\},\{a,b\}\} and then X\times Y=\{(x,y):x\in X\land y\in Y\}. We cannot however define complement because of the following theorem. It says that there is no universe (a set which has everything).

Theorem: There is no set U such that \forall x(x\in U).

Proof: Suppose that there exists such a set U. Now, by the axiom of separation we have the set X=\{z\in U:z\notin z\}. Now either we have X\in X or X\notin X. In case X\in X then by definition of X we have X\notin X. In case X\notin X then again by definition of X we have X\in X. Both cases lead to contradictions.\Box

In naive set theory where any definable collection (such as \{z:z\notin z\}) was understood as a set there is no way to clear the anomaly which follows from the above proof. One of the reasons for the abandonment of naive set theory in favor of ZF was in fact this paradox. In ZF the set \{z:z\notin z\} does not make sense as the axiom of separation specifically asks for a set X from which to pick the element z from. As we don’t have a universal set in ZF so the paradox is entirely sidestepped. The fact that there is no universal set in ZF has appeared intuitively undesirable to some, and there exist set theories which include a universal set.

It is interesting to note that although the paradox is attributed to Russell, it was Zermelo who had originally discovered it. Unfortunately he didn’t publish this idea and Russell independently discovered the paradox a year later and published it.

7. Axiom of infinity There is a set Z such that:
(i) \emptyset\in Z.
(ii) If x\in Z then \{x\}\in Z.

This axiom guarantees the existence of an infinite set. It is present to ensure that ZF is powerful enough to contain the collection of all numbers (a concept which we do not wish to make precise at this juncture). Once we have an infinite set we can also make precise the idea of a smallest infinite set as given by this axiom. This is illustrated by the following theorem:

Theorem: Let Z be the set provided by the above axiom. Let X be the set of subsets of Z that satisfy properties (i) and (ii) above. Let Z_0=\cap X. Then:
(a) X exists.
(b) Z_0 exists.
(c) If any set T satisfies properties (i) and (ii) of the axiom of infinity then Z_0\subset T.

Proof: (a) The set \{x\in \mathcal{P}(Z): x satisfies (i) and (ii) \} exists by the axiom of separation and equals X.
(b) The intersection is well defined if X\ne \emptyset which it is as Z\in X.
(c) Note that Z_0\cap T satisfies (i) and (ii) of the axiom of infinity and is in X so by definition of intersection Z_0\subset Z_0\cap T. If we suppose Z_0\nsubseteq T then \exists z_0\in Z_0 such that z_0\notin T following which z_0\notin Z_0\cap T. This contradicts Z_0\subset Z_0\cap T.\Box

We may define Z_0 as the set of whole numbers and show that the members of this set match our intuitive understanding of the members of \{0,1,2,3...\}. However such a discussion is presently out of our scope.

We have used the notion of the empty set in the statement of the axiom of infinity. The axiom of infinity can be rephrased so as not to actually assume that the empty set exists. We could have stated (i) as \forall y(\exists no x\in y\rightarrow y\in Z). Then the axiom of infinity a priori only asserts the existence of a set meeting this description, without actually asserting that it contains anything. Once we had got the set Z we could have used the axiom of separation to define a set \emptyset=\{x\in Z:x\ne x\}. Hence in this formulation the axiom of the empty set would have ceased to be an axiom and would have become a theorem of ZF. However, only a superficial difference exists in these two versions of ZF and we find no significant advantage in choosing one over the other.

8. Axiom of replacement: Informally this axiom states that ranges of functions restricted to sets exists. More formally, if \psi(x,y) is a formula such that \forall x\forall y\forall z(\psi(x,y)\land \psi(x,z)\rightarrow y=z) then for every set D there is a set R such that \forall y(y\in R\leftrightarrow \exists x(x\in D\land \psi(x,y))). The formula \psi(x,y) can now used to define the function governed by the relation f(x)=y as per the usual definition.

This axiom was not part of Zermelo’s original axioms and was added later, since it was found to aid immensely in establishing certain intuitively supported results in set theory. It is rarely used directly though, (one example is here), and within set theory its use is primarily in proving that certain large sets (such as those which are uncountable) theoretically exist in ZF.

9. Axiom of Regularity: This axiom states that every non empty set x contains an element y such that x\cap y=\emptyset.

This axiom disallows a set being a member of itself because of the following result:

Theorem: For all x, x\notin x.

Proof: By way of contradiction assume x\in x. By the theorem proved in the discussion on the axiom of pairing the set \{x\} exists. By the axiom of regularity there is an element y\in \{x\} such that y\cap\{x\}=\emptyset. Since \{x\} is a singleton so x=y and so x\cap\{x\}=\emptyset. This is a contradiction as by our hypothesis x\in x\cap\{x\}.\Box

Just like the previous axiom it is also a not obvious why this axiom must be true a priory. It is arguably the least useful ingredient of ZF, since most results in the branches of mathematics based on set theory hold even in the absence of regularity. However within mathematics we hardly ever encounter sets which contain themselves as a member, and this axiom is essential in outlawing such behavior.

That ends the list of the Zermelo Fraenkel axioms. One more axiom, probably the most famous of all, called the axiom of choice is usually added to ZF, and the resulting list of axioms is then referred to as ZFC.


Filed under Set theory