# Category Archives: Set theory

## The Inclusion Exclusion Principle

The Inclusion Exclusion Principle is one of most fundamental results in combinatorics. It deals with counting through a seive technique: i.e. we first overcount the quantity to be enumerated, then we try to balance out the overcounting by subtracting the extra. We may or may not subtract more then what is needed and so we count again the extra bits. Continuing in this way we “converge” at the correct count.

An easy example is $|A\cup B\cup C|=|A|+|B|+|C|-|A\cap B|-|A\cap C|-|B\cap C|+|A\cap B\cap C|$ where $A,B,C$ are three finite sets.

Here we start off by summing the number of elements in $A,B,C$ seperately. The overcount is compensated by subtracting the number of elements in $|A\cap B|$,$|A\cap C|$, $|B\cap C|$. We actually compensate for a bit more then needed and so to arrive at the correct count we must add $|A\cap B\cap C|$.

Our goal here is to prove the inclusion-exclusion principle and then to look at a few corollaries. We first establish two lemmas.

Lemma 1: If $X$ is any set containing $n$ elements and $k$ is any field, then the set of all functions $f:X\to k$ is an $n$-dimensional vector space $V$ over $k$ with the naturally induced operations.

Proof: Let $X=\{x_1,\cdots,x_n\}$. It is easy to see that $V$ together with the operations $f+g$, defined by $(f+g)(x_i)=f(x_i)+g(x_i)$ and $(\alpha f)(x_i)=\alpha (f(x_i))$ for all $x_i\in X$ is a vector space.

We now exhibit a basis for $V$ consisting of $n$ elements. For all $x_i\in X$ let $f_{x_i}$ be the function defined as $f_{x_i}(y)=\begin{cases} \hfill 1 \hfill & \text{ if }y=x_i \\ \hfill 0 \hfill & \text{ otherwise} \\ \end{cases}$. Now, we claim that $B=\{f_{x_i}:x_i\in X\}$ is a basis. Clearly it is spanning as for any $f\in V$ we have $f=\sum_{x_i\in X}f(x_i)f_{x_i}$. It is linearly independent as $\sum_{x_i\in X} \alpha_i f_{x_i}=0$ means that for any $j$, with $1\le j\le n$, we have $\sum_{x_i\in X} \alpha_i f_{x_i}(x_j)=0$, i.e. $\alpha_j=0$. This completes the proof.$\Box$

Lemma 2: If $m\ge 0$ then $\sum_{i \mathop = 0}^m \left({-1}\right)^i \binom m i = \delta_{0m}$.

Proof: If $m>0$ put $x=1, y=-1$ in the binomial theorem $(x+y)^m=\sum_{i=0}^mx^{m-i}y^i$. If $m=0$, then the sum on the left reduces to only one term: $(-1)^0\binom 0 0$. This is clearly $1$.$\Box$

We now prove the Inclusion Exclusion principle. This theorem in its purest form, is simply a formula for an inverse of a linear operator. The theorem is as follows:

Theorem 3: Let $S$ be a set with $n$ elements. Let $V$ be the $2^n$ dimensional vector space of all functions $f:\mathcal{P}(S)\to k$, where $k$ is some field and $\mathcal{P}(S)$ is the power set of $S$. Let $\phi:V\to V$ be the linear operator defined by

$\displaystyle\phi f(T)=\sum_{Y\supseteq T}f(Y)\text{ for all } T\subseteq S.$

Then $\phi^{-1}$ exists and is given by:

$\displaystyle\phi^{-1} f(T)=\sum_{Y\supseteq T}(-1)^{|Y-T|}f(Y)\text{ for all } T\subseteq S.$

Proof: To show $\phi^{-1}$ as given above is indeed the inverse it suffices to show that $\phi^{-1}\phi(f)=f$ for all $f\in V$.

Let $f\in V$. Then,

$\displaystyle\phi^{-1}\phi f(T)=\sum_{Y\supseteq T}(-1)^{|Y-T|}\phi f(Y)=\sum_{Y\supseteq T}(-1)^{|Y-T|}\sum_{Z\supseteq Y}f(Z)=\sum_{Z\supseteq T}\Big(\sum_{Z\supseteq Y\supseteq T}(-1)^{|Y-T|}\Big)f(Z)$

Now fix $Z,T$ and let $m=|Z-T|$. Consider $\displaystyle\sum_{Z\supseteq Y\supseteq T}(-1)^{|Y-T|}$. Any $Y$ is obtained by choosing some elements out of $Z-T$ which has $m$ elements, and taking the union of such elements with $T$. So for every $i$, with $0\le i\le m$, there are exactly $\binom m i$ ways of choosing a $Y$, which has $i+|T|$ elements. Any such $Y$ also has $i$ elements in $|Y-T|$. So $\displaystyle\sum_{Z\supseteq Y\supseteq T}(-1)^{|Y-T|}=\sum_{i=0}^m(-1)^i\binom m i=\delta_{0m}$. This when substituted in the expression for $\phi^{-1}\phi f(T)$ shows that $\phi^{-1}\phi f(T)=f(T)$, which proves the theorem.$\Box$

We now discuss some corollaries of the Inclusion Exclusion principle. Let $S$ be a set of properties that elements of a given set $A$ may or may not have. For any $T\subseteq S$, let $A'_T\subseteq A$ be those elements which have exactly the properties in $T$ and no others. We define a function $f_=:S\to \mathbb{R}$ such that $f_=(T)=|A'_T|$. Similarly, for any $T\subseteq S$, let $A''_T\subseteq A$ be those elements which have at least the properties in $T$. We define a function $f_\ge:S\to \mathbb{R}$ such that $f_=(T)=|A''_T|$. It is clear that $\displaystyle f_\ge(T)=\sum_{Y\supseteq T}f_=(T)$ for any $T\subseteq S$, and so by the Inclusion Exclusion principle we conclude that

Corollary 4: For any $T\subseteq S$, we have $\displaystyle f_=(T)=\sum_{Y\supseteq T}(-1)^{|Y-T|}f_\ge (Y)$.

In particular we have $\displaystyle f_=(\emptyset)=\sum_{Y\subseteq S}(-1)^{|Y|}f_\ge (Y)$, which gives us a formula for the number of elements having none of the properties.$\Box$

In the above corollary we think of $f_\ge(T)$ as the first approximation to $f_=(T)$. So we “include” that much “quantity” in our initial count. From this we subtract all terms of the type $f_=(Y)$ where $Y$ has just one extra element then $T$. Thus we “exclude” that much “quantity” from our count. This gives a better approximation. Next we add all terms of the type $f_=(Y)$ where $Y$ has two extra elements then $T$, and so on. This is the reason behind the terminology inclusion-exclusion.

We now discuss another corollary. Let $A$ be a finite set and let $A_1,\cdots A_n$ be some of its subsets. We define a set of properties $S=\{P_1,\cdots,P_n\}$ which elements of $A$ may or may not enjoy as follows: For any $i$, with $1\le i\le n$, $x\in A$ satisfies the property $P_i$ if and only if $x\in A_i$. Also for any $I\subseteq [n]$, let $\emptyset\ne A_I=\cap_{i\in I}A_i$ be the set of elements which have at least the properties $P_i$ for $i\in I$. Define $A_\emptyset$ to be $A$. By Corollary 4, $\displaystyle f_=(\emptyset)=\sum_{I'\subseteq S}(-1)^{|I'|}f_\ge (I')=\sum_{I\subseteq [n]}(-1)^{|I|}|A_I|$ where in the second equality we correspond each subset of properties with a subset of $[n]$. We summarize this as

Corollary 5: Let $A$ be a finite set and let $A_1,\cdots A_n$ be some of its subsets. For any $\emptyset\ne I\subseteq [n]$, let $A_I=\cap_{i\in I}A_i$ and let $A_\emptyset=A$. The number of elements in $A-\cup_{i=1}^n A_i$ is given by $\displaystyle\sum_{I\subseteq [n]}(-1)^{|I|}|A_I|=|A|+\sum_{\emptyset\ne I\subseteq [n]}(-1)^{|I|}|A_I|$.$\Box$

A further special case is obtained by considering any finite sets $A_1,A_2,\cdots,A_n$ and letting $A=\cup_{i=1}^nA_i$. Then the above corollary translates to $\sum_{I\subseteq [n]}(-1)^{|I|}|A_I|=0$. Considering the case of $I=\emptyset$ seperately, we see that $|A|+\displaystyle\sum_{\emptyset\ne I\subseteq [n]}(-1)^{|I|}|\cap_{i\in I}A_i|=0$. This easily yields the following corollary.

Corollary 6: Let $A_1,A_2\cdots,A_n$ be any finite sets. For any $\emptyset\ne I\subseteq [n]$, let $A_I=\cap_{i\in I}A_i$ and let $A_\emptyset=A$. Then $\displaystyle|A_1\cup A_2\cup\cdots\cup A_n|=\displaystyle\sum_{\emptyset\ne I\subseteq [n]}(-1)^{|I|-1}|\cap_{i\in I}A_i|$.$\Box$

Now by grouping terms involving the same size of $I$ we can restate both Corollary 5 and 6 as follows.

Corollary 7: Let $A$ be a finite set and let $A_1,\cdots A_n$ be some of its subsets. The number of elements in $A-\cup_{i=1}^n A_i$ is given by
$\displaystyle |A|+\sum_{k=1}^n(-1)^{k}\sum_{1\le i_1 < i_2 < \cdots .$\Box$

Corollary 8: If $A_1,A_2\cdots,A_n$ are any finite sets then

$\displaystyle|A_1\cup A_2\cup\cdots\cup A_n|=\sum_{k=1}^n(-1)^{k-1}\sum_{1\le i_1< i_2<\cdots.$\Box$

Corollaries 5 to 8 are often referred to as the principle of inclusion-exclusion themselves as in combinatorial settings they are the ones most often used. A further simplified version can also be derived from them when the intersection of any $k$ distinct sets $A_i$ always has the same cardinality $N_k$. In that case we only need to multiply $N_k$ with the number of such ways to select the $k$ sets to get the value of the inner sums in Corollaries 7 and 8.

Corollary 9: Let $A$ be a finite set and let $A_1,\cdots A_n$ be some of its subsets. Suppose that for any $k$ with $1\le k\le n$ there exists a natural number $N_k$ so that for any $\{i_1,\cdots, i_k\}\subseteq [n]$ we have $A_{i_1}\cap\cdots\cap A_{i_k}=N_k$. Then the number of elements in $A-\cup_{i=1}^n A_i$ is given by $|A|+\sum_{k=1}^n(-1)^k\binom{n}{k}N_k$ and $|A_1\cup A_2\cup\cdots\cup A_n|=\sum_{k=1}^n(-1)^{k-1}\binom{n}{k}N_k$.$\Box$

Filed under Combinatorics, Set theory

## Real numbers are uncountable

One of the slickest proofs of all time is using Georg Cantor‘s 1891 diagonal argument in proving that the real numbers $\mathbb{R}$ constitute an uncountable set, that is, they cannot be put in a bijective correspondence with $\mathbb{N}$. Strictly speaking, Cantor original paper only proved that the set of all $0,1$-sequences is uncountable. However since it is easy to argue that the set of all such sequences and $\mathbb{R}$ have the same cardinality, indirectly we get a proof for the uncountable nature of the reals. Moreover, the argument given by Cantor in his proof can be easily modified to give a direct proof for the uncountability of the reals. This is what we do presently.

Theorem: $\mathbb{R}$ is uncountable.

Proof: We will accept that every real number $x$ has a decimal expansion, $x=N.x_1x_2\cdots$ and it is uniquely determined by $x$ if one agrees never to terminate the expansion with an infinite string of $9$‘s. The proof is based on contradiction, and so we suppose that $\mathbb{R}$ is not uncountable. Then it must be that $\mathbb{R}$ is countable and we can make a list of all the elements of $\mathbb{R}$ along with their decimal expansion as follows:

$x_1=N_1.x_{11}x_{12}x_{13}\cdots\\x_2=N_2.x_{21}x_{22}x_{23}\cdots\\x_3=N_3.x_{31}x_{32}x_{33}\cdots\\\vdots$

We now consider the real number $0.y_1y_2y_3\cdots$ which is defined by $y_i=1$ if $x_{ii}\ne 1$ and $y_i=2$ otherwise. It is clear that this number differs from every $x_i$ in the list in at least the $i$‘th position. Hence this list is incomplete, a contradiction.$\Box$

Filed under Real Analysis, Set theory

## The Cantor Set

This is a post regarding the basic properties of the Cantor set.

We start with the definition: The Cantor set is obtained by first constructing a sequence $(C_n)$ of closed sets and then taking the intersection of the sets in this sequence. The sequence is constructed as follows:

1. Start with $[0,1]$ and remove the middle third, i.e. the interval $(\frac{1}{3},\frac{2}{3})$ to obtain the set $C_1=[0,\frac{1}{3}]\cup[\frac{2}{3},1]$. So $C_1$ contains $2$ disjoint closed intervals each of length $\frac{1}{3}$.

2. Next remove the middle third of each of these two intervals leaving $C_2=[0,\frac{1}{9}]\cup[\frac{2}{9},\frac{1}{3}]\cup[\frac{2}{3},\frac{7}{9}]\cup[\frac{8}{9},1]$ consisting of $2^2$ disjoint closed intervals each of length $\frac{1}{3^2}$.

3. Assuming $C_n$ has been constructed and consists of $2^n$ disjoint closed intervals each of length $\frac{1}{3^n}$, remove the middle thirds of all these intervals to obtain $C_{n+1}$ consisting of $2^{n+1}$ disjoint closed intervals each of length $\frac{1}{3^{n+1}}$.

By induction we have our sequence $(C_n)$.

Now, we define the Cantor set $C$ as $C=\cap_{n=1}^\infty C_n$.

We now describe the properties of the Cantor set:

1. In the usual topology on $\mathbb{R}$ the Cantor set is closed (being an intersection of closed sets). Moreover since it is bounded so by the Heine Borel theorem it is compact. What’s extraordinary is that every compact metric space is a continuous image of the Cantor set! The proof may be found here.

2. The Cantor set is uncountable. We prove this by considering the ternary expansion of all numbers in $[0,1]$. All such numbers may have the form $0.a_1a_2\cdots$ with $a_i\in\{0,1,2\}$ (including $1=0.22\cdots$). We claim that $x\in C$ iff $x$ has a ternary expansion of the form $0.a_1a_2\cdots$ with $a_i\in\{0,2\}$.

To prove this consider the construction of $C$ through the ternary lens: In ternary the construction of $C_1$ involves removing all numbers in $(0.011\cdots,0.2)$ from $[0.00\cdots,0.22\cdots]$. This shows that in $C_1$ every number has a ternary expansion of the form $0.a_1a_2\cdots$ with $a_1\in\{0,2\}$ and $a_i\in\{0,1,2\}\forall i>1$. Conversely every number of this form is in $C_1$. Likewise the construction of $C_2$ involves removing all numbers in $(0.0011\cdots,0.02)$ and in $(0.2011\cdots,0.22)$ from $C_1$. This shows that in $C_2$ every number has a ternary expansion of the form $0.a_1a_2\cdots$ with $a_1,a_2\in\{0,2\}$ and $a_i\in\{0,1,2\}\forall i>2$. Again conversely every number of this form is in $C_2$. Continuing in this way we can conclude that $x\in C_n$ iff $x=0.a_1\cdots a_na_{n+1}a_{n+2}\cdots$ with $a_1\cdots a_n\in\{0,2\}$ and $a_i\in\{0,1,2\}\forall i>n$. Hence by definition of $C$ our claim is established.

Now assume that $C$ is countable and has been enumerated in the list described below:

$0.d_1^{(1)}d_2^{(1)}d_3^{(1)}\cdots$
$0.d_1^{(2)}d_2^{(2)}d_3^{(2)}\cdots$
$0.d_1^{(3)}d_2^{(3)}d_3^{(3)}\cdots$
$\cdots$

Here each $d_i^{(j)}\in \{0,2\}$.

Now let $x=0.a_1a_2a_3\cdots$ where $a_i=0$ if $d_i^{(i)}=2$ and $a_i=2$ otherwise. Then $x\notin C$ despite being of the requisite ternary form. Hence $C$ is uncountable.

3. The Cantor set has Lebesgue measure zero. We define a set $A$ to be of Lebesgue measure zero if $\forall \epsilon>0\exists$ a sequence of intervals $(I_n)$ such that $A\subset\cup_{n=1}^\infty I_n$ and $\sum_{n=1}^\infty l(I_n)<\epsilon$. Here if the end points of $I_n$ are $a_n$ and $b_n$ (with $a_n\le b_n)$ then $l(I_n)=b_n-a_n$. (The proof that this definition is consistent with the Lebesgue measure may be found in this book.) Now given $\epsilon>0$ choose $n$ so that $(\frac{2}{3})^n<\epsilon$. Since $C\subset C_n$ and the total length of $C_n$ is $(\frac{2}{3})^n$, so it is clear that $C$ is a set of Lebesgue measure zero. Together with point 2, we see that this establishes the existence of uncountable sets of zero measure!

Filed under Measure Theory, Real Analysis, Set theory

## The axiom of choice

This post is about the axiom of choice, arguably the most famous and the most controversial of the axioms underpinning the set theoretic foundations of mathematics today. The axiom of choice is about making choices: choosing an element each from an arbitrary collection of nonempty sets. The axiom basically says that this can always be done, although it provides no recipe for doing so.

Before formally presenting the axiom we state some definitions. The treatment is essentially taken from here.

Definition: A function $I$ from a set $\Lambda$ onto a set $X$ is said to index the set $X$ by $\Lambda$. The set $\Lambda$ is called the index and $X$ is the indexed set. If $I(\lambda)=x$, we write $x_\lambda$ for $I(\lambda)$.

Definition: Let $X$ be a nonempty set indexed by a set $\Lambda$. The cartesian product of $X$ is defined to be the set $\Pi _{\lambda\in\Lambda}x_\lambda$ of all functions $f$ with domain $\Lambda$ and codomain $\cup X=\cup_{x\in X}x$, satisfying the condition $f(\lambda)\in x_\lambda$.

Let us consider an example before we go further. We let $X=\{\{a,b\},\{c,d\}\}$ be indexed by $\Lambda=\{1,2\}$ where $1\to \{a,b\}, 2\to \{c,d\}$. Now there are four functions possible from $\Lambda$ to $\{a,b,c,d\}$ which satisfy the definition:

The function $f_1$ for which $f(1)=a, f(2)=c$.
The function $f_2$ for which $f(1)=b, f(2)=c$.
The function $f_3$ for which $f(1)=a, f(2)=d$.
The function $f_4$ for which $f(1)=b, f(2)=d$.

The collection $\{f_1,f_2,f_3,f_4\}$ is the cartesian product. Now intuitively the behavior of the function $f_1$ can be captured by simply writing $(a,c)$ since the fact that the first coordinate corresponds to $a$ and the second to $c$ indicates the function $1\to a,2\to c$. Similarly the behavior of the function $f_2,f_3,f_4$ are also captured by $(b,c),(a,d),(b,d)$ respectively. Note that a particular ordered pair always corresponds to a unique function $f_i$. So the cartesian product may be simply represented as $\{(a,c),(b,c),(a,d),(b,d)\}$.

In fact if $\Lambda$ has at most countable number of elements within it, we can always consider it as an subset of the form $\{1,2,3\cdots\}$ (which may or may not terminate), and index appropriately. Then the cartesian product obtained can be equivalently thought of as a collection of n-tuples or infinite sequences where the ith coordinate corresponds to the image of $i$ in the indexing. For example if $X=\{\{0,1\}\}$ and $\Lambda=\mathbb{N}$ then all zero-one sequences correspond to the requisite functions and their collection forms the cartesian product.

Definition: Any function $f$ which satisfies the conditions required to make it a member of a cartesian product is called a choice function.

We now give three formulations of the axiom of choice:

Axiom of choice:

1. For every nonempty set whose elements are nonempty sets and are indexed by a set $I$ there exists a choice function.

2. If $\{a_i\}$ is a family of nonempty sets, indexed by a nonempty set $I$, then there exists a family $\{x_i\}$ with $i\in I$ such that $x_i\in a_i$ for each $i\in I$. (This corresponds roughly to our informal comment about choices in the first paragraph.)

3. The cartesian product of a nonempty collection of nonempty sets is nonempty.

Theorem: The three formulations are equivalent.

Proof: $1\Rightarrow 2$: Let $\{a_i\}_{i\in I}$ be a family of sets, indexed by the nonempty set $I$. Let $S=\{a_i\mid i\in I\}$. (Note that the elements of $S$ are themselves sets.) We index $S$ by itself, and consider the cartesian product of $S$. By the assumption and by $I\ne \emptyset$, the set $S$ is a nonempty set of nonempty sets. Hence owing to (1) there exists a choice function, i.e. a function $f\colon S\to \bigcup S$ such that $f(x)\in x$ for all $x \in S$. For $i\in I$ let $x_i=f(a_i)$. Then $\{x_i\}_{i\in I}$ is a family with the required properties.

$2\Rightarrow 1$: Let $S=\{a_i:i\in I\}$ be a nonempty set whose elements are themselves nonempty where $S$ is indexed by $I$. By (2) we have a family $x_i$ with $x_i\in a_i$. We define our choice function to be $f:I\to\cup S$ where $f(i)=x_i$.

$1\Leftrightarrow 3$: This is obvious by the definitions.$\Box$

Due to this result any of these three statements may be referred to as the axiom of choice. In fact, there are many other equivalent statements. The term ZFC is usually used to denote the Zermelo Fraenkel axioms together with the axiom of choice, and these axioms form a basis of most of the mathematics done today.

Why is this axiom so relevant? Firstly if one refuses to accept this axiom, one loses a lot of interesting results. Secondly, many times assuming the axiom of choice is so much easier. Many results such as the Cantor-Berstein-Schroeder theorem were proved originally using the axiom of choice, although later a “choice-free” proof was discovered. Similarly the axiom of choice makes undergraduate analysis easier by enabling one to say that $f(x)$ is continuous at $x=a$ if $f(a_n)$ tends to $f(a)$ for each sequence $(a_n)$ that tends to $a$.

Are there any negative issues with the axiom of choice? Well, firstly it is not constructive and doesn’t actually tell us the choices to make. It just guarantees that some choice exists. In one formulation it tells us that the real numbers can be well ordered but it gives us no recipe of doing so. In other words, it doesn’t actually construct a well order of the reals for us. Secondly, some counter intuitive results such as the Banach Tarsky paradox are true in ZFC and many mathematicians find this troubling and “fishy”.

Thirdly, and probably most importantly, there is history. After it was introduced (in another form) in 1904 by Zermelo, many mathematicians wondered whether or not the axiom was consistent with ZF. In other words people asked whether accepting it would lead to contradictions. Many decades passed before in 1938 Godel showed that ZFC is indeed consistent. Another doubt was whether the axiom of choice was independent of ZF. It was only by 1963 that Cohen proved that the axiom of choice cannot derived from ZF and so together with Godel’s result this implied that the axiom of choice is independent of ZF. During this long period from 1904 to 1963, many bizzare results such as the Banach Tarsky paradox, existence of non-Lebesgue measurable sets etc were found. All this sowed a suspicion in people’s minds as to whether this axiom was “false” at some fundamental level. As a result mathematicians began to pay close attention to results in which the axiom was used, and to try and device proofs which avoided the axiom as much as possible. This habit somewhat lessened after the independence results, but it has not been dropped by everyone even today.

The fact of the matter is that while rejecting the axiom of choice leads to many weird results, so does accepting it. See the chapters entitled “Disasters without Choice” and “Disasters with Choice” in this book for details. However in the long run, the advantages of accepting the axiom outweigh the advantages of rejecting it primarily because the sheer quantity of beautiful and aesthetic results flowing from the axiom of choice provide substance to mathematics as a whole.

Filed under Miscellaneous, Set theory

## The Zermelo Fraenkel axioms

In this post we plan to discuss the Zermelo Fraenkel axioms of set theory (a term which we will abbreviate as ZF). These axioms are used by most mathematicians as the pillars on which theorems and lemmas are build. Although the axioms are named after the mathematicians Zermelo and Fraenkel, contributions from Skolem too helped in evolving the consensus on what the basic axioms ought to be. A detailed history of the evolution of these axioms and of set theory in general may be found here.

We will take a naive approach towards the basic underlying principles of logic used to illustrate the axioms. Our overall goal is to give an essential idea, and not describe a rigorous formal system. Note that in ZF terms always denote sets. There is no consensus on the exact wordings and the sequence of the axioms. We are following the treatment given in this book.

1. Axiom of extensionality: If $a$ and $b$ have the same elements then $a=b$.

It is easy to prove using the underlying logic that equal sets have the same elements: Suppose $x=y$. If $t\in x$ then by substitution $t\in y$. So all elements of $x$ are in $y$. Similarly all elements of $y$ are in $x$. Hence both $x$ and $y$ have the same elements. The above axiom is the converse. Therefore we can conclude that $a=b$ iff $a$ and $b$ have the same elements. This also leads to the following theorem:

Theorem: $a=b\iff (a\subset b\land b\subset a)$.
Proof: Suppose that $a=b$. Now for $x\in a$ since it follows that $x\in a$ (this is not a misprint) so $a\subset a$. By substitution we have $a\subset b$. Similarly we have $b\subset a$. Conversely if $a\subset b$ and $b\subset a$ then clearly $\forall x, x\in a\iff x\in b$ and so $a=b$.$\Box$

2. Axiom of the empty set: A set $\emptyset$ having no elements exists.

This axiom may be thought of as redundant in certain versions of ZF, where it may be deduced from other axioms. By a little piece of vacuous reasoning it follows that the empty set is a subset of every set.

3. Axiom of pairing: For every $x$ and $y$, the pair set $\{x,y\}$ exists.

We can prove the following theorem with the aid of this axiom:

Theorem: For every $x$, the set $\{x\}$ exists.

Proof: Substituting $x$ for $y$ in the above axiom yields a set which by the axiom of extensionality equals $\{x\}$.$\Box$

It is also now clear that if $y\subset\{x\}$ then $y=\emptyset$ or $y=\{x\}$. For if $y\ne \emptyset$ and $t\in y$ then $t\in \{x\}$ and as $\{x\}$ is a singleton so $t=x$. But then $x\in y$ and so $\{x\}\subset y$ following which $y=\{x\}$.

4. Axiom of union: The elements of $\cup X$ are the elements of the elements of $X$.

Hence the elements of $\cup \{x,y\}$ are not $x$ and $y$. Instead they are the elements of $x$ and $y$. In other words $\cup\{x,y\}$ is the familiar set $x\cup y$.

5. Axiom of power set: The elements of $\mathcal{P}(X)$ are precisely the subsets of $X$.

We do not define other set functions, like intersection and Cartesian product since their existence follows from the next axiom.

6. Axiom of separation: If $\psi(x)$ is a formula and $X$ is a set then the set $\{x\in X: \psi(x)\}$ exists.

It is now an easy matter to define intersection of the sets in $X$ as $Y=\cap X$ where $Y=\{x\in\cup X:\forall t(t\in X\rightarrow x\in t)\}$, provided $X\ne \emptyset$. To define the Cartesian product note that we define the ordered pair $(a,b)$ as $\{\{a\},\{a,b\}\}$ and then $X\times Y=\{(x,y):x\in X\land y\in Y\}$. We cannot however define complement because of the following theorem. It says that there is no universe (a set which has everything).

Theorem: There is no set $U$ such that $\forall x(x\in U)$.

Proof: Suppose that there exists such a set $U$. Now, by the axiom of separation we have the set $X=\{z\in U:z\notin z\}$. Now either we have $X\in X$ or $X\notin X$. In case $X\in X$ then by definition of $X$ we have $X\notin X$. In case $X\notin X$ then again by definition of $X$ we have $X\in X$. Both cases lead to contradictions.$\Box$

In naive set theory where any definable collection (such as $\{z:z\notin z\}$) was understood as a set there is no way to clear the anomaly which follows from the above proof. One of the reasons for the abandonment of naive set theory in favor of ZF was in fact this paradox. In ZF the set $\{z:z\notin z\}$ does not make sense as the axiom of separation specifically asks for a set $X$ from which to pick the element $z$ from. As we don’t have a universal set in ZF so the paradox is entirely sidestepped. The fact that there is no universal set in ZF has appeared intuitively undesirable to some, and there exist set theories which include a universal set.

It is interesting to note that although the paradox is attributed to Russell, it was Zermelo who had originally discovered it. Unfortunately he didn’t publish this idea and Russell independently discovered the paradox a year later and published it.

7. Axiom of infinity There is a set $Z$ such that:
(i) $\emptyset\in Z$.
(ii) If $x\in Z$ then $\{x\}\in Z$.

This axiom guarantees the existence of an infinite set. It is present to ensure that ZF is powerful enough to contain the collection of all numbers (a concept which we do not wish to make precise at this juncture). Once we have an infinite set we can also make precise the idea of a smallest infinite set as given by this axiom. This is illustrated by the following theorem:

Theorem: Let $Z$ be the set provided by the above axiom. Let $X$ be the set of subsets of $Z$ that satisfy properties (i) and (ii) above. Let $Z_0=\cap X$. Then:
(a) $X$ exists.
(b) $Z_0$ exists.
(c) If any set $T$ satisfies properties (i) and (ii) of the axiom of infinity then $Z_0\subset T$.

Proof: (a) The set $\{x\in \mathcal{P}(Z): x$ satisfies (i) and (ii) $\}$ exists by the axiom of separation and equals $X$.
(b) The intersection is well defined if $X\ne \emptyset$ which it is as $Z\in X$.
(c) Note that $Z_0\cap T$ satisfies (i) and (ii) of the axiom of infinity and is in $X$ so by definition of intersection $Z_0\subset Z_0\cap T$. If we suppose $Z_0\nsubseteq T$ then $\exists z_0\in Z_0$ such that $z_0\notin T$ following which $z_0\notin Z_0\cap T$. This contradicts $Z_0\subset Z_0\cap T$.$\Box$

We may define $Z_0$ as the set of whole numbers and show that the members of this set match our intuitive understanding of the members of $\{0,1,2,3...\}$. However such a discussion is presently out of our scope.

We have used the notion of the empty set in the statement of the axiom of infinity. The axiom of infinity can be rephrased so as not to actually assume that the empty set exists. We could have stated (i) as $\forall y(\exists$ no $x\in y\rightarrow y\in Z)$. Then the axiom of infinity a priori only asserts the existence of a set meeting this description, without actually asserting that it contains anything. Once we had got the set $Z$ we could have used the axiom of separation to define a set $\emptyset=\{x\in Z:x\ne x\}$. Hence in this formulation the axiom of the empty set would have ceased to be an axiom and would have become a theorem of ZF. However, only a superficial difference exists in these two versions of ZF and we find no significant advantage in choosing one over the other.

8. Axiom of replacement: Informally this axiom states that ranges of functions restricted to sets exists. More formally, if $\psi(x,y)$ is a formula such that $\forall x\forall y\forall z(\psi(x,y)\land \psi(x,z)\rightarrow y=z)$ then for every set $D$ there is a set $R$ such that $\forall y(y\in R\leftrightarrow \exists x(x\in D\land \psi(x,y)))$. The formula $\psi(x,y)$ can now used to define the function governed by the relation $f(x)=y$ as per the usual definition.

This axiom was not part of Zermelo’s original axioms and was added later, since it was found to aid immensely in establishing certain intuitively supported results in set theory. It is rarely used directly though, (one example is here), and within set theory its use is primarily in proving that certain large sets (such as those which are uncountable) theoretically exist in ZF.

9. Axiom of Regularity: This axiom states that every non empty set $x$ contains an element $y$ such that $x\cap y=\emptyset$.

This axiom disallows a set being a member of itself because of the following result:

Theorem: For all $x$, $x\notin x$.

Proof: By way of contradiction assume $x\in x$. By the theorem proved in the discussion on the axiom of pairing the set $\{x\}$ exists. By the axiom of regularity there is an element $y\in \{x\}$ such that $y\cap\{x\}=\emptyset$. Since $\{x\}$ is a singleton so $x=y$ and so $x\cap\{x\}=\emptyset$. This is a contradiction as by our hypothesis $x\in x\cap\{x\}$.$\Box$

Just like the previous axiom it is also a not obvious why this axiom must be true a priory. It is arguably the least useful ingredient of ZF, since most results in the branches of mathematics based on set theory hold even in the absence of regularity. However within mathematics we hardly ever encounter sets which contain themselves as a member, and this axiom is essential in outlawing such behavior.

That ends the list of the Zermelo Fraenkel axioms. One more axiom, probably the most famous of all, called the axiom of choice is usually added to ZF, and the resulting list of axioms is then referred to as ZFC.