Tags: , , , ,

## Disjoint set data structure

A disjoint set data structure or union and find maintains a collection 𝑆 = { 𝑆1, 𝑆2, ⋯ , 𝑆𝑛 } of disjoint dynamic sets. Subsets are said to be disjoint if intersection between them is NULL. For example, set {1,2,3} and {4,5,6} are disjoint sets, but {1,2,3} and {1,3,5} are not as intersection is {1,3} which is not null. Another important thing about the disjoint set is that every set is represented by a member of that set called as representative.

Operations on this disjoint set data structure:
1. Make Set: Creates a new set with one element x, since the sets are disjoint, we require that x not already be in any of the existing sets.
2. Union: Merges two sets containing x and y let’s say Sx and Sy and destroys the original sets.
3.Find: Returns the representative of the set which element belongs to.

Let’s take an example and see how disjointed sets can be used to find the connected components of an undirected graph.

To start with, we will make a set for each vertex by using make-set operation.

```for each vertex v in G(V)
do makeSet(v)
```

Next process all the edges in the graph (u,v) and connect set(u) and set(v) if the representatives of the set which contains u and set which contains v are not same.

```for each edge (u,v) in 𝐺(E)
do if findSet(u) != findSet(v)
then union(u, v)
```

Once above preprocessing steps have run, then we can easily find answer if two vertices u and v are part of same connected component or not?

```boolean isSameComponent(u, v)
if findSet(u)==findSet(v)
return True
else
return False
```

To find how many components are there, we can look at how many disjoint sets are there and that will give us the number of connected components in a graph. Let’s take an example and see how it works. Below table shows the processing of each edge in the graph show figure above. Now, how can we implement sets and quickly do union and find operations? There are two ways to do it.

### Disjoint set representation using an array

Simple implementation of disjoint set is using an array which maintains their representative of element i in A[i]. To this implementation to work, it is must that all the element in the set are in range 0 to N-1 where N is size of the array.

Initially, in makeSet() operation, set `A[i]=i`, for each i between 0 and N-1 and create the initial versions of the sets. ```for (int i=0; i<N; i++) A[i] = i;
```

Union operation for the sets that contain integers u and v, we scan the array A and change all the elements
that have the value A[u] to have the value A[v]. For example, we if want to connect an edge between 1 and 2 in the above set, the union operation will replace A with A. Now, if want to add an edge between 3 and 1. In this case, u = 3 and v = 1. A = 3 and A = 1. So, we will replace all the indices of A where A[i] = 1. So final array looks like this. Similarly, if want to add an edge from 6 to 7. ```//change all elements from A[u] to A[v].
void union(int A[], int u, int v){
int temp = A[u];
for(int i=0; i<A.length; i++){
if(A[i] == temp)
A[i] = A[v];
}
}
```

findSet(v) operation returns the value of A[v].

```int findSet(int A[], int v){
return A[v]
}
```

The complexity of makeSet() operation is `O(n)` as it initializes the entire array. Union operation take every time `O(n)` operations if we have to connect n nodes, then it will be `O(n2)` operations. FindSet() operation has constant time complexity.

We can represent a disjoint set using linked list too. In that case, each set will be a linked list, and head of the linked list will be the representative element. Each node contains two pointers, one to its next element it the set and other points to the representative of the set.

To initialize, each element will be added to a linked list. To union (u, v), we add the linked list which contains u to end of the linked list which contains v and change representation pointer of each node to point to the representation of list which contained v.

The complexity of union operation is again `O(n)`. Also, find operation can be `O(1)` as it returns the representative of it.

### Disjoint set forest

The disjoint-forests data structure is implemented by changing the interpretation of the meaning of the element of array A. Now each A[i] represents an element of a set and points to another element of that set. The root element points to itself. In short, A[i] now points to the parent of i.

Makeset operation does not change, as to start with each element will be the parent of itself.
Union operation will change, if we want to connect u and v with an edge, we update A[root of u] with the root of v. How to find the root of an element? As we have the relationship that A[i] is the parent of i, we can move up the chain until we find a case where `A[i] == i`, that case, i is the root of v.

```//finding root of an element
int root(int A[],int i){
while(A[i] != i){
i = A[i];
}
return i;
}

/*Changed union function where we connect
the elements by changing the root of
one of the elements
*/

int union(int A[] ,int u ,int v){
int rootU = root(A, u);
int rootV = root(A, v);
A[rootU] = rootV ;
}
```

This implementation has a worst-case complexity of `O(n)` for union function. And also we made the worst complexity of findSet operation as `O(n)`.

However, we can do some ranking on the size of trees which are being connected. We make sure that always root of smaller tree point to the root of the bigger tree.

```void union(int[] A, int[] sz, u, v){

//Finding roots
for (int i = u; i != A[i]; i = A[i]) ;
for (int j = v; j != A[j]; j = A[j]) ;

if (i == j) return;
//Comparing size of tree to put smaller tree root under
// bigger tree's root.
if (sz[i] < sz[j]){
A[i] = j;
sz[j] += sz[i];
}
else {
A[j] = i;
sz[i] += sz[j];
}
}
```

In the next few posts, we will be discussing applications of this method to solve different problems on graphs.
Please share if there is something wrong or missing. If you are preparing for an interview, and want coaching sessions to prepare for it, please signup for free demo session.