Cycle in undirected graph using disjoint set

In post disjoint set data structure, we discussed the basics of disjoint sets. One of the applications of that data structure is to find if there is a cycle in a directed graph.

In graph theory, a cycle is a path of edges and vertices wherein a vertex is reachable from itself.

For example, in the graph shown below, there is a cycle formed by path : 1->2->4->6->1. Disjoint-set data structure has two operations: union and find. Union operation merges two sets into one, whereas find operation finds the representative of the set a given element belongs to.

Using disjoint set to detect a cycle in directed grah

How can use the data structure and operations on it to find if a given directed graph contains a cycle or not?

We use an array A, which will store the parent of each node. Initialize the array with the element itself, that means to start with every node is the parent of itself.

Now, process each edge(u,v) in the graph and for each edge to the following: Get the root of both vertices u and v of the edge. If the roots of both nodes are different, update the root of u with the root of v. If roots are same, that means they belong to the same set and hence this edge creates a cycle.

How can we find the root of a vertex? As we know A[i] represents the parent of i; we start with i= u and go up till we find A[i] = i. It means there is no node parent of i and hence i is the root of the tree to which u belongs.

Let’s take an example and see how does it work. Below is the given directed graph and we have to if there is a cycle in it or not? To start with, we initialize array A with the elements themselves. Now, we process each node of the graph one by one. First is edge(1,2). The root of node(1) is 1 and the root of node(2) is 2. Since the roots of two vertices are different, we update the parent of the root of 2 which is A to the root of 1 which is 1. Next, we process edge(2,3), here root of the node(2) is 1, whereas the root node(3) is 3. Again they differ, hence update A[root of 3] with root 2, i.e A = 1; Now, process edge(2,4), it will end up with A = 1, can you deduce why? And similarly edge(4,6) will also lead to update A = 1. Now, we process the edge(6,1). Here, root of node(6) is 1 and also the root of node(1) is 1. Both the nodes have same root, that means there is a cycle in the directed graph. To detect a cycle in direct graph : Implementation

package com.company.Graphs;

import java.util.*;

/**
* Created by sangar on 21.12.18.
*/
public class AdjacencyList {
private Map<Integer, ArrayList<Integer>> G;
private boolean isDirected;
private int count;

this.G = new HashMap<>();
this.isDirected = isDirected;
}

public void addEdge(int start, int dest){

if(this.G.containsKey(start)){
}else{
this.G.put(start, new ArrayList<>(Arrays.asList(dest)));
}

if(!this.G.containsKey(dest)) {
this.G.put(dest, new ArrayList<>());
}
//In case graph is undirected
if(!this.isDirected) {
}
}

public boolean isEdge(int start, int dest){
if(this.G.containsKey(start)){
return this.G.get(start).contains(dest);
}

return false;
}

public boolean isCycleWithDisjointSet() {
int[] parent = new int[this.G.size() + 1];

for (int u = 1; u < this.G.size() + 1; u++) {
//Process edge from each node.

//Find root of u
int i, j;

//Worst complexity is O(V)
for(i=u; i != parent[i]; i = parent[i]);

/*This loop will run for O(E) times for all
the vertices combined. */
for(int v: this.G.get(u)){
for(j=v; j != parent[j]; j = parent[j]);

if(i == j){
System.out.println("Cycle detected at
("+ u + "," + v + ")");
return true;
}

parent[i] = j;
}
}
return false;
}
}

Test cases

package test.Graphs;

import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
* Created by sangar on 21.12.18.
*/
public class AdjacencyListTest {
@Test
public void detectCycleInDirectedGraphTest() {

assertEquals(true, tester.isEdge(3,4));
assertEquals(false, tester.isEdge(1,4));

assertEquals(true, tester.isCycleWithDisjointSet());

}
}

Complexity of this algorithm is O(EV) where E is number of edges and V is vertices, where as union function in disjoint set can take linear time w.r.t to vertices and it may run for number of edge times.

Please share if there is something wrong or missing. If you are preparing for interview and interested in personalized coaching, please signup for free demo class.

Disjoint set data structure

A disjoint set data structure or union and find maintains a collection 𝑆 = { 𝑆1, 𝑆2, ⋯ , 𝑆𝑛 } of disjoint dynamic sets. Subsets are said to be disjoint if intersection between them is NULL. For example, set {1,2,3} and {4,5,6} are disjoint sets, but {1,2,3} and {1,3,5} are not as intersection is {1,3} which is not null. Another important thing about the disjoint set is that every set is represented by a member of that set called as representative.

Operations on this disjoint set data structure:
1. Make Set: Creates a new set with one element x, since the sets are disjoint, we require that x not already be in any of the existing sets.
2. Union: Merges two sets containing x and y let’s say Sx and Sy and destroys the original sets.
3.Find: Returns the representative of the set which element belongs to.

Let’s take an example and see how disjointed sets can be used to find the connected components of an undirected graph.

To start with, we will make a set for each vertex by using make-set operation.

for each vertex v in G(V)
do makeSet(v)

Next process all the edges in the graph (u,v) and connect set(u) and set(v) if the representatives of the set which contains u and set which contains v are not same.

for each edge (u,v) in 𝐺(E)
do if findSet(u) != findSet(v)
then union(u, v)

Once above preprocessing steps have run, then we can easily find answer if two vertices u and v are part of same connected component or not?

boolean isSameComponent(u, v)
if findSet(u)==findSet(v)
return True
else
return False

To find how many components are there, we can look at how many disjoint sets are there and that will give us the number of connected components in a graph. Let’s take an example and see how it works. Below table shows the processing of each edge in the graph show figure above. Now, how can we implement sets and quickly do union and find operations? There are two ways to do it.

Disjoint set representation using an array

Simple implementation of disjoint set is using an array which maintains their representative of element i in A[i]. To this implementation to work, it is must that all the element in the set are in range 0 to N-1 where N is size of the array.

Initially, in makeSet() operation, set A[i]=i, for each i between 0 and N-1 and create the initial versions of the sets. for (int i=0; i<N; i++) A[i] = i;

Union operation for the sets that contain integers u and v, we scan the array A and change all the elements
that have the value A[u] to have the value A[v]. For example, we if want to connect an edge between 1 and 2 in the above set, the union operation will replace A with A. Now, if want to add an edge between 3 and 1. In this case, u = 3 and v = 1. A = 3 and A = 1. So, we will replace all the indices of A where A[i] = 1. So final array looks like this. Similarly, if want to add an edge from 6 to 7. //change all elements from A[u] to A[v].
void union(int A[], int u, int v){
int temp = A[u];
for(int i=0; i<A.length; i++){
if(A[i] == temp)
A[i] = A[v];
}
}

findSet(v) operation returns the value of A[v].

int findSet(int A[], int v){
return A[v]
}

The complexity of makeSet() operation is O(n) as it initializes the entire array. Union operation take every time O(n) operations if we have to connect n nodes, then it will be O(n2) operations. FindSet() operation has constant time complexity.

We can represent a disjoint set using linked list too. In that case, each set will be a linked list, and head of the linked list will be the representative element. Each node contains two pointers, one to its next element it the set and other points to the representative of the set.

To initialize, each element will be added to a linked list. To union (u, v), we add the linked list which contains u to end of the linked list which contains v and change representation pointer of each node to point to the representation of list which contained v.

The complexity of union operation is again O(n). Also, find operation can be O(1) as it returns the representative of it.

Disjoint set forest

The disjoint-forests data structure is implemented by changing the interpretation of the meaning of the element of array A. Now each A[i] represents an element of a set and points to another element of that set. The root element points to itself. In short, A[i] now points to the parent of i.

Makeset operation does not change, as to start with each element will be the parent of itself.
Union operation will change, if we want to connect u and v with an edge, we update A[root of u] with the root of v. How to find the root of an element? As we have the relationship that A[i] is the parent of i, we can move up the chain until we find a case where A[i] == i, that case, i is the root of v.

//finding root of an element
int root(int A[],int i){
while(A[i] != i){
i = A[i];
}
return i;
}

/*Changed union function where we connect
the elements by changing the root of
one of the elements
*/

int union(int A[] ,int u ,int v){
int rootU = root(A, u);
int rootV = root(A, v);
A[rootU] = rootV ;
}

This implementation has a worst-case complexity of O(n) for union function. And also we made the worst complexity of findSet operation as O(n).

However, we can do some ranking on the size of trees which are being connected. We make sure that always root of smaller tree point to the root of the bigger tree.

void union(int[] A, int[] sz, u, v){

//Finding roots
for (int i = u; i != A[i]; i = A[i]) ;
for (int j = v; j != A[j]; j = A[j]) ;

if (i == j) return;
//Comparing size of tree to put smaller tree root under
// bigger tree's root.
if (sz[i] < sz[j]){
A[i] = j;
sz[j] += sz[i];
}
else {
A[j] = i;
sz[i] += sz[j];
}
}

In the next few posts, we will be discussing applications of this method to solve different problems on graphs.
Please share if there is something wrong or missing. If you are preparing for an interview, and want coaching sessions to prepare for it, please signup for free demo session.

In the last post, we discussed depth first traversal of a graph. Today, we will discuss another way to traverse a graph, which is breadth first traversal. What is breadth first traversal? Unlike depth-first traversal, where we go deep before visiting neighbors, in breadth-first search, we visit all the neighbors of a node before moving a level down. For example, breadth first traversal of the graph shown below will be [1,2,5,3,4,6] In breadth first search, we finish visiting all the nodes at a level before going further down the graph. For example, the graph used in the above example can be divided into three levels as shown. We start with a node in level 1 which is node(1). Then visit all the nodes which are one level below node(1) which are node(2) and node(5). Then we visit all the node at level 3 which are node(3), node(4) and node(6).

Breadth First Traversal: algorithm

1. Start with the given node u, put node u to queue
2. While queue is not empty, repeat below steps:
1. Dequeue fro queue and print node u.
2. For each neighbor of u, node v
3. If v is not visited already: add v to the queue
4. mark v as visited

Let’s take an example and see how it works. Below is the graph and we have to find BFS for this graph. We start from node(1), and put it in the queue. While the queue is not empty, we should pop from it and print the node. In this case, node(1) will be printed. Next, we go through all the neighbors of node(1) and put all the unvisited node on the queue. node(2) and node(5) will go on to the queue and marked as visited. Traversal = {1} Again, we dequeue from the queue and this time we get node(2). We print it and go for all the neighbor node, node(3) and node(4) and mark them as visited. Traversal = {1,2} node(5) is dequeued next and printed. Here, even though node(4) is a neighbor of node(5), it is already visited and hence not put on to the queue again. But node(6) is not yet visited, so put it on to the queue. Traversal = {1,2,5} Now, we pop node(3) and print it, however, node(4) is already visited. Hence, nothing is added to the queue. Traversal = {1,2,5,3} Next, node(4) is taken out from queue and printed, nothing goes on to queue. Traversal = {1,2,5,3,4} Last, we pop node(6) and print it. Traversal = {1,2,5,3,4,6}. At this point, the queue is empty and we stop traversal.

Breadth first traversal: implementation

boolean[] visited = new boolean[this.G.length];
ArrayList<Integer> traversal = new ArrayList<>();

Queue<Integer> q = new LinkedList<>();

//This is start node
visited = true;

while(!q.isEmpty()){
int u = (int)q.remove();

for(int i=1; i< this.G.length; i++){
if(this.G[u][i] && !visited[i]){
visited[i]= true;
}
}
}
System.out.println(traversal);
return traversal;

}

The complexity of this code is O(V2) as at least V nodes will go in queue and for each nodes internal for loop runs V times.

Implementation of breadth-first search on graph represented by adjanceny list

boolean[] visited = new boolean[this.G.size()];
ArrayList<Integer> traversal = new ArrayList<>();

Queue<Integer> q = new LinkedList<>();

//This is start node
visited = true;

//This loop will run for V times, once for each node.
while(!q.isEmpty()){
int u = (int)q.remove();

/*This loop has a worst-case complexity of O(V), where
node has an edge to every other node, but
the total number of times this loop will run is E times
where E number of edges.
*/
for(int v : this.G.get(u)){
if(!visited[v]){
visited[v]= true;
}
}
}
System.out.println(traversal);
return traversal;

}

The complexity of Breadth First Search is O(V+E) where V is the number of vertices and E is the number of edges in the graph.

The complexity difference in BFS when implemented by Adjacency Lists and Matrix occurs due to this fact that in Adjacency Matrix, to tell which nodes are adjacent to a given vertex, we take O(|V|) time, irrespective of edges. Whereas, in Adjacency List, it is immediately available to us, takes time proportional to adjacent vertices itself, which on summation over all vertices |V| is |E|. So, BFS by Adjacency List gives O(|V| + |E|).

StackOverflow

When a graph is strongly connected, O(V + E) is actually O(V2)

Applications of Breadth first traversal

1. To find shortest path between two nodes u and v
2. To test bipartite-ness of a graph
3. To find all nodes within one connected component

Please share if there is something wrong or missing. If you are preparing for an interview and want a free coaching session to guide you through it, please reach out to us at communications@algorithmsandme.com

Topological sorting in a graph

Given a directed acyclic graph G (V,E), list all vertices such that for all edges (v,w), v is listed before w. Such an ordering is called topological sorting and vertices are in topological order. For example, topological sort for below graph would be: 1,2,3,5,4,6 A topological ordering is not unique and a DAG can have more than one topological sort. For example, for above graph, 1,5,2,3,6,4 is also correct topological sort.

How to do a topological sort on a graph?

To start topological sort, we need a node which has zero incoming edges. If there is no such node in the graph, the graph is not directed acyclic graph, DAG. DAG will have at least one such node. Let’s G0 is the graph and V0 is the vertex with zero incoming node. So, V0 becomes the first vertex in order.
Now if we take out V0 and all the edges coming out from it, it leaves a graph G1, there must be a vertex V1 in G1 which has zero incoming edges. V1 is added to topological order.
TheV1 is removed from G1 which leaves us with G2 with TheV2 as candidate node. This goes on until there is no nodes remaining in the original graph G.

At any point in time, we cannot move forward, when there is no node with zero incoming nodes, it means there is a cycle in the graph and given graph is not a DAG.

Above description actually gives away the implementation detail too. We look for a vertex with zero incoming edges and take that. Then we remove all the edges coming out of that node from the graph, which will decrease the number of edges coming on to the neighbor nodes. Again select a node which has zero incoming edges and continues by removing edges.

Let’s take an example and work it out. Topological sort: implementation

package com.company.Graphs;

import java.util.*;

/**
* Created by sangar on 21.12.18.
*/
public class AdjacencyMatrix {
private boolean [][] G;
private boolean isDirected;

public AdjacencyMatrix(int numNodes, boolean isDirected){
this.G  = new boolean[numNodes+1][numNodes+1];
this.isDirected = isDirected;
}

public void addEdge(int start, int dest){
this.G[start][dest] = true;
if(!isDirected){
this.G[dest][start] = true;
}
}

public boolean isEdge(int start, int dest){
return this.G[start][dest];
}

public ArrayList<Integer> topologicalSort(){

int[] inDegree = new int[this.G.length];
boolean[] visited = new boolean[this.G.length];
ArrayList<Integer> toplogicalOrder = new ArrayList<>();

//Complexity of O(n^2)
for(int i=1; i<this.G.length; i++) {
for (int j = 1; j < this.G.length; j++) {
if (this.G[i][j]) {
inDegree[j] += 1;
}
}
}
/* We will traverse each node */
for(int i=1; i<this.G.length; i++){
/* find a node with zero in degree */
int u  = findNextNode(inDegree, visited);
/* If there is no node with zero in degree,
topological sort not possible */
if(u != -1){
visited[u] = true;
/* decrease in degree of each node adjacent
to node wih zero in degree */
for(int v=1; v<this.G.length; v++) {
if (this.G[u][v]) {
inDegree[v]--;
}
}
}
else{
System.out.println("\n Topological sort no possible");
}
}
}

private int findNextNode(int indegree[], boolean[] visited){
for( int i=1; i< this.G.length; i++){
if(indegree[i] == 0 && !visited[i]){
return i;
}
}
return -1;
}
}

The complexity of topological sort implementation with adjacency matrix representation is O(V2). For each vertex we find the vertex with zero in-degree, hence the quadratic time.

Topological sort adjacency list represented graph

package com.company.Graphs;

import java.util.*;

/**
* Created by sangar on 21.12.18.
*/
public class AdjacencyList {
private Map<Integer, ArrayList<Integer>> G;
boolean isDirected;

this.G = new HashMap<>();
this.isDirected = isDirected;
}

public void addEdge(int start, int dest){

if(this.G.containsKey(start)){
}else{
this.G.put(start, new ArrayList<>(Arrays.asList(dest)));
}

if(!this.G.containsKey(dest)) {
this.G.put(dest, new ArrayList<>());
}
//In case graph is undirected
if(!this.isDirected) {
}
}

public boolean isEdge(int start, int dest){
if(this.G.containsKey(start)){
return this.G.get(start).contains(dest);
}

return false;
}

public ArrayList<Integer> topologicalSort(){

int[] inDegree = new int[this.G.size()+1];
boolean[] visited = new boolean[this.G.size()+1];
ArrayList<Integer> topologicalOrder = new ArrayList<>();

//Complexity of O(V + E)
for(int u: this.G.keySet()){
for (int v : this.G.get(u)){
inDegree[v] += 1;
}
}

/* We will traverse each node */
for(int i=1; i<this.G.size()+1; i++){
/* find a node with zero in degree */
int u  = findNextNode(inDegree, visited);
/* If there is no node with zero in degree,
topological sort not possible */
if(u != -1){
visited[u] = true;
/* decrease in degree of each node adjacent
to node wih zero in degree */
for(int v : this.G.get(u)) {
inDegree[v]--;
}
}
else{
System.out.println("\n Topological sort no possible");
}
}
}

private int findNextNode(int inDegree[], boolean[] visited){
for( int i=1; i< this.G.size()+1; i++){
if(inDegree[i] == 0 && !visited[i]){
return i;
}
}
return -1;
}
}

The complexity of this implementation is also O(V2) as still we scan all vertices to find the vertex with zero indegree.

Whenever we are updating the in-degree of all the adjacent node, we can store all the vertices for which in-degree becomes zero in a queue. So we don’t need to separately search for the node with zero in-degree, we can simply take the front of the queue, which will reduce the time complexity to O(V + E) with an additional space complexity of O(V). This is called Kahn’s algorithm.

Below is the code for topological sorting using Depth First traversal. We are using a global variable here count. Idea is that all the nodes which are descendant of node u will come after vertex u in topological order. If we fill the array in reverse order, all the descendants will be filled up first and then the starting node.

private void topologicalSortDFS( int u, boolean[] visited,
int[] topologicalOrder) {
// Do a DFS starting from u
if ( ! visited[u] ) {
visited[u] = true;
for (int v: this.G.get(u)) {
topologicalSortDFS(v, visited, topologicalOrder);
}
this.count++;
topologicalOrder[ this.G.size() + 1 - this.count ] = u;
}
}

public void topologicalOrder() {

boolean[] visited = new boolean[this.G.size() + 1];
int[] topologicalOrder = new int[this.G.size() + 1];

for (int i = 1; i < this.G.size() + 1; i++) {
if (!visited[i]) {
topologicalSortDFS(i, visited, topologicalOrder);
}
}
Arrays.stream(topologicalOrder).forEach(i -> System.out.println(i));
}

The complexity of above implementation is O(V + E) with adjacency list represented graph.

Please share if there is something wrong or missing. If you are preparing for an interview and want to have personalized coaching for your preparation, please signup for free demo class.

Dijkstra’s Algorithm to find shortest path

Given a graph, directed or undirected and two nodes, find shortest path between these two nodes.
This is a standard problem and we don’t need to figure out what to do. We will have adjacency list representation of graph. Algorithm is widely published and is as below.

1. Initialize distance of all nodes from start node as INFINITE and all nodes as not finalized.
2. Take source node to start with, let’s say u.  Distance from source or start node to itself will be zero.
3. Mark u as considered and distance finalized.
4. Now for all neighbor nodes v of it, update the distance if the current distance is more than distance of  u + weight of (u,v).
For example, if the current node A is marked with a distance of 6, and the edge connecting it with a neighbor Bhas length 2, then the distance to B (through A) will be 6 + 2 = 8. If B was previously marked with a distance greater than 8 then change it to 8. Otherwise, keep the current value.{From Wikipedia}
5. Now select a node for which distance is not finalized and distance is minimum till now and make it u. Go to step 3.

Let’s work out an example and see how it works. Input graph is

To figure out the path which was followed to reach destination from source, we can have an array to keep track of the parent node whenever distance to the node is updated. By reverse tracking parents from destination to source, we can figure out the path.

Dijkstrat’s algorithm to find shortest path : Implementation

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

#define NUM_NODE 7
#define NUM_WORDS 10
#define NUM_CHAR 4
#define true 1
#define false 0

#define INFINITE 1000

typedef struct node{
int value;
int wt;
struct node *next;
}Node;

Node *graph[NUM_NODE + 1];

void add_edge_2(int i, int j, int wt);

int find_minimum_node(int visited[], int dist[]){
int min = INFINITE;
int index = -1;
int i;
for(i=1; i<= NUM_NODE; i++){
if(visited[i] == false && min>dist[i]){
min = dist[i];
index = i;
}
}
return index;
}

void dijstras(Node * graph[], int start, int end ){
int i;
int parent[NUM_NODE +1];
int distance[NUM_NODE+1];
int visited[NUM_NODE+1];

for(i=1; i<=NUM_NODE; i++){
visited[i] = false;
distance[i] = INFINITE;
parent[i] = -1;
}
// Mark distance of start as 0.
distance[start] =0;
for(i=1; i<=NUM_NODE; i++){
int index  = find_minimum_node(visited, distance);
if(index != -1){
Node * node = graph[index];
Node * current  = node->next;

while(current){
/*If neihbour node is not visited and its current distance is
more than distance of current node + cost of edge between
current node and this node, update the distance */
if(visited[current->value] == false && distance[current->value] >
distance[node->value] + current->wt){

distance[current->value] = distance[node->value] + current->wt;
parent[current->value] = node->value;
}
current = current->next;
}
visited[node->value] = true;
if(node->value == end)
break;
}
else{
break;
}
}

printf("\nDistance between %d and %d : %d", start , end, distance[end]);

// Printing path in reverse order,using stack, we can print it normal order to.
printf("\nPath is  (In reverse order): ");
int cur_parent =0;
while(cur_parent != -1){
printf("%d ", end );
cur_parent = parent[end];
end = cur_parent;
}
printf("\n");
}

Node *createNode(int j, int wt){

Node * new_node = (Node *)malloc(sizeof(Node));
if(new_node){
new_node->value = j;
new_node->next = NULL;
new_node->wt = wt;
}
else{
printf("\n Node cannot be allocated");
}
return new_node;
}

void addEdge(int i, int j, int wt){

Node * temp = graph[i];
if(temp == NULL){
graph[i] = createNode(j, wt);
}
else{
while(temp->next){
temp = temp->next;
}
temp->next = createNode(j, wt);
}
}

//driver program
int main(){

int i,j;
for(i=1; i<=NUM_NODE; i++){
graph[i] = createNode(i,0);
}
// creating graph with weighted edges.

dijstras(graph, 1, 6);

return 0;
}

The complexity of above code is O(V2) where V is number of vertices in graph. This can be reduced to O(E log V) by using heaps. Heaps will reduce complexity of searching minimum weight cost from O(V) to O(log V).
Limitation of algorithm1. It does not work with negative weights.

Please share if there is anything wrong or missing.

Word ladder or word golf

Let me explain problem in detail first. We are given two words A and B and a dictionary of words. Task is to transform first word into second word by changing one character at a time. Character should be changed in such a way that the new word should be in dictionary. Find minimum number of such hops from one word to reach the second word.

For example, lets say given word are  CAT and DOG The sequence of steps will be CAT –> COT–> COG–> DOG, hence three substitutions and we are done.
To convert COLD to WARM (from Wikipedia)
Sequence of steps will be COLD–>CORD–>CARD–>WARD–>WARM.

There are two strings/words given as input. And hoping can be done only if there is one condition : the destination of hop should differ at most one character with source. So if two words differ only in one character, we have a relationship between them.
So if take those two words as two nodes and connect them only if they differ by one character, we get a graph with two nodes and one edge.
Apply same principle to all words in dictionary and create a graph, nodes representing words and edges the relationship.
Once we have created graph as per above conditions, start breadth first traversing from the node which has values as first word and go on scanning till we get a node with value as second word. That’s all. Simple analysis! Implementation is bit tricky. Let’s look at implementation. For each word, if we go an scan every word in dictionary to figure out if they differ by one character in order to create edge will be too inefficient with complexity of O(n2). If we create buckets of words with one character of that replaced with _ and put all similar words into that bucket, we will segregate words with one character different. Let’s take an example.

Dictionary given is  {“CAT”, “BAT”, “COT”, “COG”, “COW”, “RAT”, “BUT”,”CUT”, “DOG”, “WEB”}. Now for word CAT, there are three buckets possible. Now take word BAT, if we replace B with _ it becomes _AT, for which we already have a bucket, so we can put CAT and BAT in same bucket. Similarly when we replace second character of COT, it becomes C_T, for which we already have a bucket, hence CAT and COT are placed in once bucket. With above given words buckets will be like below. Now connect all words in same buckets with each other. Final graph will be like below: #include<stdio.h>
#include<stdlib.h>
#include<string.h>

#define NUM_WORDS 10
#define NUM_CHAR 4
#define true 1
#define false 0

typedef struct node_s{
char value[NUM_CHAR];
int wt;
struct node_s *next;
}Node_S;

Node_S *graph_s[NUM_WORDS + 1];

Node_S * create_node_s(char *temp){

Node_S * new_node = (Node_S *)malloc(sizeof(Node_S));
if(new_node){
strcpy(new_node->value, temp);
new_node->next = NULL;
}
else{
printf("\n Node cannot be allocated");
}
return new_node;
}
void add_edge_3(int i, char *str){

Node_S * temp = graph_s[i];
if(temp == NULL){
graph_s[i] = create_node_s(str);
}
else{
while(temp->next){
temp = temp->next;
}
temp->next = create_node_s(str);
}
}
/* Check if there is already exisitng bucket */
int search_buckets(char buckets[NUM_WORDS * NUM_CHAR][NUM_CHAR], int count, char *temp){
int i;
for(i=0; i<count; i++){
if(!strcmp(buckets[i], temp))
return i;
}
return -1;
}

/* This adds string to a bucket */
void  add_string(Node_S *pre_graph[], int index, char * str){

Node_S * temp = pre_graph[index];

if(!temp){
pre_graph[index] = create_node_s(str);
}
else{
while(temp->next){
temp  = temp->next;
}
temp->next = create_node_s(str);
}
}

/* This function finds the index of the word in dictionary
Inefficient, can be done using binary search as words are in sorted
order in dictionary */

int find_index(char dict[NUM_WORDS][NUM_CHAR], char * temp){
int i;
for(i=0; i<NUM_WORDS; i++){
if(!strcmp(dict[i], temp))
return i;
}
return -1;
}

/* This function links all words in buckets to create edges of graph */

void create_graph(Node_S *pre_graph[NUM_WORDS * NUM_CHAR],
char dict[NUM_WORDS][NUM_CHAR], int count){
int i;
int index, index_1, index_2;
for(i=0; i<count; i++){
Node_S * current =  pre_graph[i];
while(current){
index  = find_index(dict, current->value);
if(!graph_s[index]){
}

next = current->next;
while(next){
index_1  = find_index(dict, next->value);
if(!graph_s[index_1])
next = next->next;
}
current = current->next;
}
}
}

/*This function creates buckets and using those buckets also
creates graph where all neighbour nodes differ by at most one
character */
void create_buckets(char buckets[NUM_WORDS * NUM_CHAR][NUM_CHAR],
char dict[NUM_WORDS][NUM_CHAR]){

int i,j;
int count = 0;
Node_S * pre_graph[NUM_WORDS * NUM_CHAR];

int key[NUM_WORDS * NUM_CHAR];

for(i=0; i<NUM_WORDS * NUM_CHAR; i++){
pre_graph[i] = NULL;
}
for(i=0; i<NUM_WORDS; i++){
for(j = 0; j<NUM_CHAR-1; j++ ){
char temp[NUM_CHAR];
strcpy(temp, dict[i]);
temp[j] = '_';

int index = search_buckets(buckets, count, temp);
if(index != -1){
}
else{
strcpy(buckets[count], temp);
count++;
}
}
}
create_graph(pre_graph, dict, count);
// Adjancancy matrix based representation
for(i=0; i<NUM_WORDS; i++){
printf("Nodes adjacent to %d : ", i);

Node_S * current  = graph_s[i];
while(current){
printf("%s ", current->value);
current = current->next;
}
printf("\n");
}
}

//Driver program
int main(){
char buckets[NUM_WORDS * NUM_CHAR][NUM_CHAR];
char dict[NUM_WORDS][NUM_CHAR] = {"CAT", "BAT", "COT", "COG", "COW",
"RAT", "BUT","CUT", "DOG", "WEB"};

create_buckets(buckets, dict);

bfs(graph_s);
return 0;
}

This above code needs optimization, please provide suggestions to improve. Breadth First Search implementation is as follows

typedef struct queue{
Node_S *front;
Node_S *rear;
}Queue;

int is_empty(Queue *q){
if(q->front == NULL && q->rear == NULL) return true;
return false;
}

void enqueue(Queue *q, char value[]){
Node_S * temp = (Node_S *)malloc(sizeof(Node_S));
strcpy(temp->value, value);
if(is_empty(q)){
q->front = temp;
q->rear = temp;
}
else{
if(q->rear){
q->rear->next = temp;
q->rear = temp;
}
else{
printf("\n Error ");
}
}
}

void dequeue(Queue *q, char str[]){
if(!is_empty(q)){
strcpy(str, q->front->value);
Node_S * curr = q->front;
if(q->front == q->rear){
q->rear = NULL;
q->front = NULL;
}
else{
q->front = curr->next;
}
free(curr);
}
}
void init_queue(Queue **q){
*q = (Queue *)malloc(sizeof(Queue));
(*q)->front = NULL;
(*q)->rear = NULL;
}

void bfs_1(Node_S * current, char *second, char dict[NUM_WORDS][NUM_CHAR]){

Queue *q = NULL;

init_queue(&q);
char str[NUM_CHAR];
int index ;
if(!current) return;

enqueue(q, current->value);
while(!is_empty(q)){
dequeue(q, str);
printf("\n%s ", str);
index = find_index(dict, str);
visited[index] = true;
if(!strcmp(str, second)){
return;
}
Node_S *temp = graph_s[index]->next;
while(temp){

index = find_index(dict, temp->value);
if(visited[index] == false){
visited[index] = true;
enqueue(q, temp->value);
}
temp = temp->next;
}
}

}

Minimum spanning tree Prim’s algorithm

In the last post, we discussed how to find minimum spanning tree in a graph using Kruskal algorithm. Let’s take a look at another algorithm to find the minimum spanning tree in a graph, this algorithm is called Prim’s algorithm. Problem statement remains the same as the Kruskal algorithm, given a graph with weighted edges, find MST in it.

Prim’s algorithm for minimum spanning tree

Basic idea pf Prim’s algorithm is to maintain two sets of nodes of the graph, one contains all the nodes which are already included into the minimum spanning tree and the other contains all nodes which are not included in MST yet. Now find an edge with least weight which connects any node in the first set to any node in the second set. Such an edge is commonly called as cut.

Let two sets be A which includes all vertices of the graph which are included in MST and V is set which contains all vertices which are not included in MST yet. Let S be set of all vertices of a graph.

To start with A will be empty and V will be equal to S. Start with any one vertex and add it to A. Also, the cost of reaching any node initially is INFINITE. Once we add the vertex to A, update the cost of all neighbor vertex based on cost to reach vertex added in A and cost of edge between this vertex and neighboring vertices. Remove the vertex from V.

Now, select an edge which is minimum and connects one vertex in A with a vertex in V, it does not form a cycle. (If both the ends of an edge are in same component of the graph there are bound form a cycle.)

Let’s see prim’s algorithm and then we will see how can we implement it.

1. Initialize cost of each vertex with INFINITE. Except from the first vertex v where we will start.
2. For each neighbor u, cost(u) < cost(v) + w(u,v), update the cost(u).
3. Select a minimum cost neighbor vertex among these vertex and see if edge forms a cycle.
4. If it forms a cycle take another vertex till we considered every neighboring vertices.
5. If it does not form cycle, add it to MST and go to step 2.
6. Repeat all above steps till all the vertices are not included in tree.

See the working of algorithm.  Minimum spanning tree algorithm's implementation.

First thing we need to do is to keep track of two sets. We can do so by keeping track of each node in MST. Create an array and every time we add the node in MST, mark it as added. While considering any edge in future, just check if MST[i] is true, if it is then node is already added in MST and should not be added again.Second thing is to keep track of the cost of each node. Initially all nodes are at INFINITE cost.

As we traverse the graph, update each node's cost from the current node. Keep an array of integers initialized with INFINITE and every time we add a node in MST, update distance of neighboring nodes based on cost of edge from current node to them. Last step is to find the minimum cost edge between two sets. We can scan through the distance array which we discussed in above paragraph and get a node which has least cost and not already in MST (using the array values in discussed in paragraph 1)
Finding minimum can be implemented using min heap or priority queues, that would reduce the complexity of overall algorithm.

#include<stdio.h>
#include<stdlib.h>

#define NUM_NODE 6
#define MAX_VALUE 999
#define true 1
#define false 0

//Definition of node in adjacency list formation
typedef struct node{
int value;
int wt;
struct node *next;
}Node;

//Edge definition for sorting and storing minimum spanning tree
typedef struct edge{
int start;
int end;
int wt;
}Edge;

Node *graph[NUM_NODE + 1];

//This function creates a node in adjancency list representation with weight
Node * createNode(int j, int wt){

Node *newNode = (Node *)malloc(sizeof(Node));
if(newNode){
newNode->value = j;
newNode->next = NULL;
newNode->wt = wt;
}
else{
printf("\n Node cannot be allocated");
}
return newNode;
}
//Adds edge in graph
void addEdge(int i, int j, int wt){

Node *currentNode = graph[i];
if(currentNode == NULL){
graph[i] = createNode(j, wt);
}
else{
while(currentNode->next){
currentNode = currentNode->next;
}
currentNode->next = createNode(j, wt);
}
}

void updateWeights(int currentVertex, int weights[],
int predecessors[], int visited[]){

Node *currentNode = graph[currentVertex];
if(currentNode){
//update distance for all neighbor nodes
Node * itr = currentNode->next;
while(itr){
if(weights[itr->value] >  itr->wt && visited[itr->value] == false ){
weights[itr->value] =  itr->wt;
predecessors[itr->value] = currentVertex;
}
itr = itr->next;
}
}
}

int getMinimumVertex( int weights[], int visited[], int numNodes ){
int currentMinWeight = MAX_VALUE;
int currentMinVertex = 1;
for( int i = 2; i<=numNodes; i++ ){
if( !visited[i] && weights[i] < currentMinWeight  ){
currentMinWeight = weights[i];
currentMinVertex = i;
}
}
return currentMinVertex;
}

//Real implementation of prims algorithm
int primsMSTAlgorithm(Node * graph[], int numNodes){

//Step 1 : Initialize all arrays
int weights[numNodes+1];
int visited[numNodes+1];
int predecessors[numNodes+1];

for( int i=1; i<=numNodes; i++ ){
weights[i] = MAX_VALUE;
}
for( int i=1; i<=numNodes; i++ ){
visited[i] = 0;
}
for( int i=1; i<=numNodes; i++ ){
predecessors[i] = MAX_VALUE;
}

weights = 0;
int totalCost = 0;
//Step 2 :
for( int i=0; i <=numNodes; i++ ){
//Greedy strategy
int closestVertex      = getMinimumVertex( weights, visited, numNodes );
totalCost+= weights[closestVertex];
visited[closestVertex] = true;
updateWeights( closestVertex, weights, predecessors, visited  );
}
}

//Driver program
int main(){

int i;
for(i=1; i<=NUM_NODE; i++){
graph[i] = createNode(i,0);
}

int totalCost = primsMSTAlgorithm(graph, NUM_NODE);
printf("Cost of minimum spanning tree : %d", totalCost);
return 0;
}

The complexity of Prim's algorithm to find minimum spanning tree in graph is O(V2).

Please share if there is anything is wrong or missing. If you are preparing for an interview and need 1 to 1 coaching for preparation, please reach out to us at communications@algorithmsandme.com or book a free session with us.

Minimum Spanning Tree – Kruskal Algorithm

Given a weighted connected undirected graph, find a minimum spanning tree in the graph. Let’s first understand what is a spanning tree?

A spanning tree T of an undirected graph G is a subgraph that is a tree which includes all of the vertices of G, with the minimum possible number of edges.

Wikipedia
When the graph is weighted i.e each edge of the graph has some weight to move from one node to another, a spanning tree with minimum cost is called the minimum spanning tree. If all the edges contain distinct weights, there will be a unique minimum spanning tree for the graph, however, if two edges can have same weight then there can be more than one minimum spanning tree for the given graph.

For the graph shown below, second figure shows spanning trees of it.  Applications of minimum spanning trees

1. Broadcast tree preparation for broadcasting on the internet. It is implemented using spanning tree protocol.
2.  Preparing telecommunication networks, water supply networks etc.

There are two widely used algorithms to find a minimum spanning tree for a graph, both based on the greedy algorithm. The basic idea of the greedy algorithm is that at every step we chose the option which gives us maximum profit at that step without worrying about future steps. These algorithms are:
1. Kruskal’s algorithm: Basic idea of the kruskal algorithm to find the minimum spanning tree in the graphs is that we take each edge one by one in increasing order of their weights. For each edge check if it makes a cycle in the existing tree? If it does not create a cycle, add it to the minimum spanning tree formed till now. If it forms a cycle, discard the edge and move to the next edge. Repeat above steps until all nodes are added in the spanning tree. Resulting tree will be the minimum spanning tree.

Kruskal algorithm to find minimum spanning tree

1. Sort all edges based on weights
2. Start with minimum cost edge. Add it to T.
3. For each edge in graph, repeat following steps.
4. If current edge forms a cycle, discard the edge.
5. If current edge does not form a cycle, add it to T.

Kruskal algorithm: implementation

#include<stdio.h>
#include<stdlib.h>

#define NUM_NODE 5
#define true 1
#define false 0

//Definition of a node in adjacency list formation
typedef struct node{
int value;
int wt;
struct node *next;
}Node;

//Edge definition for sorting and storing minimum spanning tree
typedef struct edge{
int start;
int end;
int wt;
}Edge;

Node *graph[NUM_NODE + 1];

//This function creates a node in adjancency list representation with weight
Node * createNode(int j, int wt){

Node * new_node = (Node *)malloc(sizeof(Node));
if(new_node){
new_node->value = j;
new_node->next = NULL;
new_node->wt = wt;
}
else{
printf("\n Node cannot be allocated");
}
return new_node;
}
//Adds edge in graph
void addEdge(int i, int j, int wt){

Node * currentNode = graph[i];
if(!currentNode){
graph[i] = createNode(j, wt);
}
else{
while(currentNode->next){
currentNode = currentNode->next;
}
currentNode->next = createNode(j, wt);
}
}

//Below functions are used for Union and Find method on the disjoint set
int find(int rep[], int value){
if(rep[value] == -1 || rep[value] == value)
return value;
else find(rep, rep[value]);
}

void unionSet(int rep[], int x, int y){

int xroot = find(rep, x);
int yroot = find(rep, y);

rep[xroot] = yroot;
}

void makeSet(int rep[]){
int i;
for(i=0; i<= NUM_NODE; i++){
rep[i] = -1;
}
}
//This copies all edges to array so as to sort them on weight
void copyEdges(Edge edges[], Node *graph[]){
int i, count =0;

for(i=1; i<= NUM_NODE; i++){
Node * u = graph[i];
if(u){
Node * v = u->next;
while(v){
edges[count].start =  u->value;
edges[count].end =v->value;
edges[count].wt = v->wt;
count++;
v = v->next;
}
}
}
}

//Implementation of quick sort
void swap(Edge edges[], int i, int j){
int temp_start = edges[i].start;
int temp_end = edges[i].end;
int wt = edges[i].wt;

edges[i].start  = edges[j].start ;
edges[i].end  = edges[j].end ;
edges[i].wt  = edges[j].wt;

edges[j].start  = temp_start;
edges[j].end  = temp_end ;
edges[j].wt  = wt;
}
int partition(Edge edges[], int start, int nEdges){
int i,j;

int pivot =  edges[start].wt;
i = start+1;
j = nEdges;
while(i<=j){
while(i<=nEdges && edges[i].wt < pivot)
i++;
while(j>start && edges[j].wt >= pivot)
j--;

if(i<j){
swap(edges, i, j);
}
}
if(j>start)
swap(edges,start, j);

return j;
}

void quicksort(Edge edges[], int start, int nEdges){

if(start >=nEdges) return;
int pivot = partition(edges, start, nEdges);

quicksort(edges, start, pivot);
quicksort(edges, pivot+1, nEdges);

}

//Real implementation of krutskal algorithm
void kruskal(Node * graph[], int num_edges){

int i, count =0, wt=0;

Edge edges[num_edges];

Edge mst[num_edges];

copyEdges(edges, graph);

quicksort(edges, 0, num_edges-1);

int rep[NUM_NODE + 1];

makeSet(rep);
for(i=0; i<num_edges; i++){
//Check if edge creates cycle in the tree, If not add that to tree
if(find(rep, edges[i].start) != find(rep, edges[i].end)){
unionSet(rep, edges[i].start, edges[i].end);
mst[count++] = edges[i];
wt += edges[i].wt;
}
}
// Print the minimum spanning tree
for(i=0; i<count; i++){
printf("(%d, %d)\n", mst[i].start, mst[i].end);
}
}
//Driver program
int main(){

int i;
for(i=1; i<=NUM_NODE; i++){
graph[i] = createNode(i,0);
}

kruskal(graph, 5);

return 0;
}

The complexity of the above algorithm is dominated by the sorting which takes O(E log E) where E is the number of edges. To detect whether edge forms a cycle or not, we can use the Union and Find method.

2.Prim’s algorithm: It is also based on the greedy algorithm, discussed in the next post

Please share if there is anything is wrong or missing. If you are preparing for an interview and need 1 to 1 coaching for preparation, please reach out to us at communications@algorithmsandme.com or book a free session with us.

Connected components in graph

Given an undirected graph, find the number of connected components in it.

A connected component of an undirected graph is a maximal set of nodes such that each pair of nodes is connected by a path

Every vertex of the graph lies in a connected component that consists of all the vertices that can be reached from that vertex, together with all the edges that join those vertices. If an undirected graph is connected, there is only one connected component. Every graph has at least one connected component that is the graph itself.

For example in below graph, there are two connected components {1,2,3,4} and {5, 6}. How to find the number of connected components in a graph?

We can find the number of Connected components in a graph by traversing the graph, either depth first traversal or breadth first traversal.

If we start from a node v and do a traversal of the graph, we will be able to reach all the nodes which are connected to this node through any path. These are the vertices which are part of this connected component.
Now, if there are some vertices still unvisited, then there must be another component of the graph. We start traversal again with the first unvisited vertex. We continue this until all the vertices are visited.

In the following implementation, we are using Depth First traversal of the graph to find the number of components.

Connected components of a graph: implementation

Implementation with adjacency list representation of graph

package com.company.Graphs;

import java.util.*;

/**
* Created by sangar on 21.12.18.
*/
public class AdjacencyList {
private Map<Integer, ArrayList<Integer>> G;
boolean isDirected;

this.G = new HashMap<>();
this.isDirected = isDirected;
}

public void addEdge(int start, int dest){

if(this.G.containsKey(start)){
}else{
this.G.put(start, new ArrayList<>(Arrays.asList(dest)));
}

//In case graph is undirected
if(!this.isDirected) {
if (this.G.containsKey(dest)) {
} else {
this.G.put(dest, new ArrayList<>(Arrays.asList(start)));
}
}
}

private void DFS(int node, boolean[] visited,
ArrayList<Integer> traversal){
if(!visited[node]){
visited[node] = true;

this.G.get(node).forEach(v -> DFS(v, visited, traversal));
}
}

public int connectedComponents(){

boolean[] visited = new boolean[this.G.size()];
int components = 0;

for(int i=1; i<this.G.size(); i++) {
ArrayList<Integer> traversal = new ArrayList<>();
if(!visited[i]) {
DFS(i, visited, traversal);
System.out.println(traversal);
components++;
}
}

return components;
}
}

Implementation using adjacency matrix representation of graph

package com.company.Graphs;

import java.util.ArrayList;
import java.util.Queue;
import java.util.Stack;

/**
* Created by sangar on 21.12.18.
*/
public class AdjacencyMatrix {
private boolean [][] G;
private boolean isDirected;

public AdjacencyMatrix(int numNodes, boolean isDirected){
this.G  = new boolean[numNodes+1][numNodes+1];
this.isDirected = isDirected;
}

public void addEdge(int start, int dest){
this.G[start][dest] = true;
if(!isDirected){
this.G[dest][start] = true;
}
}

private void DFS(int node, boolean[] visited,
ArrayList<Integer> traversal){
if(!visited[node]){
visited[node] = true;

for(int i=1; i< this.G.length; i++){
if(this.G[node][i]){
DFS(i, visited, traversal);
}
}
}
}

public int connectedComponents(){

boolean[] visited = new boolean[this.G.length];
int components = 0;

for(int i=1; i<this.G.length; i++) {
ArrayList<Integer> traversal = new ArrayList<>();
if(!visited[i]) {
DFS(i, visited, traversal);
System.out.println(traversal);
components++;
}
}
return components;
}
}

If an adjacency list representation is used, the for loop will run for i from 0
to V each time it is executed, so the runtime for the adjacency list representation is O(V2).

For an adjacency list representation of a graph, run time for the algorithm is O(V + E).

Please share if there is something wrong or missing. If you are preparing for an interview and want to get a free coaching session, please reach out to us at communications@algorithmsandme.com

Detect cycle in undirected graph

Given an undirected graph, find if there is a cycle in that undirected graph. For example, below graph has cycles as 2->3->4->2 and 5->4->6->5 and a few more. A simple definition of a cycle in an undirected graph would be: If while traversing the graph, we reach a node which we have already traversed to reach the current node, then there is a cycle in the graph.

How can we detect the above condition? It’s an application of Depth First Search. Do a depth-first traversal of a graph. In depth first traversal, we visit the current node and go deep to one of its unvisited neighbors. If a neighbor of the current node is already visited, that means there is a possibility of a cycle. However, make sure that the already visited neighbor is not the parent of the current node, otherwise, there will be always a cycle of two nodes in an undirected graph.

While depth first traversing a graph, keep track of the nodes which are on the stack at present. If there is an edge from the current node to any one of the nodes which are on recursion stack, we say there is a cycle in undirected graph.

To avoid additional space, we can pass the parent pointer to the recursive call and check if the visited node is the parent.

Detect cycle in undirected graph: implementation

package com.company.Graphs;

import java.util.*;

/**
* Created by sangar on 21.12.18.
*/
public class AdjacencyList {
private Map<Integer, ArrayList<Integer>> G;
private boolean isDirected;
private int count;

this.G = new HashMap<>();
this.isDirected = isDirected;
}

public void addEdge(int start, int dest){

if(this.G.containsKey(start)){
}else{
this.G.put(start, new ArrayList<>(Arrays.asList(dest)));
}

if(!this.G.containsKey(dest)) {
this.G.put(dest, new ArrayList<>());
}
//In case graph is undirected
if(!this.isDirected) {
}
}

public boolean isEdge(int start, int dest){
if(this.G.containsKey(start)){
return this.G.get(start).contains(dest);
}

return false;
}

private boolean isCycleUtil(int u, boolean[] visited, int parent){

for(int v: this.G.get(u)) {
if(!visited[v]){
visited[v] = true;
return isCycleUtil(v, visited, u);
}
else if(v != parent){
return true;
}
}

return false;
}

public boolean isCycle(){
boolean[] visited = new boolean[this.G.size() + 1];

for(int i = 1; i < this.G.size() + 1; i++) {
if (!visited[i]) {
return isCycleUtil(i, visited, -1);
}
}
return false;
}
}

The complexity of the DFS approach to finding a cycle in an undirected graph is O(V+E) where V is the number of vertices and E is the number of edges.

Please share if there is something wrong or missing. If you are preparing for an interview, please singup for free interview preparation material.