Print unique rows in a boolean matrix

Unique rows in boolean matrix

Given a matrix of 0 and 1, print all unique rows in boolean matrix. For example, if the matrix is show below:
0 1 0 0 1
1 0 1 1 1
0 1 0 0 1
1 0 1 0 0
Output should be
 0 1 0 0 1
 1 0 1 1 1
1 0 1 0 0

The first brute force solution which has a complexity of O(n2 m) where is n is the number of rows and m is the number of columns is as follows:
For each row in the boolean matrix, check all the rows if it matches with any one of them. If yes, skip printing, else print the row.
Another way which is better than the above approach is to convert each row in equivalent decimal and then check for duplicates. Again it has a complexity of O(n * m) where n is the number of rows and m as the number of columns.

Can we do better than this? There is a standard technique to check if a particular pattern is already seen or not. That’s using tries. Each pattern is added to trie, when we try to add a duplicate pattern, we will end up at the leaf node which is already marked as leaf due to an earlier pattern.
With given info, we can say that we will insert each row pattern in the trie. We will modify the insert function of trie to return us whether the row is added newly or it was already present in the trie. As explained above it can be ascertained by the fact that if the last node is marked as leaf node already, then it is duplicate row as we must have traveled the same nodes above. If the last node is not a leaf node already, then there is at least one digit which is different and hence row becomes unique. Mark last node of this pattern as a leaf node, so that same patterns will be detected as duplicates.

Add a row in the trie.
If the last node while entering the pattern is the leaf node, return false. It’s not a unique row.
If the last node is not a leaf node, mark it as a leaf node and return true.
If insert operation returns true, print the row.
Else, skip printing row.

#include<stdio.h>
#include<stdlib.h>

#define MAX_SIZE 2

#define LEAF_NODE  1
#define true 1
#define false 0

typedef struct trie_node_t {
    int value;
    struct trie_node_t * children[MAX_SIZE];
}Node;

typedef struct trie{
    int count ;
    Node *root;
}trie;

void initialize(trie *t){
    t->root = (Node *)malloc(sizeof(Node));
    t->count =0;
}

int insert_1(trie *t, int key[], int col){

    int i;

    if(t->root == NULL){
        printf("\n Trie is not initialized\n");
    }

    Node * current = t->root;
    for(i=0; i<col; i++){
        int index  =  key[i];
        if(current->children[index] == NULL){
            current->children[index] = (Node *)malloc(sizeof(Node));
        }
        current = current->children[key[i]];
    }

    /* To mark it as leaf */
    if(current->value != LEAF_NODE){
        current->value = LEAF_NODE;
        return true;
    }
    else{
        return false;
    }
}

void print_unique_rows(int a[5][5], int col, int row){

    int i,j;
    int key[col];

    trie t;
    initialize(&t);
    for(i=0; i<row; i++){
        for(j=0; j<col; j++){
            key[j] = a[i][j];
        }
        if(insert_1(&t, key, col)){
            for(j=0;j<col; j++){
                printf("%d", key[j]);
            }
            printf("\n");
        }
    }
}
//Driver program
int main(){

    int a[4][5] = {{0, 1, 0, 0, 1},
                    {1, 0, 1, 1, 0},
                    {0, 1, 0, 0, 1},
                    {1, 0, 1, 0, 0}
                  };

    print_unique_rows(a, 5, 4 );

    return 0;
}

The complexity of trie based solution is O(n * m).

Find words in maze

Find words in a maze

We learned basics of tries in last post. Let’s see if we could apply some of those properties in solving some real problems. Today’ problem at hand is to find all the words which are present in a maze of characters of size NxN. The condition is that characters of words should be adjacent to each other either horizontally or vertically.
For example, below maze contains cat, dog and word words in it.

What is the brute force solution? Simple, find all the words which can be formed using these characters in maze starting of length 1 to N*N. For each word check if the given dictionary contains the word, if yes, store that word in the result.
What is the search space here? There will be (n2!) words to be looked into the dictionary. Not a practical solution. Optimizing search in the dictionary using sorting and binary search will not be of much help. (Good point to mention at interview though).
So how can we reduce the search space? Let’s first take an example and see what is happening in the brute force approach.
If we look closely, we are checking string length N, even though we already know that N-1 character string leading to it is not a valid string. Can we avoid it? Yes, using tries. This is how we can do it.

Store all words in the given dictionary in a trie. If there is no prefix in trie with length = i; there cannot be the word with i+1 length having this prefix of length i. So just abort the lookup of words with this prefix. Else increase the string length by 1 and repeat the above step. Trie fits perfectly in this algorithm as we are dealing with prefixes. We need to keep track that what we have already visited and if there can be more words with the visited prefix. If we take the example of the above maze, we start from ‘a’, as we see that there is no word starting with ‘a’ in trie by looking at children array of the root, we can safely avoid calculating words starting with ‘a’ of any length from there.

Find words in a maze : Implemenetation

#include<stdio.h>
#include<stdlib.h>

#define N 4

int search_key(Node *current, char key){
   int index  =  GET_CHAR_INDEX(key);

   if(current->children[index] == NULL) return false;

   return true;
}

enum direction {
        NORTH =1,
        SOUTH,
        LEFT,
        RIGHT
};

void update_params(int row, int col, int *new_row, int *new_col, int dir){
   switch(dir){
      case NORTH:
           *new_row = --row;
           *new_col = col;
      break;
      case SOUTH:
           *new_row = ++row;
           *new_col = col;
      break;
      case LEFT:
           *new_col =  ++col;
           *new_row = row;
      break;
      case RIGHT:
           *new_col =  --col;
           *new_row = row;
      break;
     }
}


void find_words_wrapper(trie *t, char maze[][N]){

  int i,j,len, prefix_found = false;
  for(i=0; i<N; i++){
     for(j=0; j<N; j++){

    /*Consider all length words which can be formed stating with maze[i][j] char */

      for(len =1; len <N*N; len++){
        /* To check if we need to check for further length 
           1. if prefix_found = false, don't check, 
           no words possible for larger length 
           2. if prefix_found = true, continue looking*/

           prefix_found = false;

           /* If finally reached at the leaf of trie starting 
              with maze[i][j] and length = len */

              if(find_words(t->root, maze, len,i,j, &prefix_found)){
                 printf(" Word found at (%d %d)\n", i,j);
              }
              else if(prefix_found == false)
                 break;
       }
     }
   }
}

int valid_position(int row, int col){
        if(row<0 || row>N-1 || col <0 || col>N-1) return false;

        return true;
}

int find_words(Node *t, char maze[][N], int curr_len, 
               int curr_row, int curr_col, int *prefix){
       
       int new_row, new_col;
       int  dir,i;
       char key = maze[curr_row][curr_col];

       Node * current = t->children[GET_CHAR_INDEX(key)];

    /* Before finish the prefix we hit the null, prefix is not present */
       if(current == NULL) return false;

       /* If reach the prefix of len = curr_len but its not a word, 
        we can look forward with len = curr_len +1 */
       if(curr_len == 1 && current->value != LEAF_NODE){
                *prefix = true;
                return false;
       }
     /* If we reach at the leaf node, for this length, 
      we found a word with length = curr_len */
       if(curr_len == 1 && current->value == LEAF_NODE)
                return true;

       /* For every character look in all direction */
       for(dir = NORTH; dir<= RIGHT; dir++){

         /* if the key is present */
         if(search_key(t, key)){
         /* Move to next character based on direction of movement */
            update_params(curr_row, curr_col, &new_row, &new_col, dir);

            /*Validate that we are good in maze */
             if(valid_position(new_row, new_col)){

             /*Find word with len -1 in remaining trie */
              if(find_words(current, maze, curr_len-1, new_row, new_col, prefix)) {
                   return true;
               }
             }
          }
          else{
              return false;
          }
       }
}
void main(){

        trie t;
        initialize(&t);
        int i;

        char *dict []  = {"cat", "dog", "word"};

        char maze[N][N]  = { 'a' , 'c', 'a', 't',
                             'd' , 'w', 'o', 'r',
                             'o',  'g', 'd', 'd',
                             'p', 'p',  'p', 'p'
                           } ;
        for(i =0; i <3; i++){
                insert(&t, dict[i]);
        }

         find_words_wrapper(&t, maze);
}

Even though we have reduced the search space, we may end up looking at all the words i.e (n2!). There is a way to reduce it by using dynamic method approach, we look at that in the future.

Tries as data structure

Tries as data structure

Tries is a data structure which is usually neglected by programmers while designing solutions. Tries are very useful in cases where strings are involved with a good amount of duplication of their prefixes. What are tries? Trie is a data structure which stores information, referred to as key, in such a manner that the common information is stored only once. Usually keys are strings. No node in trie stores the whole key, but the position of the node gives us information that which key it is part of. All the descendant node in trie have the same prefix, hence trie is also know as prefix trees.Word trie is taken from retrieval word.

Another property of trie is that every node of trie has multiple children. In case of string (assuming lower case), there will be 26 children of each node. Take an example: We have the following strings: “House”, “Home”, “Tree” I need to store this information, Trie for the above words would look like:

Trie data structure example

Insertion in trie

Before insertion, we need to understand that how can we represent a node of a trie. Since intermediate nodes of a key do not store any information at them, we need to distinguish between the leaf node and the intermediate node. For that we can store a field called as the value inside each node, If the value is zero, it’s an intermediate node, if non-zero, it’s a leaf node.
In our case, we are assuming MAX_SIZE of 26.
Basic initialization

#define MAX_SIZE 26

#define GET_CHAR_INDEX(ch)\
        ((int) ch - (int)'a' )

#define LEAF_NODE  1
#define true 1
#define false 0

typedef struct trie_node_t {
        int value;
        struct trie_node_t * children[MAX_SIZE];
}Node;

typedef struct trie{
        int count ;
        Node *root;
}trie;

void initialize(trie *t){
        t->root = (Node *)malloc(sizeof(Node));
        t->count =0;
}

Now we can go and see insertion part.
Scan through the string character one by one. For each character, check if it has child nodes. The character acts as an index in children array. If the node at the char index has a child node, move to the child node and take the next character of the string. Repeat step 2 and 3 If the node at the char index doesn’t have a child node, create a child node and add it to trie. Move the character in the string and move down to a newly created child. Go to step 2.

void insert(trie *t, char key[]){

        int len = strlen(key);
        int i;

        if(t->root == NULL){
                printf("\n Trie is not initialized\n");
        }

        Node * current = t->root;
        for(i=0; i<len; i++){

             /* Get the index of char in children array */
             int index  =  GET_CHAR_INDEX(key[i]);

             /* If the children array does not contain the pointer
                at index pointed by char, allocate new node */

             if(current->children[index] == NULL){
                current->children[index] = (Node *)malloc(sizeof(Node));
             }
             /* Else traverse down the trie. */
             current = current->children[GET_CHAR_INDEX(key[i])];
        }

        /* To mark it as leaf */
        current->value = LEAF_NODE;
}

Search in trie
The best part of the trie is its search. It is done in O(M) where M is the length of the input string.
Start from the root. and take the first character of the string. If the array index pointed by the character is NULL, return false. Else move to the child node and next character. Repeat step 2 and 3 till the end to the input string. If we reach till the end of the string and the current node is a leaf node (identified using value), then return true.

int search(trie *t, char key[]){
        int len = strlen(key);
        int i;
        if(t->root == NULL){
                printf("\n Trie is not initialized\n");
        }

        Node * current = t->root;
        for(i=0; i<len; i++){
                int index  =  GET_CHAR_INDEX(key[i]);

                /* If characters are left in key and we have 
                reached at NULL, there is no key present */
                if(current->children[index] == NULL) return false;
                 
                /* Else traverse down the trie */
                current = current->children[index];
        }

        /* If we have reached the lead=f node, key is present */
        if(current && current->value == LEAF_NODE )
                return true;

        return false;
}

The only drawback with tries is that it takes a lot more space (K *  MAX_SIZE * N) as compared to the binary search tree. It is evident from the fact that every node in trie has to have a place for all the possible characters. In time complexity terms, insertion and search both are O(M) where M is length of string being entered. In this post we saw insertion and search operation in tries. We will use these operations in the future post and solve some real problems.