## Unique rows in boolean matrix

Given a matrix of 0 and 1, print all unique rows in boolean matrix. For example, if the matrix is show below:

0 1 0 0 1

1 0 1 1 1

0 1 0 0 1

1 0 1 0 0

Output should be

0 1 0 0 1

1 0 1 1 1

1 0 1 0 0

The first brute force solution which has a complexity of O(n^{2} m) where is n is the number of rows and m is the number of columns is as follows:

For each row in the boolean matrix, check all the rows if it matches with any one of them. If yes, skip printing, else print the row.

Another way which is better than the above approach is to convert each row in equivalent decimal and then check for duplicates. Again it has a complexity of O(n * m) where n is the number of rows and m as the number of columns.

Can we do better than this? There is a standard technique to check if a particular pattern is already seen or not. That’s using tries. Each pattern is added to trie, when we try to add a duplicate pattern, we will end up at the leaf node which is already marked as leaf due to an earlier pattern.

With given info, we can say that we will insert each row pattern in the trie. We will modify the insert function of trie to return us whether the row is added newly or it was already present in the trie. As explained above it can be ascertained by the fact that if the last node is marked as leaf node already, then it is duplicate row as we must have traveled the same nodes above. If the last node is not a leaf node already, then there is at least one digit which is different and hence row becomes unique. Mark last node of this pattern as a leaf node, so that same patterns will be detected as duplicates.

Add a row in the trie.

If the last node while entering the pattern is the leaf node, return false. It’s not a unique row.

If the last node is not a leaf node, mark it as a leaf node and return true.

If insert operation returns true, print the row.

Else, skip printing row.

#include<stdio.h>
#include<stdlib.h>
#define MAX_SIZE 2
#define LEAF_NODE 1
#define true 1
#define false 0
typedef struct trie_node_t {
int value;
struct trie_node_t * children[MAX_SIZE];
}Node;
typedef struct trie{
int count ;
Node *root;
}trie;
void initialize(trie *t){
t->root = (Node *)malloc(sizeof(Node));
t->count =0;
}
int insert_1(trie *t, int key[], int col){
int i;
if(t->root == NULL){
printf("\n Trie is not initialized\n");
}
Node * current = t->root;
for(i=0; i<col; i++){
int index = key[i];
if(current->children[index] == NULL){
current->children[index] = (Node *)malloc(sizeof(Node));
}
current = current->children[key[i]];
}
/* To mark it as leaf */
if(current->value != LEAF_NODE){
current->value = LEAF_NODE;
return true;
}
else{
return false;
}
}
void print_unique_rows(int a[5][5], int col, int row){
int i,j;
int key[col];
trie t;
initialize(&t);
for(i=0; i<row; i++){
for(j=0; j<col; j++){
key[j] = a[i][j];
}
if(insert_1(&t, key, col)){
for(j=0;j<col; j++){
printf("%d", key[j]);
}
printf("\n");
}
}
}
//Driver program
int main(){
int a[4][5] = {{0, 1, 0, 0, 1},
{1, 0, 1, 1, 0},
{0, 1, 0, 0, 1},
{1, 0, 1, 0, 0}
};
print_unique_rows(a, 5, 4 );
return 0;
}

The complexity of trie based solution is O(n * m).