First non repeated character in string

First non repeated character in string

Given a string, find first non repeated character in a stringFor example, string is abcbdbdebab, the first non repeating character would be c. Even though e is also non repeating in string, c is output as it is first non repeating character.

Non repeating character : thoughts

What does it mean to be non-repeating character? Well, the character should occur in string only once. How about we scan the string and find what is count for each character? Store character and count in map as key value pair.
Now, that we have <character, count> key value pair for all unique characters in string, how can we find first non repeating character? Refer back to original string; scan the string again and for each character, check the corresponding count and if it is 1 return the character.

package com.company;

import java.util.HashMap;

/**
 * Created by sangar on 4.10.18.
 */
public class FirstNonRepeatingChar {

    HashMap<Character, Integer> characterCount = new HashMap<>();

    public char firstNonRepeatingCharacter(String s){
        //Best to discuss it with interviewer, what should we return here?
        if(s == null) return ' ';

        if(s.length() == 0) return ' ';

        for (char c: s.toCharArray()){
            if(!characterCount.containsKey(c)){
                characterCount.put(c,1);
            }
            else {
                characterCount.put(c, characterCount.get(c) + 1);
            }
        }
        for (char c: s.toCharArray()) {
            if(characterCount.get(c) == 1) return c;
        }

        return ' ';
    }
}
package test;

import com.company.FirstNonRepeatingChar;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 28.8.18.
 */
public class FirstNonRepeatingCharTest {

    FirstNonRepeatingChar tester = new FirstNonRepeatingChar();

    @Test
    public void firstNonRepeatingCharExists() {
        String s = "abcbcbcbcbad";
        assertEquals('d', tester.firstNonRepeatingCharacter(s));
    }

    @Test
    public void firstNonRepeatingCharDoesNotExists() {
        String s = "abcbcbcbcbadd";
        assertEquals(' ', tester.firstNonRepeatingCharacter(s));
    }

    @Test
    public void firstNonRepeatingCharWithEmptyString() {
        String s = "";
        assertEquals(' ', tester.firstNonRepeatingCharacter(s));
    }

    @Test
    public void firstNonRepeatingCharWithNull() {
        assertEquals(' ', tester.firstNonRepeatingCharacter(null));
    }
}

Complexity of this method to find first non-repeating character in a string is O(n) along with space complexity of O(1) to store the character to count map.

There is always some confusion about space complexity of above method, we think as 256 characters are used, should it not be counted as space complexity? Definitely. But in asymptotic notations, this space is independent of size of input, so space complexity remains O(1).

One more thing, even though time complexity is O(n), input string is scanned twice, first time to get count of characters and second time to find first non repeating character.

Optimization

Consider a case when string is too large with millions of characters, most of them repeated, above solution may turn slow in last where we look for character with count 1 in map.  How can we avoid scanning array second time?
How about we store some information with character in map along with count, so that we can figure out if the character is first non repeating or not.
Or we can have two maps, one stores the count and other stores the first index of character.

Once, we have created two maps as mentioned above, go through the first map and find the all characters with count 1. For each of these characters, check which one has the minimum index on second map and return that character.

Complexity of algorithm remains same, however, second scan of string is now not required. In other words, second scan is now independent of size of input as it depends on the size of first map, which is constant to 256 as that’s the number of unique 8 bit characters possible.

Find first non repeating character : Implementation

package com.company;

import java.util.HashMap;

/**
 * Created by sangar on 4.10.18.
 */
public class FirstNonRepeatingChar {
    public char firstNonRepeatingCharacterOptimized(String s){
        HashMap<Character, Integer> characterCount = new HashMap<>();
        HashMap<Character, Integer>characterIndex = new HashMap<>();
        //Best to discuss it with interviewer, what should we return here?
        if(s == null) return ' ';

        if(s.length() == 0) return ' ';

        for (int i=0; i<s.length(); i++){
            char c  = s.charAt(i);
            if(!characterCount.containsKey(c)){
                characterCount.put(c,1);
                characterIndex.put(c,i);
            }
            else {
                characterCount.put(c, characterCount.get(c) + 1);
            }
        }
        char nonRepeatedCharacter = ' ';
        int prevIndex = s.length();
        for (char c : characterCount.keySet()) {
            if(characterCount.get(c) == 1 
			&& characterIndex.get(c) < prevIndex){
                prevIndex = characterIndex.get(c);
                nonRepeatedCharacter = c;
            }
        }
        return nonRepeatedCharacter;
    }
}
package test;

import com.company.FirstNonRepeatingChar;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 28.8.18.
 */
public class FirstNonRepeatingCharTest {

    FirstNonRepeatingChar tester = new FirstNonRepeatingChar();

    @Test
    public void firstNonRepeatingOptimizedCharExists() {
        String s = "aebcbcbcbcbad";
        assertEquals('e', tester.firstNonRepeatingCharacterOptimized(s));
    }

    @Test
    public void firstNonRepeatingCharOptimizedDoesNotExists() {
        String s = "abcbcbcbcbadd";
        assertEquals(' ', tester.firstNonRepeatingCharacterOptimized(s));
    }

    @Test
    public void firstNonRepeatingCharOptimizedWithEmptyString() {
        String s = "";
        assertEquals(' ', tester.firstNonRepeatingCharacterOptimized(s));
    }

    @Test
    public void firstNonRepeatingCharOptimizedWithNull() {
        assertEquals(' ', tester.firstNonRepeatingCharacterOptimized(null));
    }
}

Please share if there is anything wrong or missing. If you are interested in taking personalized coaching sessions by our expert teachers, please signup to website and get first session free.


First non repeated character in stream

First non repeated character in stream

In last post, we discussed first non repeated character in string, where all characters where available in memory beforehand. Let’s take another variation of the problem : Given a stream of characters, find first non repeated character in stream in stat a given point of time.

For example, if stream till now is a,a,b,c,c,d; and it is asked what is first non repeated character at this point, it will be b. Next character in stream is let’s say b, then first non repeated character in stream at this point will be d.

Non repeated character in stream : thoughts

If set of characters is already present in memory, then it is very simple as we discussed here. Idea is to create a hash map with character as key and number of times is occurs in string as value; increase the count when character is seen during scan of string.
Once all characters are processed, go through string again and return first character which has count 1. However, this approach does not work with continuous stream of characters.

Why a queue?

Problem is to return first non repeating character at a given point of time. A character which was unique at a certain point of time, may not be unique at another point of time in future, that character may have be duplicated in stream coming in during those instances of time. We need to store characters in order of their arrival, however remove them as they occur more than once. At any specific time, we want the first arrived character till that point and has only one occurrence. Are you able to pattern?

First in first out pattern and queues are the best data structure to implement first in first out pattern.

One thing to be noted that even though first in first out pattern applies, removal of character is ad-hoc between two queries when stream flows in.  Characters which happen to occur twice or more between queries will be removed from the queue. Why mentioned this because then we cannot have array based queue. What data structure is best to remove a random node? Well, it’s doubly linked list. Given a node, it’s O(1) operation to remove it.

Why a hash map?

Another problem is, how can we map a character coming out of stream to a node in queue? We scan the doubly linked list based queue and delete the node which contains the character. Even though it will be O(1) operation (as max queue can grow 256 nodes why?), it is more expensive computationally to do for all the characters coming from stream.

What can quickly map a node to character, obviously a hash map. So, key value pair is store in hash map with character as key and node address as value.
Hash map actually now servers another purpose too, which is to maintain the count of occurrence. If a character is present in hash, that means it has corresponding node in queue, which implies that character has already been seen once. So we remove it from the queue if there is value for character in map.
If there is not value for the character in hash map, that implies, this is the first time we have seen this element and should be added to queue.

First non repeated character in stream : Algorithm

  1. Process next character from stream.
  2. Check in hash map if there is corresponding node in queue.
  3. If there is node already in hash map
    1. Delete the node corresponding node from queue
    2. go to step 1.
  4. If there is no node in hash map for this character
    1. Create a node and push that at the tail of queue.
    2. Update hash map with char, node address as key value pair

Algorithm looks fine, but there is an implementation problem here, can you see? We are freeing the node from queue however, we are keeping it in hash, to  indicate that character is already seen.
In ideal world, we should not store references to free memory which can be
accessed accidentally and cause segmentation fault. Idea would be to separate storing node pointer and seen functionalities with two hashes.

Create an additional hash map, character as key and bool as value, which indicates if character is seen twice already in stream before. With this hash map, we can remove the entry from other hash map which store node address of particular character, so no dangling pointers.

Modified algorithm

  1. Process next character from stream.
  2. Check in visited hash map if character was ever seen twice before.
  3. If true, go to step 1.
  4. If false, check the node address hash map, if there is node address for character
    1. Delete the node corresponding node from queue and remove entry from hash.
    2. Update visited hash map for this char as true.
    3. go to step 1.
  5. If there is no node in hash map for this character
    1. Create a node and push that at the tail of queue.
    2. Update hash map with char, node address as key value pair
package com.company;

import java.util.HashMap;

/**
 * Created by sangar on 8.10.18.
 */
public class NonRepeatedCharInStream {

    private HashMap<Character, Boolean> visited =  new HashMap<>();
    private HashMap<Character, ListNode> address =  new HashMap<>();
    private Queue<Character> queue = new QueueImplementation<>();

    public char getNonRepeatedCharacter(){
        return queue.isEmpty() ? ' ' : queue.peek().getData();
    }

    public void process(char c){
        if(visited.containsKey(c)) return;

        if(address.containsKey(c)){
            queue.deleteNode(address.get(c));
            address.remove(c);
            visited.put(c, true);
        }
        else{
            address.put(c,queue.enqueue(c));
        }
    }
}

Queue implementation using Doubly linked list

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public class QueueImplementation<E> implements Queue<E>{
    ListNode<E> head;
    ListNode<E> tail;

    public QueueImplementation(){
        head = null;
        tail = null;
    }

    @Override
    public ListNode<E> deleteNode(ListNode<E> node){
        if(this.isEmpty()) {
            return null;
        }

        if(this.head == node){
            if(this.head.getNext() != null) this.head.getNext().setPrev(null);
            this.head = this.head.getNext();
            return node;
        }

        if(this.tail == node){
            if(this.tail.getPrev() != null) this.tail.getPrev().setNext(null);
            this.tail = this.tail.getPrev();
            return node;
        }
        /*
            We are deleting node in between. So following things happen
            1. If node has prev, set node.prev.next = node.next.
            2. If node has next, set node.next.prev = node.prev
        */
        if(node.getPrev() != null) node.getPrev().setNext(node.getNext());
        if(node.getNext() != null) node.getNext().setPrev(node.getPrev());

        return node;
    }


    @Override
    public ListNode peek() {
        if(this.isEmpty()) {
            return null;
        }
        return this.head;
    }

    @Override
    public ListNode remove() {
        if(this.isEmpty()) {
            return null;
        }
        /*
            We are deleting node at head. So following things happen
            1. Set temporary node point to head.
            2. Move head to next of node.
            3. Set prev of new head to NULL.
            4. Free the temp node.
          */
        ListNode<E> tempNode = this.head;
        this.head.setPrev(null);
        this.head = this.head.getNext();

        return tempNode;
    }

    @Override
    public ListNode enqueue(E data) {
        if(this.isEmpty()) {
            this.head = new ListNode<E>(data);
            this.tail = this.head;
            return this.head;
        }
        ListNode<E> newNode = new ListNode<E>(data,null, this.tail);
        this.tail.setNext(newNode);
        this.tail = newNode;

        return newNode;
    }

    @Override
    public boolean isEmpty() {
        return this.head == null;
    }
}

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public interface Queue<E> {
    public ListNode<E> peek();
    public ListNode<E> remove();
    public ListNode<E> enqueue(E data);
    public ListNode<E> deleteNode(ListNode<E> node);
    public boolean isEmpty();
}

Test cases (add more and see if that works)

package test;

import com.company.NonRepeatedCharInStream;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 23.9.18.
 */
public class NonRepeatedCharInStreamTest {

    NonRepeatedCharInStream tester = new NonRepeatedCharInStream();

    @Test
    public void testNormalCase() {

        tester.process('a');
        tester.process('b');
        tester.process('c');

        assertEquals('a', tester.getNonRepeatedCharacter());

        tester.process('a');
        assertEquals('b', tester.getNonRepeatedCharacter());

        tester.process('c');
        tester.process('d');
        tester.process('b');

        assertEquals('d', tester.getNonRepeatedCharacter());
    }

    @Test
    public void testNoUniqueCharCase() {

        tester.process('a');
        tester.process('b');
        tester.process('c');
        tester.process('a');
        tester.process('b');
        tester.process('c');

        assertEquals(' ', tester.getNonRepeatedCharacter());
    }
}

Let’s see how this algorithm works with an example. To start with both hash maps (visited and address) and queue are empty. Stream starts with character c, new node is added to queue, also, entry is entered into the address hash map to store address of node in queue.

first character added to queue and hash map

Next d comes from stream,  we check if it is on visited map, it is not, so we check if it on address map. It is not there, it means, we saw c first time. Add it to queue and store pointer to node in address node.

d is added to address map and queue

Next character is e, same process followed as d and it gets added to queue and address map.

Imagine that stream puts d out now. We will check in visited map, it is not, however, in address map, it is present. This is where we know that d has occurred twice. We delete it from queue and address map. However, we add it to visited map.

 first non repeated character at this moment is c 

Now comes the character f, it is not present in visited map nor in address map, so we add it to queue and store pointer in the address map.

Next comes the character d, we check visited map it true and so do not add it anywhere and skip it.

What if next character is c,  it is present in address map, so remove it from queue and map. Add it to visited map.

first non repeated character in stream
At this point of time, first non repeated character is e

Complexity to find first non repeated character in stream is O(1) as number of characters processed from stream does not impact the processing time. Space complexity is also O(1) as it independent of number of characters processed.

Please share if there is something wrong or missing. Please reach out to us at communications@algorithmsandme.com if you have doubt or need any help with your interview preparations.

Convert excel column to equivalent number

Convert excel column to equivalent number

Every column is assigned a value, A =1 , B=2 … and Z= 26. Again AA = 27, AB = 28…. and AZ = 52. Given a string S, find out the equivalent number. For example for AAA it will be 703.

This problem has simple solution. We just need to come up with the mathematical formula for it. So let’s work out some examples so that we can identify some pattern and then figure out formula. A  = 1 = 1 * (26 ^0) AB = 2 + 26 ^1 =  28 AAB = 2 + 26^1 + 26^2 = 704  B  = 2 = 2 * (26 ^0) BB = 2 + 2 * (26^1) =  54 ABB = 2 + 2* (26^1) + 1 *( 26^2) = 730 C = 3  = 3 * (26 ^0)  AC  = 3 * (26^0) + 1 * (26^1) = 29 ACC = 3 * (26^0) + 3 * (26^1) + 1 * (26^2)=757

Do you see some pattern?  Yes, we need to get the power of 26 based on the place the character is in string , starting from right hand side. For example, in ABB, we took (26^2) because A is at second position from the right side, starting from zero. What else? We multiply that power to the value assigned to that character. For example, for C we multiply with 3. Value of single character can be easily found out using char -‘A’ + 1 expression. Rest is to just add each of the value to get the result. Hence our formula is
result  = result + S[i] * power(26, len-i-1) where i = 0 to len-1
It is fairly simple to code it.

int power(int i){

   if(i==0)
      return 1;
   if(i == 1) return 26;
   else{
        int temp =  power(i/2);
        if(i%2){
              return 26 * temp *temp;
        }
        else
              return temp * temp;
   }
}
void excel_column(char *s){

 int i ;
 int len  = strlen(s);
 int rank =0;
 for(i=0; i<len;i++){
        rank += power(len-i-1) * ((s[i]-'A')+1);
 }
 printf("%d", rank);
}

The complexity to convert excel column to an equivalent number code is O(N) where N is the size of the string.

Find words in maze

Find words in a maze

We learned basics of tries in last post. Let’s see if we could apply some of those properties in solving some real problems. Today’ problem at hand is to find all the words which are present in a maze of characters of size NxN. The condition is that characters of words should be adjacent to each other either horizontally or vertically.
For example, below maze contains cat, dog and word words in it.

What is the brute force solution? Simple, find all the words which can be formed using these characters in maze starting of length 1 to N*N. For each word check if the given dictionary contains the word, if yes, store that word in the result.
What is the search space here? There will be (n2!) words to be looked into the dictionary. Not a practical solution. Optimizing search in the dictionary using sorting and binary search will not be of much help. (Good point to mention at interview though).
So how can we reduce the search space? Let’s first take an example and see what is happening in the brute force approach.
If we look closely, we are checking string length N, even though we already know that N-1 character string leading to it is not a valid string. Can we avoid it? Yes, using tries. This is how we can do it.

Store all words in the given dictionary in a trie. If there is no prefix in trie with length = i; there cannot be the word with i+1 length having this prefix of length i. So just abort the lookup of words with this prefix. Else increase the string length by 1 and repeat the above step. Trie fits perfectly in this algorithm as we are dealing with prefixes. We need to keep track that what we have already visited and if there can be more words with the visited prefix. If we take the example of the above maze, we start from ‘a’, as we see that there is no word starting with ‘a’ in trie by looking at children array of the root, we can safely avoid calculating words starting with ‘a’ of any length from there.

Find words in a maze : Implemenetation

#include<stdio.h>
#include<stdlib.h>

#define N 4

int search_key(Node *current, char key){
   int index  =  GET_CHAR_INDEX(key);

   if(current->children[index] == NULL) return false;

   return true;
}

enum direction {
        NORTH =1,
        SOUTH,
        LEFT,
        RIGHT
};

void update_params(int row, int col, int *new_row, int *new_col, int dir){
   switch(dir){
      case NORTH:
           *new_row = --row;
           *new_col = col;
      break;
      case SOUTH:
           *new_row = ++row;
           *new_col = col;
      break;
      case LEFT:
           *new_col =  ++col;
           *new_row = row;
      break;
      case RIGHT:
           *new_col =  --col;
           *new_row = row;
      break;
     }
}


void find_words_wrapper(trie *t, char maze[][N]){

  int i,j,len, prefix_found = false;
  for(i=0; i<N; i++){
     for(j=0; j<N; j++){

    /*Consider all length words which can be formed stating with maze[i][j] char */

      for(len =1; len <N*N; len++){
        /* To check if we need to check for further length 
           1. if prefix_found = false, don't check, 
           no words possible for larger length 
           2. if prefix_found = true, continue looking*/

           prefix_found = false;

           /* If finally reached at the leaf of trie starting 
              with maze[i][j] and length = len */

              if(find_words(t->root, maze, len,i,j, &prefix_found)){
                 printf(" Word found at (%d %d)\n", i,j);
              }
              else if(prefix_found == false)
                 break;
       }
     }
   }
}

int valid_position(int row, int col){
        if(row<0 || row>N-1 || col <0 || col>N-1) return false;

        return true;
}

int find_words(Node *t, char maze[][N], int curr_len, 
               int curr_row, int curr_col, int *prefix){
       
       int new_row, new_col;
       int  dir,i;
       char key = maze[curr_row][curr_col];

       Node * current = t->children[GET_CHAR_INDEX(key)];

    /* Before finish the prefix we hit the null, prefix is not present */
       if(current == NULL) return false;

       /* If reach the prefix of len = curr_len but its not a word, 
        we can look forward with len = curr_len +1 */
       if(curr_len == 1 && current->value != LEAF_NODE){
                *prefix = true;
                return false;
       }
     /* If we reach at the leaf node, for this length, 
      we found a word with length = curr_len */
       if(curr_len == 1 && current->value == LEAF_NODE)
                return true;

       /* For every character look in all direction */
       for(dir = NORTH; dir<= RIGHT; dir++){

         /* if the key is present */
         if(search_key(t, key)){
         /* Move to next character based on direction of movement */
            update_params(curr_row, curr_col, &new_row, &new_col, dir);

            /*Validate that we are good in maze */
             if(valid_position(new_row, new_col)){

             /*Find word with len -1 in remaining trie */
              if(find_words(current, maze, curr_len-1, new_row, new_col, prefix)) {
                   return true;
               }
             }
          }
          else{
              return false;
          }
       }
}
void main(){

        trie t;
        initialize(&t);
        int i;

        char *dict []  = {"cat", "dog", "word"};

        char maze[N][N]  = { 'a' , 'c', 'a', 't',
                             'd' , 'w', 'o', 'r',
                             'o',  'g', 'd', 'd',
                             'p', 'p',  'p', 'p'
                           } ;
        for(i =0; i <3; i++){
                insert(&t, dict[i]);
        }

         find_words_wrapper(&t, maze);
}

Even though we have reduced the search space, we may end up looking at all the words i.e (n2!). There is a way to reduce it by using dynamic method approach, we look at that in the future.

Remove particular combination of characters

Remove particular combination of characters from string.

Given a string, we need to remove a particular combination of characters, for example, if s = “aacbbdccaac” and if combination is “ac” then our output will be “abbdcca”. If s = “aacaaccd” and if combination is “ac”, then output should be “acd”. Following will be the execution  aacaaccd ——-> aacd ——>ad 
Note that this has to be done in linear time and without any extra space.

Here there are two subparts of the problem: One, we need to keep track we have already encountered the first character in the combination of characters needed to be removed.
Second we need to keep track of the position where the next character should be placed in the string if it is not to be eliminated. First problem can be easily solved by maintaining a state machine, where we have two states and one moves from one state to another based on the input. In current example, we have states like state “INITIAL” and “MOVED”, Whenever we encountered “a”, we move from INITIAL to MOVED. Then if we get “c” in “MOVED” state we are sure that we have encountered the whole pattern and they need to be removed. If we are in “MOVED” state and if we don’t get “c”, we move back to “INITIAL” state.

For the second part, we can simply follow the approach we have used in this post, that is to maintain two pointers, one to point to the character to be processed, and other to point position where current character to be placed if it is not be eliminated.
The problem can be extended with multiple characters in pattern, with change in state machine accordingly.Same state machine method can be used to count number of words in a given string. Whenever we encounter the separator, we move to OUT state and as soon as we see first character in OUT state we move to IN state till we again see separator.

#define INITIAL 1
#define MOVED 2
char * remove_pattern(char *s, char * pattern){
    int state = INITIAL;
    if(s == NULL) return NULL;
    if(pattern == NULL) return s;

    char * source = s;
    char * destination  = s;
    while(*source != ''){

        /*If character is not equal to the first character of the pattern,
          copy it as such */
          if(state == INITIAL && *source != *pattern){
               *destination = *source;
               destination++;
          }
          else if( *source == *pattern && state == INITIAL){
              /* Move state to MOVED */
              state = MOVED;
          }else if(state == MOVED && *source != *(pattern + 1)){
                /* If next character is not as per pattern, 
                   copy previous character first. */
                *destination =  *pattern;
                destination++;
                /* If next character is first character of the pattern,
                   move state to INITIAL */
                   if(*source != *pattern){
                       *destination  = *source;
                       destination++;
                       state =  INITIAL;
                   }
             }
         /* If we hit the pattern, start again and move state to INITIAL */
          else if(state == MOVED && *source == *(pattern + 1)){
                state = INITIAL;
          }
          /* After moving the characters, check if they 
            don't make pattern again */
          if((int)(destination-s) >= 2
              && *(destination -2) == *pattern
              && *(destination-1) == *(pattern +1)){
              destination -=2;
          }
          source++;
    }
    *destination = '';
    return s;
}

Test cases
1. When pattern is present
s= “aaacacacbbb” pattern = “ac”

2. When pattern is not pattern
s =”aaaabdcaaa” pattern  = “ac”
3. When movement leads to pattern again
s= “abacaccdc” pattern =”ac”

4. When string is NULL or pattern is NULL s= NULL pattern = NULL
5. When return string is empty s = “acacacac” pattern =”ac”

6. Just first character in pattern s= “aaaaaaaaa” pattern =”ac”

The complexity of the above code is O(N) and no extra space used.