First non repeated character in stream

Tags: , , , ,

First non repeated character in stream

In last post, we discussed first non repeated character in string, where all characters where available in memory beforehand. Let’s take another variation of the problem : Given a stream of characters, find first non repeated character in stream in stat a given point of time.

For example, if stream till now is a,a,b,c,c,d; and it is asked what is first non repeated character at this point, it will be b. Next character in stream is let’s say b, then first non repeated character in stream at this point will be d.

Non repeated character in stream : thoughts

If set of characters is already present in memory, then it is very simple as we discussed here. Idea is to create a hash map with character as key and number of times is occurs in string as value; increase the count when character is seen during scan of string.
Once all characters are processed, go through string again and return first character which has count 1. However, this approach does not work with continuous stream of characters.

Why a queue?

Problem is to return first non repeating character at a given point of time. A character which was unique at a certain point of time, may not be unique at another point of time in future, that character may have be duplicated in stream coming in during those instances of time. We need to store characters in order of their arrival, however remove them as they occur more than once. At any specific time, we want the first arrived character till that point and has only one occurrence. Are you able to pattern?

First in first out pattern and queues are the best data structure to implement first in first out pattern.

One thing to be noted that even though first in first out pattern applies, removal of character is ad-hoc between two queries when stream flows in.  Characters which happen to occur twice or more between queries will be removed from the queue. Why mentioned this because then we cannot have array based queue. What data structure is best to remove a random node? Well, it’s doubly linked list. Given a node, it’s O(1) operation to remove it.

Why a hash map?

Another problem is, how can we map a character coming out of stream to a node in queue? We scan the doubly linked list based queue and delete the node which contains the character. Even though it will be O(1) operation (as max queue can grow 256 nodes why?), it is more expensive computationally to do for all the characters coming from stream.

What can quickly map a node to character, obviously a hash map. So, key value pair is store in hash map with character as key and node address as value.
Hash map actually now servers another purpose too, which is to maintain the count of occurrence. If a character is present in hash, that means it has corresponding node in queue, which implies that character has already been seen once. So we remove it from the queue if there is value for character in map.
If there is not value for the character in hash map, that implies, this is the first time we have seen this element and should be added to queue.

First non repeated character in stream : Algorithm

  1. Process next character from stream.
  2. Check in hash map if there is corresponding node in queue.
  3. If there is node already in hash map
    1. Delete the node corresponding node from queue
    2. go to step 1.
  4. If there is no node in hash map for this character
    1. Create a node and push that at the tail of queue.
    2. Update hash map with char, node address as key value pair

Algorithm looks fine, but there is an implementation problem here, can you see? We are freeing the node from queue however, we are keeping it in hash, to  indicate that character is already seen.
In ideal world, we should not store references to free memory which can be
accessed accidentally and cause segmentation fault. Idea would be to separate storing node pointer and seen functionalities with two hashes.

Create an additional hash map, character as key and bool as value, which indicates if character is seen twice already in stream before. With this hash map, we can remove the entry from other hash map which store node address of particular character, so no dangling pointers.

Modified algorithm

  1. Process next character from stream.
  2. Check in visited hash map if character was ever seen twice before.
  3. If true, go to step 1.
  4. If false, check the node address hash map, if there is node address for character
    1. Delete the node corresponding node from queue and remove entry from hash.
    2. Update visited hash map for this char as true.
    3. go to step 1.
  5. If there is no node in hash map for this character
    1. Create a node and push that at the tail of queue.
    2. Update hash map with char, node address as key value pair
package com.company;

import java.util.HashMap;

/**
 * Created by sangar on 8.10.18.
 */
public class NonRepeatedCharInStream {

    private HashMap<Character, Boolean> visited =  new HashMap<>();
    private HashMap<Character, ListNode> address =  new HashMap<>();
    private Queue<Character> queue = new QueueImplementation<>();

    public char getNonRepeatedCharacter(){
        return queue.isEmpty() ? ' ' : queue.peek().getData();
    }

    public void process(char c){
        if(visited.containsKey(c)) return;

        if(address.containsKey(c)){
            queue.deleteNode(address.get(c));
            address.remove(c);
            visited.put(c, true);
        }
        else{
            address.put(c,queue.enqueue(c));
        }
    }
}

Queue implementation using Doubly linked list

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public class QueueImplementation<E> implements Queue<E>{
    ListNode<E> head;
    ListNode<E> tail;

    public QueueImplementation(){
        head = null;
        tail = null;
    }

    @Override
    public ListNode<E> deleteNode(ListNode<E> node){
        if(this.isEmpty()) {
            return null;
        }

        if(this.head == node){
            if(this.head.getNext() != null) this.head.getNext().setPrev(null);
            this.head = this.head.getNext();
            return node;
        }

        if(this.tail == node){
            if(this.tail.getPrev() != null) this.tail.getPrev().setNext(null);
            this.tail = this.tail.getPrev();
            return node;
        }
        /*
            We are deleting node in between. So following things happen
            1. If node has prev, set node.prev.next = node.next.
            2. If node has next, set node.next.prev = node.prev
        */
        if(node.getPrev() != null) node.getPrev().setNext(node.getNext());
        if(node.getNext() != null) node.getNext().setPrev(node.getPrev());

        return node;
    }


    @Override
    public ListNode peek() {
        if(this.isEmpty()) {
            return null;
        }
        return this.head;
    }

    @Override
    public ListNode remove() {
        if(this.isEmpty()) {
            return null;
        }
        /*
            We are deleting node at head. So following things happen
            1. Set temporary node point to head.
            2. Move head to next of node.
            3. Set prev of new head to NULL.
            4. Free the temp node.
          */
        ListNode<E> tempNode = this.head;
        this.head.setPrev(null);
        this.head = this.head.getNext();

        return tempNode;
    }

    @Override
    public ListNode enqueue(E data) {
        if(this.isEmpty()) {
            this.head = new ListNode<E>(data);
            this.tail = this.head;
            return this.head;
        }
        ListNode<E> newNode = new ListNode<E>(data,null, this.tail);
        this.tail.setNext(newNode);
        this.tail = newNode;

        return newNode;
    }

    @Override
    public boolean isEmpty() {
        return this.head == null;
    }
}

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public interface Queue<E> {
    public ListNode<E> peek();
    public ListNode<E> remove();
    public ListNode<E> enqueue(E data);
    public ListNode<E> deleteNode(ListNode<E> node);
    public boolean isEmpty();
}

Test cases (add more and see if that works)

package test;

import com.company.NonRepeatedCharInStream;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 23.9.18.
 */
public class NonRepeatedCharInStreamTest {

    NonRepeatedCharInStream tester = new NonRepeatedCharInStream();

    @Test
    public void testNormalCase() {

        tester.process('a');
        tester.process('b');
        tester.process('c');

        assertEquals('a', tester.getNonRepeatedCharacter());

        tester.process('a');
        assertEquals('b', tester.getNonRepeatedCharacter());

        tester.process('c');
        tester.process('d');
        tester.process('b');

        assertEquals('d', tester.getNonRepeatedCharacter());
    }

    @Test
    public void testNoUniqueCharCase() {

        tester.process('a');
        tester.process('b');
        tester.process('c');
        tester.process('a');
        tester.process('b');
        tester.process('c');

        assertEquals(' ', tester.getNonRepeatedCharacter());
    }
}

Let’s see how this algorithm works with an example. To start with both hash maps (visited and address) and queue are empty. Stream starts with character c, new node is added to queue, also, entry is entered into the address hash map to store address of node in queue.

first character added to queue and hash map

Next d comes from stream,  we check if it is on visited map, it is not, so we check if it on address map. It is not there, it means, we saw c first time. Add it to queue and store pointer to node in address node.

d is added to address map and queue

Next character is e, same process followed as d and it gets added to queue and address map.

Imagine that stream puts d out now. We will check in visited map, it is not, however, in address map, it is present. This is where we know that d has occurred twice. We delete it from queue and address map. However, we add it to visited map.

 first non repeated character at this moment is c 

Now comes the character f, it is not present in visited map nor in address map, so we add it to queue and store pointer in the address map.

Next comes the character d, we check visited map it true and so do not add it anywhere and skip it.

What if next character is c,  it is present in address map, so remove it from queue and map. Add it to visited map.

first non repeated character in stream
At this point of time, first non repeated character is e

Complexity to find first non repeated character in stream is O(1) as number of characters processed from stream does not impact the processing time. Space complexity is also O(1) as it independent of number of characters processed.

Please share if there is something wrong or missing. Please reach out to us at communications@algorithmsandme.com if you have doubt or need any help with your interview preparations.