System design basics – III Consistent hashing

Consistent Hashing

To understand consistent hashing, first of all, we have to understand traditional hashing and it’s limitations in large scale distributed systems.

Hashing in plain terms is nothing but a key-value pair store, were given a key, the associated value can be found very efficiently. Example: Let’s say we want to find the name of a street in the city given its zip code. Idea is to store this information as hash as

The problem becomes more interesting when data is too big to store on one node or machine, multiple such nodes or machines are required in the system to store it. For example, a system which uses number of web caches. First question: How to decide which key goes on which node? The simplest solution is to use a modulo function to decide. Given a key, take the hash of the key,  divide it by the number of nodes in the system and then put that key on to that node. Similarly, when fetching key, hash the key, divide with the number of nodes and then go to that node and fetch the value. The picture depicts conventional hashing in the multi-node system.

traditional-hashing

Failures are common in commodity hardware-driven distributed multi-node system. Any node can die without any prior notice and the expectation is that the system runs unaffected with a slight cost on performance. What happens when in the system described above, a node goes down?  In traditional hashing, the total number of nodes has decreased, so function determining node to put key on or fetch key from, changes. New keys will be working fine. What happens to existing keys? They are all in the wrong nodes as per the new function.

traditional-hashing (1)

To fix this problem, we have to redistribute all existing keys on the remaining nodes, which may be a very costly operation and can have detrimental effects on the running system. Again what happens when a node comes back? well, repeat what was done when the node went down. This may result in thrashing effect where if a node misbehaves regularly, the system would do no good work except re-distribution of keys.

How to solve this challenge? This is where consistent hashing comes into the picture.

Consistent hashing in distributed multi-node systems

Consistent hashing comes up with a very simple idea, that is to have nodes and keys in the same id space, unlike traditional hashing where node id and keys were in two different id space. Node id can be a hash function to IP address and then the same hash function is applied to keys to determine which node key goes on or to fetch from.

A critical requirement for consistent hashing implementation is to have a hash function which is consistent irrespective of system view and map keys roughly uniformly on all machines.

Chose any base hash function such that it maps a keyspace to integers in the range [0..M]. Once we divide it with M, it gives us a unit circle. Now, each key once hashed represents a point on this unit circle.

Consistent hashing

How does a key map to a node exactly? Well, a key is hashed and then put key on to the first node you find while moving clockwise. Simple enough, huh? To find a key, take a hash and go to the first node while moving clockwise on to the unit circle.

consistent-hashig (1)

How does it solve the problem of scale? Let’s my system is receiving  5 x load, what happens to nodes and how can I balance load or reduce it? Simple thing is to add more nodes uniformly distributed on the unit circle and problem solved. Consistent hashing built of scale.

What happens when a node goes down? All the keys which were on this node are reallocated to the next successor node on the circle. All other keys remain unchanged. This is far more optimal compared to the case when we have re-distribute all keys on the failure of one node.

As mentioned earlier, we assume that the hash function used will distribute keys on nodes uniformly, which is not realistic. To reduce non-uniformity,  virtual nodes are introduced. In this case, each node is hashed with K different hash function which maps nodes on different points on the circle. Still, the node is going to get 1/N keys however, virtual nodes reduce key load variance significantly.

One challenge still remains: How to efficiently find successor node for a given key, we want to find source s such that h(s) > h(k) of key k. The intuitive solution is to use hash, but hashes do not maintain any ordering information. Best bet is to use binary search tree which maintains order, but the successor function is proportional to the depth of tree which is O(N), We can reduce that by using balanced binary search trees like red-black trees which reduces complexity to log(N).

Where all consistent hashing is used?

Consistent hashing is used in Memcached, Casandra, Amazon Dynamo, etc.

If you find this article useful, please share it. If there is something missing or wrong in the article please share and we will correct it.

Reference:

http://theory.stanford.edu/~tim/s16/l/l1.pdf
http://www8.org/w8-papers/2a-webserver/caching/paper2.html

Most frequent words in file

Most frequent words in file

In last post:  find K smallest element in an array, we learned some concepts to find top or bottom element in a given set. Let’s apply those concept again to a different problem called most frequent words in a file.

Given a text file which contains strings/words, find n most frequently occurring words in it i.e. n words whose count is the maximum.

For example, if given file is like follows: one, two, two, three, three, three, four,four, four,four, five, five, five, five, five, six,six, six, six, six, six  and we are looking for three most frequent words, output should be :  six,five, four.

Line of thoughts

Brute force solution would be really easy, scan the whole file, get the count for each unique words in file. Then sort the output based on that count in descending order and then return first n words.

This problem has three parts to it. First, read all words from file, second created a map which store frequency count of each word on file. Third is to get top n words from that map.

Reading a file is pretty standard operation in any language.  Brush upon Java basics here. We will not focus on that and also that’s not the intention of this question.
Assuming we can read from the file, how can we store frequency count against words. Simple, use a hash map. We read word from file, store in hash if it is not already there with count 1. If words is already in hash map, update the count by 1.

After this step, we have a hash with count of occurrence of each word. Now comes the challenging part:  how to find top n or most frequent words from this hash map. Mind you that hash map does not store elements in sorted or any order. 

Rule of thumb is to find top or least from collection of items, heaps are best bet. To find top N most frequently occurring words, let’s try heap. 
Which heap would be best?  We need to get a limited set, if we have free entry in that set till n words(as we need n top words). All further words have to pass a test before they enter the set. 

If new word is less than the least frequent word in the set, what can be said about this new word? Well, it cannot be in top n words right? 
If new word has frequency more than word with least frequency in set, new word should enter the set and  word with least frequency should be moved out.
What we just explained is typical use of min heap, as it give least element at the root. To find top n most frequent words in file, below is the algorithm.

Most frequent words in file : algorithm

  1. Take first N words appearing in map along with their count and create a min heap with these N words.
  2. One by one read words from hash and check if frequency of new word is more than least frequent word on heap, i.e word at root of heap.
  3. If yes, remove root and add new word in min heap. If not, continue with next word.
  4. When done with all words in hash, min heap will contain N most frequently occurring words in file.

Implementation

package com.company;

import javafx.util.Pair;

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.*;

/**
 * Created by sangar on 1.10.18.
 */
public class FrequentWords {

	public static HashMap<String, Integer> readFile(String fileName)
		throws IOException {
	HashMap<String, Integer> wordMap = new HashMap<>();

	Path path = Paths.get(fileName);
	try (Scanner scanner =  new Scanner(path)){
		while (scanner.hasNextLine()){
			String line = scanner.nextLine();
			if(wordMap.containsKey(line)) {
				wordMap.put(line, wordMap.get(line) + 1);
			}
			else{
				wordMap.put(line, 1);
			}
		}
	}
	return wordMap;
	}

	public static ArrayList<String> mostFrequentWords(String fileName, int n){
		ArrayList<String> topWords = new ArrayList<>();

		try {
			HashMap<String, Integer> wordMap = readFile(fileName);
			PriorityQueue<Pair<String, Integer>> pq =
				new PriorityQueue<>(n, (x,y) -> x.getValue().compareTo(y.getValue()));

			int i = 0;
			Iterator it = wordMap.entrySet().iterator();
			/*
				Get first n words on heap
			*/
			while(it.hasNext()){
				if(i == n) break;
				HashMap.Entry<String, Integer> entry =
					(HashMap.Entry<String, Integer>)it.next();
				pq.add(new Pair<>(entry.getKey(), entry.getValue()));
				it.remove();
				i++;
			}

			/*
				Check all other words, if anyone more than least
				remove the least and add the new word.
			*/
			for (String key : wordMap.keySet()){
				if(pq.peek().getValue() < wordMap.get(key)){
					pq.poll();
					pq.add(new Pair<>(key, wordMap.get(key)));
				}
			}
			while(!pq.isEmpty()){
				topWords.add(pq.poll().getKey());
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
		return topWords;
	}
	
	public static void main(String[] args){  
		System.out.println(mostFrequentWords("/home/sangar/Documents/test.txt", 3));
	}
}

Reading M words from file will require O(m) time, where as creating N element heap will take O(n). Also, scanning through all words and inserting them on to heap has complexity of O((m-n) log n). Overall complexity to find top n most frequent words in fileis O(m log m).


First non repeated character in stream

First non repeated character in stream

In last post, we discussed first non repeated character in string, where all characters where available in memory beforehand. Let’s take another variation of the problem : Given a stream of characters, find first non repeated character in stream in stat a given point of time.

For example, if stream till now is a,a,b,c,c,d; and it is asked what is first non repeated character at this point, it will be b. Next character in stream is let’s say b, then first non repeated character in stream at this point will be d.

Non repeated character in stream : thoughts

If set of characters is already present in memory, then it is very simple as we discussed here. Idea is to create a hash map with character as key and number of times is occurs in string as value; increase the count when character is seen during scan of string.
Once all characters are processed, go through string again and return first character which has count 1. However, this approach does not work with continuous stream of characters.

Why a queue?

Problem is to return first non repeating character at a given point of time. A character which was unique at a certain point of time, may not be unique at another point of time in future, that character may have be duplicated in stream coming in during those instances of time. We need to store characters in order of their arrival, however remove them as they occur more than once. At any specific time, we want the first arrived character till that point and has only one occurrence. Are you able to pattern?

First in first out pattern and queues are the best data structure to implement first in first out pattern.

One thing to be noted that even though first in first out pattern applies, removal of character is ad-hoc between two queries when stream flows in.  Characters which happen to occur twice or more between queries will be removed from the queue. Why mentioned this because then we cannot have array based queue. What data structure is best to remove a random node? Well, it’s doubly linked list. Given a node, it’s O(1) operation to remove it.

Why a hash map?

Another problem is, how can we map a character coming out of stream to a node in queue? We scan the doubly linked list based queue and delete the node which contains the character. Even though it will be O(1) operation (as max queue can grow 256 nodes why?), it is more expensive computationally to do for all the characters coming from stream.

What can quickly map a node to character, obviously a hash map. So, key value pair is store in hash map with character as key and node address as value.
Hash map actually now servers another purpose too, which is to maintain the count of occurrence. If a character is present in hash, that means it has corresponding node in queue, which implies that character has already been seen once. So we remove it from the queue if there is value for character in map.
If there is not value for the character in hash map, that implies, this is the first time we have seen this element and should be added to queue.

First non repeated character in stream : Algorithm

  1. Process next character from stream.
  2. Check in hash map if there is corresponding node in queue.
  3. If there is node already in hash map
    1. Delete the node corresponding node from queue
    2. go to step 1.
  4. If there is no node in hash map for this character
    1. Create a node and push that at the tail of queue.
    2. Update hash map with char, node address as key value pair

Algorithm looks fine, but there is an implementation problem here, can you see? We are freeing the node from queue however, we are keeping it in hash, to  indicate that character is already seen.
In ideal world, we should not store references to free memory which can be
accessed accidentally and cause segmentation fault. Idea would be to separate storing node pointer and seen functionalities with two hashes.

Create an additional hash map, character as key and bool as value, which indicates if character is seen twice already in stream before. With this hash map, we can remove the entry from other hash map which store node address of particular character, so no dangling pointers.

Modified algorithm

  1. Process next character from stream.
  2. Check in visited hash map if character was ever seen twice before.
  3. If true, go to step 1.
  4. If false, check the node address hash map, if there is node address for character
    1. Delete the node corresponding node from queue and remove entry from hash.
    2. Update visited hash map for this char as true.
    3. go to step 1.
  5. If there is no node in hash map for this character
    1. Create a node and push that at the tail of queue.
    2. Update hash map with char, node address as key value pair
package com.company;

import java.util.HashMap;

/**
 * Created by sangar on 8.10.18.
 */
public class NonRepeatedCharInStream {

    private HashMap<Character, Boolean> visited =  new HashMap<>();
    private HashMap<Character, ListNode> address =  new HashMap<>();
    private Queue<Character> queue = new QueueImplementation<>();

    public char getNonRepeatedCharacter(){
        return queue.isEmpty() ? ' ' : queue.peek().getData();
    }

    public void process(char c){
        if(visited.containsKey(c)) return;

        if(address.containsKey(c)){
            queue.deleteNode(address.get(c));
            address.remove(c);
            visited.put(c, true);
        }
        else{
            address.put(c,queue.enqueue(c));
        }
    }
}

Queue implementation using Doubly linked list

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public class QueueImplementation<E> implements Queue<E>{
    ListNode<E> head;
    ListNode<E> tail;

    public QueueImplementation(){
        head = null;
        tail = null;
    }

    @Override
    public ListNode<E> deleteNode(ListNode<E> node){
        if(this.isEmpty()) {
            return null;
        }

        if(this.head == node){
            if(this.head.getNext() != null) this.head.getNext().setPrev(null);
            this.head = this.head.getNext();
            return node;
        }

        if(this.tail == node){
            if(this.tail.getPrev() != null) this.tail.getPrev().setNext(null);
            this.tail = this.tail.getPrev();
            return node;
        }
        /*
            We are deleting node in between. So following things happen
            1. If node has prev, set node.prev.next = node.next.
            2. If node has next, set node.next.prev = node.prev
        */
        if(node.getPrev() != null) node.getPrev().setNext(node.getNext());
        if(node.getNext() != null) node.getNext().setPrev(node.getPrev());

        return node;
    }


    @Override
    public ListNode peek() {
        if(this.isEmpty()) {
            return null;
        }
        return this.head;
    }

    @Override
    public ListNode remove() {
        if(this.isEmpty()) {
            return null;
        }
        /*
            We are deleting node at head. So following things happen
            1. Set temporary node point to head.
            2. Move head to next of node.
            3. Set prev of new head to NULL.
            4. Free the temp node.
          */
        ListNode<E> tempNode = this.head;
        this.head.setPrev(null);
        this.head = this.head.getNext();

        return tempNode;
    }

    @Override
    public ListNode enqueue(E data) {
        if(this.isEmpty()) {
            this.head = new ListNode<E>(data);
            this.tail = this.head;
            return this.head;
        }
        ListNode<E> newNode = new ListNode<E>(data,null, this.tail);
        this.tail.setNext(newNode);
        this.tail = newNode;

        return newNode;
    }

    @Override
    public boolean isEmpty() {
        return this.head == null;
    }
}

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public interface Queue<E> {
    public ListNode<E> peek();
    public ListNode<E> remove();
    public ListNode<E> enqueue(E data);
    public ListNode<E> deleteNode(ListNode<E> node);
    public boolean isEmpty();
}

Test cases (add more and see if that works)

package test;

import com.company.NonRepeatedCharInStream;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 23.9.18.
 */
public class NonRepeatedCharInStreamTest {

    NonRepeatedCharInStream tester = new NonRepeatedCharInStream();

    @Test
    public void testNormalCase() {

        tester.process('a');
        tester.process('b');
        tester.process('c');

        assertEquals('a', tester.getNonRepeatedCharacter());

        tester.process('a');
        assertEquals('b', tester.getNonRepeatedCharacter());

        tester.process('c');
        tester.process('d');
        tester.process('b');

        assertEquals('d', tester.getNonRepeatedCharacter());
    }

    @Test
    public void testNoUniqueCharCase() {

        tester.process('a');
        tester.process('b');
        tester.process('c');
        tester.process('a');
        tester.process('b');
        tester.process('c');

        assertEquals(' ', tester.getNonRepeatedCharacter());
    }
}

Let’s see how this algorithm works with an example. To start with both hash maps (visited and address) and queue are empty. Stream starts with character c, new node is added to queue, also, entry is entered into the address hash map to store address of node in queue.

first character added to queue and hash map

Next d comes from stream,  we check if it is on visited map, it is not, so we check if it on address map. It is not there, it means, we saw c first time. Add it to queue and store pointer to node in address node.

d is added to address map and queue

Next character is e, same process followed as d and it gets added to queue and address map.

Imagine that stream puts d out now. We will check in visited map, it is not, however, in address map, it is present. This is where we know that d has occurred twice. We delete it from queue and address map. However, we add it to visited map.

 first non repeated character at this moment is c 

Now comes the character f, it is not present in visited map nor in address map, so we add it to queue and store pointer in the address map.

Next comes the character d, we check visited map it true and so do not add it anywhere and skip it.

What if next character is c,  it is present in address map, so remove it from queue and map. Add it to visited map.

first non repeated character in stream
At this point of time, first non repeated character is e

Complexity to find first non repeated character in stream is O(1) as number of characters processed from stream does not impact the processing time. Space complexity is also O(1) as it independent of number of characters processed.

Please share if there is something wrong or missing. Please reach out to us at [email protected] if you have doubt or need any help with your interview preparations.

LRU cache implementation

This is commonly asked question in interview, especially Microsoft and Amazon interviews. Problem statement is very simple

Implement LRU cache or Least Recently Used cache

Before going further into solution, first let’s understand what is cache?  In computer architectural terms, a cache is small buffer of pages OS maintains in-order to avoid more expensive main memory accesses.

Usually cache accesses are faster than main memory access, improving overall performance. Whenever a process needs to access content from a specific memory location, it tries to search that page in the cache first. If it finds a page in the cache, it uses it and does not access memory. It’s called cache hit.

However, caches are very small in size as compared to main memory, there is a probability that page requested by the process is not present in the cache. In that case, the new page is moved into the cache and one of the existing pages is swapped out. When this happens, it is called cache miss.

Caching also happens at application layer too, for example, caching visited pages on a browser, caching frequently accessed data from DB to in-memory caches like Redis.

Based on how cache miss is handled in cache, there different type of caches like first in first out cache, least recently use cache, least frequently use cached.

In first in first out cache, in the event of a cache miss, an entity which came first into the cache is evicted first; whereas in least recently used cache, the page which was used least recently gets evicted. In the same vein, least frequently used cache is where page which is least frequently used among all the pages in cache.

LRU cache implementation

Consider that you have a cache with space for an additional page. If cache miss happens, we bring a page from memory and put it in the cache for future access.

Now, if the cache is full, and cache miss happens, we have to bring in a new page and evict a page from cache. In LRU cache, this page will  be the page which accessed the longest time ago.

What if a page was accessed at the start, and then accessed just before the cache miss? Is it the least recently used page? On the contrary, it is the most recently used page and hence should be last to evict. The question now is which page should we evict? In this case, page which came after the first page goes out.

If page is brought into cache first, it is first candidate for eviction, if it is not accessed again before cache miss happens.

Why queue?

In principle, LRU cache is first in first out cache with a special case, that if a page is accessed again, it goes to end of the eviction order. Which data structure is best to implement FIFO pattern? Of course, it’s a queue. So our LRU cache will be a queue where each node will store a page. This queue will have a specific size as cache as a limited size. Whenever a new page is brought in, it is added at the rear of the queue. When eviction happens, it happens from the front of cache.

Why hash?

There is one requirement of LRU cache which does not map directly to queue data structure, which is to move a node corresponding recently accessed page to end of the queue. This poses two problems: First, how to find the node in the queue corresponding to page id being accessed? Second, how to move it to end as efficiently possible? Both problems are very similar to what we solved in first nonrepeated character in stream.

We will use hash which will store the node address in the queue corresponding to page id. This will give us immediate access to the node to be reshuffled.

Why doubly linked list-based queue?

Still the problem remains to move nodes around with moving all elements of the queue. Which data structure removes an element in O(1), given the element? Doubly linked list it is. If we implement queue as a doubly linked list, removing and adding pages from the queue will be O(1) operation.

LRU cache algorithm

  • Cache miss happens :
    • If cache has free entry, enqueue page to queue.
    • If cache is full,  remove the page from from of queue and add new page at the end of queue.
  • Cache hit happens :
    • delete node from current location in queue.
    • Enqueue page at the end of queue.
  • If page is present in hash, it’s a cache hit, if page is not present in hash map, it’s a cache miss.

Queue interface

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public interface Queue<E> {
    public ListNode<E> peek();
    public ListNode<E> remove();
    public ListNode<E> enqueue(E data);
    public ListNode<E> deleteNode(ListNode<E> node);
    public boolean isEmpty();
    public int size();
}

Queue implementation

package com.company;

/**
 * Created by sangar on 8.10.18.
 */
public class QueueImplementation<E> implements Queue<E>{
    ListNode<E> head;
    ListNode<E> tail;
    int size;

    public QueueImplementation(){
        head = null;
        tail = null;
        this.size = 0;
    }

    @Override
    public ListNode<E> deleteNode(ListNode<E> node){
        if(this.isEmpty()) {
            return null;
        }

        if(this.head == node){
            if(this.head.getNext() != null) this.head.getNext().setPrev(null);
            this.head = this.head.getNext();
            this.size--;
            return node;
        }

        if(this.tail == node){
            if(this.tail.getPrev() != null) this.tail.getPrev().setNext(null);
            this.tail = this.tail.getPrev();
            this.size--;
            return node;
        }
        /*
            We are deleting node in between. So following things happen
            1. If node has prev, set node.prev.next = node.next.
            2. If node has next, set node.next.prev = node.prev
        */
        if(node.getPrev() != null) node.getPrev().setNext(node.getNext());
        if(node.getNext() != null) node.getNext().setPrev(node.getPrev());

        this.size--;
        return node;
    }


    @Override
    public ListNode peek() {
        if(this.isEmpty()) {
            return null;
        }
        return this.head;
    }

    @Override
    public ListNode remove() {
        if(this.isEmpty()) {
            return null;
        }
        /*
            We are deleting node at head. So following things happen
            1. Set temporary node point to head.
            2. Move head to next of node.
            3. Set prev of new head to NULL.
            4. Free the temp node.
          */
        ListNode<E> tempNode = this.head;
        this.head.setPrev(null);
        this.head = this.head.getNext();

        this.size--;
        return tempNode;
    }

    @Override
    public ListNode enqueue(E data) {
        if(this.isEmpty()) {
            this.head = new ListNode<E>(data);
            this.tail = this.head;
            this.size++;
            return this.head;
        }
        ListNode<E> newNode = new ListNode<E>(data,null, this.tail);
        this.tail.setNext(newNode);
        this.tail = newNode;

        this.size++;
        return newNode;
    }

    @Override
    public boolean isEmpty() {
        return this.head == null;
    }

    @Override
    public int size() {
        return this.size;
    }
}

LRU cache implementation in java

package com.company;

import java.util.ArrayList;
import java.util.HashMap;

/**
 * Created by sangar on 9.10.18.
 */
public class LRUCache {
    private Queue<Integer> queue;
    private HashMap<Integer, ListNode> pages;
    private int cacheSize;

    public LRUCache(int cacheSize){
        this.cacheSize = cacheSize;
        queue = new QueueImplementation<>();
        pages = new HashMap<>();
    }

    /* This function implements the LRU cache */
    public void readPage(int pageId) {

        /*Cache Miss and we can add the page in the cache */
        if (!pages.containsKey(pageId) && queue.size() < cacheSize) {
            pages.put(pageId, queue.enqueue(pageId));
            return;
        }

        /* Cache Miss and we cannot add new page to cache,
        remove the LRU page */
        if (!pages.containsKey(pageId) && queue.size() >= cacheSize) {
            //First remove it from the queue.
            ListNode evictedPage = queue.remove();
            //Remove node from hash table
            pages.remove(evictedPage.getData());
            //Enqueue new node and add it at tail
            pages.put(pageId, queue.enqueue(pageId));
            return;
        }

        /* Cache is Hit */
        if (pages.containsKey(pageId)) {
            updateCache(pageId);
        }

        return;
    }

    /* This function modifies queue when there is cache hit */
    public void updateCache(int pageId){

        /* Case where queue may be empty - defensive programing*/
        if(queue.isEmpty() && queue.size() < cacheSize){
            pages.put(pageId,queue.enqueue(pageId));
        }
        /* Update the pointers of neighbor nodes and tail in dummy node */
        else{
            ListNode node = queue.deleteNode(pages.get(pageId));
            if(node != null){
                queue.enqueue((Integer)node.getData());
            }
        }
    }

    public ArrayList<Integer> cacheState(){
        ListNode current = queue.peek();
        ArrayList<Integer> cacheState = new ArrayList<>();
        while(current != null){
            cacheState.add((Integer)current.getData());
            current = current.getNext();
        }
        return cacheState;
    }
}

Test case for LRU cache implementation

package test;

import com.company.LRUCache;
import org.junit.jupiter.api.Test;

import java.util.ArrayList;
import java.util.Arrays;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 23.9.18.
 */
public class LRUCacheTest {

    LRUCache tester = new LRUCache(5);

    @Test
    public void testCacheInsertion() {

        ArrayList<Integer> cacheState = new ArrayList<>(Arrays.asList(1,2,3,4,5));

        tester.readPage(1);
        tester.readPage(2);
        tester.readPage(3);
        tester.readPage(4);
        tester.readPage(5);

        assertEquals(cacheState, tester.cacheState());
    }

    @Test
    public void testCacheHit() {

        ArrayList<Integer> cacheState = new ArrayList<>(Arrays.asList(1,4,5,2,3));

        tester.readPage(1);
        tester.readPage(2);
        tester.readPage(3);
        tester.readPage(4);
        tester.readPage(5);
        tester.readPage(2);
        tester.readPage(3);

        assertEquals(cacheState, tester.cacheState());
    }

    @Test
    public void testCacheMiss() {

        ArrayList<Integer> cacheState = new ArrayList<>(Arrays.asList(2,3,4,5,6));

        tester.readPage(1);
        tester.readPage(2);
        tester.readPage(3);
        tester.readPage(4);
        tester.readPage(5);
        tester.readPage(6);

        assertEquals(cacheState, tester.cacheState());
    }

    @Test
    public void testEvictionAndInsertion() {

        ArrayList<Integer> cacheState = new ArrayList<>(Arrays.asList(3,4,5,6,1));

        tester.readPage(1);
        tester.readPage(2);
        tester.readPage(3);
        tester.readPage(4);
        tester.readPage(5);
        tester.readPage(6);
        tester.readPage(1);

        assertEquals(cacheState, tester.cacheState());
    }


    @Test
    public void testEmptyCache() {
        ArrayList<Integer> cacheState = new ArrayList<>();

        assertEquals(cacheState, tester.cacheState());
    }
}

Let’s take an example and see how whole thing works. Let’s say we have cache of size 5, which is empty to start with. Application accesses page id 1,2,3,4 and 5. As they are all cache miss to start with and there is space in cache, all these pages are brought into cache.

implement lru cache in java
Cache state after reading first 5 pages

Now, the application accesses page 6. We have a cache miss. What happens? As the cache is full, we evict the page which is at the front and add page 6 to it.

Page 1 is evicted, and page 6 added to queue

Application accesses page 2 again, which is already present in the cache, it’s a cache hit. As page 2 is the most recently used page, it has to go to the end of the queue.

implement lru cache
Page 2 moves at the end

Let’s say next page accessed is page 7, it is a cache miss. Evict a page from the cache, which is the first node in the queue (3).

lru cache implementation
New page 7 added to end after evicting page 3 from front

Insertion and removal from queue is O(1) complexity.

Please share if there is something wrong or missing. If you are interested in taking personal coaching from our expert teachers, please contact us at [email protected]