Print last n lines of a file

Tags: , , , ,

Print last n lines of file

A lot of times, when we are debugging production systems, we go through the logs being generated by systems. To see the logs which are most recent, we commonly use tail -n functionality of Unix.

Tail -n functionality prints the last n lines of each FILE to standard output

After going through many interview experiences at Microsoft, I found that this question regularly features in the majority of interviews. Let’s take an example and see what to expect out of the functionality.

print last n lines of a file
This is given file content

tail -f
This is output of tail functionality

The first thing we notice about this problem is that we have to print the last n lines. It means we have to maintain some kind of order. If we want the last line first, this is typical LIFO, which is implemented using the stack data structure.

However, another constraint is that we have to print most n lines. In that case, if the number of lines on stack goes more than n, we will remove some lines from it. Which lines should be removed? We will remove the lines which came first. Unstack all the lines from the stack and removed the first line and then put all lines back on to the stack.
When you read, we just read from the top of the stack till stack is empty which will give us last n lines of the file.

Also, tail functionality of Unix prints the line in forwarding order rather than reverse order. If we are implementing true tail functionality, the order will be FIFO rather than LIFO. But make sure that you clarify this with the interviewer.

The complexity of reading n lines is O(n) and putting a new line also takes O(n) complexity. If the stack is implemented using linked list, we do not require additional memory.

What if the file is continuously written on, and tail happens occasionally. As mentioned above, the stack solution has O(n) complexity to put every line, which is not ideal in here. Tail -f actually requires that output grows as things are added to the file.

       -f, --follow[={name|descriptor}]
       output appended data as the file grows;

What if we optimize the writing part using queues to store the last n lines of the file. Imagine a case, when a queue has last n lines of the file at a point of time. Now if a new line comes, we can add its tail of the queue and remove them from the front. If we keep track of tail of queue, insertion and removal operation both become O(1).

To read lines, we have to read the queue in reverse order. This should give us the idea that a doubly linked list should be used to implement queue. Using doubly linked list, if we have the tail pointer, we can always traverse queue in reverse order. The complexity of reading n lines will still be O(n). Real Tail does not require it, you can print the entire queue in FIFO manner, howver, it is good to mention in interview why you chose DLL over singly linked list to implement queue.

Print last n lines of a file: Algorithm

  1. For every line being added to the file, do the following:
  2. If size of queue is less than n, we simply enqueue the line in queue.
  3. If size of queue is greater than n, dequeue line from front and enqueue new line at the end.

If you are tailing an existing file, then read the whole file line by line and do the last two operations in the algorithm.

Print last n lines: implementation

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_SIZE 500
#define true 1
#define false 0

typedef struct queue_l{
	char data[MAX_SIZE];
	struct queue_l *next;
	struct queue_l *prev;
}Queue;

typedef struct dummyNode{
	int size;
	struct queue_l *front;
	struct queue_l *tail;
}dummyNode;

/* Below are the routine function for init queue, enqueue, 
dequeue, queue_empty etc */
void initializeQueue(dummyNode **q){
	*q  = (dummyNode *)malloc(sizeof(dummyNode));
	if(*q){
		(*q)->front = NULL;
		(*q)->tail = NULL;
		(*q)->size = 0;
	}
}

int isEmpty(dummyNode *q){
	if( !(q->size))
		return true;
	
	return false;
}

Queue * enqueue(dummyNode *q, char * elem){
	Queue *newNode= (Queue *) malloc(sizeof(Queue));
	if(newNode){
		strcpy(newNode->data, elem);
		newNode->next = NULL;
		newNode->prev = q->tail;
		
		if(q->tail){
			q->tail->next = newNode;
		}
		q->tail = newNode;
		if(!q->front)
			q->front = newNode;
		q->size++;
	}
	return newNode;
}

char * dequeue(dummyNode *d){
	
	if(isEmpty(d)){
		printf("\n Queue is empty");
		return NULL;
	}

	Queue *q  = d->front;
	d->front = q->next;
        
	if(q->next)
		q->next->prev = NULL;
	else
		d->tail = NULL;

	char * deletedNode = q->data;
	free(q);
	d->size--;
	
	return deletedNode;
}

void update_lines(dummyNode *d, char *s, int n){
    if(d->size <n){
        enqueue(d, s);
    }
    else{
        dequeue(d);
        enqueue(d, s);
    }
}

int main(){
	
	dummyNode *d =  NULL;
	
	int n=10;

	initializeQueue(&d);

	char line[MAX_SIZE], *result;
	FILE *stream;
	/* Open the file */
	stream  = fopen("problems.txt","rb");

	/*Read lines one by one */
	while((result =fgets(line, MAX_SIZE, stream)) != NULL){
		update_lines(d, line,n);
	}

	fclose(stream);
        print_queue(d);
	
	return 0;
}

Please share if there is something wrong or missing. If you are preparing for an interview and want coaching session to prepare you fast, please book a free session with us.