TIL - CS50x Week 5 Data Structure

Until this lesson, the term “data structure” reminded me only of the “data types” that variables can have. But I learned that data structure rather referred to how multiple data is stored and accessed in different methods.

Array

Though I’ve used arrays in so many different languages, since there were a lot of simplification applied to the arrays in Javascript and Python, I only came to know of the restrictions that arrays have.

The size of an array is predetermined and fixed. This was actually one of the biggest challenges of using arrays in C. Unlike other high level languages, where the additional memory allocation takes place behind the scenes, in C this is either impossible or very inefficient. This means that in order to correctly use arrays, one must know beforehand, how much memory the array should allocate. Additionally it is rather complex to add or delete items to and from an array.

However, in comparison to the data structures that are about to follow, it is rather easy and efficient to sort and search, since the array contains only the data itself and is located consequently.

Linked List

The predominant advantage of a linked list is the efficiency in adding items to a list. Because the list contains not only the value of the actual data but also the pointer, of which the next data is located, when a new item is added to the list, this data can be stored in an arbitrary location which can be connected into the string of nodes, practically anywhere in the list.

However, since a single data only is capable of referring to the next data, search is linear and inefficient when the size of the data grows. (O(n)) Moreover, due to the complexity of referring to the data that points to a certain data, deletion is can also be a problem.

Hash Table

Hash Function

Hash function is a function when a string is passed through the function, a number is returned as an output.

Using this hash function, hash table stores strings in the form of numbers. When several of the strings are converted into the same number using the hash function, these data is then stretched into linked lists or the table itself stretches to encompass a larger variety of the numbers.

While this is in practice faster in search compared to some of the previous data structures, in theory, when the data size grows indefinitely, the time of search asymptotically reaches O(n). In addition, the memory required to store the data is larger than the previous data structures, since in order for the table to be complete, it requires memory allocations for every possible hash number, whether or not a data is actually stored.

Trie

Trie is short for re’trie’val. As the origin of the name suggests, search of data in this data structure is highly efficient. For every character of the string a separate array of 26 letters is created, similar to a combination of arrays and trees.

Though this might require more memory than all the other data structures, the search for a data in a trie is highly efficient, because there is only a single trail to follow down to the destination. Therefore the time spent on the search would equal to the length of the string at the most, in other words O(1).

Higher level data constructs

Queue

Like the queue in real life, queue is based on the idea of “First In First Out(FIFO)”. This means that when an item is added to the queue (enqueue), it is added to the very back of the current items, and when an item is taken from the queue (dequeue), this is done from the very front of the list of items.

Stack

On the contrary, stack is based on the idea of “Last In First Out(LIFO)”, also similar to the stacks in real life. When a data is added to this list (push), it is done so at the very end of the line, as in the queue. However, when an item is removed from this list, the latest item, on the back of the list, is removed first.