Introduction
In the previous tutorial, we learned about dynamic memory which lets you create arrays which have a size that are only known at run time (and not when the program is written). One way that this occurs is if you have to read in an image. If you want to support images of multiple sizes, you cannot know how big of an array will be required to store the image. One option would be to create a really big array, and then hope that all images will fit in that array. This has two problems: it wastes a lot of memory in the case of a small image and it might to be big enough if the user has a really big image. RAM and image sizes get bigger every year, so the program might not work in the future. In this case, it makes sense to use a dynamic array so that you aren’t wasting space or creating a program that won’t work when you really need it.
In this tutorial, we are going to take our list one step further. What if we don’t know how many items we are going to have until we are done? An example of this would be searching through a text file and finding all of the words that start with the letter a. In this case, there is no way of knowing how many items there will be in the array until we are finished processing the file. Sure, there are some things that we could do. We could declare an array that has a length that is equal to the number of words in the file. This way we would be guaranteed to have an array that is big enough. Another option is to process the file twice. The first time we would count the number of words. Then we will know how long to make the array and the second time we would store off the words. Also, we could first declare an empty array. Then when we find a matching word, we declare an array that is one element larger and then copy all of the data from the old array into the new array, and add the new element on to the end, making sure to delete the old array so that we don’t create a memory leak. This turns out to be a good suggestion, and an even better one if instead of adding one new element to the end, we double the size of the array. However, in this tutorial we are going to talk about a common solution to this problem called a linked list. A linked list is stored differently than an array, but it ends up functioning similarly. Instead of storing all of the values in the array sequentially in memory, it stores a value and then a pointer to the next value.
Code
So the basic building block of a linked list looks like this:
typedef struct LinkedListItem
{
int Number;
LinkedListItem* Next;
} LinkedListItem;
This stores a number and contains a pointer to the next value in the list, which is the same type (LinkedListItem) as the current entry. This allows you to link together multiple of them to form a list. So, here is how we would go about doing this:
LinkedListItem *head = NULL;
head = new LinkedListItem();
(*head).Number = 1;
(*head).Next = NULL;
(*head).Next = new LinkedListItem();
(*(*head).Next).Number = 2;
(*(*head).Next).Next = NULL;
(*(*head).Next).Next = new LinkedListItem();
(*(*(*head).Next).Next).Number = 3;
(*(*(*head).Next).Next).Next = NULL;
And so on. So, admittedly, the code looks pretty ugly, but we’ll clean it up in a second. The first thing to notice is that we always hold on to a reference to the first element in the list. This is not an accident. We need to hold on to the first element, so that we know where the list starts. This is because we only have information about the next element in the list, so we need to have a reference to the first element. We know that an element is the last one in the list when the pointer to the next element is equal to NULL. This is a great way to signal that is the end of the list. If you notice, when we add a new item to the list, we always set the Next element in the list to NULL. This is just a good habit. We could have excluded this for the first two entries in the list in this case because we knew we were adding three elements to the array. In general, we don’t know this, that’s why we are using a linked list, so it is a good habit to always set the next element to NULL. So, the first thing that we can do to clean it up is to change the definition of the LinkedListItem to add a constructor:
typedef struct LinkedListItem
{
int Number;
LinkedListItem* Next;
LinkedListItem(int number)
{
Number = number;
Next = NULL;
}
} LinkedListItem;
This way, now the next variable is always set to NULL and the number value is specified on creation. This makes it a little easier, so that you don’t forget to set it when you create it. Now the list creation code looks like this:
LinkedListItem *head = NULL;
head = new LinkedListItem(1);
(*head).Next = new LinkedListItem(2);
(*(*head).Next).Next = new LinkedListItem(3);
So, this is already looking much better. Since the operation of (*pointer).variable is so common, there is a special operator that can be used for it: pointer->variable. Using this makes the code even easier to follow:
LinkedListItem *head = NULL;
head = new LinkedListItem(1);
head->Next = new LinkedListItem(2);
head->Next->Next = new LinkedListItem(3);
Now we should test our result, to make sure that the list contains the correct numbers (1, 2, 3). To do this, let’s write a function to print out the values in the list:
void Print(LinkedListItem *head)
{
LinkedListItem* curr = head;
while(curr != NULL)
{
cout << curr->Number << endl;
curr = curr->Next;
}
}
This function starts at the first element in the list and prints it out. Then it moves to the next element in the list and if it exists (is not equal to NULL), then it repeats the loop.
Performance
Linked lists are a great option because adding items to them is fairly easy and very fast. (Much faster than the suggestion in the introduction to copy all of the values from the old array into a new array.) The drawback is that accessing an element in a linked list is not as fast as accessing an element in an array. In order to find an element in an array, we just have to compute an offset in memory from the first elements location. This is because all of the elements are stored sequentially in memory. This computation is very efficient. Linked lists are not stored this way. In order to find an element in the list, you need to start at the first element and follow all of the Next pointers until you get to the element that you want. This is much more time consuming.
This might not be as big of a problem as it might seem in the first place. If you are accessing each element in an array in order:
for(int i = 0; i < length; i++)
{
cout << array[i] << endl;
}
Then the linked list will perform as well, even though the loop looks a little different:
LinkedListItem* curr = head;
while(curr != NULL)
{
cout << curr->Number << endl;
curr = curr->Next;
}
This is because we don’t have to start at the first element in the list and search until we find the one that we want. We can just advance to the next item in the list. So, the real problem is randomly accessing elements in the list/array:
for (int i = 0; i < 10; i++)
{
int idx = rand() % length;
cout << array[idx] << endl;
}
Something like this would be much slower for a linked list than an array.
Another consideration is memory. Arrays require a sequential block of memory. For very large arrays, it might not be possible to find a spot big enough to store everything in one place. Since linked lists don’t have to be sequential, each element can be stored individually. So, even though a linked list will be slightly larger, since we have to store a number and a pointer for each element, it’s easier to fit into memory. Think about fitting a ten foot long board in the trunk of a compact car. It’s not going to happen. However, ten boards that are one foot long will fit no problem.
Summary
In this tutorial, we learned about how to create list of objects for the times when we don’t know how many we are going to have until we are done. We also talked about some of the tradeoffs to this approach.
In the next tutorial, we will be expanding on the linked list concept and implementing some of the common operations that can be performed on a list like adding an item to the end, inserting an item in the middle, and removing an item.
If you have any questions or comments about what was covered here, post them to the comments. I watch them closely and will respond and try to help you out.