Data Structure - An intentional arrangement of data
We naturally think in collections of information.
A recipe is a data structure, as is a shopping list, a flight schedule, bank statement, etc.
Focus of this course is data structures created and held in memory in a running computer program.
Basic need for data structure: Want to enforce systematic organization, group variables together and treat it as one item
In computer science:
In math:
How do we implement records/fields/tuples in a programming language?
// define the struct
struct Book {
string title;
double price;
bool isPublished;
bool isHardback;
};
// create a variable with that struct type
Book first;
// set member variables
first.title = "Dark and Stormy Night";
first.price = 12.95
first.isPublished = true;
first.isHardback = false;
What's the difference between a struct and a class?
struct class only data - no behavior behavior and data simple creation explicit instantiation (new, alloc) value types reference types no object-oriented-features polymorphism, inheritance, etc. "Plain Old Data Structure" (PODS)
struct Point {
int x;
int y;
};
Point startPosition;
startPosition.x = 50
startPosition.y = 50;
Point finishPosition;
finishPosition.x = 500;
finishPosition.y = 100;
myObject.animate(startPosition, finishPosition);
struct Color {
int red;
int green;
int blue;
int alpha;
};
Color backgroundColor;
backgroundColor.red = 255;
backgroundColor.green = 0;
backgroundColor.blue = 0;
backgroundColor.alpha = 255;
myWindow.setBackground(backgroundColor);
Language Support Objective-C As in C, used in many Apple frameworks C# /other .NET Also allows basic behavior to be added Java Do not exist - closest equivalent is lightweight class Python Do not exist Ruby Exist, though implemented as lightweight class
Misconception: Simple Arrays are just from the olden days, should use dynamic arrays without data restrictions if you can.
Flexibility introduces overhead
Access elements (row index, column index) Useful for representing real-world situations
Choosing to do a little more work up front to allow logic to not have to be added later Having internal arrays of differing sizes Example: If you need to represent days in a month, there are varying numbers of days in a month
Pseudocode:
int[][] ticketSales = new int[12][]
for each month in ticketSales
if april, june, september, november
create array of 30 elements
else if february and leap year
create array of 29 elements
else if february and not leap year
create array of 28 elements
else
create array of 31 elements
end if
add array to ticketSales[month]
end for
String[] fixedArray = new String[3];
fixedArray[0] = "This";
fixedArray[1] = "Cannot";
fixedArray[2] = "Grow";
// need to import
import java.util.*;
// create arraylist of strings
List<String> resizeable = new ArrayList<String>();
resizeable.add("This");
resizeable.add("Is");
resizeable.add("Resizeable");
//NSArray used for arrays of objects - fixed size
NSArray \*myFiexedArray = @[@"one", @"two", @"three"];
// NSMutableArray is the resizeable version
NSMutableArray *resizeable = [[NSMutableArray alloc]init];
[resizeable addObject:@"one"];
[resizeable addObject:@"two"];
[resizeable addObject:@"three"];
Adding anywhere else requires moving other elements around
Language Method Java add(value) Objective-C addObject:value JavaScript push(value) Ruby push(value) Python append(value)
Language Method Java add(index, value) Objective-C addObject:value atIndex:index JavaScript splice(index, items_to_remove, items_to_insert) Ruby insert(index, value) Python insert(index, value)
Language Method Java remove(index) Objective-C removeObjectAtIndex:index JavaScript pop / slice Ruby pop / delete_at Python pop / remove
Often won't get all five
Will typically use the sort built in to the language because it's battle-tested
Built-in sorting tends to look at the length of the array in order to figure out what implementation to use
Most languages will attempt to sort in-place
There are a few that create a sorted copy of the array
Need to understand which one your language is doing
Sorting is always computationally intensive - keeping conscious of how much data you have and how often you need it sorted may lead you to changing data structure
If you have an array of objects with multiple pieces of data, you want to control how that's sorted
Will need to provide a little bit of logic, comparator or compare function
Sorting is hard, comparing is easy
PseudoCompare (Employee a, Employee b)
if a.lastName < b.lastname return -1 // less than
if a.lastname > b.lastname return 1 // greater than
if a.lastname == b.lastname
if a.firstname < b.firstname return -1 // less than
if a.firstname > b.firstname return 1 // greater than
if a.firstname == b.firstname return 0 // equal
end if
end
set i to 0
while i < array.length
if array[i] == 99
return true
end if
add 1 to i
end while
return false
Best case: element at [0] Worst case: Not in array
If you have a simple array and the items can be in any order, a linear search may be what you have to do.
If there's no predictable order, no other option than to check all the items
If the array is ordered, there are better ways of searching
If searching is something you're going to want to do, having an order may be important
Asking a data structure to sort is computationally demanding.
if (myArray.contains(99) ) {
log("Yes, it exists")
}
// for specific location
int result = myArray.indexOf(99);
if ( result != -1) {
log ("The value is located at position: " + result);
} else {
log ("The object is not in the array")
end if
Language Method for existence Method for location Java contains indexOf Objective-C containsObject indexOfObject JavaScript indexOf Ruby index / find_index Python index C# contains indexOf
If the values in your array are in descending or ascending order
If not:
Language Method Java binarySearch C# Array.BinarySearch JavaScript n/a Ruby bsearch C++ binary_search Objective-C indexOfObject:inSortedRange:
Although binary search is great for sorted array, we'll see it again going forward because it's not just useful for arrays
In Python, fundamental data type
In Java, list is an interface (abstract definition)
In Objective-C, programmers write it themselves
In Ruby, unlikely to talk about lists
Both arrays and lists are collections (ways to collect items into one group with a name)
If arrays are about direct access, lists are about sequential access.
Structure does not have strict numeric index
Elements can be stored anywhere in memory
When you access a list, you get the first element (first node).
A list node is a simple wrapper object (struct) that holds the element and the link to the next node.
Last element contains a terminal/sentinel node or null reference
Downside, have to access elements sequentially
While adding/removing in the middle of an array requires moving everything past that element
To add a node at the start of a list, create a new node and point it to the previous start of the list
To add to the end, create a node at the end of the list, pointing the previous end of the list to the new node.
Removing is just as easy, changing pointers.
Arrays Linked Lists Direct Access **Good**
fixed time O(1)**Poor**
linear time O(n)Adding/Removing **Poor**
linear time O(n)**Good**
fixed time O(1)Searching O(n) linear search
O(log n) binary searchO(n) linear search
Singly linked lists - reference to next node
Doubly linked lists - reference to previous and next node
Circular doubly linked list - last element points to first element, first element's previous is last element
Java - interface (describes behavior that another class can implement)
Language Support Java LinkedList in java.util C# LinkedList in System.Collections.Generic Objective-C n/a Ruby n/a Python n/a - "lists" are dynamic arrays, **not** linked lists C++ std::list
Last element added to the stack is the first one out (LIFO/FILO)
Things you can do with a stack:
Language Support Java Stack (push / pop / peek) C# Stack (Push / Pop / Peek) Objective-C use NSMutableArray Ruby use Array (push / pop) Python use lists (append / pop) C++ std::stack (push / pop)
Stacks and Queues
Abstract data types are not the same as an abstract class
First element in is the first element out (FIFO)
Sending jobs to a printer is an example of a queue
Used in multithreading and concurrency situations
Language Support Java LinkedList (add / remove) C# Queue (enqueue / dequeue) Objective-C NSMutableArray (addObject / removeObjectAtIndex:0) Ruby use Array (push / shift) Python queue (put / get) C++ std::queue (push_back / pop_front)
Typically requires a comparator or compare function
Allows you to add elements to the array that can be moved closer to the front of the queue if they have a higher priority than the elements already in the queue.
Language Support Java PriorityQueue C# n/a Objective-C CFBinaryHeap Ruby n/a Python n/a C++ std::priority_queue
NOTE: Course says n/a for Python, but Python website indicates there is a PriorityQueue class:
Priority Queue
class asyncio.PriorityQueue
A variant of Queue; retrieves entries in priority order (lowest first).
Entries are typically tuples of the form (priority_number, data).
Can add to either end
Can remove from front or end
Can behave like a stack or a queue
Caution Deque (specialized kind of queue) vs. Dequeue (method to remove item from queue)
Language Support Java LinkedList implements Deque C# n/a - use LinkedList for equivalent Objective-C n/a - use NSMutableArray Ruby n/a - use Array Python collections.deque C++ std::deque
Language Support Java HashMap, HashTable C# Hashtable, Dictionary Objective-C NSDictionary Ruby Hash Python dict C++ std::unordered_map Javascript objects
Behind the scenes, most associative arrays are implemented using a hash table
To understand a hash table, we first need to understand a hash.
A way to take data, run it through a hash function that will manipulate that data and output a short, simplified reference generated from that data
Hashing functions are typically one-way
Information is lost when hashing
Public Class Person {
String firstname;
String lastname;
Data birthDate;
@Override
public int hashCode() {
// code to add all numeric values
// take letters in all the names, give them a 1-26 A-Z representation
// add all those numbers up
// take all the numbers in the date, add them up
// add the numbers from the name to the numbers in the date
// return number
return hashvalue;
}
}
/* Example:
Sam // 19 1 13 (33) +
Jones // 10 15 14 5 (44)
// (77)
04/04/1990 // 04 04 1990 (1998)
// hash: 2075
When two objects result from different objects
Being able to take a complex object and boil it down to a single integer representation is useful because you can use that hash value to get to a certain location.
Typical way of implementing an associative array
Benefit over linked lists and arrays is speed
Created with multiple empty buckets
Pass a key, value pair to hash table
Will take key and run it through the hash function, getting an integer out
When we need to get an object from the hash table, it can run it through the hash function and then access the element based on the index result of the hash function
Separate chaining:
Other ways of managing collisions:
Language Method Java hashCode() C# GetHashCode() Objective-C -hash Ruby hash() Python hash() C++ std::hash
Default equality behavior checks identity
Can be overridden to check internal state
If you override the equality of your class, you have to redefine hashing
This behavior is already provided for string objects
Language Support Java HashTable, HashMap, ConcurrentHashMap C# Hashtable, StringDictionary, Dictionary<>, etc. Objective-C NSDictionary, NSMutableDictionary Ruby Hash Python dict C++ std::unordered_map
A set is an unordered collection of objects
No index, sequence, or key
No duplicates (you cannot add the same object twice to the same set)
Designed for fast lookup (checking membership of a collection)
Key is the value
Checking membership, usually use a contains method
Would never use a set to index to a specific object and retrieve it
To go to any specific object in a set, already need the object
Language Support Java HashSet C# HashSet Objective-C NSSet, NSMutableSet Ruby Set Python set / frozenset C++ std::set
In C++, sets are implemented not with hash tables but with binary search trees
As with linked list, there is a starting point of the tree structure
Root node could contain value, also contains next/child nodes
Child nodes contain next/child nodes
Child nodes with the same parent are called siblings
A node with no children is a leaf node
A sorted/ordered tree
A node can have 0 child nodes (leaf), 1, or 2.
Every node will have two links, left child and right child, which may point to another node or to null
A left child node must be less than its parent
A right child node must be more than its parent
More nodes on right than left - Unbalanced BST
Self-Balancing algorithms include:
Binary Search Tree Hash Table Fast insertion, fast retrieval Fast insertion, fast retrieval Stays sorted - iterate elements in sequence Retrieval not guaranteed order
Language Implementation Java TreeMap C# SortedDictionary Python n/a Ruby n/a Objective-C n/a C++ std::set
Heaps are a collection of objects
Items are always added top to bottom, left to right
Heaps are implemented as binary trees
Min heap: Lowest value at the top
Rule: Child must always be larger than parent
Max heap: Highest value at the top
Rule: Child must always be smaller than parent
A heap is not a fully sorted data structure
Good for implementing priority queue
Language Support Java PriorityQueue C# n/a Python heapq Ruby n/a Objective-C CFBinaryHeap C++ std::priority_queue
Collection of nodes where any node can connect to any other node
Any time you need to describe a complex system of interconnected points
Vertices (nodes)
Links (Edges)
Directed - connections are one-way
Undirected - connections are two-way
Weighted graphs - associating a number with the connection of two vertices
When thinking about data structures, think about the data you have
Strengths
Weaknesses
Strengths
Weaknesses
Strengths
Weaknesses
One of the best uses of a stack in programming is when parsing code or expressions where you need to do something like validate you have the right amount of brackets/parentheses, etc.
Strengths
Weaknesses
Strengths
Do not use for
Strengths
Weaknesses
Choose a fixed (immutable) version where possible