Binary Search Trees (BST) P.J. Drongowski 13 November 2004 Binary search tree * A binary search tree is an ADT that: + Stores values (keys) + Retrieves information associated with a value (key) * The BST values are stored using a total ordering + The ordering must be preserved at all times + The ordering speeds up the retrieval of information + The ordering complicates deletion of values * A BST typifies many common tree operations * Operations + Insert - Inserts a new value into the tree + Find - Finds a target value in the tree + Print - Traverses the tree in order and displays its contents + Remove - Removes a value from the tree Total order * For all a, b and c in set S, a total order on a set S is a binary relation, R, that is: + Reflexive: (a R a) + Antisymmetric: If (a R b) and (b R a), then a = b + Transitive: If (a R b) and (b R c), then (a R c) + Total: (a R b) or (b R a) * Example: Less than or equals relation, <=, on the natural numbers + If (a <= b) and (a != b), then (a < b) * Example: Lexicographic (alphabetic) order on strings + strcmp(s1,s2) * Any pair of elements are mutually comparable * A totally ordered set is sometimes called a "chain" + A totally ordered set is a partially ordered set (poset) which is total, i.e., either (a R b) or (b R a) must be true + A total order is a "well-order," if every nonempty subset of S has a least element Total ordering applied to a BST * The values at any two nodes in a BST can be compared * For any node n in the BST, the following two properties must hold: + Every value in the left subtree must be less than or equal to the value of n + Every value in the right subtree must be greater than the value of n * All operations must preserve these properties * The ordering can be effectively exploited during search (find) Recursive algorithm for search (find) * Given a pointer to the root of a BST, root, and a value to be found, target: + If the BST is empty (root == NULL), then return false + If value of the root equals target, return true + If target is less than the value of the root, search the left subtree + If target is greater then the root, search the right subtree 45 45 / \ / \ / \ / \ / \ => find(52) => / \ 9 53 9 53 / \ / \ / \ / \ / \ / \ / \ / \ 3 17 52 54 3 17 52 54 Example from [Main and Savitch] Algorithm for insert * Insertion is similar to search * Given a pointer to the root of a BST, root, and a value to be inserted, new_value: + If the BST is empty (root == NULL), then insert a node with new_value as the root + If value of the root equals new_value, do nothing + If new_value is less than the value of the root, insert it into the left subtree + If new_value is greater then the root, insert new_value into the right subtree 45 45 / \ / \ / \ / \ / \ => insert(20) => / \ 9 53 9 53 / \ / \ / \ / \ / \ / \ / \ / \ 3 17 52 54 3 17 52 54 \ 20 45 45 / \ / \ / \ / \ / \ => insert(16) => / \ 9 53 9 53 / \ / \ / \ / \ / \ / \ / \ / \ 3 17 53 54 3 17 53 54 \ / \ 20 16 20 * What sequence(s) of values generate the original BST if they are inserted one-by-one as they appear in the sequence? Algorithm for print * Print uses an inorder traversal * Given a pointer to the root node of a BST, root: + If the BST is empty (root == NULL), return + Traverse the left subtree + Display the value of the root node + Traverse the right subtree * What sequence of values does an inorder traversal produce? 45 / \ / \ / \ => print() => { ??, ??, ??, ... } 9 53 / \ / \ / \ / \ 3 17 52 54 Algorithm for remove * Given a pointer to the root node and a target value, first find the node to be removed * Then, consider and apply one of the three cases: 1. If the node is a leaf node, remove the node immediately and set the parent's pointer to the node to NULL 2. If the node has only one child, then replace the node with the child + Make the parent point to the child instead of the node + The root node is a special case since it doesn't have a parent 3. The node to be removed has two children + Replace the *value* in the node with the least value in the right subtree of the node + The replacement value is the one that would follow the original value in an in-order traversal of the BST + Replacement preserves the total ordering over the BST + Remove the node with the least value - The least value node does not have a left child - It will have at most one child making removal simple * A variant of this algorithm replaces the value of the node to be removed with the largest element in the left subtree Example: Remove a leaf node (case 1) 45 45 / \ / \ / \ / \ / \ => remove(3) => / \ 9 53 9 53 / \ / \ \ / \ / \ / \ \ / \ 3 17 52 54 17 52 54 Example: Remove a node with one child (case 2) 45 45 / \ | \ / \ | \ / \ => remove(9) => | \ 9 53 | 53 \ / \ | / \ \ / \ | / \ 17 52 54 17 52 54 Example: Remove a node with two children (case 3) 45 Find least node 45 / \ / \ / \ / \ / \ => remove(45) => / \ 9 53 9 53 / \ / \ / \ / \ / \ / \ / \ / \ 3 17 52 54 3 17 >52< 54 45 Replace target >52< / \ / \ / \ / \ / \ => remove(45) => / \ 9 53 9 53 / \ / \ / \ / \ / \ / \ / \ / \ 3 17 52 54 3 17 >52< 54 45 Delete least 52 / \ node / \ / \ / \ / \ => remove(45) => / \ 9 53 9 53 / \ / \ / \ \ / \ / \ / \ \ 3 17 52 54 3 17 54 Sources: Data Structures and Algorithms in Java, Goodrich and Tamassia, 2001 Data Structures and other Objects Using C++, Main and Savitch, 2005 Data Structures and Problem Solving Using Java, Weiss, 2002 Problem Solving with Java, Koffman and Wolz, 2002 ****************************************************************** bst.h ****************************************************************** #ifndef BST_H #define BST_H #include class BSTNode { friend class BST ; public: BSTNode() { value = 0 ; left = right = NULL ; } private: int value ; BSTNode *left ; BSTNode *right ; } ; typedef BSTNode *PtrBSTNode ; class BST { public: BST() ; ~BST() ; void insert(int new_value) ; bool find(int target) ; void remove(int target) ; void print() ; private: PtrBSTNode inner_insert(PtrBSTNode node, int new_value) ; PtrBSTNode inner_find(PtrBSTNode node, int target) ; void inner_print(PtrBSTNode node) ; PtrBSTNode findMin(PtrBSTNode node) ; PtrBSTNode findMax(PtrBSTNode node) ; PtrBSTNode removeMin(PtrBSTNode node) ; PtrBSTNode inner_remove(PtrBSTNode node, int target) ; PtrBSTNode root ; } ; #endif ****************************************************************** bst.cpp ****************************************************************** #include using namespace std ; #include "bst.h" BST::BST() { root = NULL ; } BST::~BST() { // Your code to deallocate the tree here! } PtrBSTNode BST::inner_insert(PtrBSTNode node, int new_value) { if (node == NULL) { node = new BSTNode ; node->value = new_value ; } else if (new_value < node->value) { node->left = inner_insert(node->left, new_value) ; } else if (new_value > node->value) { node->right = inner_insert(node->right, new_value) ; } else { // Equality! } return( node ) ; } void BST::insert(int new_value) { root = inner_insert(root, new_value) ; } PtrBSTNode BST::inner_find(PtrBSTNode node, int target) { while( node != NULL ) { if (target < node->value) { node = node->left ; } else if (target > node->value) { node = node->right ; } else { // Equality - target was found! return( node ) ; } } return( NULL ) ; } bool BST::find(int target) { if (inner_find(root, target) != NULL) { return( true ) ; } else { return( false ) ; } } PtrBSTNode BST::findMin(PtrBSTNode node) { if (node != NULL) { while (node->left != NULL) { node = node->left ; } } return( node ) ; } PtrBSTNode BST::findMax(PtrBSTNode node) { if (node != NULL) { while (node->right != NULL) { node = node->right ; } } return( node ) ; } PtrBSTNode BST::removeMin(PtrBSTNode node) { if (node == NULL) { // Mininum was not found return( NULL ) ; } else if (node->left != NULL) { // Move down to minimum node in left subtree node->left = removeMin(node->left) ; return( node ) ; } else { // Only one child to the right, return a // pointer to it return( node->right ) ; } } PtrBSTNode BST::inner_remove(PtrBSTNode node, int target) { if (node == NULL) { // target not found return( NULL ) ; } else if (target < node->value) { // Find item to remove in left subtree node->left = inner_remove(node->left, target) ; return( node ) ; } else if (target > node->value) { // Find item to remove in right subtree node->right = inner_remove(node->right, target) ; return( node ) ; } else if ((node->left != NULL) && (node->right != NULL)) { // Node to be removed has two children node->value = (findMin(node->right))->value ; node->right = removeMin(node->right) ; return( node ) ; } else { // Node to be removed has only one child if (node->left != NULL) { return( node->left ) ; } else { return( node->right ) ; } } } void BST::remove(int target) { root = inner_remove(root, target) ; } void BST::inner_print(PtrBSTNode node) { if (node == NULL) return ; else { inner_print(node->left) ; cout << node->value << endl ; inner_print(node->right) ; } } void BST::print() { inner_print(root) ; } void fail(char *msg, int value) { cout << "FAIL: " << msg << " " << value << endl ; } char *find_failed = "Find failed" ; int main() { BST bst ; bst.insert(7) ; bst.insert(2) ; bst.insert(9) ; bst.insert(1) ; bst.insert(5) ; bst.insert(3) ; bst.insert(4) ; cout << "Before\n" ; bst.print() ; if (! bst.find(7)) fail(find_failed, 7) ; if (! bst.find(2)) fail(find_failed, 2) ; if (! bst.find(9)) fail(find_failed, 9) ; if (! bst.find(1)) fail(find_failed, 1) ; if (! bst.find(5)) fail(find_failed, 5) ; if (! bst.find(3)) fail(find_failed, 3) ; if (bst.find(0)) fail(find_failed, 0) ; if (bst.find(10)) fail(find_failed, 10) ; bst.remove(2) ; cout << "After\n" ; bst.print() ; return( 0 ) ; }