COMP15 Spring 20XX
Due: Monday, May 2, 11:00PM
This project is a follow-up to Project #7 in which we used a binary search tree to implement a dictionary abstract data type (ADT.) In this project, we will keep the same basic interface to the dictionary ADT, but will use a hash table to implement the dictionary. We will again measure the execution speed of insertion and look up operations -- to increase the speed of the hash table implementation and to compare the hash table method with the binary search tree implementation.
The objectives for this project are:
Several files are provided:
You will need to implement the dictionary ADT using the files dictionary.h and dictionary.cpp. The implementation must use a hash table to maintain the contents of the dictionary. Dictionary items are represented using HashNode objects which must be allocated and deallocated dynamically. To simplify, a HashNode elides the value part of a key-value pair and only stores the symbolic key in the form of a C-style string. The key symbol must be dynamically allocated and deallocated, too.
It is not necessary to implement a copy constructor or overloaded assignment operator for the Dictionary class.
A new function, printStats, has been added to the Dictionary class interface. This function computes and displays statistics on hash table utilization and the length of the table's lists.
You will need to reuse the test and measurement driver from Project #7. To avoid confusion when building and making comparisons with the binary search tree method, the test driver for Project #8 is contained in the file hash_search.cpp. If you were careful in Project #7, you should be able to reuse your existing test driver and just add a call to compute and display hash table statistics. Please see the sample output below.
Number of words in dictionary: 354984 Measurements for reading dictionary word list Elapsed time: 0.343066 seconds User time: 0.320000 seconds System time: 0.040000 seconds Measurements for inserting words into dictionary Elapsed time: 14.136600 seconds User time: 14.200000 seconds System time: 0.230000 seconds Number of words to lookup: 30065 Measurements for reading list of words to be found Elapsed time: 0.024554 seconds User time: 14.230000 seconds System time: 0.230000 seconds Starting lookups Lookups finished Measurements for finding words in dictionary Elapsed time: 1.043691 seconds User time: 15.260000 seconds System time: 0.240000 seconds Statistics for comparisons per find operation Samples: 30065 Minimum: 1.000000 Maximum: 405.000000 Mean: 173.646133 Variance: 39344.784834 Std dev: 198.355199 Coeff of var: 1.142296 Hash table utilization statistics Hash table entries: 1031 Hash table entries used: 1031 Hash table utilization: 1.000000 Hash table list length statistics Samples: 1031 Minimum: 291.000000 Maximum: 407.000000 Mean: 344.309408 Variance: 118783.291980 Std dev: 344.649520 Coeff of var: 1.000988
The test and measurement driver takes two command line arguments: the name of the file containing words to be place into the dictionary and the name of the file containing words to be looked-up. The usage synopsis for the driver is:
Usage: hash_search wordfile lookupfileThe test and measurement driver must check for the correct number of command line arguments.
Two test data files are provided:
These are the same files that were used in Project #7. When doing performance studies, it's important to use the same data sets and to take measurements under the same conditions in order to make meaningful comparisons.
Here are some additional requirements to be satisfied:
Use the provide system to submit your finished program:
/local/bin/provide comp15 p8 makefile hash_node.h ...The files to be submitted are:
hash_node.h Declaration/definition of a hash table node dictionary.h Declaration of the dictionary ADT interface dictionary.cpp Definition of the dictionary ADT hash_search.cpp Test and measurement driver stats.h Declaration of the statistics interface stats.cpp Definition of the statistics module timer.h Declaration of the timing interface timer.cpp Definition of the timer module makefile makefile to build everythingDo not submit the test files moby.scrambled and lookup.
Be sure you are completely satisfied with your work before submitting! Build and test your program on comp15.cs.tufts.edu before submitting your work. This system is the target build and test platform for all course projects. Projects which do not build correctly receive a score of zero.
Space-time efficiency of programs is important and as engineers, we must make trade-offs between execution time and memory space. One metric for space-time efficiency is the product of execution time and memory space. For this assignment, the efficiency metric is the product of CPU time (user+system) and hash table size.
Extra credit will be awarded for the top five programs (one program per student) with the most efficient find operation. Extra credit will be awarded on a graduated scale: 1 point (10% extra) for the top program, 0.8 point for second place, 0.6 point for third place, etc. To qualify for extra credit, your program must compute and display the product of CPU time for the lookup (find) phase of operation and hash table size (i.e., the number of entries in the hash array itself.)
(User CPU + System CPU) * table_sizeThe ExecutionTimer function printProcessTimes saves the user CPU time and system CPU time. After calling printProcessTimes, you may retrieve the current user CPU time and system CPU time by calling the functions getUserTime and getSystemTime, respectively. Remember, you need to compute the user and system CPU time for the lookup phase, so you will need to "snap" the CPU times before and after the lookup phase of operation.
The decision of the judges is final. Contest void where prohibited by law.
There is one new method function: printStats. This function computes and prints statistics on hash table utilization and the length of the hash table lists. These statistics can be used to judge the effectiveness of the hash table design.
Function: Dictionary() Purpose: Constructor for the Dictionary class. Initializes the dictionary to its empty state. Parameters: None Returns: Nothing Error conditions: None Function: ~Dictionary() Purpose: Deallocates (deletes) any items in the dictionary Parameters: None Returns: Nothing Error conditions: None Function: insert(char *new_symbol) Purpose: Inserts a new entry into the dictionary with the key specified by new_symbol Parameters: new_symbol: The key (a C-style string) Returns: Nothing Error conditions: Memory space exhausted Function: find(char *target) Purpose: Performs a lookup to find an item in the dictionary with the specified key Parameters: target: The key (a C-style string) Returns: True if the item was found, otherwise false Error conditions: Memory space exhausted Function: remove(char *target) Purpose: Finds and removes the item with the specified key from the dictionary Parameters: target: The key (a C-style string) Returns: Nothing Error conditions: None Function: print Purpose: Prints (dumps) the contents of the dictionary; For debugging purposes only Parameters: None Returns: Nothing Error conditions: None Function: printStats Purpose: Prints hash table utilization statistics Parameters: None Returns: Nothing Error conditions: None Function: clear_number_of_comparisons Purpose: Resets the running count of symbol (string) comparisons Parameters: None Returns: Nothing Error conditions: None Function: inc_number_of_comparisons Purpose: Increments the running count of symbol (string) comparisons Parameters: None Returns: Nothing Error conditions: None Function: get_number_of_comparisons Purpose: Retrieves the running count of symbol (string) comparisons Parameters: None Returns: Current running count of symbol comparisons Error conditions: NoneCopyright © 2004-2013 Paul J. Drongowski