COMP15 Spring 20XX
Due: Monday, May 2, 11:00PM
This project is a follow-up to Project #7 in which we used a binary search tree to implement a dictionary abstract data type (ADT.) In this project, we will keep the same basic interface to the dictionary ADT, but will use a hash table to implement the dictionary. We will again measure the execution speed of insertion and look up operations -- to increase the speed of the hash table implementation and to compare the hash table method with the binary search tree implementation.
The objectives for this project are:
Several files are provided:
You will need to implement the dictionary ADT using the files dictionary.h and dictionary.cpp. The implementation must use a hash table to maintain the contents of the dictionary. Dictionary items are represented using HashNode objects which must be allocated and deallocated dynamically. To simplify, a HashNode elides the value part of a key-value pair and only stores the symbolic key in the form of a C-style string. The key symbol must be dynamically allocated and deallocated, too.
It is not necessary to implement a copy constructor or overloaded assignment operator for the Dictionary class.
A new function, printStats, has been added to the Dictionary class interface. This function computes and displays statistics on hash table utilization and the length of the table's lists.
You will need to reuse the test and measurement driver from Project #7. To avoid confusion when building and making comparisons with the binary search tree method, the test driver for Project #8 is contained in the file hash_search.cpp. If you were careful in Project #7, you should be able to reuse your existing test driver and just add a call to compute and display hash table statistics. Please see the sample output below.
Number of words in dictionary: 354984
Measurements for reading dictionary word list
Elapsed time: 0.343066 seconds
User time: 0.320000 seconds
System time: 0.040000 seconds
Measurements for inserting words into dictionary
Elapsed time: 14.136600 seconds
User time: 14.200000 seconds
System time: 0.230000 seconds
Number of words to lookup: 30065
Measurements for reading list of words to be found
Elapsed time: 0.024554 seconds
User time: 14.230000 seconds
System time: 0.230000 seconds
Starting lookups
Lookups finished
Measurements for finding words in dictionary
Elapsed time: 1.043691 seconds
User time: 15.260000 seconds
System time: 0.240000 seconds
Statistics for comparisons per find operation
Samples: 30065
Minimum: 1.000000
Maximum: 405.000000
Mean: 173.646133
Variance: 39344.784834
Std dev: 198.355199
Coeff of var: 1.142296
Hash table utilization statistics
Hash table entries: 1031
Hash table entries used: 1031
Hash table utilization: 1.000000
Hash table list length statistics
Samples: 1031
Minimum: 291.000000
Maximum: 407.000000
Mean: 344.309408
Variance: 118783.291980
Std dev: 344.649520
Coeff of var: 1.000988
The test and measurement driver takes two command line arguments: the name of the file containing words to be place into the dictionary and the name of the file containing words to be looked-up. The usage synopsis for the driver is:
Usage: hash_search wordfile lookupfile
The test and measurement driver must check for the correct number of
command line arguments.
Two test data files are provided:
These are the same files that were used in Project #7. When doing performance studies, it's important to use the same data sets and to take measurements under the same conditions in order to make meaningful comparisons.
Here are some additional requirements to be satisfied:
Use the provide system to submit your finished program:
/local/bin/provide comp15 p8 makefile hash_node.h ...
The files to be submitted are:
hash_node.h Declaration/definition of a hash table node
dictionary.h Declaration of the dictionary ADT interface
dictionary.cpp Definition of the dictionary ADT
hash_search.cpp Test and measurement driver
stats.h Declaration of the statistics interface
stats.cpp Definition of the statistics module
timer.h Declaration of the timing interface
timer.cpp Definition of the timer module
makefile makefile to build everything
Do not submit the test files moby.scrambled and
lookup.
Be sure you are completely satisfied with your work before submitting! Build and test your program on comp15.cs.tufts.edu before submitting your work. This system is the target build and test platform for all course projects. Projects which do not build correctly receive a score of zero.
Space-time efficiency of programs is important and as engineers, we must make trade-offs between execution time and memory space. One metric for space-time efficiency is the product of execution time and memory space. For this assignment, the efficiency metric is the product of CPU time (user+system) and hash table size.
Extra credit will be awarded for the top five programs (one program per student) with the most efficient find operation. Extra credit will be awarded on a graduated scale: 1 point (10% extra) for the top program, 0.8 point for second place, 0.6 point for third place, etc. To qualify for extra credit, your program must compute and display the product of CPU time for the lookup (find) phase of operation and hash table size (i.e., the number of entries in the hash array itself.)
(User CPU + System CPU) * table_size
The ExecutionTimer function printProcessTimes saves
the user CPU time and system CPU time. After calling
printProcessTimes, you may retrieve the current user CPU time
and system CPU time by calling the functions getUserTime and
getSystemTime, respectively. Remember, you need to compute the
user and system CPU time for the lookup phase, so you will need to "snap"
the CPU times before and after the lookup phase of operation.
The decision of the judges is final. Contest void where prohibited by law.
There is one new method function: printStats. This function computes and prints statistics on hash table utilization and the length of the hash table lists. These statistics can be used to judge the effectiveness of the hash table design.
Function: Dictionary()
Purpose:
Constructor for the Dictionary class. Initializes the
dictionary to its empty state.
Parameters:
None
Returns:
Nothing
Error conditions:
None
Function: ~Dictionary()
Purpose:
Deallocates (deletes) any items in the dictionary
Parameters:
None
Returns:
Nothing
Error conditions:
None
Function: insert(char *new_symbol)
Purpose:
Inserts a new entry into the dictionary with the
key specified by new_symbol
Parameters:
new_symbol: The key (a C-style string)
Returns:
Nothing
Error conditions:
Memory space exhausted
Function: find(char *target)
Purpose:
Performs a lookup to find an item in the dictionary
with the specified key
Parameters:
target: The key (a C-style string)
Returns:
True if the item was found, otherwise false
Error conditions:
Memory space exhausted
Function: remove(char *target)
Purpose:
Finds and removes the item with the specified key
from the dictionary
Parameters:
target: The key (a C-style string)
Returns:
Nothing
Error conditions:
None
Function: print
Purpose:
Prints (dumps) the contents of the dictionary; For
debugging purposes only
Parameters:
None
Returns:
Nothing
Error conditions:
None
Function: printStats
Purpose:
Prints hash table utilization statistics
Parameters:
None
Returns:
Nothing
Error conditions:
None
Function: clear_number_of_comparisons
Purpose:
Resets the running count of symbol (string) comparisons
Parameters:
None
Returns:
Nothing
Error conditions:
None
Function: inc_number_of_comparisons
Purpose:
Increments the running count of symbol (string) comparisons
Parameters:
None
Returns:
Nothing
Error conditions:
None
Function: get_number_of_comparisons
Purpose:
Retrieves the running count of symbol (string) comparisons
Parameters:
None
Returns:
Current running count of symbol comparisons
Error conditions:
None
Copyright © 2004-2013 Paul J. Drongowski