After my latest post there was a comment discussion about how to test the efficiency of one method of performing an action versus another. I looked at code profiling for Python, but everything seemed to be too complicated for what I wanted to do. I decided to use my own basic method for comparing the speed of each implementation. The blocks of code in question were:

def collatz(start): print start, if start > 1: if start % 2 == 0: start = start / 2 collatz(start) else: start = start * 3 + 1 collatz(start)

and:

def collatz(start): while start != 1: yield start if start % 2 == 0: start = start / 2 else: start = start * 3 + 1 yield 1

Each block allowed the user to print the numbers in the Collatz Sequence from a given starting point. To test the efficiency of these blocks of code, I decided to stage a scientific(ish) experiment. I figured the best test would be to have each method loop through every number between 0 and 20,000 and print the Collatz Sequence to the screen for each number, recording the time it took to complete the task.

A few notes about the above method:

- Since one of the methods already prints to the screen, the initial test also had to print to the screen with the second method, which would require another logic block in the testing method. Since this is how one would have to access the numbers in the real world, I considered this acceptable
- In order to avoid statistical anomalies in my testing, I performed the test 15 times on each block of code. Normally I would expect more tests in order to provide statistically significant results, but each block of code was taking between 5 and 8 minutes to execute. Given that this was “for-fun” testing, I didn’t feel like wasting that much of my life waiting
- Computers are funny, especially if they are running Windows Vista. I did my best to minimize interference from the OS, but there are certainly a few outliers that suggest that I was not entirely successful. Measures I took to prevent interference included disabling my wireless card (my network often “hiccups”, leading to slowdowns as the OS tries to reconnect), disabling all superfluous programs and services, and running the test over night so that there would not be any interference from me opening files and programs
- The time recording was done by saving the system time to a variable before the operation started, and again as soon as it finished, and comparing the two and writing the difference to a file. This meant no human interaction at all

I ran the initial test with each method (just to make sure my test suite was working correctly), and was surprised by the results: the generator (my code) was slower by a full 2 minutes than the recursive function. I t was then that I noticed that there was a difference in how each was printed to a console window when executed: the generator printing each number on it’s own line, and the recursive function printing across the screen and wrapping. In order to make sure that the difference was not being caused by the print function, I changed my code so it would print like the other code. I ran the code again, and found the two pieces of code performed as follows:

Total time spent: 0:06:38.639000

Total time spent: 0:06:49.947000

Total time spent: 0:06:50.136000

Total time spent: 0:06:50.752000

Total time spent: 0:06:47.260000

Total time spent: 0:06:37.820000

Total time spent: 0:06:33.018000

Total time spent: 0:06:46.054000

Total time spent: 0:06:42.338000

Total time spent: 0:06:37.777000

Total time spent: 0:06:37.989000

Total time spent: 0:06:40.777000

Total time spent: 0:06:37.321000

Total time spent: 0:06:39.519000

Total time spent: 0:06:47.838000

Total time spent: 0:06:27.871000

Total time spent: 0:06:36.712000

Total time spent: 0:06:50.259000

Total time spent: 0:06:50.623000

Total time spent: 0:06:52.055000

Total time spent: 0:06:49.951000

Total time spent: 0:06:38.672000

Total time spent: 0:06:46.564000

Total time spent: 0:06:47.910000

Total time spent: 0:06:40.730000

Total time spent: 0:06:50.151000

Total time spent: 0:06:39.437000

Total time spent: 0:06:40.821000

Total time spent: 0:06:40.915000

Total time spent: 0:06:24.477000

As you can see, the two pieces of code perform virtually neck-and-neck in this capacity. I had to wonder though, if the print statement could make a difference of two minutes based upon whether or not each number had it’s own line what kind of difference would be made by not printing the numbers at all. The results are as follows:

Total time spent: 0:00:09.738000

Total time spent: 0:00:09.724000

Total time spent: 0:00:09.350000

Total time spent: 0:00:09.368000

Total time spent: 0:00:09.731000

Total time spent: 0:00:09.623000

Total time spent: 0:00:09.491000

Total time spent: 0:00:09.563000

Total time spent: 0:00:09.660000

Total time spent: 0:00:09.625000

Total time spent: 0:00:09.725000

Total time spent: 0:00:09.657000

Total time spent: 0:00:09.648000

Total time spent: 0:00:09.299000

Total time spent: 0:00:09.640000

Total time spent: 0:00:10.263000

Total time spent: 0:00:10.479000

Total time spent: 0:00:09.982000

Total time spent: 0:00:09.893000

Total time spent: 0:00:09.951000

Total time spent: 0:00:10.188000

Total time spent: 0:00:10.161000

Total time spent: 0:00:09.953000

Total time spent: 0:00:10.155000

Total time spent: 0:00:10.136000

Total time spent: 0:00:09.848000

Total time spent: 0:00:10.168000

Total time spent: 0:00:10.123000

Total time spent: 0:00:10.112000

Total time spent: 0:00:10.080000

Here, we are able to see not only the performance hit caused by printing the numbers (from 6 minutes to 10 seconds!), but that there is an average difference of about 1 second between the two pieces of code, with the generator pulling ever so slightly ahead.