Science, it works!


After my latest post there was a comment discussion about how to test the efficiency of one method of performing an action versus another. I looked at code profiling for Python, but everything seemed to be too complicated for what I wanted to do. I decided to use my own basic method for comparing the speed of each implementation. The blocks of code in question were:

def collatz(start):
    print start,
    if start > 1:
        if start % 2 == 0:
            start = start / 2
            collatz(start)
        else:
            start = start * 3 + 1
            collatz(start)

and:

def collatz(start):
    while start != 1:
        yield start
        if start % 2 == 0:
            start = start / 2
        else:
            start = start * 3 + 1
    yield 1

Each block allowed the user to print the numbers in the Collatz Sequence from a given starting point. To test the efficiency of these blocks of code, I decided to stage a scientific(ish) experiment. I figured the best test would be to have each method loop through every number between 0 and 20,000 and print the Collatz Sequence to the screen for each number, recording the time it took to complete the task.

A few notes about the above method:

  • Since one of the methods already prints to the screen, the initial test also had to print to the screen with the second method, which would require another logic block in the testing method. Since this is how one would have to access the numbers in the real world, I considered this acceptable
  • In order to avoid statistical anomalies in my testing, I performed the test 15 times on each block of code. Normally I would expect more tests in order to provide statistically significant results, but each block of code was taking between 5 and 8 minutes to execute. Given that this was “for-fun” testing, I didn’t feel like wasting that much of my life waiting
  • Computers are funny, especially if they are running Windows Vista. I did my best to minimize interference from the OS, but there are certainly a few outliers that suggest that I was not entirely successful. Measures I took to prevent interference included disabling my wireless card (my network often “hiccups”, leading to slowdowns as the OS tries to reconnect), disabling all superfluous programs and services, and running the test over night so that there would not be any interference from me opening files and programs
  • The time recording was done by saving the system time to a variable before the operation started, and again as soon as it finished, and comparing the two and writing the difference to a file. This meant no human interaction at all

I ran the initial test with each method (just to make sure my test suite was working correctly), and was surprised by the results: the generator (my code) was slower by a full 2 minutes than the recursive function. I t was then that I noticed that there was a difference in how each was printed to a console window when executed: the generator printing each number on it’s own line, and the recursive function printing across the screen and wrapping. In order to make sure that the difference was not being caused by the print function, I changed my code so it would print like the other code. I ran the code again, and found the two pieces of code performed as follows:

Total time spent: 0:06:38.639000
Total time spent: 0:06:49.947000
Total time spent: 0:06:50.136000
Total time spent: 0:06:50.752000
Total time spent: 0:06:47.260000
Total time spent: 0:06:37.820000
Total time spent: 0:06:33.018000
Total time spent: 0:06:46.054000
Total time spent: 0:06:42.338000
Total time spent: 0:06:37.777000
Total time spent: 0:06:37.989000
Total time spent: 0:06:40.777000
Total time spent: 0:06:37.321000
Total time spent: 0:06:39.519000
Total time spent: 0:06:47.838000

Total time spent: 0:06:27.871000
Total time spent: 0:06:36.712000
Total time spent: 0:06:50.259000
Total time spent: 0:06:50.623000
Total time spent: 0:06:52.055000
Total time spent: 0:06:49.951000
Total time spent: 0:06:38.672000
Total time spent: 0:06:46.564000
Total time spent: 0:06:47.910000
Total time spent: 0:06:40.730000
Total time spent: 0:06:50.151000
Total time spent: 0:06:39.437000
Total time spent: 0:06:40.821000
Total time spent: 0:06:40.915000
Total time spent: 0:06:24.477000

As you can see, the two pieces of code perform virtually neck-and-neck in this capacity. I had to wonder though, if the print statement could make a difference of two minutes based upon whether or not each number had it’s own line what kind of difference would be made by not printing the numbers at all. The results are as follows:

Total time spent: 0:00:09.738000
Total time spent: 0:00:09.724000
Total time spent: 0:00:09.350000
Total time spent: 0:00:09.368000
Total time spent: 0:00:09.731000
Total time spent: 0:00:09.623000
Total time spent: 0:00:09.491000
Total time spent: 0:00:09.563000
Total time spent: 0:00:09.660000
Total time spent: 0:00:09.625000
Total time spent: 0:00:09.725000
Total time spent: 0:00:09.657000
Total time spent: 0:00:09.648000
Total time spent: 0:00:09.299000
Total time spent: 0:00:09.640000

Total time spent: 0:00:10.263000
Total time spent: 0:00:10.479000
Total time spent: 0:00:09.982000
Total time spent: 0:00:09.893000
Total time spent: 0:00:09.951000
Total time spent: 0:00:10.188000
Total time spent: 0:00:10.161000
Total time spent: 0:00:09.953000
Total time spent: 0:00:10.155000
Total time spent: 0:00:10.136000
Total time spent: 0:00:09.848000
Total time spent: 0:00:10.168000
Total time spent: 0:00:10.123000
Total time spent: 0:00:10.112000
Total time spent: 0:00:10.080000

Here, we are able to see not only the performance hit caused by printing the numbers (from 6 minutes to 10 seconds!), but that there is an average difference of about 1 second between the two pieces of code, with the generator pulling ever so slightly ahead.

Advertisements
Leave a comment

5 Comments

  1. Wow, I didn’t think that the print function would cause that much of a difference… Must have been due to the fact that the “same line” output doesn’t require the shell to basically print twice (once for the value, once for the new line).

    I’ll run some stats on the data later today, as well as see if running it on a Mac (and in different shells) makes any difference on the time. Did you just include the printing for loop within your function definition? Also, how exactly did you record the system times?

    Thanks for this post, and I’m glad we were both suitably intrigued by this problem!

    Reply
    • brennydoogles

       /  March 11, 2010

      To do the testing, I put each method in a file, along with a run() method. Here is mine:

      from datetime import datetime
      import os
      def collatz(start):
          while start != 1:
              yield start
              if start % 2 == 0:
                  start = start / 2
              else:
                  start = start * 3 + 1
          yield 1
      
      def run():
          startTime = datetime.now()
          for i in range(2, 20000):
              print "Sequence for %d:" % (i)
              for j in collatz(i):
                  print j,
          endTime = datetime.now()
          totalTime = "Total time spent: %s" % (endTime - startTime)
          print totalTime
          dir = 'C:\\Users\\brendon\\Desktop\\Python\\My Python Scripts\\collatz'
          filename = "mylog.txt"
          path = os.path.join(dir, filename)
          f = open(path, 'a')
          f.write("%s\n" % (totalTime))
      
      

      Here is yours:

      from datetime import datetime
      import os
      def collatz(start):
          print start,
          if start > 1:
              if start % 2 == 0:
                  start = start / 2
                  collatz(start)
              else:
                  start = start * 3 + 1
                  collatz(start)
      
      def run():
          startTime = datetime.now()
          for i in range(2, 20000):
              print "Sequence for %d:" % (i)
              collatz(i)
          endTime = datetime.now()
          totalTime = "Total time spent: %s" % (endTime - startTime)
          print totalTime
          dir = 'C:\\Users\\brendon\\Desktop\\Python\\My Python Scripts\\collatz'
          filename = "hislog.txt"
          path = os.path.join(dir, filename)
          f = open(path, 'a')
          f.write("%s\n" % (totalTime))
      

      And here is the program that alternately ran them:

      import hisCollatzTester
      import myCollatzTester
      
      if __name__ == "__main__":
          counter = 0
          while counter < 30:
              if counter%2 == 0:
                  hisCollatzTester.run()
              else:
                  myCollatzTester.run()
              counter += 1
          print "Test Complete"
      

      This should work cross-platform without any issues, provided you change the path for the log files to something that exists on the system.

      Reply
      • Thanks for posting up your code. Just ran the tests on my machine, and I’m glad that I did! I ran the program for both cases, and this is what my numbers returned:

        Printing to screen:
        Generator: 4.1956077 sec
        Recursion: 4.6507228 sec [~0.5 sec slower]

        Not printing to screen:
        Generator: 0.0295538 sec
        Recursion: 1.0592684 sec

        [Tests were done using Terminal.app running Python 2.6.4 on a Macbook with no other applications running and Airport turned off. Times shown are averages over thirty runs.]

        I’m guessing the long times you encountered during the printing test came from your IDE. I guess this will just be further proof that using generators leads to faster code, which is great to know!

        Thanks also for the info on collecting time data for program runs. I’ll definitely be using it in some of my other projects.

      • brennydoogles

         /  March 11, 2010

        Oddly enough, this was using windows command prompt. I wonder what differences hardware played as well. I am running on an intel centrino processor with 3 GB ram on Vista Basic, which is not a terribly high performing configuration. Most modern Macbooks have quad-core processors and 4-8 GB ram, as well as (snow?) Leopard which I would assume is far lighter in a resting state than Vista. On a side note, I am going to be sending you an email shortly at the address you use for your wordpress account. Fair Warning.

      • Got it, and replied back.

        My Macbook’s closing in on three years of life, plus I got the lowest end model there was (cost reasons), so I shudder to think what the high end ones would do with these programs. It’s an Intel Core 2 Duo with 1 GB RAM, but I did upgrade to Snow Leopard a few months ago, so that might’ve helped…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s