Science, it works!

After my latest post there was a comment discussion about how to test the efficiency of one method of performing an action versus another. I looked at code profiling for Python, but everything seemed to be too complicated for what I wanted to do. I decided to use my own basic method for comparing the speed of each implementation. The blocks of code in question were:

def collatz(start):
    print start,
    if start > 1:
        if start % 2 == 0:
            start = start / 2
            start = start * 3 + 1


def collatz(start):
    while start != 1:
        yield start
        if start % 2 == 0:
            start = start / 2
            start = start * 3 + 1
    yield 1

Each block allowed the user to print the numbers in the Collatz Sequence from a given starting point. To test the efficiency of these blocks of code, I decided to stage a scientific(ish) experiment. I figured the best test would be to have each method loop through every number between 0 and 20,000 and print the Collatz Sequence to the screen for each number, recording the time it took to complete the task.

A few notes about the above method:

  • Since one of the methods already prints to the screen, the initial test also had to print to the screen with the second method, which would require another logic block in the testing method. Since this is how one would have to access the numbers in the real world, I considered this acceptable
  • In order to avoid statistical anomalies in my testing, I performed the test 15 times on each block of code. Normally I would expect more tests in order to provide statistically significant results, but each block of code was taking between 5 and 8 minutes to execute. Given that this was “for-fun” testing, I didn’t feel like wasting that much of my life waiting
  • Computers are funny, especially if they are running Windows Vista. I did my best to minimize interference from the OS, but there are certainly a few outliers that suggest that I was not entirely successful. Measures I took to prevent interference included disabling my wireless card (my network often “hiccups”, leading to slowdowns as the OS tries to reconnect), disabling all superfluous programs and services, and running the test over night so that there would not be any interference from me opening files and programs
  • The time recording was done by saving the system time to a variable before the operation started, and again as soon as it finished, and comparing the two and writing the difference to a file. This meant no human interaction at all

I ran the initial test with each method (just to make sure my test suite was working correctly), and was surprised by the results: the generator (my code) was slower by a full 2 minutes than the recursive function. I t was then that I noticed that there was a difference in how each was printed to a console window when executed: the generator printing each number on it’s own line, and the recursive function printing across the screen and wrapping. In order to make sure that the difference was not being caused by the print function, I changed my code so it would print like the other code. I ran the code again, and found the two pieces of code performed as follows:

Total time spent: 0:06:38.639000
Total time spent: 0:06:49.947000
Total time spent: 0:06:50.136000
Total time spent: 0:06:50.752000
Total time spent: 0:06:47.260000
Total time spent: 0:06:37.820000
Total time spent: 0:06:33.018000
Total time spent: 0:06:46.054000
Total time spent: 0:06:42.338000
Total time spent: 0:06:37.777000
Total time spent: 0:06:37.989000
Total time spent: 0:06:40.777000
Total time spent: 0:06:37.321000
Total time spent: 0:06:39.519000
Total time spent: 0:06:47.838000

Total time spent: 0:06:27.871000
Total time spent: 0:06:36.712000
Total time spent: 0:06:50.259000
Total time spent: 0:06:50.623000
Total time spent: 0:06:52.055000
Total time spent: 0:06:49.951000
Total time spent: 0:06:38.672000
Total time spent: 0:06:46.564000
Total time spent: 0:06:47.910000
Total time spent: 0:06:40.730000
Total time spent: 0:06:50.151000
Total time spent: 0:06:39.437000
Total time spent: 0:06:40.821000
Total time spent: 0:06:40.915000
Total time spent: 0:06:24.477000

As you can see, the two pieces of code perform virtually neck-and-neck in this capacity. I had to wonder though, if the print statement could make a difference of two minutes based upon whether or not each number had it’s own line what kind of difference would be made by not printing the numbers at all. The results are as follows:

Total time spent: 0:00:09.738000
Total time spent: 0:00:09.724000
Total time spent: 0:00:09.350000
Total time spent: 0:00:09.368000
Total time spent: 0:00:09.731000
Total time spent: 0:00:09.623000
Total time spent: 0:00:09.491000
Total time spent: 0:00:09.563000
Total time spent: 0:00:09.660000
Total time spent: 0:00:09.625000
Total time spent: 0:00:09.725000
Total time spent: 0:00:09.657000
Total time spent: 0:00:09.648000
Total time spent: 0:00:09.299000
Total time spent: 0:00:09.640000

Total time spent: 0:00:10.263000
Total time spent: 0:00:10.479000
Total time spent: 0:00:09.982000
Total time spent: 0:00:09.893000
Total time spent: 0:00:09.951000
Total time spent: 0:00:10.188000
Total time spent: 0:00:10.161000
Total time spent: 0:00:09.953000
Total time spent: 0:00:10.155000
Total time spent: 0:00:10.136000
Total time spent: 0:00:09.848000
Total time spent: 0:00:10.168000
Total time spent: 0:00:10.123000
Total time spent: 0:00:10.112000
Total time spent: 0:00:10.080000

Here, we are able to see not only the performance hit caused by printing the numbers (from 6 minutes to 10 seconds!), but that there is an average difference of about 1 second between the two pieces of code, with the generator pulling ever so slightly ahead.

Collatz Conjecture in Python

After a brief comment thread on a pretty cool blog I decided to post a little bit more about generators. We were discussing the viability of using a generator for finding the sequence of numbers following the Collatz Conjecture for any number. For those unfamiliar with the Collatz Conjecture, it is as follows from Wikipedia:

Consider the following operation on an arbitrary positive integer:
If the number is even, divide it by two.
If the number is odd, triple it and add one.
In modular arithmetic notation, define the function f as follows:

Now, form a sequence by performing this operation repeatedly, beginning with any positive integer, and taking the result at each step as the input at the next. [… It will eventually end in 1 …]

Or more aptly put by

The original code in question was well written, but relied upon recursion to perform it’s function. I proposed that perhaps a generator would be a better way to test the conjecture (which by the way has neither been proven or disproved). Here is the code I came up with for the generator:

def collatz(start):
    while start != 1:
        yield start
        if start > 1:
            if start % 2 == 0:
                start = start / 2
                start = start * 3 + 1
    yield 1

if __name__ == "__main__":
    for i in collatz(10):
        print i

My first “real” Python Program pt. 1

Whenever I start learning a new programming language, I like to start off with a project to get used to the syntax of the language without having to deal with finding holes in my logic. Typically, this program has been a simple “Mad-Libs” style program in which I prompt the user for some words and insert them into a (sometimes) hilarious story. For the most part in the past, these have been statically written stories in which I hard code variable names into a string and write custom prompts for each variable. When starting out in Python, I decided this would not be enough of a challenge. Somewhere between PHP and C# in my language path I hit a point where programming started to make sense on a deeper level. Adding new object types to my repertoire was becoming an everyday occurrence, I began branching out from what I had been taught in programming classes, and creating and dealing with objects dynamically in loops without ever knowing their names became the norm, and no longer blew my mind.

Python “yield” and Project Euler problem 2

When working on Project Euler problem 2, I stumbled on a unique feature of Python: the yield keyword. The yield keyword allows for the creation of generators. Generators are defined much like functions, but have the added bonus of being able to return multiple values over the course of their execution. For example, lets say I wanted to make a function that counted from 0 to infinity, and kept it’s state between uses. The following generator would do just that:

def counter():
     a = 0
     while True:
          yield a
          a += 1

Simple isn’t it? Iterating through it is simple too. Let’s say we want to get the first 10 numbers, and then the next 20, with a break in between:

if __name__ == "__main__":
     x = counter()
     for i in x:
          if i < 10:
               print i
     print "Broken out of counter"
     for i in x:
          if i < 30:
               print i

The output would be:

Broken out of counter

Now to make this relevant. Project Euler problem 2 is as follows:

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …

Find the sum of all the even-valued terms in the sequence which do not exceed four million.

So what we need to do is find a way to take the starting values of 0 and 1, and as each iteration passes, set a to the value of b, and b to the value of a+b, returning a each time. Sounds like a job for generators!

def fibonacci(max):
     a, b = 0, 1
     while a < max:
          yield a
          a, b = b, a+b

Next, we loop through each of the returned values, checking if it is even. If so, add it to a running total.

def sumEvenFibonacciNumbers(max):
     number = 0
     for n in fibonacci(max):
          if n%2 == 0:
               number += n
     return number

if __name__ == "__main__":
     print sumEvenFibonacciNumbers(4000000)

There you have it, using generators Project Euler problem 2 was a breeze!

Reading RSS feeds with Magpie RSS

Last semester I was taking a client-side programming class and ended up attempting to design a Twitter-esque web app using PHP and AJAX. The goal of the website was to create a twitter like interface for a news aggregation website. While my team and I were all competent programmers, there was one problem that we were all having a hard time solving at first: How to add realistic looking articles to our database while on a serious time crunch. After a little bit of debate and quite a bit of head-against-wall banging, we came upon the idea of pulling RSS feeds from major news organizations and parsing them out into byte-sized pieces (typically just the headline). We decided to use 3 major news organizations for simplicity’s sake since each RSS feed was just a little bit different, and we ended up going with NPR, CNN, and MSNBC World News.

Getting Started with Project Euler

If you haven’t already, go ahead and sign up at Project Euler. Once you have logged in, head on over to the Problems page. I recommend sorting the problems by ascending difficulty, as this will allow you to tackle the problems in a (more or less) gentle slope, at least at first. Let’s take a look at the first problem:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

So where do we get started? The key to this problem is the modulo operator. For those who are unfamiliar with the modulo operator (notated in many languages as % ), it divides two numbers and returns the remainder. So, if a number divides evenly by 3 or 5, we want to add it to a running total. So, if we were to look at this problem in language-agnostic pseudo-code we would get:

for each number between 0 and 1000
     if number/3 evenly, or number/5 evenly
          add the number to a running total
print the total to the user

With that, you should be able to complete this problem fairly quickly. If you would like to see this problem implemented in a few programming languages, hit the Read link below:

Project Euler and Learning Python

A few months ago I decided it was time to start learning Python. I had heard great things about python,

The Joy of Python

It's the Python

and I couldn’t in good conscience let an opportunity pass to learn a new language. I grabbed a copy of “Dive Into Python” from the internet (see the link on my sidebar) and got started. I spent a few days writing simple apps going along with the book, but got bored fairly quickly. As someone who has conquered several programming languages at this point, it is difficult to find your mind stimulated by programs designed for new programmers. After a few Google searches, I stumbled upon Project Euler. For those who are not familiar with Project Euler, it is a collection of language-agnostic programming/math problems that are designed to help programmers keep (or develop) their edge. Each problem is designed in such a way that an efficient algorithm should provide the correct answer in 60 seconds or less in any language. I decided to give it a go, and have enjoyed myself greatly in the process. I plan on attempting to complete each challenge in a variety of languages, in an attempt to compare and contrast each language. How many of you have tried Project Euler, and what has your experience been?

My Language Path

Everyone has to start somewhere. For me, I started early without knowing it. Somewhere in late Middle School or early High School I had to buy a graphing calculator for school, and ended up with a TI-85 new out of the box. Now, I should have known early on that something was abnormal in my brain because like always, the first thing I did when I opened my calculator was read the Instruction Manual.


Using SQLite with C#


Adding a database to your application can be an easy way to store data and settings between sessions for your program, but it is not always feasible to use a server based DBMS to store your database. SQLite is a small, fast, and reliable database which can be used without the end user having to install anything extra (achieved by referencing a single .dll in your project). There are a few things we as developers must do to get started with SQLite: