Friday, August 30, 2013

Regarding c++ vs. python (in algorithm contests)

Reddit...

Dear people from reddit. Someone linked this blog to you. And it was wrong for it to happen. Because the intended audience of this blog is *not* software developers/ programmers at /r/programming. The intended audience is people who participate in programming contests. Specifically TopCoder. This blog post was made for those people in mind. For example, chances are that you have no idea what an editorial is in this context. The blog is also extremely tongue-in-cheek as I write it using my vexorian persona.

We already know very well about performance. Python runs around 10x slower in comparison to c++ in these contests. Since that's a known fact about the whole thing that's why I didn't even mention it or include "benchmarks".

One thing that you must understand is that the contests are basically designed around C++. This comparison has nothing, absolutely nothing, to do with what language is better than the other for general usage. This is no ammo in some language supremacy flame war. In other words: There is absolutely no need to come to insult me. If you just want to know which language is better for general usage: Just pick the one you are more experienced with and that fits your needs better; If you are undecided, go look for more professional comparisons or something...

--------------------------

So, yesterday I was writing an editorial and things were going ok. I was solving a problem when I decided to include both c++ and python code, like I've been doing plenty of times lately. In this occasion, I wanted the c++ not to rely on libraries or built-in functions, so that someone who just started reading on the syntax understands the code. For the python one, I wanted to make a one-liner, because it was totally possible.

I was aware that this choice would make python 'look good". What I didn't expect was to wake up today to the news that TopCoder's facebook account decided to use those codes as the preview image when talking about my editorial... So now we have this image circulating around that seems to make the point that python allows much shorter code than c++ and it is in part because of me. This was the final trigger that made me write this post, I was planning to write something like this since a while ago.

Python - good?

My relationship with python started on rocky ground. Once upon a time, I was a lame programmer who was obsessed with speed and performance. Python, along with Java and many others was a language that was very slow. So there was no way I would ever use it. Worse, python specifically had syntactically-relevant indentation, which also meant it didn't have brackets or begin/end or something similar. Block depth was implicit. For a good while, this feature made python code make me feel dizzy. It is like the code is floating without something that holds it!. It actually still annoys me, but not as much as before.

Eventually, I did some maturing as a programmer. I learned many languages. Learned about good practices. I even made my own languages as an attempt to fix a very bad language (And discovered many things about what works and what doesn't). So I eventually gave python a chance. It helps to have an editor with code-folding or indentation guides. It removes that feeling of things floating without being held :).

My learning process was slow because I never had any project that I had to write in python. But it did become my main calculator application. I also use it for small scripts and to generate test cases for programming contest problems.

There are many good things about python. There are also some things I am not a fan off. What I like about python is that instead of trying to be a better C++ (*cough* Java and C# *cough*), it tries to be a better language. It is also free and not tied to some evil mega-corporation (*cough* Java and C# *cough*). By the time I heard they were going to add python to the list of languages in Topcoder SRMs, I was excited because I would get to use it in editorials, and it seems like a great tool for them.

Python in editorials

And so, began the journey. I started to use python to solve SRM problems and include the code in editorials. I now have some things to say about using python in TopCoder contests.

Small code

Python really shines in problems that need little code. The stuff other languages need to do the simplest of operations, is usually very easily-implementable if not completely built-in. The code from the problem in the image I first talked about is a good example.

class GooseTattarrattatDiv2:
 def getmin(self, S):
    return len(S) - max( S.count(ch) for ch in S )

It is even self-documented. The result is equal to the length of the string, minus the maximum number of times a character appears in S.

It is actually very fun, and a major source of procrastination, to try to come up with the smallest possible python code to solve a problem.

class InsertZ:
 def canTransform(self, init, goal):
    return "Yes" if (goal.replace('z','') == init) else "No"

If the goal is equal to the init after removing all Zs from it, return Yes.

Making readable code

So the division 2 problem of SRM 588 was a typical "bit masks" problem. You had to iterate through all the subsets of songs. More so, songs had two attributes: length and tone. Typically , the code for this looks very messy in c++ or Java. You end up using bitmasks, which means that you will have beautiful things like (1<<i)& in your code, then for each i, you check if it is in the bitmask, if so, call duration[i], etc, etc... You need a formula between the maximum, minimum tones and the sum of durations and other things

The python code, however, looks like this:

import itertools, collections
 
class GUMIAndSongsDiv2:
 def maxSongs(self, duration, tone, T):
    n = len(tone)
    #making this a named tuple is not needed, but it adds readability:
    Song = collections.namedtuple('Song', ['tone', 'duration'] )
    # Make a list of song objects:
    songs = [ Song(tone[i], duration[i]) for i in range(0,n) ]
    res = 0
    for c in xrange(1, n+1):
        # c is a number from 1 to n, increasing order
        for picked in itertools.combinations(songs, c):
            # "picked" is a set of c songs:
            durationSum = sum( s.duration for s in picked )
            minTone     = min( s.tone     for s in picked )
            maxTone     = max( s.tone     for s in picked )
            if durationSum + maxTone - minTone <= T:
                res = c
    return res

itertools.combinations creates an iterator from the iterator that you pass to it, in this case the songs. The new iterator will contain all the combinations of c songs. So if you try c in increasing order, you can easily get all the subsets. In order to easily divide the songs in subsets, we actually made a list of the song objects. To create and define those objects, we only needed a single line to call collections.namedtuple...

There are many ways in which this code is amazing (to me). And it baffles me how readable it is. It is almost a description of the algorithm for the problem, instead of being code. The alternative in c++ or worse, Java , would be much messier. Check it:

int maxSongs(vector <int> duration, vector <int> tone, int T)
{
    int n = tone.size();
    int best = 0;
    // for each subset of songs represented by a bit mask:
    for (int mask=1; mask<(1<<n); mask++) {
        int maxTone = -1, minTone = 200000, durationSum = 0, c = 0;
        // find the minimum/maximum tone, the sum of durations and the
        // number of songs in the subset:
        for (int i=0; i < n; i++) {
            if (mask & (1<<i)) { //is song i in the subset?
                maxTone = max(maxTone, tone[i]);
                minTone = min(minTone, tone[i]);
                durationSum += duration[i];
                c++;
            }
        }
        // If selected songs in optimal order fit the time constraint, this is valid:
        if ( durationSum + maxTone - minTone <= T) {
            best = std::max(best, c);
        }
    }
    return best;
}

Being able to use functions

Most large algorithms will need you to use many functions and call them. You know what this means? In c++ / Java this means that you will have to copy some variables and the arguments as global variables or class members so that the functions can use them. To me, this is very annoying and has been a source of bugs because if you forget to actually make the copy or assign it correctly, you are screwed. c++11 allows lambdas, and I was able to use them to avoid this issue somehow, like in test SRM #2. But c++ lambdas have a very big issue, you cannot (easily) have recursion with them. You need to work around it somehow. The current best answer to the problem I have is to use std::function explicitly when declaring the lambda and it is very messy.

Python has nested functions instead of lambdas, so making them recursive is not hard at all. It is quite normal. Check the codes in the post about SRM 589 and you will see what I mean.

Some more obvious advantages

Sure, there are some obvious things like how overflow is not possible. That whole stuff about tuples is nice to have. Too many times you are dealing with points or coordinates so you are using tuples, so something like (x,y) = ... makes more sense than x = ... ; y = ... The native set() and dict() types. Etc.

Some bad things

After I started to apply this python knowledge to make code at topcoder, I couldn't avoid thinking. What if this is more convenient during a contest? What if I move to python?

Then I try to use this python thing for a larger problem and start to notice a darker side...

My pet peeve against dynamic languages

This is not something new that I just found, or that is specific to programming contests. I have to mention it. Languages that allow you to declare and use variables in the fly. Languages that are not compiled. They have this annoyance, and it is that your typos won't be discovered until the line with the typo is executed. You write these 100 lines of code and run your program. It is not until a few seconds of execution, that the program reaches a line of code in which you type blok instead of block. Sorry, do it again.

Multidimensional "arrays"?

Python doesn't have that array concept, it has tuples and lists. Lists are one-dimensional, but of course, you can make a list of lists and it can be indexed like a bi-dimensional array - a matrix. You can go out of your way and do the same for a list of lists of lists of lists of lists, and you have something with 5 dimensions!

In the Real World™ Programming world, this is likely not a big issue. In the algorithm/programing contests world, specifically the Topcoder world, it can be quite a pain. We have those dynamic programming problems with 4 dimensions and it is like nothing to write home about. You might even have 6 dimensions.

What in c++ looks like :

int dp[w][h][t][n][2][3];    // Just a typical day in topcoder land
memset(dp, -1, sizeof(dp));
dp[x][y][f][p][1][0] = 6;

In python looks like:

dp = [[[[[ [-1]*3 for i in xrange(2) ] for j in xrange(n)] for k in xrange(t)] for a in xrange(h)] for b in xrange(w)]
dp[x][y][f][p][1][0] = 6

Of course, that's ridiculous. Maybe the real problem is that we are approaching it the wrong way. We could use a dictionary?

dp = dict() 
dp[ (x,y,f,p,1,0) ] = 6

But then, since it is a dictionary it will have overhead in accessing the elements... Maybe a list but we will translate the indexes to a single integer? This is what C arrays do anyway.

dp = [-1] * w*h*t*n*2*3 
dp[ (x*h*t*n*2*3 + y*t*n*2*r + f*n*2*3 +  p*2*3 +  1*3 + 0 ] = 6

Yeah, a bit complicated... too, back to square one.

The petty default recursion limit

Speaking of dynamic programming and memoization in functions. I attempted to use python for FlippingBitsDiv2 and it worked... except that it was actually reaching the recursion depth limit way too quickly. In that problem, the recursion depth is as large as 2500, which doesn't really strike me as too high. Specially not for these contests... But that's the default.

Turns out it is perfectly possible to raise the depth limit with a call to a sys. function. So this problem isn't a big deal. Except that you need to be conscious of it when solving something using recursion.

Memory...

I knew of the obvious weakness. Speed. Python's dynamism comes at that price. Actually, I tried to do some tests and it seems like the time factor is 6 times or even 10 times slower. I haven't actually found a division 1 hard problem that can be solved in python yet.

What I didn't expect was the memory usage. This happened to me while solving GameInDarknessDiv2. It needed a depth-first search on a large graph. If you take a look at the c++ code, there are plenty of implementation bottlenecks. My first idea was to use python code for this solution. Because it can make easier to read code if you work for it. The result is beautiful:

def check(field, moves):
    # useful dictionary from move char to position offset:
    move = {'R':(1,0), 'L':(-1,0), 'U':(0,-1), 'D':(0,1) }
    
    # given a position and a move character, return the new position:
    def doMove( (x,y), ch):
        return (x + move[ch][0], y + move[ch][1])

    # find Alice and Bob's initial positions:
    (w,h) = ( len(field[0]), len(field) )
    for y in range(0,h):
        for x in range(0,w):
            if field[y][x] == 'A':
                (ax,ay) = (x,y)
            elif field[y][x] == 'B':
                (bx,by) = (x,y)

    # Save Alice's positions for each of Bob's turns:
    moves = string.join(moves, '')
    alice = [ doMove( (ax,ay), moves[0] ) ] 
    for ch in moves[1:]:
        alice.append( doMove(alice[-1], ch ) )
    # if in the first move, Alice moves towards Bob, it is game over for Bob:
    if (bx,by) == alice[0]:
        return "Alice wins"

    #And the DFS:
    T = len(alice)
    
    visited = [ [ [False]*(T+1) for i in xrange(0,h) ] for j in xrange(0,w) ]
    winner = ["Alice"]
    
    def dfs( (x,y), t):
        if not visited[x][y][t]:
            visited[x][y][t] = True
            if t == T:
                winner[0] = "Bob"
            else :
                for ch in "URDL":
                    (nx, ny) = doMove( (x,y), ch)
                    if (0 <= nx < w) and (0 <= ny < h) and (field[ny][nx] != '#'):
                        if (alice[t] != (nx,ny) ) and ( (t+1 == T) or (alice[t+1] != (nx,ny)) ):
                            dfs( (nx,ny), t+1 )
                            
    #oops, default recursion limit is too conservative for our DFS:
    sys.setrecursionlimit(T+10)
    #call DFS:
    dfs( (bx,by), 0 )
    #and done:
    return winner[0] + " wins"

The amount of code that is not wasted in boring activities is very relevant, at least to me. The use tuples also makes many things make more sense. It is more self-documenting. However, when I actually run system tests on that solution, I found plenty of issues. The need for a 3D "array" was the first. Then the recursion limit. But the one thing that I couldn't fix was the fact that this code was reaching the 64 MB limit.

The memory usage of this approach is `O(w*w*h*h)`, for `w,h <= 50`, such a memory complexity didn't usually seem like a problem to me when using c++ or Java. So I didn't expect this coming at all. 2500*2500 indexes in a boolean array take around 5.96 MB. In python, however, it turns out that lists holding 2500*2500 elements are too heavy (They are still too heavy even if I use the single-list approach that converts 3D indexes to a single dimension). What is going on?

It is not so surprising. Python is dynamic, a single index in a list can describe plenty of things, not just a number or a boolean. The reserved memory is probably at least 4 bytes for each index, that means 23.84 MB instead of just 5. It is more than that. I discovered a secret python function called __sizeof__, it tells me that a list of 2500*2500 False values is worth 47.68 MB (somehow). That is a good chunk of our 64MB. So apparently, not only is the available time around 10 times smaller, it seems like memory is 3 times smaller in practice too.

In codejam

In google codejam, python is the third most popular language and it is very close to Java, in fact, in round 1A it beat Java in popularity for the top 20% contestants. In TopCoder, python isn't working so well (But at least it has beaten Visual Basic and seems to be on its way to beat C# :) ). Python is quite a recent addition, we have had it for 4 official SRMs so far..., there may still not be many coders who know about its availability and the python coders that didn't give TopCoder a try because of the lack of python probably haven't heard of the change yet. It is no surprise usage is low right now. But will it ever reach a popularity level similar to what it has in google code jam?

Codeforces has had python since inception and python's numbers don't seem to match Codejam's (Although a tool like go-hero.net for codeforces language statistics would definitely be great to have, I couldn't find such thing). Codeforces and TC have an audience that is more focused on ACM-ICPC than the Codejam, though, so that skews things up. I have an alternative theory though: Hard time and memory limits like those in TopCoder and Codeforces just don't work well if you want a language agnostic contests. Making the time limits different is not a good solution either, as it would be difficult to pick good limits. What codejam does well is that, with 4 minutes/8 minutes time limits and using your computer's memory instead of google's, the constant factor is quite irrelevant. That allows plenty of more languages to work, including python.

Move to python?

I think python is a great language. But I don't think I will move to using python during contests. At least not for the time being. I am still not as good with python as I am with c++.

Also, although python has a powerful expression power. c++ has also improved a big deal with c++11. The big advantage I would have assigned to python over c++ would be the support for functions as first-class objects and closures. But now c++ has it too. So...

Remember the initial dilemma? In reality the true c++ version, using more of the available features would look like this:

int getmin(string S) 
{
    int n = S.length(), m = 0;
    for (char ch : S) {
        m = max(m, (int)count(S.begin(), S.end(), ch) );
    }
    return n - m;
}

It is still ugly, but not so bad:)

A bonus: Swapping integers

Last week, there was a image circulating around. How to swap two integers without using an additional variable. The "C version": a=a+b; b=a-b;a=a-b;. Then one of those obnoxious meme faces throws a misogynist slur and says that python has: a,b = b,a. Ahahahaha, so python is better, supposedly.

Let me introduce to you, the c++ way: swap(a,b). It has a nice advantage: It is self-commenting. It also follows [once and only once]. But the reason I like is because of what it says about c++. (a,b) = (b,a) is cool, but it is an builtin language feature that was introduced precisely to fix this problem and similar ones. The std::swap is a library function.

template <class T> void swap ( T& a, T& b )
{
  T c(a); a=b; b=c;
}

It is a result of things like templates, and also by reference arguments. Powerful features that also solve plenty of other problems and not just swap. std::swap works with any data types too - Unlike the C approach. :)

10 comments :

Oleksandr Sochka said...

In C++11 swap uses std::move which is much faster for STL containers and other classes which have move constructors. So now that looks like


T c{std::move(a)};
a = std::move(b);
b = std::move(c);

Eldar said...

Nice post!


You say that you haven't found a d1-hard solvable in python, but for SRM586's hard you claimed your Python code works in time :)

vexorian said...

That's because I tend to forget about a SRM's problems the moment I finish the editorial.

JOmegaCV said...

2500*2500 False values would be 47MB if booleans were stored in 64 bits. My guess is that they are.

In python you can use bytearrays for True/False which appears to actually only require 1 byte per element and only 49 bytes of overhead. Still not sure how to force a list of 4 byte elements.

# Size in MB
print sys.getsizeof(bytearray(2500**2)) / 2.0**20
5.96051120758

#Overhead
print sys.getsizeof(bytearray(2500**2)) - (2500**2)
49

vexorian said...

I think they need 32/64 bits minimum for each value, because a value needs type information and can represent an address too.


bytearray sounds good, the problem is that it turns the code lower level.

Swosh said...

I was expecting some performance comparison too.

vexorian said...

Basically you can't use python in topcoder without noticing it is 10x times slower if not more :/

Paddy3118 said...

The python multiple assignment you mention in `a, b = b, a` can be used for swapping variables as shown, but it is more powerful than that. It is also used for next state assignments where state is held in multiple variables and next state is an update to those same variables based on their current state.


For example:


colour, tag = next_colour(colour, tag), next_tag(tag)


Here the righthand side is computed from present values of both colour and tag *before* being assigned as the new values. It helps to stop those difficult to find errors when, for example, the next tag value is used to compute the next colour value which can happen if the assignment were split.


Given that the multiple assignment works for swapping variables too then there is no need for an explicit swap function. Learn the one thing and apply it in more places.

Awais Kamran said...

In my view, C++ is a bit hard to learn as compare to learning Python. C++ in urdu

Awais Kamran said...

I personally think that Python is much easier to learn as compared to c++ in urdu