Thursday, February 27, 2014

Python in TopCoder 2

The other day someone linked my topcoder preliminary python vs. c++ comparison in reddit (Could you please not *ever* link my blog to reddit? ) Anyway, it reminds me that now that it has been a while since Python was enabled in topcoder SRMs it is a good time to talk more about it.

Poor adoption

It appears that the number of python topcoders stayed as low or lower than when it was released. I would love to have some actual stats, but anecdotally it appears to me C# still has more users than python (And C#'s user share is very low in comparison to Java). Which is a shame, really.

The following is a screenshot of the top 50 in the last SRM. Yellow scores are c++, Green scores are Java, Orange scores are python and White scores are C#:

So yep, it is not statistics, but this is a scenario that tends to repeat itself. The python solutions stand out because they are rare. I was expecting a bit more.

So why is this?

A contest system designed around Java

Let's go to the point, the contest system is just not python-friendly. This all begins because it has been clearly designed with Java solutions in mind. The obvious indication of this is that it requires you to wrap the solution in a class. In Java, this is natural. In C++, it is slightly annoying, but useful because it allows you easy access to the heap. But in python, it is really just a bunch of verbosity that is not helpful at all.

Using classes in python is a bit obnoxious and it gets in the way unless you really need the OOP syntactic sugar. In python, the more declarative/functional approaches tend to be better. This means using functions. But if you put your functions inside the class, that means having to use self. syntax and include self in all your function. Ultimately it means having the self argument in places where it is not really needed. I just wish there was an alternative to this.

What if we wanted to include actual functional languages? You won't be able to force a class in Haskell. And it would be quite cool to have Haskell or even LISP as an option. Actually, I wish we had more languages in general.

But the real aspect of TopCoder that reveals that it is all about Java is the execution time limit. There is a 2.0 time limit for each test case. When problem setting, writers have to provide a Java solution that runs in less than 1.0 seconds. So time constraints are guaranteed for Java and work great for c++ because it is faster. But any language slower than Java is going to have issues and it shows.

The list of Topcoder problems that are impossible in python keeps growing. Even if a problem is possible, if it is not one of the easy problem categories, it will be usually be much harder to come up with a passing python solution than a passing c++ or Java solution. Problems like EllysCandyGame where you can easily pass in c++ with brute force, but need an exponential-time dynamic programming to pass in python.

Poor information

How many people out there know that python is allowed in TC? It seems to still be news to plenty of people. Maybe we need to tell more people about this?

If only

There are a couple of problems that I'd like to mention out of how frustratingly bad they are for python.

EllysPairing an egregious example. This problem has a 16MB memory limit. This means python cannot even run without failing the test case. There is no way around this issue, python simply cannot do anything about it.

What is ironic is that the solution for the problem would be very nicely implemented in python. The problem uses a list of integers that are generated randomly . The memory limit makes you unable to save the list of integers in a list/array/etc. The algorithm that solves it is linear-time majority vote which requires you to visit the sequence twice. Since you cannot save the numbers in a list, this means that you would need to generate the list twice. This makes it very easy to have messy code that repeats the random generation process in two code places. But in python, this wouldn't be a problem, because it has yield.

class EllysPairing:
 def getMax(self, M, count, first, mult, add):
    # generate all the random numbers, return as generator
    def names():
        for i in xrange( len(count) ):
            x = first[i]
            for j in xrange(count[i]):
                yield x
                x = (x * mult[i] + add[i]) & (M - 1);
    p = -1
    q = 0
    for x in names(): #use them
        if (p != x):
            q -= 1
            if (q <= 0):
                p = x
                q = 1
        else:
            q += 1
    
    n = sum(count)
    c = sum( x == p for x in names() ) #use them again
    
    if (c > n / 2):
        return n / 2 - (c - n/2 - n%2)
    else:
        return n / 2

The important part is the for x in names() , that is all that is needed to iterate through all the random numbers in the list without rewriting the code. The top c++ solution by aceofdiamonds repeats the random number generation code.

This cute solution cannot run. Even without the cruel 16 MB limit, it will be too slow in comparison to the Java solution and would time out, taking around 6 seconds in TC's servers. That's too bad.

I also want to talk about a more recent problem: MiningGoldEasy. This problem has a simple dynamic programming / memoization solution. Just notice that if , in each move you limit yourself to rows and columns that contain at least one "event", it should be optimal, so we have an `O(|event|^3)` solution if we remember the current number of step, and the current position. In python, I can code this:

class MiningGoldEasy:
 def GetMaximumGold(self, N, M, event_i, event_j):
    @MemoizedFunction
    def f(x,y,s):
        if s == len(event_i) :
            return 0
        else:
            bx = max( f(nx,y,s+1) for nx in event_i )
            by = max( f(x,ny,s+1) for ny in event_j )
            return max(bx,by) + N + M - abs(event_i[s] - x) - abs(event_j[s] - y)
     
    return max( max( f(x,y,0) for x in event_i) for y in event_j )

The MemoizedFunction is a decorator. A syntactic sugar that means I do f = MemoizedFunction(f) after declaring the function. MemoizedFunction() is a function that takes a function and instead returns a function that behaves exactly the same but uses memoization. It is clever:

def MemoizedFunction(f):
    mem = dict()
    def g(*args):
        if args not in mem:
            mem[args] = f(*args)
        return mem[args]
    return g

Since 90% of Topcoder problems are dynamic programming ones, this idea to automatically turn a function into memoization is very cool. Just take a look to the GetMaximumGold() code up there, it is almost pseudocode. A c++ equivalent looks like this:

int GetMaximumGold(int N, int M, vector<int> event_i, vector<int> event_j)
{
    int t = event_i.size();
    
    int mem[50][50][51];
    memset(mem, -1, sizeof(mem));        
    function<int(int,int,int)> f = [&](int a, int b, int s) {
        int & res = mem[a][b][s];
        if (res == -1) {
            if (s == t) {
                res = 0;
            } else {
                int x = N + M 
                      - abs( event_i[a] - event_i[s])
                      - abs( event_j[b] - event_j[s]);
                res = 0;
                // change row:
                for (int na = 0; na < event_i.size(); na++) {
                    res = std::max(res, x + f(na,b,s + 1) ); 
                }
                // change column:
                for (int nb = 0; nb < event_j.size(); nb++) {
                    res = std::max(res, x + f(a,nb,s + 1) ); 
                }
            }
        }
        return res;
    };
    int mx = 0;
    for (int a = 0; a < event_i.size(); a++) {
        for (int b = 0; b < event_j.size(); b++) {
            mx = std::max(mx, f(a,b,0) );
        }
    }
    return mx;
}

Unfortunately, the python solution above fails system tests whilst the c++ one passes in 0.1 seconds worst-case. Of course, it is likely that the python solution can be optimized to pass. Probably not using dictionaries or adding crops. But this means going lower level and thus losing the advantages python provides.

Google's Code Jam

Again, this all contrasts with the Google Code Jam, in which python has almost as many users as Java. Since in google jam there is no memory limit (as long as it runs in your computer) and the time limit is very generous (4~8 minutes for around 100 test cases as opposed to 2 seconds for each test case). Then python has a more healthy following in Code Jam. Shall TC make changes to become more language agnostic? It is possible for them to do that without losing what makes them unique? I have no idea.

SRM 611-620 Challenge

With all this in mind, let's do something fun! In the next 10 SRMs, I will use only python during each match. Repeat, I will be forbidden from submitting anything in c++. This will be the first time I use a language other than c++ during a match. Let's see what happens... I will try to make detailed blog posts about how it is effecting (helping or being an obstacle) my performance.

If anyone want to join the challenge, just post a comment here or make a blog post of your own.

Saturday, February 08, 2014

vexorian FAQ

I keep getting some questions so I am gonna leave answers here.

Q: What tool do you use to draw your images?

Whenever I have no idea how to explain a problem, I use images in the editorial / problem statement. My images tend to look pretty darn good. So it is not uncommon I get this question. The short answer is : I use inkscape.

The long answer is, that as you might figure out, inkscape is actually just a simple vector graphics software that is not even a full tool like Adobe illustrator. I use Inkscape because it is free and cross platform. Cross platform is important because I use Linux. Free is important because I avoid promoting proprietary software and a very effective way to avoid promoting proprietary SW is not to use it. I am not a graphical designer, so I can easily afford using Inkscape instead of a costly tool with ethically dubious licenses.

But I think the question is really: How do I make images look so good in editorials. Well, first of all, I have had plenty of Inkscape practice before becoming an editorial writer. I had to learn a ton of Inkscape when I was making graphics for a game that is called Xye, eventually, I made my brother the graphics designer make a new, more advanced skin, but I still like the graphics I was able to make... Of course, when I started writing editorials, my inkscape skills weren't so good either. I still needed to have more and more practice. I am far of being an inkscape expert, but I did end up making quite a large number of images that explain algorithms. So there is that.

The other part of the equation besides practice is that I make sure that the images are rendered as vector graphics instead of bitmaps. This makes the images look good regardless of scale. My editorials have plenty of javascript and CSS tweaks that make the images use SVG format and scale them up or down depending on the user's resolution. Albeit, it is a bit pointless at the moment because TopCoder's wiki site does not have a mobile mode :/ But if your PC uses very large resolution, you might notice something :)

Q: Do you even have a job?

(Or variations like "where do you work?", "Do you work for Google"* , "Do you study?" )

I have a job, I am the editorial writer at TopCoder.com !. Really, that and the occasional SRM problem set is all I am doing for a living right now. It is not the most profitable job ever. And I am only an "independent contractor" of TC, meaning I have no fixed salary, benefits, protections or responsibilities (besides delivering "deliverables" in a decent time) In fact, it is getting extremely time-consuming lately with SRMs getting a lot harder and more frequent than they used to be. But it is okay to me, and it fixes my needs. It allows me to earn money by programming in mostly c++, programming cool little problems as opposed to software (I HATE software development).

I also supposedly study. In my case University was a terrible, terrible thing that didn't work out very well. I am extremely late in graduating. I don't even care anymore, to be honest.

* Actual question happened at least 6 times, for some reason.

Q: Why don't you have a red rating in topcoder?

I am just not that good during contests. I actually constantly struggle to solve the division 1 hards and mediums before being able to explain them. Specially lately, every tutorial has been a little miracle. In this very moment I am facing issues solving a division 1 EASY problem!

Q: Why don't you participate in Codeforces / Codechef / ACM / etc

  • I try to participate in Codeforces. I don't love it that much but I sort of like codeforces. The contests are more challenging (because Codeforces' division 1 has higher density of uber good coders than TopCoder's division 1 has). Although the probability to find problems that are repetitive or annoying is also slightly larger. The main reason I don't join more Codeforces contests is time availability. Specially with TC editorials taking a lot of my time. When I am not writing editorials I am a bit tired to be participating in more contests :/. Also, about 1/2 of the time, the time slot is quite incompatible with me. Sunday matches, for example, are very hard for me to do.

  • Codechef has similar issues. It is in third place so it has much less priority than Codeforces - If I only had time available for one contest and I had to choose between Codeforces and Codechef, I would choose Codechef - so it is less likely I'll participate. And the times Codechef picks ARE TERRIBLE FOR ME! Like I mentioned, Sundays are a big no for me, so that means no Cook-Offs. Solving the 15-day monthly challenge sounds like a Herculean task considering I possibly already spend 15 days a month writing editorials :/. I am also not a big fan of them using ACM rules for cook-offs (see below).
  • ACM / ICPC / their online judges. Did I ever mention I hate the ACM contests? They have these terrible format/scoring rules where problems letters are randomly-assigned. The time you take in the first problem is weighted excessively in the final rank; Groups are mandatory; The problems are low quality (They have been improving a bit , but not enough); etc... :/
  • Hackercup: I would need to join facebook to participate in this one. And that's not going to happen.

Q: How do you solve $HOMEWORK_PROBLEM?

I am not a forum, I barely have time to solve problems that I get paid to explain. Could you avoid contacting me personally to solve a specific problem? Unless it is a problem I wrote and no one has submitted a correct solution in the practice room / spoj... Well, I am not saying that you should *never* send these problems, but don't expect me to be very helpful. Many times I do try to solve whatever problem people sent me, but remember: I am not that good at solving problems! Many times I have no idea how to solve the random problems you send me :/