## Tuesday, March 04, 2014

### SRM 611: My first python SRM

The other day, I made the foolish decision to use Python exclusive in SRMs 611 to 620. I suspected it will be bad for my rating at least initially. I am not as experienced with python as I am with c++ and there is always a good chance the admins will include a problem in the set that is unsolvable in python. On the other hand, maybe the 250 has a quick to think solution and then python's conciseness would help me score some points.

## Div1 250: The one with LCM

You are given two sets A, B, the LCM set of a set S is the set of all numbers L such that L is equal to the LCM of the numbers in a subset of S. Given A and B find if their LCM sets are equal. Numbers in each set are between 2 and 10^9, inclusive.

Tough luck, it didn't appear this was going to be a quick problem to code, and at first I had no idea what to do.

Eventually, I thought of something that appears to be a solution. Given a set X, we can get rid of any number that can be created by taking the LCM of other numbers in the set. After doing this, set X becomes the minimal set such that its LCM set will be equal to X. If we find the minimal sets of A and B, they will be equal if and only if the LCM sets are equal.

Implementing was another issue. How can you tell if a number can be made from the other numbers in the set? Well, you would need to process the numbers in decreasing order. I focused on the prime decomposition of each number, if the exponent of at least one prime number is larger than the maximum ever exponent in the numbers smaller than it, then this number shouldn't be removed. I needed to code Eratosthenes sieve in python for the first time, it probably isn't so well coded.

I made a mistake, the first implementation would ignore the smallest number in each of the sets, so it thought {7} had the same LCM set as {8}. The slow submission plus the resubmit made me score incredibly low. I am not 100% sure about this approach, but if I fail system tests it will not make a difference, my ranking in the match will be very low either way.

import math

class LCMSet:
def equal(self, A, B):

# greatest common divisor
def gcd(x,y):
while y != 0:
(x,y) = (y, x%y)
return x

# least common multiple:
def lcm(x,y):
return x * y / gcd(x,y)

# Eratosthenes siege returns an iterator of prime numbers, we don't need
# primes larger than sqrt(max( max(A), max(B) ))
def sieve():
t = int( math.sqrt(max(max(A), max(B))) + 1 )
compos = [ False ] * (t + 1)
for i in xrange(2, t + 1):
if not compos[i]:
yield i
j = i + i
while j <= t:
compos[j] = True
j += i

# make a list of prime numbers
primes = list( sieve() )

# prime decomposition:
def primeDecomp(x):
d = dict()
for p in primes:
c = 0
while x % p == 0:
x /= p
c += 1
if c > 0:
d[p] = c
if x > 1:
d[x] = 1
return d

# "compress" a set, find the minimal set that has a LCM set equal to X
def compress(X):
decomp = [ primeDecomp(x) for x in reversed(X) ]
n = len(X)
c = []
for i in xrange(n - 1):
good = False
# for each p^r in the prime decomposition:
for (p , r) in decomp[i].items():
if r > max( 0 if p not in decomp[j] else decomp[j][p] for j in xrange(i+1,n) ):
good = True
if good:
c.append( decomp[i] )
#smallest number is always there. Missing this line caused the resubmit
c.append( decomp[n-1] )
return c

return ['Equal','Not equal'][compress(A) != compress(B)]


## Div1 500: The one with spanning tree

You are given a complete graph of up to 20 vertices with weighted edges (It is given in the form of points in a map that you have to connect using a tree topology). Return the minimum standard deviation of the edges of a spanning tree.

Easy, right? Well, the 20 constraint hints me that some heavy dp or brute force is needed. I remembered a match in which we had some sort of brute force/dp through all spanning trees, but I didn't remember enough. Also, the constraint wasn't a good sign for whether the problem is solvable in python or not :/

## The end

Plenty of very different solutions for 250. I have no idea if I will pass, I expect many system test fails.

What about python? Well, this wasn't a great match for comparisons. I think I would have taken very long to code a solution in c++ too. I am currently sleep deprived, I stayed up till very late last night trying to finish something that barely passes as an editorial for SRM 610. I guess there is a good reason I am doing this for 10 SRMs and not just a couple.

## Update: Failed system tests

What a bad day. Div1 250 approach fails the following test case: [{168, 7476, 84, 890, 10, 4, 40, 3560, 37380, 8, 89, 20}, {84, 4, 20, 7476, 3560, 89, 712, 420, 74760, 1780, 14952, 8, 356, 10}], it is a wrong answer so this is likely a mistake with the algorithm and not python's fault.

## Sunday, March 02, 2014

### We all hate the ads

Let's try something. If you like this blog and find it useful, you can use gittip to motivate me. Whenever the weekly tip goes to a value greater than or equal to 3.00 USD, I'll hide the ads in the blog once I notice. If it drops below 3.00, we get ads again. If it never goes anywhere close to 3.00 or even above 0.00, it is all right too, don't worry, don't have much expectations.

## Thursday, February 27, 2014

### Python in TopCoder 2

The other day someone linked my topcoder preliminary python vs. c++ comparison in reddit (Could you please not *ever* link my blog to reddit? ) Anyway, it reminds me that now that it has been a while since Python was enabled in topcoder SRMs it is a good time to talk more about it.

It appears that the number of python topcoders stayed as low or lower than when it was released. I would love to have some actual stats, but anecdotally it appears to me C# still has more users than python (And C#'s user share is very low in comparison to Java). Which is a shame, really.

The following is a screenshot of the top 50 in the last SRM. Yellow scores are c++, Green scores are Java, Orange scores are python and White scores are C#:

So yep, it is not statistics, but this is a scenario that tends to repeat itself. The python solutions stand out because they are rare. I was expecting a bit more.

So why is this?

### A contest system designed around Java

Let's go to the point, the contest system is just not python-friendly. This all begins because it has been clearly designed with Java solutions in mind. The obvious indication of this is that it requires you to wrap the solution in a class. In Java, this is natural. In C++, it is slightly annoying, but useful because it allows you easy access to the heap. But in python, it is really just a bunch of verbosity that is not helpful at all.

Using classes in python is a bit obnoxious and it gets in the way unless you really need the OOP syntactic sugar. In python, the more declarative/functional approaches tend to be better. This means using functions. But if you put your functions inside the class, that means having to use self. syntax and include self in all your function. Ultimately it means having the self argument in places where it is not really needed. I just wish there was an alternative to this.

What if we wanted to include actual functional languages? You won't be able to force a class in Haskell. And it would be quite cool to have Haskell or even LISP as an option. Actually, I wish we had more languages in general.

But the real aspect of TopCoder that reveals that it is all about Java is the execution time limit. There is a 2.0 time limit for each test case. When problem setting, writers have to provide a Java solution that runs in less than 1.0 seconds. So time constraints are guaranteed for Java and work great for c++ because it is faster. But any language slower than Java is going to have issues and it shows.

The list of Topcoder problems that are impossible in python keeps growing. Even if a problem is possible, if it is not one of the easy problem categories, it will be usually be much harder to come up with a passing python solution than a passing c++ or Java solution. Problems like EllysCandyGame where you can easily pass in c++ with brute force, but need an exponential-time dynamic programming to pass in python.

### Poor information

How many people out there know that python is allowed in TC? It seems to still be news to plenty of people. Maybe we need to tell more people about this?

### If only

There are a couple of problems that I'd like to mention out of how frustratingly bad they are for python.

EllysPairing an egregious example. This problem has a 16MB memory limit. This means python cannot even run without failing the test case. There is no way around this issue, python simply cannot do anything about it.

What is ironic is that the solution for the problem would be very nicely implemented in python. The problem uses a list of integers that are generated randomly . The memory limit makes you unable to save the list of integers in a list/array/etc. The algorithm that solves it is linear-time majority vote which requires you to visit the sequence twice. Since you cannot save the numbers in a list, this means that you would need to generate the list twice. This makes it very easy to have messy code that repeats the random generation process in two code places. But in python, this wouldn't be a problem, because it has yield.

class EllysPairing:
def getMax(self, M, count, first, mult, add):
# generate all the random numbers, return as generator
def names():
for i in xrange( len(count) ):
x = first[i]
for j in xrange(count[i]):
yield x
x = (x * mult[i] + add[i]) & (M - 1);
p = -1
q = 0
for x in names(): #use them
if (p != x):
q -= 1
if (q <= 0):
p = x
q = 1
else:
q += 1

n = sum(count)
c = sum( x == p for x in names() ) #use them again

if (c > n / 2):
return n / 2 - (c - n/2 - n%2)
else:
return n / 2


The important part is the for x in names() , that is all that is needed to iterate through all the random numbers in the list without rewriting the code. The top c++ solution by aceofdiamonds repeats the random number generation code.

This cute solution cannot run. Even without the cruel 16 MB limit, it will be too slow in comparison to the Java solution and would time out, taking around 6 seconds in TC's servers. That's too bad.

I also want to talk about a more recent problem: MiningGoldEasy. This problem has a simple dynamic programming / memoization solution. Just notice that if , in each move you limit yourself to rows and columns that contain at least one "event", it should be optimal, so we have an O(|event|^3) solution if we remember the current number of step, and the current position. In python, I can code this:

class MiningGoldEasy:
def GetMaximumGold(self, N, M, event_i, event_j):
@MemoizedFunction
def f(x,y,s):
if s == len(event_i) :
return 0
else:
bx = max( f(nx,y,s+1) for nx in event_i )
by = max( f(x,ny,s+1) for ny in event_j )
return max(bx,by) + N + M - abs(event_i[s] - x) - abs(event_j[s] - y)

return max( max( f(x,y,0) for x in event_i) for y in event_j )


The MemoizedFunction is a decorator. A syntactic sugar that means I do f = MemoizedFunction(f) after declaring the function. MemoizedFunction() is a function that takes a function and instead returns a function that behaves exactly the same but uses memoization. It is clever:

def MemoizedFunction(f):
mem = dict()
def g(*args):
if args not in mem:
mem[args] = f(*args)
return mem[args]
return g


Since 90% of Topcoder problems are dynamic programming ones, this idea to automatically turn a function into memoization is very cool. Just take a look to the GetMaximumGold() code up there, it is almost pseudocode. A c++ equivalent looks like this:

int GetMaximumGold(int N, int M, vector<int> event_i, vector<int> event_j)
{
int t = event_i.size();

int mem[50][50][51];
memset(mem, -1, sizeof(mem));
function<int(int,int,int)> f = [&](int a, int b, int s) {
int & res = mem[a][b][s];
if (res == -1) {
if (s == t) {
res = 0;
} else {
int x = N + M
- abs( event_i[a] - event_i[s])
- abs( event_j[b] - event_j[s]);
res = 0;
// change row:
for (int na = 0; na < event_i.size(); na++) {
res = std::max(res, x + f(na,b,s + 1) );
}
// change column:
for (int nb = 0; nb < event_j.size(); nb++) {
res = std::max(res, x + f(a,nb,s + 1) );
}
}
}
return res;
};
int mx = 0;
for (int a = 0; a < event_i.size(); a++) {
for (int b = 0; b < event_j.size(); b++) {
mx = std::max(mx, f(a,b,0) );
}
}
return mx;
}


Unfortunately, the python solution above fails system tests whilst the c++ one passes in 0.1 seconds worst-case. Of course, it is likely that the python solution can be optimized to pass. Probably not using dictionaries or adding crops. But this means going lower level and thus losing the advantages python provides.