Wednesday, April 16, 2014

SRM 616 recap and python review

It was a fine match. Actually the first SRM problem set by tourist. Well, it is also the sixth match of 10 matches in my python challenge. Having finished 60% of it, I have a new perspective. At this moment, I think that when the challenge ends, in SRM 621, I might actually still use python. At least in problems where I am sure it can shine.

Div1 250 - The one with alarm clocks

Match starts, open the easy problem in division 1. Meet a simulation problem. problem statement. In this problem, you start with a value `S` at time 0. Every second, S is increased by `D`, then the alarms scheduled to run at that second ring. Each alarm that rings will decrease `S` by some special value specific to the alarm. We have the alarms periods (Between 1 and 10), first ring time (between 1 and its period) and volume (the value by which `S` is decreased). Each alarm rings at times: start, start + period, start + 2*period, etc. There are up to 50 alarms. When `S` becomes 0 or less , you wake up immediately. Find the maximum starting value of `S` such that you will eventually wake up. If no matter what value of `S` you pick, you wake up eventually, return -1.

We can combine the cycles together by taking the LCM of all their periods. Because the periods are not more than 10, the combined cycle length is not more than 2520 = 2*2*2*3*3*5*7. You can answer the question by just simulating the first LCM seconds. For more info, read the editorial.

This is a problem that does okay in python. Python can really help keep the code small in comparison to the other languages. I took some advantage of it during the match and got a good score. But in reality, my original code could be improved in many ways. For example, did you know there is a builtin GCD function?.

from fractions import gcd
def lcm(x,y):
    return x * (y / gcd(x,y))

class WakingUp:
 def maxSleepiness(self, period, start, volume, D):
    sleep = 0
    n = 0
    for i in xrange(1, reduce(lcm, period) +1 ):           # LCM is at most 2520
        sleep += D
        for (p,s,v) in zip(period, start, volume):
            if (i - s) % p == 0:
                sleep -= v
        n = min(n, sleep)
    return -1 if (sleep < 0) else -n

Many details, for example using reduce to get the lcm of all the periods. The zip function to iterate through the alarms without having to do much indexing...

A subtle one: To check if an alarm should ring at time i, we use (i - s) % p == 0. In c++ you would need : i % p == s % p. The reason is that python's % operator is not broken for negative numbers. Very handy.

For a comparison, try the fastest c++ submission. The algorithm is exactly the same, but you can see that it has to "weigh" more, it has many sources of complications. Having to use a loop to calculate the LCM of all periods, or using the for j = 0; j < n; j++ to get the special alarms.

Div1 500

After finishing div1 250, I moved to the next problem. It just seemed ... impossible. It turns out that it is just a mathy problem. I am not very good with them.

I skipped this problem during the match. Later I , of course, had to learn it to solve the editorial. Once I understood I implemented a python solution. It actually took a while to code, and I am not sure if it is really using all of python's potential. You can read the code at the editorial.

Div1 1000

I explained this problem in detail before: here

Just like there are topcoder problems that are not my style at all. There are some which are really my style. This was the rare div1 hard I actually solved the instant after reading it. Because I just like grid-bitmasks dp problems a lot.

There is a world of difference between having a correct idea for a problem and actually implementing it. I noticed the `O(n^5)` idea , while correct and possibly able to pass in c++, would fail utterly to run in time in python.

So let's dissect why the following python solution (closest I got to good performance) have much more issues passing than the c++ equivalent you can see in the editorial I linked.

sys.setrecursionlimit(10**6)

class ThreeLLogo:
 def countWays(self, grid):
    n = len(grid)
    m = len(grid[0])
    
    def allValidMasks():
        yield 0
        for i in xrange(m-1):
            yield (1 << i) 
            for j in xrange(i+1, m-1):
                yield (1 << i) | (1 << j)
                for k in xrange(j+1, m-1):
                    yield (1 << i) | (1 << j) | (1 << k)
    
    MAX_MASKS = 4090
    validMasks = [0] * MAX_MASKS
    getMaskId = dict()
    t = 0
    for mask in allValidMasks():
        validMasks[t] = mask
        getMaskId[mask] = t
        t += 1
     
    @MemoizedFunction( [n+1,m+1,4,MAX_MASKS,2] )
    def f(x,y, need, maskid, horz):
        mask = validMasks[maskid]
        if x == n:
            # base case
            return 1 if ( (mask == 0) and (need == 0) ) else 0
        elif y == m:
            return 0 if (horz == 1) else f(x + 1,0, need, maskid, 0)
        elif (1<<y) & mask:
            if (horz == 1) or (grid[x][y] == '#'):
                #must continue the left "L" shape, but also must continue the above one
                return 0
            else:
                # 1) Finish the L shape above.
                nmask = mask ^ (1 << y)
                res  = f(x,y+1, need, getMaskId[nmask], 1)
                # 2) Continue the L shape vertically
                res += f(x,y+1, need, maskid, 0)
                return res
        elif horz == 1:
            return 0 if (grid[x][y] == '#') else sum(f(x,y+1, need, maskid, s) for s in (0,1)) 
        else:
            # 1) Do nothing
            res = f(x,y+1, need, maskid, 0)
            # 2) Do something, Create a new L shape
            if (grid[x][y] == '.') and (need > 0) and (y < m - 1) :
                nmask = mask | (1 << y)
                res += f(x,y+1, need - 1, getMaskId[nmask], 0)
            return res
     
    return f(0,0, 3, 0, 0) 


# decorator that adds memoization to a function using a list of bounds
# ej: @MemoizedFunction( [10, 10, 10] )
def MemoizedFunction(bounds):
    def deco(f):
        mem = [-1] * reduce( lambda a,b: a*b, bounds )
        def g(*args):
            p = 1
            k = 0
            for i in xrange(len(args)):
                (a,b) = (args[i], bounds[i])
                k = a * p + k
                p = p * b  
            if mem[k] == -1:
                mem[k] = f(*args)
            return mem[k]
        return g
    return deco

Well, things that were already mentioned, the default recursion limit is a bit large. Python is just slower than c++, mostly because its code does much more than c++. It has this dynamism that makes it so powerful but also slower. In an i5 computer, it takes around 10 seconds, while the goal is 3 seconds in TC's servers... Another large issue was the memory usage. Even with the clever decorator that makes all of the dp table contents exist within a single list, we would need around 768 MB to make all fit in memory.

During the match, I began to code a solution to the problem anyway. Because of the python challenge, I couldn't use c++. I still wanted to solve it because it was fun and also, if the python solution works in small inputs, it would mean I at least understood the problem, which would be helpful later and it would be a moral victory :). So I tried to do it, but I had bugs. Story of my life, I couldn't really finish the code until after I spent a couple of days working in the rest of the editorial..

My first code (once fixed) was actually more compact and slow than the one above:

import sys
sys.setrecursionlimit(10**6)
class ThreeLLogo:
 def countWays(self, grid):
    n = len(grid)
    m = len(grid[0])
    @MemoizedFunction
    def f(x,y, need, arms, horz):
        if x == n:
            # base case
            return 1 if ( (arms == () ) and (need == 0) ) else 0
        elif y == m:
            return 0 if (horz == 2) else f(x + 1,0, need, arms, 0)
        elif y in arms:
            if (horz == 2) or (grid[x][y] == '#'):
                #must continue the left "L" shape, but also must continue the above one
                return 0
            else:
                # 1) Finish the L shape above.
                narms = tuple( z for z in arms if z != y )
                res  = f(x,y+1, need, narms, 2)
                # 2) Continue the L shape vertically
                res += f(x,y+1, need, arms, 0)
                return res
        elif horz == 2:
            return 0 if (grid[x][y] == '#') else f(x,y+1, need, arms, 1)
        else:
            # 1) Do nothing
            res = f(x,y+1, need, arms, 0)
            # 2) Do something
            if (grid[x][y] == '.'):
                # 2.1) Create a new L shape
                if need > 0:
                    narms = tuple( sorted(arms + (y,)) )
                    res += f(x,y+1, need - 1, narms, 0)
                # 2.2) Continue the left L shape:
                if horz == 1:
                    res += f(x,y+1, need, arms, 1)
            return res
     
    return f(0,0, 3, (), 0) 

However, this helped me discover something cool. After writing this code, it was easy to optimize it and later translate to c++. Behaving as a human python-c++ compiler. When the python challenge ends, I might try this more often. Code a prototype in python making sure the logic works, then translate to c++ :).

The other ones

Then I wrote the editorial. Besides of the div1 hard that appears to be impossible in python, the remaining problems work just fine.

A notable example: This 3-liner solution for div2 500 which could be made into a one-liner.

class ColorfulCoinsEasy:
 def isPossible(self, values):
    limits = sorted([ (values[i+1]/values[i]) for i in range(len(values) -1) ])   ## get list of values[i+1] / values[i], sort it
    condition = all(i+1 < limi for (i,limi) in enumerate(limits))                 ## Required: for each i: i + 1 < limits[i].
    return "Possible" if condition else "Impossible"

No comments :