Wednesday, April 16, 2014

SRM 616 recap and python review

It was a fine match. Actually the first SRM problem set by tourist. Well, it is also the sixth match of 10 matches in my python challenge. Having finished 60% of it, I have a new perspective. At this moment, I think that when the challenge ends, in SRM 621, I might actually still use python. At least in problems where I am sure it can shine.

Div1 250 - The one with alarm clocks

Match starts, open the easy problem in division 1. Meet a simulation problem. problem statement. In this problem, you start with a value `S` at time 0. Every second, S is increased by `D`, then the alarms scheduled to run at that second ring. Each alarm that rings will decrease `S` by some special value specific to the alarm. We have the alarms periods (Between 1 and 10), first ring time (between 1 and its period) and volume (the value by which `S` is decreased). Each alarm rings at times: start, start + period, start + 2*period, etc. There are up to 50 alarms. When `S` becomes 0 or less , you wake up immediately. Find the maximum starting value of `S` such that you will eventually wake up. If no matter what value of `S` you pick, you wake up eventually, return -1.

We can combine the cycles together by taking the LCM of all their periods. Because the periods are not more than 10, the combined cycle length is not more than 2520 = 2*2*2*3*3*5*7. You can answer the question by just simulating the first LCM seconds. For more info, read the editorial.

This is a problem that does okay in python. Python can really help keep the code small in comparison to the other languages. I took some advantage of it during the match and got a good score. But in reality, my original code could be improved in many ways. For example, did you know there is a builtin GCD function?.

from fractions import gcd
def lcm(x,y):
    return x * (y / gcd(x,y))

class WakingUp:
 def maxSleepiness(self, period, start, volume, D):
    sleep = 0
    n = 0
    for i in xrange(1, reduce(lcm, period) +1 ):           # LCM is at most 2520
        sleep += D
        for (p,s,v) in zip(period, start, volume):
            if (i - s) % p == 0:
                sleep -= v
        n = min(n, sleep)
    return -1 if (sleep < 0) else -n

Many details, for example using reduce to get the lcm of all the periods. The zip function to iterate through the alarms without having to do much indexing...

A subtle one: To check if an alarm should ring at time i, we use (i - s) % p == 0. In c++ you would need : i % p == s % p. The reason is that python's % operator is not broken for negative numbers. Very handy.

For a comparison, try the fastest c++ submission. The algorithm is exactly the same, but you can see that it has to "weigh" more, it has many sources of complications. Having to use a loop to calculate the LCM of all periods, or using the for j = 0; j < n; j++ to get the special alarms.

Div1 500

After finishing div1 250, I moved to the next problem. It just seemed ... impossible. It turns out that it is just a mathy problem. I am not very good with them.

I skipped this problem during the match. Later I , of course, had to learn it to solve the editorial. Once I understood I implemented a python solution. It actually took a while to code, and I am not sure if it is really using all of python's potential. You can read the code at the editorial.

Div1 1000

I explained this problem in detail before: here

Just like there are topcoder problems that are not my style at all. There are some which are really my style. This was the rare div1 hard I actually solved the instant after reading it. Because I just like grid-bitmasks dp problems a lot.

There is a world of difference between having a correct idea for a problem and actually implementing it. I noticed the `O(n^5)` idea , while correct and possibly able to pass in c++, would fail utterly to run in time in python.

So let's dissect why the following python solution (closest I got to good performance) have much more issues passing than the c++ equivalent you can see in the editorial I linked.


class ThreeLLogo:
 def countWays(self, grid):
    n = len(grid)
    m = len(grid[0])
    def allValidMasks():
        yield 0
        for i in xrange(m-1):
            yield (1 << i) 
            for j in xrange(i+1, m-1):
                yield (1 << i) | (1 << j)
                for k in xrange(j+1, m-1):
                    yield (1 << i) | (1 << j) | (1 << k)
    MAX_MASKS = 4090
    validMasks = [0] * MAX_MASKS
    getMaskId = dict()
    t = 0
    for mask in allValidMasks():
        validMasks[t] = mask
        getMaskId[mask] = t
        t += 1
    @MemoizedFunction( [n+1,m+1,4,MAX_MASKS,2] )
    def f(x,y, need, maskid, horz):
        mask = validMasks[maskid]
        if x == n:
            # base case
            return 1 if ( (mask == 0) and (need == 0) ) else 0
        elif y == m:
            return 0 if (horz == 1) else f(x + 1,0, need, maskid, 0)
        elif (1<<y) & mask:
            if (horz == 1) or (grid[x][y] == '#'):
                #must continue the left "L" shape, but also must continue the above one
                return 0
                # 1) Finish the L shape above.
                nmask = mask ^ (1 << y)
                res  = f(x,y+1, need, getMaskId[nmask], 1)
                # 2) Continue the L shape vertically
                res += f(x,y+1, need, maskid, 0)
                return res
        elif horz == 1:
            return 0 if (grid[x][y] == '#') else sum(f(x,y+1, need, maskid, s) for s in (0,1)) 
            # 1) Do nothing
            res = f(x,y+1, need, maskid, 0)
            # 2) Do something, Create a new L shape
            if (grid[x][y] == '.') and (need > 0) and (y < m - 1) :
                nmask = mask | (1 << y)
                res += f(x,y+1, need - 1, getMaskId[nmask], 0)
            return res
    return f(0,0, 3, 0, 0) 

# decorator that adds memoization to a function using a list of bounds
# ej: @MemoizedFunction( [10, 10, 10] )
def MemoizedFunction(bounds):
    def deco(f):
        mem = [-1] * reduce( lambda a,b: a*b, bounds )
        def g(*args):
            p = 1
            k = 0
            for i in xrange(len(args)):
                (a,b) = (args[i], bounds[i])
                k = a * p + k
                p = p * b  
            if mem[k] == -1:
                mem[k] = f(*args)
            return mem[k]
        return g
    return deco

Well, things that were already mentioned, the default recursion limit is a bit large. Python is just slower than c++, mostly because its code does much more than c++. It has this dynamism that makes it so powerful but also slower. In an i5 computer, it takes around 10 seconds, while the goal is 3 seconds in TC's servers... Another large issue was the memory usage. Even with the clever decorator that makes all of the dp table contents exist within a single list, we would need around 768 MB to make all fit in memory.

During the match, I began to code a solution to the problem anyway. Because of the python challenge, I couldn't use c++. I still wanted to solve it because it was fun and also, if the python solution works in small inputs, it would mean I at least understood the problem, which would be helpful later and it would be a moral victory :). So I tried to do it, but I had bugs. Story of my life, I couldn't really finish the code until after I spent a couple of days working in the rest of the editorial..

My first code (once fixed) was actually more compact and slow than the one above:

import sys
class ThreeLLogo:
 def countWays(self, grid):
    n = len(grid)
    m = len(grid[0])
    def f(x,y, need, arms, horz):
        if x == n:
            # base case
            return 1 if ( (arms == () ) and (need == 0) ) else 0
        elif y == m:
            return 0 if (horz == 2) else f(x + 1,0, need, arms, 0)
        elif y in arms:
            if (horz == 2) or (grid[x][y] == '#'):
                #must continue the left "L" shape, but also must continue the above one
                return 0
                # 1) Finish the L shape above.
                narms = tuple( z for z in arms if z != y )
                res  = f(x,y+1, need, narms, 2)
                # 2) Continue the L shape vertically
                res += f(x,y+1, need, arms, 0)
                return res
        elif horz == 2:
            return 0 if (grid[x][y] == '#') else f(x,y+1, need, arms, 1)
            # 1) Do nothing
            res = f(x,y+1, need, arms, 0)
            # 2) Do something
            if (grid[x][y] == '.'):
                # 2.1) Create a new L shape
                if need > 0:
                    narms = tuple( sorted(arms + (y,)) )
                    res += f(x,y+1, need - 1, narms, 0)
                # 2.2) Continue the left L shape:
                if horz == 1:
                    res += f(x,y+1, need, arms, 1)
            return res
    return f(0,0, 3, (), 0) 

However, this helped me discover something cool. After writing this code, it was easy to optimize it and later translate to c++. Behaving as a human python-c++ compiler. When the python challenge ends, I might try this more often. Code a prototype in python making sure the logic works, then translate to c++ :).

The other ones

Then I wrote the editorial. Besides of the div1 hard that appears to be impossible in python, the remaining problems work just fine.

A notable example: This 3-liner solution for div2 500 which could be made into a one-liner.

class ColorfulCoinsEasy:
 def isPossible(self, values):
    limits = sorted([ (values[i+1]/values[i]) for i in range(len(values) -1) ])   ## get list of values[i+1] / values[i], sort it
    condition = all(i+1 < limi for (i,limi) in enumerate(limits))                 ## Required: for each i: i + 1 < limits[i].
    return "Possible" if condition else "Impossible"

Monday, April 14, 2014

SRM 616 hard problem: ThreeLLogo

This editorial is Since Topcoder insist on restricting the wiki for non-registered users, I will post the explanation here.

Link to problem statement

The first part of the explanation explains what the problem asks us to count. Just notice that the constraints say there will be at most 30 columns and at most 30 rows:

A change in format

This explanation will interpret the question slightly different from the statement. The purpose is that I find it easier to make explanation graphics under the following interpretation: We have a grid, some cells (white) which allow us to put shapes on them and some cells (black) that don't the "L" shapes occupy multiple cells, at least three.

Fill the grid

There are different approaches to this problem. The one intended by the problem setter is to see it as a dynamic programming problem. Instead of focusing on there being a need for just 3 L shapes and trying ad hoc counting, perhaps it is more straightforward to see this problem as a grid/bitmask dynamic programming one. Much like FourBlocks or FloorBoards

The idea is to fill the grid in row-major order. In each cell, we can leave it empty, or choose to start a L shape or perhaps continue an already started L shape. So given a row `x` and a column `y`, assume that all the previous cells (in row-major order) have already been filled.

In this example, the current cell we are deciding is green and the remaining undecided cells are red. Because there is an unfinished vertical arm of an L shape above this cell, we have two options:

  • Use cell `(x,y)` to continue the vertical arm of the L shape. We move to the next cell:

  • Use cell `(x,y)` to finish it. Then we also move to the next cell:

Assume we did the latter, now we need to decide what to do with the new cell `(x,y)`, this time the cell has a horizontally-adjacent L shape which must be finished. We have two options:

  • Continue this horizontal shape, we will need to add more of this shape in later steps:

    If we make this decision, the next step will have a blocked cell as `(x,y)`, we cannot put any shape on this cell, but due to the decision we are forced to place something, so the number of ways to fill the remaining cells correctly is 0.

  • Finish this horizontal shape:

    Let's pick this decision

After doing this, the new `(x,y)` position is a blocked cell, our only choice is to do nothing and leave this cell empty. Move to process the next cell, you will find a situation similar to the first example, we can choose to continue the above L shape or finish it, this time we'll pick to continue it.

This is another interesting situation, `(x,y)` is an empty cell, there is no vertical or horizontal L-shape that we must continue. We can leave this cell empty:

Or, because there are currently only two L shapes, we can start a whole new one:

Dynamic programming

Once we understand how is it possible to fill the grid in row-major order, we can see that, in order to count the number of valid ways, there are few things to consider:

  • `(x,y)` : The current position in the row-major order. There are `O(nm)` such pairs. We assume that all earlier cells have already been chosen.
  • `k` : The number of L shapes we still haven't created.
  • `V` : A set of the columns currently containing vertical arms of (currently) incomplete L-shapes. This is a subset of the `m` columns, but it can contain at most 3 elements, thus there are `O(m^3)` such sets. More specifically, you shouldn create L shapes at the right-most column, so `V` is only a subset of the first `m-1` columns and with some calculation you can find that there are 4090 such sets in total for `m = 30`.
  • The state of the cell to the left of `(x,y)`, does it contain an unfinished horizontal L arm? We can use the variable `"horz"` for this, if `"horz"` is 1, we must continue the horizontal shape else it is 0.

In other words, the number of ways to fill the cells painted red in the following two examples ( `(x,y)` and the later cells in row-major order ) should be exactly the same:

This allows an `O(nm^4)` approach through memoization of a recursive implementation of the idea to count the ways to fill the grid. But that is not the whole story. We need to be careful: In practice, there are 4 values for `k`, 2 values for `"horz"`, 4090 values for the `V` set and 31*31 values for the `(x,y)` (we need the base case `x = n` and the special case `y = m` to keep the code simple). Also, the result needs to be a 64 bits integer. Thus the memory usage is 4*2*4090*31*31 * 8 bytes ~= 239 MB, quite close to the 256 MB limit and thus we should avoid overhead. Also notice that slight modifications to the logic we used could have made us require more states. There is a 3 seconds limit but we should also avoid having too much of a constant factor. A key aspect of the implementation is how to deal with the `V` set. As is usual, we can use bitmasks, but of course we cannot downright use 30 bit bitmasks to store results in memory, a workaround is to index the bit masks.


vector<string> grid;
int n, m;

long mem[31][31][4][MAX_MASKS][2];
map<int,int> getMaskId;
int validMasks[MAX_MASKS];

long f(int x, int y, int need, int maskid, int horz) {
    long & res = mem[x][y][need][maskid][horz];
    if (res != -1) {
        return res;
    int mask = validMasks[maskid];
    if (x == n) {
        // base case
        res = ( ( (mask == 0) && (need == 0) ) ? 1: 0 );
    } else if (y == m) {
        res = ( (horz == 1) ? 0 : f(x + 1,0, need, maskid, 0) );
    } else if ( (1<<y) & mask ) {
        if ( (horz == 1) || (grid[x][y] == '#') ) {
            // if (horz == 1) must continue the left "L" shape,
            //  but also must continue the above one
            res = 0;
        } else {
            //1) Finish the L shape above.
            int nmask = mask ^ (1 << y);
            res  = f(x,y+1, need, getMaskId[nmask], 1);
            //2) Continue the L shape vertically
            res += f(x,y+1, need, maskid, 0);
    } else if (horz == 1) {
        if (grid[x][y] == '#') {
            res = 0;
        } else {
            res = f(x,y+1, need, maskid, 1) + f(x,y+1, need, maskid, 0);
    } else {
        //1) Do nothing
        res = f(x,y+1, need, maskid, 0);
        //2) Create a new L shape
        if ((grid[x][y] == '.') && (need > 0) && (y + 1 < m) ) {
            //2.1) Create a new L shape
            int nmask = mask | (1 << y);
            res += f(x,y+1, need - 1, getMaskId[nmask], 0);
    return res;

long countWays(vector<string> grid)
    this->grid = grid;
    n = grid.size(), m = grid[0].size();
    // generate all bitmasks with <= 3 turned on bits:
    int t = 0;
    auto addMask = [&](int mask) {  // don't mind the lambda syntax, it
        getMaskId[mask] = t;       // simplifies having to repeat this code
        validMasks[t++] = mask;    // for each number of 1 bits.
    for (int i = 0; i + 1 < m; i++) {
        addMask( 1 << i );
        for (int j = i + 1; j + 1 < m; j++) {
            addMask( (1 << i) | (1 << j) );
            for (int k = j + 1; k + 1 < m; k++) {
                addMask( (1 << i) | (1 << j) | (1 << k) );
    cout << t << " masks found for " << m << endl; 
    // run the recursive solution:
    memset(mem, -1, sizeof(mem));
    return f(0,0, 3, 0, 0);

Friday, April 04, 2014

SRM 615: Recursive, recursive

Already reached the half of this python experiment. This time , python hasn't really helped me and it wasn't an obstacle either. Quite mundane, really.

Div1 250: The one with ameba

First of all, this is a 7:00 AM match so I had a bit of issues getting brain to work at first. I didn't really understand the statement very well initially. Then the problem itself is quite special, ad hoc and cool. So it took a while. Even so, I was surprised when it turned out that tons of people had a much better score than I.

The problem (after tons of translation) is as follows:

An amoeba has size `s` , a integer. There is a list `X` of amounts of gel it met in chronologically order, so first it met `X[0]` gel, then `X[1]` gel, etc. `X` has at most 200 elements. Whenever the amoeba meets a gel with size exactly equal to its current size `s`. The ameoba is forced to eat the gel and thus double its size. `s' = 2s`. We don't know the initial size `s_0`, but we wonder how many integers `u` exist such that the final size of the ameoba cannot be equal to `u`.

All integers that do not belong to `X` are possible final sizes. For example, take a integer `a` that is not in `X` and make it the starting size. Since none of the gels the amoeba will meet is exactly `a`, the amoeba will never double its size and the final size is `a`.

So we should only care about the integers that are already in `X`. For each of those integers, answer: "Can the final size be equal to `X[i]`?".

Given a integer `a`, how do you know if it is a possible final size? Focus on the last integer in the sequence: `X[n-1]`. There are two possibilities in which the final size is `a`:

  • The size was already equal to `a` before the amoeba met gel `X[n-1]` and the amoeba didn't eat it. This requires two conditions to be true:
    • `a != X[n-1]`, because the amoeba must not eat it.
    • It is possible for the amoeba to reach size `a` after meeting the first `n-1` gels.
  • The size of the amoeba became `a` after meeting `X[n-1]`:
    • This means that `2X[n-1] = a`.
    • And also `X[i]` should be a possible size after meeting the first `n-1` gels.

So we have a recursive idea. The issue is that `n = 200`, so let's get clever. First find the set of impossible integers for the first 1 elements (easy), then use that set to build the set of impossible integers for the first 2 elements, then the first 3 elements and so and so. E.g: If you know the set of impossible elements for the first `n-1` elements, it is easy to know if `a` or `X[i]` are impossible or not to have after meeting the first `n-1` elements. This yields an `O(n^2log(n))` solution if you use a logarithmic set or `O(n^2)` if you use a hash table.

class AmebaDiv1:
 def count(self, X):
    impossible = { X[0] }
    for i in range(1, len(X)):
        impossible2 = set()
        # which are impossible for X[:i+1] ?
        for j in xrange(0,i+1):
            a = X[j]
            # is a possible?
            # a) ignore X[i],
            poss = False
            if X[i] != a and a not in impossible:
                poss = True
            # b) use X[i] to double
            if 2*X[i] == a and X[i] not in impossible:
                poss = True
            if not poss:
        impossible = impossible2
    return len(impossible)

So I said python didn't help that much. It is a very procedural algorithm and well, I think that if I did it in c++ it would take the same amount of lines. There are possibly much better python implementations.

The rest

Tried the other problems, not very thrilled about having to write editorial because div1 550 seems hard and div1 950 is incredibly complex in what it asks for. Currently waiting for the challenge phase to end. Blog post will be posted automatically once it does so.

While watching the challenge phase, I learned a new solution and the reason everyone had such fast scores. Remember that either `a` is possible in the first `n-1` or `X[i]` is possible. It follows that the only starting integers you should try and simulate are those already in `X[i]`, if you test each of those integers as the starting one, you can easily find which `X[i]` are possible. The solution is then to simply do that.