Monday, April 16, 2012

To solve Google Codejam: Hall of Mirrors I

This is probably be the problem I'll remember the most out of those solved this year. (Don't get me wrong, this is not the hardest problem I solved this year nor it is the hardest to implement, but it was cool and I had some breakthroughs while solving it). I initially had the right idea initially (try angles, simulate movement, count correct) but just the right idea is hardly enough to solve this problem. You need to think things througj and it also helps a lot to understand many things about this problem.

The "easy" D-small

Something the official contest analysis does not say well is ... something that is supposedly explained in the statement, but you do not really understand it until trying to solve and debugging the third example case. What does it mean to count the number of images of yourself you can see? In reality, this problem asks you to count the number of starting angles for the trajectory of the light such that they will return to the starting point before traveling a distance of at most D units. Once this is understood the official analysis is useful enough.

This is supposedly easier than D-large, and in a way it is, but it is not very easy either. There were at least 15000 coders in the qualification round, but the number of people who solved this problem is around 500. And we all had 24 hours to do it.

I supposedly solved this problem during the match, and I did. But not exactly well. I discovered that there was a way to find the possible distances through a formula. But it seems that I discovered the formula in an improvised way. It turns out that the formula based on the number of bounces I explained before was wrong. Yet the overall solution is correct. Curious thing eh? Well, the thing is that although the formula is wrong, what it does is return the wrong final position (dx,dy) for each combination of vertical and horizontal bounces (bx,by), but it does so in a way that it returns the correct (dx,dy) for a different (bx,by) and that it still finds all needed pairs (dx,dy). As you can see, this mistake did not stop me from solving D-small, but in a way, it made it harder for me to solve D-large, you will see why.

The actual way to solve this problem is, again by first taking the number of horizontal bounces (bx,by). The official contest analysis does a great job explaining both how to think of the formula and why it works. A horizontal bounce can be seen as folding the whole board. You can choose to fold left or right, and thus when bx is negative, you can see it as folding left instead of right. Same about folding up or down. So, you try all pairs (bx,by) and for each of them, you find the appropriate (dx,dy). It is important not to take a pair (dx,dy) that is a multiple or divisor of a pair already picked, so that you do not pick a same direction more than once.

If you are interested in knowing what my mistake was , this is the correct formula:

             int dx, dy;
             if (bx & 1) {
                 dx = W * (bx- sign(bx) ) + 2*( (bx >= 0)?(W-x0):(-x0));
             } else {
                 dx = bx*W;
             }
             if (by & 1) {
                 dy = H * (by - sign(by) ) + 2*( (by >= 0)?(H-y0):(-y0));
             } else {
                 dy = by * H;
             }

Every pair of horizontal folds, move the point by twice W. If there is an odd amount of folds, then you have to increase (W-x0) or decrease (-x0) , depending if the folds are negative or not. For the y axis, the logic is similar.

What we can learn from D-small to solve D-large

The solution to D-small is important and the analysis made in the official site is also. Not because we will actually use something like that to solve D-large, but because we need to take some conclusions from it.

First conclusion:: When solving D-small, it is best to multiply all values by 2, so that the starting point is at a lattice point instead of fractioned coordinates. It is also useful in the large problem. Thus from now on, we will assume that D is at most 100, not 50.

Second conclusion:From D-small, you can see that the final direction (dx,dy) will always go towards another point that is also at the center of a square. And the distance between (0,0) and (dx,dy) must be less than D. This actually means that (dx,dy) will always be integer and that the integers are between -100 and 100. More so, since the distance has to be at most 100, then we are only interested in lattice (dx,dy) such that their distance to (0,0) is at most 100. This greatly bounds the number of directions we have to try so the idea to just simulate all light directions is viable. We will from now on call (dx,dy) the direction vector. You know, for every dx units we move in the x-axis we will move dy units in the y axis. Note that dx and dy must be co-prime, so because for example, (2*dx, 2*dy) represents the same direction and should not be counted again.

The direction vector determines the slope of our direction, right. But since abs(dx) and abs(dy) are at most 100. This means two different slopes are never very close to each other. This is very useful, this means that precision is not a big deal. When starting this problem, I was very worried about precision, because if we simulate the movement of light until it returns to a point, we need to know when is light intersecting with a point. An error in calculation can make the point not count as intersected. Generally, using floats/dobules to answer a question that has YES or NO as an answer (And the question, [Does going towards this direction makes you return this very point?] is one of such questions) is very risky because floats are not exact numbers. As I mentioned in the previous post, I was about to try fractions from the GMP library so I had arbitrary precision and avoid this risk, but they were too slow for this problem.

You will see that in most of the operations when simulating the trajectory of the light, the most evil floating point operation we do is division. And division will be performed against dx or dy which both are ints with abs(dx) at most 100. This makes division rather stable. Since slopes cannot be very different, then we can have a generous Epsilon when comparing floats/doubles. In fact, the epsilon I ended up using was 1e-4 !, and it works.

Another thing that shall convince you about precision not being a big problem is that the definition from the statement actually asks us not to consider light bounces when they hit the limits of a segment (mirror). There are some exceptions that we need to consider, like those corners that make the light return in the opposite direction and those corners that "destroy" the light. Though.

Part 2 - the algorithm

After we extract all those ideas from the small version, we can try to solve the large one. I decided to split this post at this place. So that it does not get too large.