Tuesday, January 29, 2013

SRM 568: Last second

This was hard. The division 1 medium problem was solved by very few people. It seems that problems are becoming harder.

Div1 500: The one with a matrix

A matrix with numbers in it is called nice, if every perfect matching between rows and columns gives the same sum of matched cells. In other words, in a NxN matrix pick N cells such that no two cells share a row and a column, all sets of picked cells possible must yield the same sum of values written in the picked cells.

You are given a NxN matrix , which contains digits from 0 to 9 or -. Count the number of ways to fill each of the - in the matrix with non-negative (possibly greater than 9) integer numbers such that the matrix is nice. It is guaranteed that the number of ways is finite.

... finite. It seems that for the number of ways to be finite, then there should be initially a set of N cells that do not share row/column that do not contain any - character in them. The sum of this set will be the same sum shared by all the sets and can be at most 50*9 = 450. That's about the only progress I could make.

In order for there to be only one possible sum, then the min-cost matching and the max-cost matching of the matrix must be equal. That's it. There are probably many other ideas and the solution is hopefully simple, but I have no idea.

I tried many things during the match. Kept thinking. Opened div1 1000. Eventually I spent my time until the 10 last minutes of the coding phase watching the division summary. It was clear that I was not the only one having difficulty with this problem. Petr was the only coder to have a solution for a looong while. Some blue coders were solving the hard problem, but I doubted those solutions would stay.

Div1 250: The one with colored balls

Less than 10 minutes left. Faithful to my suicidal strategy, I open 250. It seemed like most people would only have a score in this problem, and thus speed would have been quite crucial. From the scores it seemed that taking less than 10 minutes was doable, so I put all my effort.

There are N boxes, each with at least 1 ball and at most 1000000 of each color (red, green or blue). A single step consists of taking a ball from a box and putting it in another box. What is the minimum number of steps needed to make it so no box contains balls of more than one color?

The first idea you can have is to, for each box, pick the color that has the maximum number of balls, and move the other balls to other boxes. The problem is to where?. You can assume that you just move those balls to boxes that will end with that color only, so you can add up, for each box, the number of balls of the colors that are not the maximum in the box. There is an issue with this logic, for example, what happens when all boxes have more red balls than the other colors? Then we cannot move the blue and green balls from each box to somewhere else, because there is no such box.

We can overcome this issue by first fixing three boxes: One that will forcefully have only red balls, another for green balls and another for blue balls. We first need to take the balls with the wrong color from each of these boxes. Then we can continue with the remaining boxes. For each box i, we can take the green balls to the fixed green box, the red balls to the red box, and the blue balls to the blue box . But we can leave one of the groups of balls, the maximum for each box unmoved. This will minimize the number of balls needed. Repeat for every possible combination of 3 balls that are assigned red, blue or green and the minimum possible is the answer.

I finished this when there were 6 minutes left, too bad that there was some mistake in the code stopping me from passing the example cases. I had no idea what it was. Until the timer reached 20 seconds, and I suddenly noticed! In a part I had res += instead of c +=. I corrected the bug and compiled and submitted without testing (20 seconds left!). I was able to submit it almost at the last second!. And it passed.

int minOperations(vector <int> red, vector <int> green, vector <int> blue)
{
pair<int,int> res = make_pair(2100000000,-1); //std::pair is good like this
int n = red.size();
for (int fr = 0; fr < n; fr++) {
for (int fg = 0; fg < n; fg++) if (fr != fg) {
for (int fb = 0; fb < n; fb++) if (fb != fg && fb != fr) {
int moves = 0;
// fr is a box that is forced to be red
// fg is a box that is forced to be green
// fb is a box that is forced to be blue

// Move balls of the wrong color from each of the picked boxes:
moves += green[fr] + blue[fr];
moves += green[fb] + red[fb];
moves += blue[fg] + red[fg];
// For each other other boxes, leave the maximum color and move
// the rest to their assigned boxes:
for (int i=0; i<n; i++) if (i != fr && i != fg && i != fb) {
int x = std::max(red[i], std::max(blue[i] , green[i]) );
moves += red[i] + blue[i] + green[i] - x;
}

res = std::min(res, make_pair(moves,moves) );
}
}
}
return res.second;
}

This is exactly what I wanted to happen ever since I started using that strategy. If I opened the problems in the [Correct] order, I would have probably spent more time in 250 instead of less than 10 seconds (The pressure of the time reaching 0:00 made me wake up and notice the issue), so I would have acquired a far worse score. Then I would not have even had a chance to try the medium problem.

Comments?

Saturday, January 26, 2013

HTTPS Everywhere ruleset for topcoder (Updated)

At this moment topcoder is my only source of revenue. Things like how, even if you manually use https, it tends to redirect you to a version with no https in some links, really bother and worry me.

Today I thought of Firefox's HTTPS Everywhere plugin (Also for Chrome). And tried to look how to make it do its magic on TopCoder.com and its subdomains. Surprise! it works:

<ruleset name="TopCoder">
<target host="topcoder.com" />
<target host="*.topcoder.com" />

<!-- https://forums.topcoder.com does not exist, they should remove that link...-->
<rule from="^https?://forums\.topcoder\.com/" to="https://apps.topcoder.com/forums/"/>
<!-- Will the earth still spin if there were no regular expressions? -->
<rule from="^http://([^.]+\.)?topcoder\.com/" to="https://$1topcoder.com/"/>
</ruleset>

For info about how to install it, take a look at this guide: https://www.eff.org/https-everywhere/rulesets.

Update I knew it was way too easy, there were bugs with the forums. subdomain. This new version should fix it.

Thursday, January 24, 2013

Editorial for TopCoder SRM 567

We have changed the way I release topcoder editorials, now I first submit it to the wiki, then the admins publish it. This saves up the time that was spent between me sending the editorial by email and an admin submitting it manually to the wiki and uploading pictures and fixing the links himself... So editorials shall be available a bit faster than before. In fact, you can access http://apps.topcoder.com/wiki/display/tc/SRM+567 right now to read the editorial without waiting for [[rng_58]]'s post.

Regarding the editorial. I think the explanation for div1 1000 really sucks. It is really a challenging problem to explain. The solution is seemingly easy and even trivial (just memoization). But go ahead and try to prove that it works, or to explain how to come up with the idea, and it really broke my brain.

Since the last two editorials had huge delays, I decided to not wait till I can find the perfect explanation for this problem and instead release something that is mostly ok (but sucks). At least this time we did not wait 1 week for the editorial...

Monday, January 21, 2013

SRM 567: Not blue yet

I finished it on Saturday's morning and was published the Sunday's afternoon. It was a long week. So of course, there was little to no time till SRM 567. Another editorial I will have to write... BTW, stay tuned because I might publish the editorial in parts and will announce in this blog whenever the easier parts of the editorial are ready.

I was wondering if I would get blue rating today. I knew that if I got another 0.00 score using my new self-destructive "strategy" I would most likely become blue today. But I was lucky that today was one of those matches in which I get to solve the division 1 medium...

Div1 medium: The one with strings.

Two players play a game using a set of strings/words. At the beginning of the game, player 1 picks one of the strings, and can permute it. He also gets to permute the alphabet. Then player 2 picks another string that he can also permute it. If player 1's picked string is lexicographically earlier (USING THE PERMUTATION OF THE ALPHABET) than player 2's then player 1 wins. Given an array of strings, return the indexes of the strings which, if picked by player 1 will allow the player to 1, assuming he picks the best possible alphabet and player 2 plays optimally.

It is all about the way player 1 permutes the alphabet. Afterwards, the optimal move for both players is to sort their strings using the picked alphabet. Player 2 has the advantage that he can basically sort all of the strings (other than player 1's) and then pick the the string that is best when sorted.

Let us say the alphabet is picked and the letter a[i] is the i-th letter of the alphabet. Then the string that is lexicographically earliest will be the one such that, in the earliest index i such that count(string1, a[i]) != count(string2, a[i]), has the character a[i] the largest number of times. If both strings have each letter an equal number of times, then player 1 cannot win.

At first it seemed like it is always better for player 1 to give smaller indexes in the permutation to the letters that exist a longer number of times in his string. This approach is wrong. It also seemed wrong, I had the feeling it would not be so easy. Yet it passes the example cases, which made me figure out that it would be easy to find wrong solutions.

A challenge case for the wrong approach: {"xvvvvv","vvvvxx"} When player 1 is given the second string, it would be unwise for him to pick alphabet "vx...", he would instead win with "xv...".

My quickest idea was: for each index i, pick the letter that appears the smallest number of times in string 1, and still does not cause a failing condition (One of the remaining strings is lex earlier). It is actually not necessary to pick the letter that appears the smallest number of times. After picking a letter, delete/disable the strings that are already lexicographically larger than player 1's string. This simple approach is enough to pass. In fact, I think that many people failed the problem because of attempts to make it a stronger algorithm.

Proof? When a character appears more times in another string than in the player 1 string, then we cannot use this character until we use another character that removes this other string. If it is possible to remove the string, then we will eventually do it. If the character appears an equal number of times, then using it or not will not really help us to remove the string.

vector <int> getWinningStrings(vector <string> S)
{
int n = S.size();
int counts[n][26];
for (int i=0; i<n; i++) {
fill(counts[i], counts[i]+26, 0);
for (int j=0; j<S[i].size(); j++) {
counts[i][ S[i][j] - 'a' ]++;
}
}
vector<int> res;
for (int i=0; i<n; i++) {
vector<bool> usedAlpha(26, false);
vector<bool> active(n, true);
active[i] = false;
int t = 0;
while (t < S[i].size()) {
int pick = -1;
// Find a good letter we can use:
for (int a=25; a>=0; a--) if (!usedAlpha[a]) {
bool good = true;
for (int j=0; j<n; j++) if (active[j]) {
good &= (counts[i][a] >= counts[j][a]);
}
if (good) {
t += counts[i][a];
usedAlpha[a] = true;
pick = a;
for (int j=0; j<n; j++) {
// disable any string that has this letter
// a smaller number of times:
active[j] = active[j] && (counts[i][a] == counts[j][a]);
}
break;
}
}
if (pick == -1) {
break;
}
}
bool won = true;
for (int j=0; j<n; j++) {
won &= !active[j] ;
}
if (won) {
res.push_back(i);
}
}
return res;
}

Div1 250: The one with square roots

I had free time after solving div1 500. But I did not open div1 250. The rules are that I must only open div1 250 when there are less than 10 minutes of coding phase left. No matter if I solved the other problems. I tried to solve div1 1000. I also had breakfast and those things. Making sure not to open this problem.

Given N, M: Count the number of pairs (a,b) such that (1 <= a <= N <= 77777) , (1 <= b <= M <= 77777) and (sqrt(a) + sqrt(b))2 is a integer.

By using notable products we have that: (sqrt(a) + sqrt(b))2 = a + b +2*sqrt(a*b) which will be a integer if and only if a*b is a perfect square. So we need to count the number of pairs (a,b) in which a*b is a perfect square.

I made the mistake of wasting my only 10 minutes in what seems to be the wrong direction to handle the problem : I tried to first generate the perfect squares between 1 and 77777*77777 and then trying to count the pairs (a,b). I could not really optimize it greatly during 10 minutes.

The solution that I like, takes a as a starting point instead. You pick a number a from 1 to N and then try to count the number of valid bs that would make a*b a perfect square.

The prime factorization of a perfect square is such that every exponent is even (Every prime factor appears an even number of times). It is easy to factorize a, and you will find that some prime factors appear an odd number of times in a. b must be a multiple of the products of these prime factors for it to work. So let us say that b = c * (multiple of necessary prime factors).

The total product is a * c * (necessary prime factors). We know that a*(necessary prime factors) is a perfect square. Thus c will also have to be a perfect square. In fact, it is easy to try all numbers 1,2,... sqrt(M/(a*necessary prime factors)), find c and thus generate b.

int countPairs(int N, int M)
{
int result = 0;
for (int a=1; a<=N; a++) {
int x = a;
int req = 1;
// Find the prime factors of a (and their exponents)
// in O(sqrt(a)):
for (int y = 2; y <= x/y; y++) {
if (x % y == 0) {
int e = 0;
do {
x /= y;
e++;
} while (x % y == 0);
// req will hold the product of all odd prime factors:
if (e % 2 == 1) {
req *= y;
}
}
}
// If x > 1, then x is another prime factor of a, that appears once (odd)
if (x > 1) {
req *= x;
}
//b = req * (a perfect square)
// try each y (square roots of the perfect squares)
for (int y = 1; y <= M/y && y*y <= M/req; y++) {
result++;
}
}
return result;
}

A complexity analysis: Extracting the prime factors for each a is O(N*sqrt(N)). Note that result++ will happen as many times as the result for N,M does. The number of operations is O(N*sqrt(N) + f(N,M)) where f(N,M) is the number of pairs that make perfect squares. We can test that f(77777,777777) is around 50000. So all is fine. The approach needs 0.1 seconds in the worst case.

Thursday, January 17, 2013

In case you are wondering why that editorial was taking so long

Since some TopCoder matches ago, I was given the responsibility of being the editorial writer for all Topcoder matches. Too bad that my last two experiences did not go very well in regards to the time these editorials are taking.

In SRM 565, the main cause of issues was the division 1 Hard. I think it was one of the hardest problems of 2012, if not the most confusing one. SRM 565 finished it on a Thursday, and I intended to get the editorial ready the next Friday. Because, mind you, the time of the year was a very complicated one and I had to do a ton of shopping in Saturday and stuff...

The plan failed. I could not really do much about understanding the hard problem until the Sunday. The Sunday was a day in which I was forced beyond my will to spend a good chunk away of a computer with extended family. Meh. But the good thing is that the empty time was a good chance to get to think about the problem. Points, trees, paths, distances. I got something that *looked* like an idea. I could only begin to test it until that Sunday night though. If you head to the editorial the part that I got correct that day was the section [Variation with a fixed center].

When I implemented, I noticed some issues with the other variation, I tried hard to fix them. I actually got it correct. But I was failing one of the big, random test cases. I spent the next two days working on it. Those two days happened to be Christmas eve and Christmas :).

It turns out that I underestimated the "2 roots" version greatly. The recursive part from the editorial was not in the first draft of the solution. I finally found the challenge case that would defeat my first wrong solution during Christmas, it was ok.

SRM 566 now turns out to have a similar fate. But at least I have finally solved the div1 hard.

Div1 hard: The one with penguins and polygons

There are numPosts (at most 222) posts scattered around the border of a circle at equidistant angles. There are also penguins (at most 50) of possibly different colors at some fixed positions. The objective is to create closed polygons (using the posts as vertices) in such a way that a) All polygons contain at least one penguin. b) No two segments cross. c) All penguins of each color are inside one of the polygon (A polygon may contain various colors). The objective is to return the total number of ways to create these polygons modulo 100007. Two ways to create polygons are different if a segment is present in one and not in the other.

The solution is not that hard conceptually. It is typical dynamic programming. But it is hard to think of it, understand it and also cover all possible corner cases. When you solve a counting problem using dynamic programming and there is a circle involved, you need to be careful not to count the same thing twice...

I will not explain it fully right now because that bellongs to the editorial. But the solution requires two main ideas:

  • There are O(numPosts^2) possible segments. If a segment separates two penguins of the same color (one penguin is at one side to the segment and the other at the other side) then this segment is certainly invalid. In fact, it does not matter what you do with the polygons, as long as you only use segments that are not invalid, the polygons will always contain all the penguins of one color. Once you identify the invalid segments, the problem becomes about counting the ways to create polygons such that each polygon uses only valid segments and contains at least one penguin.
  • To solve the new problem, think about how to create a function f(l,r) that counts the total number of ways to have polygons using only the posts with indexes between l and r, inclusive. An easy way to avoid issues is to, pick the index of the smallest vertex that will be used and the index of the vertex it will be connected to. This already creates a sub problem of the form (nr + 1, r) (more details in the editorial). But then you have to solve g(l,r): The total number of ways to create a polygon that starts with l and finishes with r. While you create this polygon, you create new subproblems f(l,r).

I was barely able to come up with this solution yesterday, and I had to spend a lot of time debugging. Until I ran out of ideas about how to fix the solution. I figured many corner cases in the process, but I reached a moment in which the only example case I was failing was a big one that I could not debug.

I spent all this time, since last afternoon to now, trying to come up with the reason. One last logic flaw in my polygon counting. One last small bug, maybe? The more time I spent, the more convinced I was that the logic was correct. So it must be a small issue. I could not help but notice that I was only failing large cases, and could not find a small case that was wrong.

- Maybe it is overflow?

-"NAH" I foolishly replied myself. When I first read the statement, I noticed that the modulo value is 100007 instead of 1000000007. Such a small value gave me the idea that the admins decided to use a small value because that would allow you to use only integer operations (for product) and since the processors in TopCoder are 32 bits, this would speed up things. Maybe they intend the solution to be a bit tight.

I spent more hours looking for issues. Until just now, I finally notice: 100006 * 100006 can still overflow!. All this time I skipped the overflow possibility because of a lame assumption. Right now I have no idea why they picked a smaller value than usual for the modulo. If it was 10007, then indeed we could have avoided 64 bits operations. But the 100007 is quite a mystery.

It is a bit discouraging that after all this time I still manage to get problems wrong because of overflow. Such is life, I guess.

Div1 hards cause too many editorial delays

Looking back, most of the time I spend making an editorial is spent trying to solve the div1 hard. Mind you, I am only a yellow, soon to be blue, coder and some of these problems are not even solved by guys like Petr. If the best coders cannot solve a problem, then it is likely I will need quite some time to do it.

We are working in new ways to deliver editorials that might allow the editorial for the easier problems to appear in less than a day (at best) so that at least those problems get delivered faster. Let us see how well this goes in the next SRM.

Saturday, January 12, 2013

TopCoder SRM 566: The road to blue

If you recall, I am going with a "suicidal" strategy, to never open the division 1 easy before there are less than 10 minutes left for the end of coding phase. The results so far are not very surprising. I have the feeling that 250s are lately becoming longer. This is my second 0.00 in a row.

Div1 1000: The one with penguins, a circle and colors

Skipped this one quickly after reading the statement. Seems to be the typical div1 hard that I won't solve until after a couple of hours and with admin help. This one will stop me from finishing the editorial quickly...

Div1 500: The one with a mountain and cities

You have numCities total cities all in the border of a circle. You start at city 0. You make numSteps steps in total. In step 1, you move 1 city clockwise or counterclockwise. In step 2, two cities clockwise or counterclockwise. ... In step k , k cities clockwise or counterclockwise... Count the number of different sequences of numSteps steps such that you end back in city 0 modulo 1000000007.

At first I thought I was gonna to solve this problem. It seemed like a standard matrix multiplication one. Then I noticed that there can be at most 350 cities. That changes things... Let us explain the matrix approach though.

When the step number k is too big, the clockwise or counterclockwise distance between your current city and the new one is (k % numCities). This means that every numCities steps, we basically repeat the same thing. Over and over again.

Let us describe a transition matrix for step 1: For each i, there is one way to move from vertex i to vertex (i+1)%numCities and one way to move from vertex i to vertex (i-1+numCities)%numCities. This works for step k: One way to move from vertex i to (i+k)%numCities and one way to move from i to (i-k+numCities)%numCities. Note that if the two vertices are the same, there is only one way in total (Two paths are different not because of the direction you move but because of the city you move to).

If we multiply together the first numCities matrices. You will get a matrix that represents numCities moves. You can raise this one to the x-th power. Let x be numSteps / numCities. Then you can multiply the result to the product of the first numSteps % numCities matrices. Then this will be the total transition matrix.

The problem is that there are numCities matrix multiplications we have to use. This yields a O(n^4) algorithm and it is too slow for n=350. In fact, n seems to be crafted with this in mind.

The solution is that if you take a look to the matrices, you will see that each element inside the matrix is at most 1, and each row contains at most 2 ones. If you tweak the matrix formula a bit, you can actually do each multiplication in O(n^2) instead of O(n^3). This is very nice and clever...

When I finished coding that optimization I began to have issues. Apparently, my classic matrix exponentiation library might have some seg fault bug that happens when the matrixes are larger than usual. Then I had the other issue that it was timing out, apparently the number of steps is so large that even O( log(numSteps) * numCities ^ 3) is too much.

I thought of the solution for the time out issue when it was too late to code it. It turns out that the product of the numCities matrixes tends to generate very dull matrixes. Containing the same numbers repeated over and over again in the matrix and following some pattern. There is also symmetry in the matrix, so the same row is repeated over and over again, just slided a bit in each new row... I think that you can turn the matrix idea into a vector idea and turn the whole thing O(n^2).

Div1 250: The one with points and crossing

With less than 10 minutes left I open this problem and it turns out to be a bit tough to read. (long).

Ok, so you have at most 50 points. And some segment definitions that connect pairs of the points. You do not know the actual positions of the points. You can remove 0 or more segments. Count the total number of subsets of segments such that it is GUARANTEED that no two segments will ever cross, regardless of the position of the points.

After thinking for a while and looking at the statement images you can make some conclusions:

  • Empty set (no segments at all) is always possible and valid.
  • Whenever there is more than one connected component of segments, it is possible they would cross (take two separated segments and make them cross).
  • When there is only one segment. It is valid.
  • Star patterns are valid. A point that acts as a center and is connected to other points by a segment each.

We can count the pattenrs (single segment, star) that can be made using only the input segments. It is easy to count the segments. The stars are a bit harder: Pick a center point, then count the number of points that are initially connected with that center, each combination (use the binomial) of 2 or more points connected to it can make a star pattern.

I coded that stuff and had a bit of debug issues (again). Some seg faults and stuff that I could only fix after the end of the coding phase. That is when I noticed that the logic was failing the last test case...

It turns out that I forgot of yet another valid pattern: A single polygon of exactly three connected points (A triangle, a cycle of 3, etc). It is impossible to cross this pattern too. So you also pick the number of triples in whcih each pair is connected. This fixes the issue, you pass system tests and all is good

long long countDesigns(int numCheckpoints, vector <int> checkpoint1, vector <int> checkpoint2) 
{
int n = numCheckpoints;

// make an adjacency matrix. The original input format is a bit hazardous:
bool con[n][n] ;
memset(con, 0, sizeof(con));
for (int i=0; i<checkpoint1.size(); i++) {
con[ checkpoint1[i] -1][ checkpoint2[i] -1] = true;
con[ checkpoint2[i] -1][ checkpoint1[i] -1] = true;
}
// Pascal triangle to generate the binomials we will need:
long long C[50][50];
memset(C, 0, sizeof(C));
for (int i=0; i<50; i++) {
C[i][0] = 1;
for (int j=1; j<=i; j++) {
C[i][j] = C[i-1][j] + C[i-1][j-1];
}
}

long long res = 1 + checkpoint1.size(); // the empty set + the number of segments
// all are valid

// Count stars with center i:
for (int i=0; i<n; i++) {
int x = 0;
for (int j= 0; j < n ; j++) {
if (con[i][j]) {
x++;
}
}
for (int y = 2; y <= x; y++) {
res += C[x][y];
}
}
// Count triangles:
for (int i=0; i<n; i++) {
for (int j=i+1; j<n; j++) {
for (int k=j+1; k<n; k++) {
if (con[i][j] && con[j][k] && con[i][k]) {
res++;
}
}
}
}
return res;

}

Conclusion

The problems were good and interesting. My rating will suffer. Any comments?