Tuesday, December 28, 2010

SRM 492, comments and explanation for div1 250

Before this match, I noticed how it was actually plausible I would reach 2100+ rating this time, and get to see a red sector in my rating graph. But it did not happen.

Div1 250 - TimeTravellingGardener

Once I noticed this problem, I knew it was going to take me a long time.

The idea is actually simple, imagine that at least two of the trees will stay in their position, then we can find two of such trees, and generate a straight line from them. Then we can iterate through all the other trees and see if their tops are already in the line, else verify that their tops can be reduced so that they touch the line. And pick the pair of points that would require the least amount of cuts (time traveling).

But for that, we need to ask "Is a case that leaves less than one tree intact possible?" As a matter of fact, it is, and I only found out about it after coding the two trees solution, example 1) actually has a case in which only one tree is left intact.

There are good news though, if we assume that exactly one tree will stay intact, then we can change all of the other trees' heights. The simplest thing to do in this case is to set these trees' heights to be equal to the tree that will stay intact. In fact, we can just cut all the trees so that their heights become equal to the minimum, and then we will have a straight line that will go through their tops. Since all sequences of heights will have a minimum, we can assume that the result will be n-1 in case it is not possible to pick two trees that form a valid straight line (previous step).

Back to that previous step, the checks needed for this iteration are tricky. Imagine trees i and j were picked to be part of the straight line, and we want to check if tree k belongs to that line. Assume that i < j (without loss of generality). Also assume we have arrays x[] and y[] that hold the top points' (x,y) coordinates for each tree (x would be the position of the tree and y its height). Then we can say that (dx = x[j]-x[i]) and (dy = y[j]-y[j]). We can say that the slope is dy/dx . Then what about (tx = x[k]-x[i]) ? Well, then for the point to belong to the straight line, (ty = x[k]-x[i]) must be equal to tx*(dy/dx)...

Then we have a equation;:
ty = tx*(dy/dx)

It is always recommendable not to use floating point numbers if it is not necessary, many people actually failed system tests because of precision errors, we can change last equation to:

ty * dx = tx * dy

Then we do not need integers for that. If (ty * dx = tx * dy) is true then it is not necessary to cut the tree. Otherwise it is necessary to cut the tree. There are still two things to consider, and this is the part in which I had the most trouble: a) It is not possible to "cut" a tree so that its height becomes larger than its original value. And b) It is not possible to "cut" a tree so that its height becomes negative.


For the first condition, note that it translates into: y[k] >= y[i] + tx*(dy/dx) . Because tx is the difference x[k]-x[i], so tx*(dy/dx) is going to be the wanted difference between the tree's height and y[i].

For the second condition, note that it similarly translates into: 0 <= y[i] + tx*(dy/dx).

We can get rid of floating point calculations by multiplying dx to both inequalities, but note that we have ensured dx is positive, so the directions of the inequalities will not change.

y[k]*dx >= y[i]*dx + tx*dy
(y[k]-y[i])*dx >= tx*dy
ty*dx >= tx*dy

0 <= y[i]*dx + tx*dy

If those two previous conditions are true, then it is possible to cut the tree to match the straight line, else it is not possible, so there is no solution in which both trees i and j stay intact.

Adding things up, we can code a solution...


int determineUsage(vector <int> distance, vector <int> y)
{
int n=y.size();
if(n<=2) {
return 0;
}
///prepare x[], note that we just renamed the height array to y[]
int x[n];
x[0] = 0;
for (int i=1; i<n; i++) {
x[i] = x[i-1] + distance[i-1];
}


int res = n-1;
// It is always possible to leave at least one tree intact, just
// pick the minimum height tree, and make the other trees' heights
// match it.

for (int i=n; i--;) {
for(int j=i+1; j<n; j++) {
//pick two trees i and j, j>i:

int r=n-2; //r is the number of trees we sent back in time...
bool can = true;
for (int k=0; k<n; k++) {
if(k!=i && k!=j) {
int dx = x[j]-x[i];
int dy = y[j]-y[i];
int ty = y[k]-y[i];
int tx = x[k]-x[i];
//The conditions we found...
if(ty*dx == dy*tx ) {
r--;
} else if ( ( ty*dx < tx * dy ) || ( y[i]*dx+tx*dy < 0 ) ) {
can = false;
}
}
}
if(can) {
res = std::min(res,r);
}
}
}
return res;
}


During the match, I noticed most of the aspects needed for the solution early, except the second condition (that trees must not become negative after the time travel) which caused me to lose a long time debugging it.

Once I submitted 250, I barely had time to see the 550 problem, it seemed very interesting, but it is too bad 250 was such a time sink, I just didn't have enough time to think it through.

I ended up losing plenty of rating, but at least it wasn't higher than the amount of rating I won in the previous match.

Sunday, December 19, 2010

Member SRM 491 - Div1 600, PrefixTree

At first it was easy to get misguided by the examples and think that maybe sorting each of words and then building a trie would work, but that is not the case. Even one of the examples actually fails with this solution.

Anyway, once I noticed that was not going to work, I just took a look to the constraints. There can only be at most 16 words in the input, which suggested me that the intended solution is likely exponential.

I think the main trick is to notice the relation between intersection and the optimal solution. For example, with two words "aaaxy" and "yaaz" , there is a intersection "aay" (order does not matter). It should be easy to see that the optimal trie would be formed as long as we make sure that the intersection is a prefix of both words: For example, "aayxa" , "aayz". Note that the order inside the intersection does not matter and neither do the suffixes, for example "ayaax" and "ayaz" will also give optimal prefix trees.

From previous paragraph, we can assume that if we had plenty of words in a trie, and the trie was optimal, then the total intersection of all the words must be a common prefix between them. (Note that such condition is necessary but not sufficient). So, let us for example say that we are merging two different optimal tries of words, each generate from a different list of words. For all words in list A the common prefix will be equal to their intersection and the same is true for list B. We want to merge both tries such that the final trie will be optimal. For this list to be optimal, we would need all of the words in both A and B to have a common prefix equal to the intersection of all words in A and B. This is actually possible, because if A1,A2,...An were the elements of A, and B1,B2,...,Bm the elements of B, then we can assume that the common prefix among A will consist of the characters in (A1 ∩ A2 ∩ ... ∩ An) and the common prefix among elements of B will consists of the characters in (B1 ∩ B2 ∩ ... ∩ Bm). The total intersection is: (A1 ∩ A2 ∩ ... ∩ An ∩ B2 ∩ ... ∩ Bm). The key in here is that (A1 ∩ A2 ∩ ... ∩ An ∩ B2 ∩ ... ∩ Bm) is a subset of (A1 ∩ A2 ∩ ... ∩ An) and also a subset of (B2 ∩ ... ∩ Bm), this is a known fact of set theory: ( (A ∩ B) c B ) .

Because for optimal tries we want the prefixes to consist of the characters in the intersection, but the order does not matter, we can assume that the prefixes of A1,A2,...An all start with the characters that are in (A1 ∩ A2 ∩ ... ∩ An ∩ B2 ∩ ... ∩ Bm). We can do something similar for the elements in B. Then we can say that all the elements in both A and B will have a prefix in common that consists of the characters in set (A1 ∩ A2 ∩ ... ∩ An ∩ B2 ∩ ... ∩ Bm). This common prefix will become a common path in both tries. So, we can combine the tries to form a single one. The size of the merged trie would be: (Size of trie A) + (Size of trie B) - (Size of nodes in common). The size of the nodes in common is equal to the size of (A1 ∩ A2 ∩ ... ∩ An ∩ B2 ∩ ... ∩ Bm) plus 1. (Because all the prefix trees have a root node that represents an empty string).


Did you see that? This means that in order to know the minimum trie size of a trie that combines two tries from lists A and B, we only need the optimal trie size for the words in A, the optimal trie size for the words in B, and the size of the intersection between all words in A and B. So, if we are given a list of words, we just need to somehow split it into two parts A and B, then find the optimal trie sizes for A and B and use that procedure. Correctly guessing which parts A and B to pick is difficult though. But we do not need to find a way to guess it, we can just try all possible pairs A and B that partition the set of words in two parts, and then pick the pair that would yield the minimum size after merging the tries.

The last paragraph implied that we have a recursion. We need a function trieMin() that takes a set of words, and returns its minimum trie size. What it will do is try all subsets A, find the complement B, call trieMin(A) and trieMin(B) to find out the optimal trie sizes of A and B and then pick the minimum total size. You should note that no matter which two complementary sets A and B we pick, the total intersection is always the total intersection among all the words in the set originally given to trieMin.

trieMin will always take a subset of the original words array in the input. That means that if its size was N, then there are at most 2N subsets this function can take. With N=16, we can just use memoization for this function, so it never evaluates the same call twice. Note that A and B will always be smaller than the argument set, so the recursion would be acyclic (which is necessary for the memoization to work).

Inside the trieMin function, we need to iterate through all possible subsets of the given set. If we iterate through all possible subsets of each subset of a set of size N, the total complexity is O(3N). As a quick proof, try counting the number of steps. Begin by counting the number of steps caused by subsets of size 0. That is: C(N,0) * 20 . (Because there are C(N,0) subsets of size 0, and for each of them you'll need 20 steps. For the subsets of size 1, you'll need: C(N,1) * 21, and we can repeat:


C(N,0) * 20 + C(N,1) * 21 + ... C(N,N-1) * 2N-1 + C(N,N) * 2N


Let us just use a little imagination, and add a couple of powers of 1:

C(N,0) * 20 * 1N + C(N,1) * 21 * 1N + ... C(N,N-1) * 2N-1 * 11 + C(N,N) * 2N * 10

If we know anything about The binomial coefficient then we can tell that the previous sum is equal to:

(2 + 1)N = 3 N.

Therefore, the complexity of our algorithm is O(N3) which is good to run in time for N=16. But we still need to actually implement it. First of all, we need a way to find the size of the intersection of the characters in a list of words, note that words may contain the same character more than once, so you need a special data structure to represent a set and the intersection. Then we need to represent a subset of the list of words and also to iterate through all the subsets of a subset of the list. In both cases, bit operations are very useful. We can represent a set by a bitmask integer such that if the i-th bit is 1, the subset includes element i. In order to iterate through all the subsets A of a set, just use the ( i = (i - 1) & superset ) trick explained in bmerry's tutorial . And to get the complementary subset B , just negate A, and intersect it with the original bitmask.

    int n;
//our set structure is simply a size [26] array that holds the count
// of each character in the word.
int swords[16][26];

// The memoization array...
int mem[1<<16];
// Returns the optimal trie size for the subset of
// words represented by the bits in mask.
int trieMin(int mask) {
int & res = mem[mask];
if(res==-1) {
res = 851; //largest trie size is smaller than 17*50

//get intersection size...
int inters[26];
fill(inters,inters+26,50);
int t=0;
for (int i=0; i<n; i++) {
if( mask&(1<<i)) {
for (int j=0; j<26; j++) {
inters[j] = std::min(inters[j], swords[i][j]);
}
t++;
}
}
// (to intersect two sets, just get the minimum count for
// each character). The sum of these counts is the size
// of the intersection.
int isize = accumulate(inters,inters+26,0)+1;
// +1 because of the root node which also belongs to the
// intersection.

if(t==1) {
// only one word, its size is the optimal trie size.
res = isize;
} else {
// Try all possible subsets and pair them with their
// complements.
for (int sub=mask-1; sub>0; sub=(sub-1)&mask) {
int nsub = mask &(~sub);
int A = trieMin(sub); //optimal trie size for the subset
int B = trieMin(nsub); //... for its complement.

// total size when merging these tries is A+B - isize.
// keep the minimum one.
res = std::min(res, A+B - isize);
}
}
}
return res;
}
int getNumNodes(vector <string> words)
{
n = words.size();
//convert words to the set format:
for(int i=0; i<words.size(); i++) {
fill(swords[i],swords[i]+26,0);
for (int j=0; j<words[i].size(); j++) {
swords[i][words[i][j]-'a']++;
}
}
// initialize the mem array.
memset(mem,-1,sizeof(mem));

// recurse...
return trieMin( (1<<n)-1 );
}

Saturday, December 18, 2010

Member SRM 491 commentary and explanation for the div1 easy problem.

FoxMakingDice
The 250, 600, 900 distribution was threatening this match to repeat the pattern of SRM 490. So I knew I had to be fast in this problem, and tried hard to do it... It didn't work.

Ok... so we always have 6 faces in the dice. We want to count the number of ways to assign the faces such that the sum of all opposite faces is equal and greater than or equal to K and the faces are different numbers from 1 to N.

Err, wait, how are two dices different exactly? The statement said that they are equal if you can rotate one to become the other. Well, that was a little confusing to me. What really helped was to notice that the first example had N=6, K=7 (normal 6 faces dice) and the result was 2.

Why two? Well, there are 3 pairs of numbers that give 7 as result, so there are C(3,3)=1 ways to pick the pairs, then according to word of God, there are two ways to assign these pairs to the dice. I decided to trust the problem setter on this.

So with fixed sum, we can count the number t of available pairs that give such sum, then 2*C(t, 3) is the number of ways to have dice that have opposite faces that sum sum.

Now we can just iterate from sum=K to 2*N (minimum and maximum sum value possible), and add up the totals.

Finding the number of pairs is easy, we can just iterate starting from x=1 , the opposite face must be (sum-x), so (sum-x > x) (because if that was not the case, we already counted this pair) and also (1 <= sum-x <= N). We just need to count the total values of x that follow those runs. This takes O(N) steps. The iteration for the sum value takes another O(N) steps. So the total complexity is O(N*N). The solution is then "simple" , except for calculating C(t, 3) . For some silly reason I used Pascal's triangle. Which introduced a silly bug, I had a [2001][3] array when I needed a [2001][4] array. Lame lame. Instead, it is better to just use t*(t-1)*(t-2) / 6 (that's what you get from manually calculating C(t,3).

long long theCount(int N, int K)
{
long long ways = 0;
//iterate through all possible face sums:
for (int s=K; s<=N+N; s++) {
long long t=0;
//count the number of face pairs that yield s as sum:
for (int x=1; (s-x>x) && (x<=N); x++) {
if(s-x<=N) {
t++;
}
}
if(t>=3) {
// add C(t,3)*2 to the result, it neatly translates to:
ways += (t*(t-1)*(t-2)) / 3;
}
}
return ways;
}




PrefixTree
I was wrong, unlike SRM 490, this medium turned out to be approachable without mass implementation issues. It was also in my opinion, beautiful , will elaborate later.
Update: Explanation for div1 600


Hard problem (whatever the name)
I had enough time to open it, but was clueless. During the challenge phase I found out it could have been used with min-cost max flow and binary search, that's crazy.

Challenge phase
This should underline neal_wu 's awesomeness. Before the challenge phase, I noticed there was a 600-pointer submission by a blue coder. So I rushed to open it quickly at the start of the challenge phase, and just as I finished reading the first three lines, it was already challenged by neal_wu. He later said he quickly noticed the guy wasn't picking all the subsets so he just gave it a random large challenge.

Rating and issues
Well, I had a chance to see for a second what my rating would become if this match is rated: 2035 ! That would mean I recovered from the losses of last match. Unfortunately, it appears that it won't be rated due to issues during the challenge phase :(.

Thursday, December 16, 2010

The latest topcoder editorials I've written

Hello all (I wonder if anybody actually reads this blog, I have made a good effort not to post links to it or talk about it in public). I have decided to rebrand this blog. Its name is now Doomed to debugging and its focus will be about programming and computers but will try to use it for just stuff related to me learning programming and try as hard as possible to keep opinions and rants outside. (I cannot give guarantees though).

I could not help but notice that this blog has had no entries since October. As much as I would like to say I was busy, I really wasn't. Though lately and more than usual I have been attempting to write editorials for TopCoder matches. Let us try all that has happened since October.

TCO Semifinals and wild card rounds.
This was fun. In total, I had to write explanations for 7 problems (The explanations for 2 problems were already done). All of which were of semifinal level (The TCO is a world wide tournament). I have started to think that the editorial writer is the one that has to do the dirty work. You know, the problem setter has to think of clever problems. The testers have to find flaws in them and the director has to decide what problem is appropriate and what not. Who's left? The editorial writer! The guy that has to actually make an explanation for the problems that people have to be able to understand...

I'll have to admit, writing the editorial for the semifinals and wild card rounds felt like less work than usual. Usually, when writing the editorial for a SRM, I spend most of the time actually trying to solve the problems, which is not easy at all. It is in fact nigh impossible sometimes. This time, I had quick explanations from the problem writers and also the most helpful bits of text that Petr Mitrichev had in his own blog. Actually, after reading the stuff Petr wrote, I couldn't help but feel like a third wheel. It was not as clear to me as to why was it needed for me to make much longer versions of what Petr said and turn it into three editorials...

Another thing that made it easier than a usual SRM's editorial was that I did not have to write the match summaries - those paragraphs in the top of each editorial that supposedly explain what happened during the match - They are incredibly hard for me to write.

TCO 2010 Semifinal round 1 (Explanation for the easy problem was written by Ivan Metelsky)
TCO 2010 Semifinal round 2 (Explanation for the medium problem was written by Ivan Metelsky)
TCO 2010 wildcard round

Please notes that the editorials I write are subsequently posted to a wiki which every TC member can edit, so anything that reassembles correct English probably came from a helpful editor and not me.

The most difficulty I had while writing those editorials was actually the hard problem in the wildcard round. I was actually unable to make it run in time in Java, and I ran out of time to work on a correct version. But it was supposed to work well in theory and it implements the correct ideas. The second issue I had was with the Semifinal 2 hard problem, until that day I have had little to no experience with range trees.


SRM editorials
SRM 487
SRM 488
SRM 490

I have broken a record and written three SRM editorials in a row! I have to make a clarification, the reason I get to write editorials so frequently is not exactly because of the score they get in the feedback post that usually accompanies editorials when they are posted. The reason is actually that I am usually available to write editorials when other approved editorial writers are not. Of course, the positive feedback does help and I guess it would be possible for me to lose my approved editorial writer position if I get consistently bad feedback. I must confess I am usually shocked by the good feedback my editorials receive :).

Writing editorials for SRMs is very hard, because you must actually understand the solutions for the problems before being able to explain them, and SRM problems have gotten very hard lately. So I spend most of the time actually attempting to solve the problems, else I have to ask the contest director for help or see if there are useful hints at the forums. Some behind the scenes:

SRM 487: I actually reverse engineered the division 1000 hard problem's solution from what the source code of the top placed coders. During the match I tried to solve this problem and was elaborating on many approaches that were wrong. This match was very enjoyable to me both as a coder and as the editorial writer. Specially the graph coloring problem was just great.

SRM 488: This SRM... I must say that I seriously ran out of time when writing this editorial, because the division 2 and 1 hard problems had intended solutions I was not able to understand. At the end things turned out right, I think.

SRM 490: I hated this SRM while I was participating in it. By the time I opened the 250 it was pretty clear to me that it was going to be yet another mathy problem that was going to take ages for me to solve, and I already knew the medium level problem was worth 550 points which probably meant it was out of my league. At the end I ended up getting a very low score in the 250 problem and I had almost no time to solve the 550 problem. I was right on the preliminary solving idea for the 550 but was hours of debugging away from solving it. I lost many rating points thanks to this SRM and I once again failed to maintain a 2000+ rating for more than one match.

Writing the editorial for SRM 489 actually greatly improved my opinion on it. The 250 actually makes sense once you managed to picture how it works and explain it in text. The 550 was a very hard to implement problem but it was the kind of problem that rewarded you for thinking before implementing. But the real savior was the 1000 pointer. I actually did not even open it during the match, that was a mistake. The 1000 pointer was a maze one (I love maze problems) and it was very interesting AND it had something similar to a linear recurrence... It has many elements I love. Though TopCoder's contest system tends to sometimes be excessively punitive of slow submissions.

Oh, and I like this editorial because I managed to finish it even though I had to study and go to a "Group work and mnemonics 5" ... errr I mean "Software engineering" final exam during the first 8 hours of the deadline. At the end things turned out right.

SRM next Saturday
Member SRM 491 is set for next Saturday. Note that a large group of people are entering holiday and vacation periods, plus Saturday matches tend to be very active. I have the feeling this SRM will have many and many contestants, so I hope the problem set is fun and I also hope I recover my 2000+ rating.

That's how this entry ends. I'll return to debugging err... programming some new features for Xye.