Sunday, March 11, 2012

Codeforces: VK Cup round 1: Problem C: Abracadabra

This problem haunted me all day and I kept solving it inside my head without really wanting to. At the end, the solution I thought of all day was sort of incomplete, but a little dirty trick can make it pass. Later I saw some other codes and finally noticed the hard reality...

The divide and conquer
First of all, let us say that we care about the string present after a number of steps equal to step (30 for the string we want to solve). We can tell the size of the string (t = 2^step - 1). We can know the position of them middle character m = t/2 + 1. Please note that due to how the string is constructed, the middle character is unique - this character does not appear in any other position of the string at this number of steps.

In fact, that's really all we need to know how to use divide and conquer in this problem. Let us say that the result greatest common substring contains this middle character, then it is forcefully the overlap between the two intervals. If the best substring is not the overlap between the two intervals, then it won't contain the middle character.

In fact, we can split each of the intervals into two maximal parts (possibly empty) that do not contain the middle character. And each of these four parts will match an interval in the string for step # (step - 1). (The positions of the intervals that appear after the middle, shall get the middle position subtracted).

The result is thus one of five possibilities: It can be the overlap of the two intervals. Or it can be the largest substring (same sub-problem) between one of the four possible pairs between strings that come from the first interval and strings that come from the second.

This solution as is has a lot of branching. Worst case may call the divide and conquer recurrence 4^30 times.

Dirty tricks
First of all, we will handle trivial cases such as when one of the intervals is empty. The result is obviously always 0 in those cases.

But there are ways to optimize that original 4^30 idea. What I did was to make sure that it is 2^30, avoiding to branch the recurrence more than two times per instance, this requires some tricks like not doing the split when both intervals begin at the first position or end at the last (It can be proven the overlap is the best result in this case). Even at 2^30, it is too slow, the second dirty trick was to do an array mem[80][80][80][80], and if both extremes are less than 80, memoize the results of the recurrence. This will make the number of calls from 2^30 to 2^30 / (80^4) + 80^4. This is enough to solve the problem.

The better solution
A smarter solution involves noticing, what if one interval is completely inside the other? In this case, the overlap is 100% surely the best answer (And it is actually equal to the size of the smallest string). It is easy to know that this will give a correct answer. It is slightly more complicated to notice that this little optimization will make the algorithm run very fast. In fact, the maximum number of branches is 2, so the result is O(step). Really.

As a slight proof, let us think of a pair of intervals such that one is not a subset of the other.

1) If neither interval touches the middle letter, the only pair that is not trivial (does not have an empty string) will be one that matches the two intervals again but in the previous step. No branching.
2) If both intervals touch the middle letter, there will be branching. Yes. But 2 of the pairs of intervals generated will be cases in which one interval is a subset of the other. So the case will branch 2 times, and both times the case will an interval that begins at position 1 with another interval that ends at position t.
3) Handling Case 2 will either result in a sub-problem in which one interval is a subset or the other, or in Case 2 again. Try it.

int solve(int l1, int r1, int l2, int r2, int step=30) 
{
if (l1 > r1) {
return 0;
}
if (l2 > r2) {
return 0;
}
//overlap
int ovl = max(l1, l2);
int ovr = min(r1, r2);
int overlap = ( (ovr >= ovl)? (ovr - ovl + 1) : 0 );

int t = (1<<step) - 1;

int m = t/2 + 1; //the middle

// should we continue in depth? (Is the result not necessarily the overlap?)
if ( (l1 <= l2 && r1 >= r2) || (l2 <= l1 && r2 >= r1)) { //one is a subset of the other
return overlap;
}
// split each interval in two halves that don't touch the middle.
int l[4] = { min(l1, m) , max(m+1, l1) - m, min(l2, m) , max(m+1, l2) - m};
int r[4] = { min(r1, m-1), max(m, r1) - m, min(r2, m-1), max(m, r2) - m};
int res = overlap;
// try each pair of interval.
for (int i=0; i<2; i++) {
for (int j=2; j<4; j++) {
res = max(res, solve(l[i],r[i], l[j], r[j], step-1) );
}
}
return res;

}

1 comment :

Praveendhinwa said...

thank you so much vexorian , I really keep waiting for your post and your editorial in Topcoder ...
Thank you very much for sharing this ...