Monday, May 16, 2011

std::pair is good

The editorial for the Topcoder Open 2011 qualifcation round 1 is up, but there is something that I don't like about it. To avoid having to explain off-topic STL features, I didn't make the solution for 1000 as simple as it should have been. The culprit is this function:
// Compares two sequences, returns the better one using the tie-breaking
// describing in the statement.
string better( const string& a, const string &b) {
if (a == "...") {
return b;
} else if (b == "..." ) {
return a;
}
if (a.length() < b.length()) {
return a;
}
if (a.length() > b.length() ) {
return b;
}
if (a < b) {
return a;
} else {
return b;
}
}



Innocent enough eh? But isn't it the most repetitive, boring code you will ever have to type? Everything with non-trivial tie breaking rules involves doing something like that. In this problem's case, you need to tie break between "..." , smaller strings and lexicographically-first ones.

For a long time I was skeptical of std::pair and I didn't use it. Neither did I get why the people in the top kept using it. Let us accept it, vector<pair< pair<int,int>, string > > looks almost like obfuscation - Why not use a struct containing named ints and a string?. I eventually learned what was so great about.

  • 1. STL pairs can be created very easily with make_pair.

  • 2. STL pairs implement the < and > operators, and they do it in a rather standard way - First compare the first element, and if the two first elements are equal, use the second for tie breaking. They also implement == and = in a logical way.

  • As a case of great synergy, std::min works when comparing any type that supports = and <.


Where does this take us? Well, for starters it makes pairs very useful for these over-complicated tie breaking days. The following is a way to implement the better() function:

// Compares two sequences, returns the better one using the tie-breaking
// describing in the statement.
string better( const string& a, const string &b) {
if (a == "...") {
return b;
} else if (b == "..." ) {
return a;
}
return std::min(make_pair(a.length(),a), make_pair(b.length(), b) ).second;
}



Explaining: make_pair(a.length(), a) . Will magically make a pair< string::size_t*, string > that has a.length() as first element and a as second element.

* string::size_t is a fancy way to say unsigned int. It causes a lot of issues that things like .length() is declared as unsigned int instead of just a simple int. So many that perhaps this was the main reason Java does not support unsigned.

std::min, will compare both pair< string::size_t, string >s, if the first elements are equal it will compare the second elements. Then it will return the one pair that is the smallest. In other words, if the lengths of the strings are different, it will return the pair with the smaller string, and if they are equal, it will return the pair with the lexicographically-first string. (Because std::string implements < that does lex-first check).


But there is more, what about dealing with "..." ? The main problem with doing that comparison is that "..." has three characters, when it should be the equivalent to the maximum possible result. So, how about we make "..." a pair that first contains a very high number and second "..." ? Then if we use pair<int,string>s to represent possible results, we would just need to return the second element of the best pair we found. Such as the following code:

const int MAX_SEQUENCE_LENGTH = 199;
const int MAX_SQUARE_SIZE = 149;

typedef pair<int,string> result;
const result IMPOSSIBLE = make_pair(2* MAX_SEQUENCE_LENGTH,string("..."));

struct SquareSeries
{
// We use these tables to memoize the results.
bool visited[MAX_SEQUENCE_LENGTH+1][MAX_SQUARE_SIZE+1][26];
result mem[MAX_SEQUENCE_LENGTH+1][MAX_SQUARE_SIZE+1][26];

// constants
string pattern;
int rightStart, lastLength;

//----------
// Returns a sequence that:
// * starts at position p of the pattern.
// * If the square previous to the sequence had size prevsz and
// color prevc, the last square of the sequence
// will have size = lastLength
//
result rec(int p, int prevsz, char prevc) {
//memoize the results:
result & x = mem[p][prevsz][prevc-'A'];
if ( visited[p][prevsz][prevc-'A'] ) {
return x;
}
visited[p][prevsz][prevc - 'A'] = true;

x = IMPOSSIBLE;

if(p == pattern.size()) {
//The end of the pattern. The last size
// should match the one we want.
x = (prevsz == lastLength )? make_pair(0,string("")) : IMPOSSIBLE;
return x;
}

bool alloww = false, allowb = false;
if (pattern[p] != '?') {
//forced to use pattern[p] color
alloww = (pattern[p]=='W');
allowb = ! alloww;
} else { //?
//we can skip to right side:
x = rec(rightStart, prevsz, prevc);
// Allow any of the colors
alloww = allowb = true;
}
for (char ch = 'B'; ch<='W'; ch+='W'-'B') {
// If not allowed to use a color, don't use it
if ( ((ch=='B') && !allowb) || ((ch=='W') && !alloww) ) {
continue;
}
// Change the size according to last color.
int sz = prevsz + ( (ch!=prevc) ? 1: -1 );
if ( sz == 0 || sz > MAX_SQUARE_SIZE ) {
//Skip when size is invalid.
continue;
}
// Continue the recursion for the next character positions
result y = rec(p+1, sz, ch);
x = std::min(x, make_pair(y.first + 1, ch + y.second) );
}

return x;
}
string completeIt(string pattern, int lastLength)
{
// Find the question mark.
int q = 0;
while ( pattern[q] != '?' ) {
q ++;
}
//Split in left and right parts.
string left = pattern.substr(0,q);
string right = pattern.substr(q+1);

//max size of added sequence:
int qw = MAX_SEQUENCE_LENGTH - left.size() - right.size();

//Position at which the right side starts
rightStart = q + qw;

//The new pattern with extra '?' for each possible
//location of a new character.
this->pattern = left + string(qw,'?') + right;

this->lastLength = lastLength;

//Fill visited with 0s.
memset(visited,0,sizeof(visited));

//Starting at position 0, the last square had size 0 and color 'Y'
//the wanted last size is lastLength.
return rec(0, 0,'Y').second;
}


};


pitfalls
make_pair does the type checking automatically. But the problem is that implicit type casts between different std::pairs do not work correctly (or at all). For example:

string y = "hello" ; // This works fine, "hello" is a const char*
// but const char*s can get typecasted as std::strings thanks to constructors.

pair<int, string> p = make_pair(0, "fail"); // This does not work fine.
// make_pair(0, "fail") believes it has to make a pair<int, const char*>
// pair<int, const char*> does not get casted automatically as
// pair<int, string>.

//Possible alternatives:
pair<int,string> q = make_pair<int, string>(0, "fail"); //explicitely use
//the correct make_pair

pair<int,string> r = make_pair(0, (string)"fail");
//cast the const char* to string before passing it to make_pair

pair<int,string> r = make_pair(0, string("fail") );
//a nicer looking way to cast it...



1 comment :

vexorian said...

Did anyone notice that the pair to pair conversion actually works?


I am sure that when I tested this it didn't work. Anyone know what happened here? Maybe old gcc wasn't able to do that.