private string DecodeParameter(string query, string parameterName) {
string valueUtf8 = HttpUtility.ParseQueryString(query, Encoding.UTF8)[parameterName];
const char invalidUtf8Character = (char) 0xFFFD;
if (valueUtf8.Contains(invalidUtf8Character)) {
const int latin1 = 0x6FAF;
var valueLatin1 = HttpUtility.ParseQueryString(query, Encoding.GetEncoding(latin1))[parameterName];
return valueLatin1;
}
return valueUtf8;
}
Refactorings
No refactoring yet !
Ants
November 17, 2010, November 17, 2010 06:11, permalink
I would recommend changing latin1 to decimal form instead of hexadecimal because that is what is more commonly documented in MSDN and other sources.
Personally, I think that the it's already too late when the query string is already in Unicode, because how do you know that the conversion into Unicode was using the correct encoding?
Anyway, here's my approach if the query string is all I've got. I'm not sure about the correctness of still finding the invalid UTF-8 characters in place.
Here's my alternative refactoring:
static readonly char[] invalidUtf8Bytes = new char[] {
'\xC0', '\xC1', // over long encoding
'\xF5', '\xF6', '\xF7', // restricted: start of 4 byte sequence
'\xF8', '\xF9', '\xFA', '\xFB', // restricted: start of 5 byte sequence
'\xFC', '\xFD', // restricted: start of 6 byte sequence
'\xFE', '\xFF' // invalid: not defined by UTF-8 spec
};
private string DecodeParameter(string query, string parameterName)
{
Encoding encoding;
if (query.IndexOfAny(invalidUtf8Bytes) >= 0)
encoding = Encoding.GetEncoding("iso-8859-1");
else
encoding = Encoding.UTF8;
return HttpUtility.ParseQueryString(query, encoding)[parameterName];
}
Firefox encodes entire URL using either Latin-1 (ISO-8859-1) or UTF-8 if URL can not be encoded using Latin-1. This method does decoding in ASP.NET MVC application.