<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
  <id>tag:www.refactormycode.com,2007:users611friends</id>
  <link type="application/atom+xml" href="http://www.refactormycode.com/users/611/friends" rel="self"/>
  <title>Chris Jester-Young friends</title>
  <updated>Mon Nov 16 04:25:38 -0800 2009</updated>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor361815</id>
    <published>2009-11-16T04:25:38-08:00</published>
    <title>[C#] On Strip Html Comments</title>
    <content type="html">&lt;p&gt;That &amp;quot;invalid HTML tag&amp;quot; case is really tough, a real edge condition. If you discard that one, it's pretty easy. I'd recommend doing a first pass to remove all invalid HTML tags, first. Like.. er.. &amp;lt;&#195;&#167;123&amp;gt;.&lt;/p&gt;

&lt;pre&gt;    public static string StripHtmlComments(string html)
    {
        if (html == null)
        {
            throw new ArgumentNullException(&amp;quot;html&amp;quot;);
        }

        if (html.IndexOf(&amp;quot;&amp;lt;!&amp;quot;, StringComparison.Ordinal) &amp;lt; 0)
        {
            return html;
        }

        return Regex.Replace(html, &amp;quot;(?&amp;lt;!='|=\&amp;quot;|=)&amp;lt;![^&amp;gt;]+&amp;gt;&amp;quot;, &amp;quot;&amp;quot;);
    }&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/597-strip-html-comments/refactors/361815" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor14126</id>
    <published>2008-07-30T20:45:34-07:00</published>
    <title>[C#] On Balance HTML Tags</title>
    <content type="html">&lt;p&gt;&amp;gt; I would worry that you're using the wrong tool for the job, and focusing too much on micro performance increases.&lt;/p&gt;

&lt;p&gt;Well, to play Devil's Advocate: I would worry that you're writing far too much code, and overengineering a solution with too many complex, unnecessary external dependencies. :)&lt;/p&gt;

&lt;pre&gt;&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/360-balance-html-tags/refactors/14126" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor14125</id>
    <published>2008-07-30T20:43:13-07:00</published>
    <title>[C#] On Balance HTML Tags</title>
    <content type="html">&lt;p&gt;&amp;gt; Tags will not get balanced if 'ignore' tags and regular tags are incorrectly nested.&lt;/p&gt;

&lt;p&gt;This is by design; I don't view incorrect nesting as a problem that would prevent page rendering, like an unclosed &amp;lt;div&amp;gt; or &amp;lt;td&amp;gt; would. Am I wrong?&lt;/p&gt;

&lt;pre&gt;&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/360-balance-html-tags/refactors/14125" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor13028</id>
    <published>2008-07-13T07:37:40-07:00</published>
    <title>[C#] On Balance HTML Tags</title>
    <content type="html">&lt;p&gt;It doesn't break Corbin's code, it breaks mine. The point was that a single concatenation is fairly cheap, to the tune of 15ms over 4,000 iterations.&lt;/p&gt;

&lt;p&gt;In general, yes, concatenation is expensive. But one per loop isn't really significant in my benchmarking. As to why the List&amp;lt;&amp;gt;.Contains is 25% slower than String.Contains, I dunno. But it is..&lt;/p&gt;

&lt;pre&gt;&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/360-balance-html-tags/refactors/13028" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor12965</id>
    <published>2008-07-12T00:46:05-07:00</published>
    <title>[C#] On Balance HTML Tags</title>
    <content type="html">&lt;p&gt;&amp;gt; Another problem is the concatenation (&amp;quot;&amp;lt;&amp;quot; + tagname). Doing that every iteration adds up, too.&lt;/p&gt;

&lt;p&gt;The difference is fairly small.. even if I (erroneously) use just the tagname, no concatenation, improvement is marginal. Result of 4,000 iterations:&lt;/p&gt;

&lt;p&gt;tagname (broken!!)&lt;/p&gt;

&lt;p&gt;543 ms
&lt;br /&gt;546 ms
&lt;br /&gt;534 ms&lt;/p&gt;

&lt;p&gt;&amp;quot;&amp;lt;&amp;quot; + tagname&lt;/p&gt;

&lt;p&gt;569 ms
&lt;br /&gt;558
&lt;br /&gt;551&lt;/p&gt;

&lt;p&gt;String.Concat(&amp;quot;&amp;lt;&amp;quot;, tagname)&lt;/p&gt;

&lt;p&gt;581 ms
&lt;br /&gt;555
&lt;br /&gt;551&lt;/p&gt;

&lt;pre&gt;// ** DO NOT USE **
// THIS IS BAD BROKEN CODE, ONLY DISPLAYED FOR BENCHMARKING PURPOSES!
if (!tagpaired[i] &amp;amp;&amp;amp; !ignoredtags.Contains(tagname))

// as coded
if (!tagpaired[i] &amp;amp;&amp;amp; !ignoredtags.Contains(&amp;quot;&amp;lt;&amp;quot; + tagname))

// alternate
if (!tagpaired[i] &amp;amp;&amp;amp; !ignoredtags.Contains(String.Concat(&amp;quot;&amp;lt;&amp;quot;, tagname)))

&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/360-balance-html-tags/refactors/12965" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor12964</id>
    <published>2008-07-12T00:39:12-07:00</published>
    <title>[C#] On Balance HTML Tags</title>
    <content type="html">&lt;p&gt;&amp;gt; I would think that a list's Contains method should be at least as fast to evaluate, depending on the algorithm that String.Contains uses.&lt;/p&gt;

&lt;p&gt;And yet it is not. Result of 4,000 iterations:&lt;/p&gt;

&lt;p&gt;use List&amp;lt;string&amp;gt; for ignored tags&lt;/p&gt;

&lt;p&gt;650 ms
&lt;br /&gt;641
&lt;br /&gt;635&lt;/p&gt;

&lt;p&gt;use String.Contains for ignored tags&lt;/p&gt;

&lt;p&gt;585 ms
&lt;br /&gt;565
&lt;br /&gt;549&lt;/p&gt;

&lt;pre&gt;var ignoredtags = new List&amp;lt;String&amp;gt; { &amp;quot;p&amp;quot;, &amp;quot;img&amp;quot;, &amp;quot;br&amp;quot; };

if (!tagpaired[i] &amp;amp;&amp;amp; !ignoredtags.Contains(tagname))
{  &lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/360-balance-html-tags/refactors/12964" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor12962</id>
    <published>2008-07-12T00:30:52-07:00</published>
    <title>[C#] On Balance HTML Tags</title>
    <content type="html">&lt;p&gt;Wow, Corbin, thank you. That's really nice -- it's even faster, and does more! Testing with 4,000 iterations on medium sized input:&lt;/p&gt;

&lt;p&gt;my BalanceTags:&lt;/p&gt;

&lt;p&gt;569 ms
&lt;br /&gt;549
&lt;br /&gt;549&lt;/p&gt;

&lt;p&gt;your (improved) BalanceTags&lt;/p&gt;

&lt;p&gt;566 ms
&lt;br /&gt;537
&lt;br /&gt;524&lt;/p&gt;

&lt;p&gt;Ah, yes, a List of Matches eg List&amp;lt;Match&amp;gt; ! Why didn't I think of that.. &lt;/p&gt;

&lt;pre&gt;&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/360-balance-html-tags/refactors/12962" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor12921</id>
    <published>2008-07-11T10:44:23-07:00</published>
    <title>[C#] On Balance HTML Tags</title>
    <content type="html">&lt;p&gt;&amp;gt; What's wrong with a list or set here?&lt;/p&gt;

&lt;p&gt;Much, much, MUCH slower in my benchmarking. The simple String.Contains() test is extremely fast to evaluate, which is an issue when we're calling it in a loop.&lt;/p&gt;

&lt;p&gt;&amp;gt; Example that actually fixes up the tags.&lt;/p&gt;

&lt;p&gt;The agility pack is 4x slower than this routine in my testing -- result of 4,000 runs on medium sized input:&lt;/p&gt;

&lt;p&gt;Regex based Balance
&lt;br /&gt;563
&lt;br /&gt;560
&lt;br /&gt;555&lt;/p&gt;

&lt;p&gt;HtmlAgilityPack balance
&lt;br /&gt;2048
&lt;br /&gt;1990
&lt;br /&gt;1974&lt;/p&gt;

&lt;p&gt;Also, HtmlAgilityPack may fix up orphaned &amp;lt;begin&amp;gt; tags but it strips out orphaned &amp;lt;/end&amp;gt; tags just like I do..&lt;/p&gt;

&lt;p&gt;I need to add the &amp;lt;li&amp;gt; elements to the ignore list as well; those don't need to be closed. I know I never close mine..
&lt;/p&gt;

&lt;pre&gt;&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/360-balance-html-tags/refactors/12921" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor12919</id>
    <published>2008-07-11T10:28:06-07:00</published>
    <title>[C#] On Sanitize HTML</title>
    <content type="html">&lt;p&gt;&amp;gt; your overall approach is more like a blacklist: look inside this string for &amp;quot;bad&amp;quot; stuff and erase it&lt;/p&gt;

&lt;p&gt;I see what you mean vs. HTMLEncoding() the whole thing, and then writing regular expressions to replace the escaped whitelisted entities with valid HTML. I just never thought of it this way.&lt;/p&gt;

&lt;p&gt;The other advantage of that approach is that content is not deleted.&lt;/p&gt;

&lt;pre&gt;&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/333-sanitize-html/refactors/12919" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor12905</id>
    <published>2008-07-11T09:25:44-07:00</published>
    <title>[C#] On Balance HTML Tags</title>
    <content type="html">&lt;p&gt;&amp;gt; p isn't self closing&lt;/p&gt;

&lt;p&gt;Well, it should be. :) We have a lot of data with unclosed &amp;lt;p&amp;gt; tags for testing. Also, since I am using the ban-hammer of REMOVAL, it's not a fun tag to make mistakes with for people who enter a lot of unclosed &amp;lt;p&amp;gt; tags and have JavaScript disabled, so they can input anything in our editor textbox.&lt;/p&gt;

&lt;p&gt;&amp;gt; iterating over a list in reverse is plain ugly (because you can't use 'foreach').&lt;/p&gt;

&lt;p&gt;That's necessary so we can continue to manipulate the string while the size is changing -- otherwise it's a PITA when we go forward; the original tag match locations downstream of each replace are no longer valid and must be adjusted with an offset for every deletion.&lt;/p&gt;

&lt;pre&gt;&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/360-balance-html-tags/refactors/12905" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Code360</id>
    <published>2008-07-11T08:40:17-07:00</published>
    <updated>2011-05-09T00:30:26-07:00</updated>
    <title>[C#] Balance HTML Tags</title>
    <content type="html">&lt;p&gt;For the subset of HTML tags we care about, this function ensures that all tags in the input HTML are balanced (but not correctly nested, necessarily) -- it does this by *removing* any extra opening or closing tags it finds in the HTML string. Note: this routine is NOT designed to be a XSS sanitizer! It assumes it is running on safe, pre-sanitized HTML.&lt;/p&gt;

&lt;p&gt;UPDATED: 2009-11-15 small bugfix with nested tags; 30% performance improvement&lt;/p&gt;

&lt;pre&gt;private static Regex _namedtags = new Regex
    (@&amp;quot;&amp;lt;/?(?&amp;lt;tagname&amp;gt;\w+)[^&amp;gt;]*(\s|$|&amp;gt;)&amp;quot;,
    RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled);

/// &amp;lt;summary&amp;gt;
/// attempt to balance HTML tags in the html string
/// by removing any unmatched opening or closing tags
/// IMPORTANT: we *assume* HTML has *already* been 
/// sanitized and is safe/sane before balancing!
/// 
/// CODESNIPPET: A8591DBA-D1D3-11DE-947C-BA5556D89593
/// &amp;lt;/summary&amp;gt;
public static string BalanceTags(string html)
{
    if (String.IsNullOrEmpty(html)) return html;

    // convert everything to lower case; this makes
    // our case insensitive comparisons easier
    MatchCollection tags = _namedtags.Matches(html.ToLowerInvariant());

    // no HTML tags present? nothing to do; exit now
    int tagcount = tags.Count;
    if (tagcount == 0) return html;

    string tagname;
    string tag;
    const string ignoredtags = &amp;quot;&amp;lt;p&amp;gt;&amp;lt;img&amp;gt;&amp;lt;br&amp;gt;&amp;lt;li&amp;gt;&amp;lt;hr&amp;gt;&amp;quot;;
    int match;
    var tagpaired = new bool[tagcount];
    var tagremove = new bool[tagcount];

    // loop through matched tags in forward order
    for (int ctag = 0; ctag &amp;lt; tagcount; ctag++)
    {
        tagname = tags[ctag].Groups[&amp;quot;tagname&amp;quot;].Value;

        // skip any already paired tags
        // and skip tags in our ignore list; assume they're self-closed
        if (tagpaired[ctag] || ignoredtags.Contains(&amp;quot;&amp;lt;&amp;quot; + tagname + &amp;quot;&amp;gt;&amp;quot;))
            continue;

        tag = tags[ctag].Value;
        match = -1;

        if (tag.StartsWith(&amp;quot;&amp;lt;/&amp;quot;))
        {
            // this is a closing tag
            // search backwards (previous tags), look for opening tags
            for (int ptag = ctag - 1; ptag &amp;gt;= 0; ptag--)
            {
                string prevtag = tags[ptag].Value;
                if (!tagpaired[ptag] &amp;amp;&amp;amp; prevtag.Equals(&amp;quot;&amp;lt;&amp;quot; + tagname, StringComparison.InvariantCulture))
                {
                    // minor optimization; we do a simple possibly incorrect match above
                    // the start tag must be &amp;lt;tag&amp;gt; or &amp;lt;tag{space} to match
                    if (prevtag.StartsWith(&amp;quot;&amp;lt;&amp;quot; + tagname + &amp;quot;&amp;gt;&amp;quot;) || prevtag.StartsWith(&amp;quot;&amp;lt;&amp;quot; + tagname + &amp;quot; &amp;quot;))
                    {
                        match = ptag;
                        break;
                    }
                }
            }
        }
        else
        {
            // this is an opening tag
            // search forwards (next tags), look for closing tags
            for (int ntag = ctag + 1; ntag &amp;lt; tagcount; ntag++)
            {
                if (!tagpaired[ntag] &amp;amp;&amp;amp; tags[ntag].Value.Equals(&amp;quot;&amp;lt;/&amp;quot; + tagname + &amp;quot;&amp;gt;&amp;quot;, StringComparison.InvariantCulture))
                {
                    match = ntag;
                    break;
                }
            }
        }

        // we tried, regardless, if we got this far
        tagpaired[ctag] = true;
        if (match == -1)
            tagremove[ctag] = true; // mark for removal
        else
            tagpaired[match] = true; // mark paired
    }

    // loop through tags again, this time in reverse order
    // so we can safely delete all orphaned tags from the string
    for (int ctag = tagcount - 1; ctag &amp;gt;= 0; ctag--)
    {
        if (tagremove[ctag])
        {
            html = html.Remove(tags[ctag].Index, tags[ctag].Length);
            System.Diagnostics.Debug.WriteLine(&amp;quot;unbalanced tag removed: &amp;quot; + tags[ctag]);
        }
    }

    return html;
}&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/360-balance-html-tags" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor11424</id>
    <published>2008-06-22T02:29:47-07:00</published>
    <title>[C#] On Sanitize HTML</title>
    <content type="html">&lt;p&gt;&amp;gt; I'm just guessing; since you write C#, I can't actually run your code. I'm supposing something like this will happen:&lt;/p&gt;

&lt;p&gt;Well, you don't need C#, you could use a regex testing tool of your choice to see what actually happens. Remember, our intial tag match is &amp;quot;&amp;lt;[^&amp;gt;]*?(&amp;gt;|$)&amp;quot;&lt;/p&gt;

&lt;p&gt;&amp;quot;&amp;lt;foo&amp;lt;bar&amp;gt;&amp;lt;scr&amp;lt;bar&amp;gt;ipt/&amp;gt;&amp;quot;&lt;/p&gt;

&lt;p&gt;remove first &amp;lt;foo&amp;lt;bar&amp;gt; ==&amp;gt; &amp;quot;&amp;lt;scr&amp;lt;bar&amp;gt;ipt/&amp;gt;&amp;quot;
&lt;br /&gt;remove first &amp;lt;scr&amp;lt;bar&amp;gt; ==&amp;gt; &amp;quot;ipt/&amp;gt;&amp;quot;&lt;/p&gt;

&lt;p&gt;resulting string is &amp;quot;ipt/&amp;gt;&amp;quot;.&lt;/p&gt;

&lt;p&gt;(and anyway, &amp;quot;first match&amp;quot; is moot; I added 2 lines of code so it actually replaces the correct location for each match now, to prevent any possibility of unwanted side-effect IndexOf() matches.)&lt;/p&gt;

&lt;pre&gt;&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/333-sanitize-html/refactors/11424" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor11386</id>
    <published>2008-06-21T16:35:49-07:00</published>
    <title>[C#] On Sanitize HTML</title>
    <content type="html">&lt;p&gt;&amp;gt; but what if somebody copies and pastes from the web&lt;/p&gt;

&lt;p&gt;Nick, I don't think that's the use case here. &lt;/p&gt;

&lt;p&gt;1) If you are pasting a code sample it'll be inside a &amp;lt;pre&amp;gt;&amp;lt;code&amp;gt; block (in the WMD editor, highlight the code snippet and use CTRL+K or press the &amp;quot;code&amp;quot; toolbar button) and thus fully escaped, so it'll appear as-is.&lt;/p&gt;

&lt;p&gt;2) If you are composing question or answer text, you'd probably paste from the textual content of the web page, not the view source.&lt;/p&gt;

&lt;p&gt;&amp;gt; By replacing the first match you introduce another class of (more subtle) vulnerabilities&lt;/p&gt;

&lt;p&gt;You're implying that a later bad tag would somehow appear earlier in the string as a substring, but I cannot for the life of me come up with an example of that working. If a tag is contained within another tag, it'll get stripped by the new tag match that uses either &amp;gt; or $ (end of text) as the end match.&lt;/p&gt;

&lt;p&gt;Can you provide an example of this that works, given the new tag match regex? I can't come up with one.&lt;/p&gt;

&lt;pre&gt;&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/333-sanitize-html/refactors/11386" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Refactor11342</id>
    <published>2008-06-21T04:48:35-07:00</published>
    <title>[C#] On Sanitize HTML</title>
    <content type="html">&lt;p&gt;Hi Chris -- those are both *excellent* points, and I verified both. Working on it.. the second one is fairly easy (just replace the one instance) but the unclosed tag thing is rough. Suggestions?&lt;/p&gt;

&lt;pre&gt;&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/333-sanitize-html/refactors/11342" rel="alternate"/>
  </entry>
  <entry>
    <id>tag:www.refactormycode.com,2007:Code333</id>
    <published>2008-06-20T08:24:46-07:00</published>
    <updated>2012-01-18T20:24:54-08:00</updated>
    <title>[C#] Sanitize HTML</title>
    <content type="html">&lt;p&gt;Takes a provided HTML string and removes any potentially dangerous XSS HTML tags using a whitelist approach. Useful when you want to allow a small subset of &amp;quot;safe&amp;quot; HTML tags in user content.&lt;/p&gt;

&lt;p&gt;UPDATED: July 11th to reflect all refactorings, plus optimizing for speed. Now 2x faster!
&lt;br /&gt;UPDATED: Sept 1st, bugfixes
&lt;br /&gt;UPDATED: May 14th, simplify and improve whitelist strictness&lt;/p&gt;

&lt;pre&gt;private static Regex _tags = new Regex(&amp;quot;&amp;lt;[^&amp;gt;]*(&amp;gt;|$)&amp;quot;,
    RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled);
private static Regex _whitelist = new Regex(@&amp;quot;
    ^&amp;lt;/?(b(lockquote)?|code|d(d|t|l|el)|em|h(1|2|3)|i|kbd|li|ol|p(re)?|s(ub|up|trong|trike)?|ul)&amp;gt;$|
    ^&amp;lt;(b|h)r\s?/?&amp;gt;$&amp;quot;,
    RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
private static Regex _whitelist_a = new Regex(@&amp;quot;
    ^&amp;lt;a\s
    href=&amp;quot;&amp;quot;(\#\d+|(https?|ftp)://[-a-z0-9+&amp;amp;@#/%?=~_|!:,.;\(\)]+)&amp;quot;&amp;quot;
    (\stitle=&amp;quot;&amp;quot;[^&amp;quot;&amp;quot;&amp;lt;&amp;gt;]+&amp;quot;&amp;quot;)?\s?&amp;gt;$|
    ^&amp;lt;/a&amp;gt;$&amp;quot;,
    RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
private static Regex _whitelist_img = new Regex(@&amp;quot;
    ^&amp;lt;img\s
    src=&amp;quot;&amp;quot;https?://[-a-z0-9+&amp;amp;@#/%?=~_|!:,.;\(\)]+&amp;quot;&amp;quot;
    (\swidth=&amp;quot;&amp;quot;\d{1,3}&amp;quot;&amp;quot;)?
    (\sheight=&amp;quot;&amp;quot;\d{1,3}&amp;quot;&amp;quot;)?
    (\salt=&amp;quot;&amp;quot;[^&amp;quot;&amp;quot;&amp;lt;&amp;gt;]*&amp;quot;&amp;quot;)?
    (\stitle=&amp;quot;&amp;quot;[^&amp;quot;&amp;quot;&amp;lt;&amp;gt;]*&amp;quot;&amp;quot;)?
    \s?/?&amp;gt;$&amp;quot;,
    RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);


/// &amp;lt;summary&amp;gt;
/// sanitize any potentially dangerous tags from the provided raw HTML input using 
/// a whitelist based approach, leaving the &amp;quot;safe&amp;quot; HTML tags
/// CODESNIPPET:4100A61A-1711-4366-B0B0-144D1179A937
/// &amp;lt;/summary&amp;gt;
public static string Sanitize(string html)
{
    if (String.IsNullOrEmpty(html)) return html;

    string tagname;
    Match tag;

    // match every HTML tag in the input
    MatchCollection tags = _tags.Matches(html);
    for (int i = tags.Count - 1; i &amp;gt; -1; i--)
    {
        tag = tags[i];
        tagname = tag.Value.ToLowerInvariant();
        
        if(!(_whitelist.IsMatch(tagname) || _whitelist_a.IsMatch(tagname) || _whitelist_img.IsMatch(tagname)))
        {
            html = html.Remove(tag.Index, tag.Length);
            System.Diagnostics.Debug.WriteLine(&amp;quot;tag sanitized: &amp;quot; + tagname);
        }
    }

    return html;
}&lt;/pre&gt;</content>
    <author>
      <name>Jeff Atwood</name>
      <email>jatwood@codinghorror.com</email>
    </author>
    <link type="text/html" href="http://www.refactormycode.com/codes/333-sanitize-html" rel="alternate"/>
  </entry>
</feed>

