Regex Precompilation

I was watching Twitter today and saw Elijah Manor posted a link to a post titled Improve your code: Regex creation is expensive.  In the article it discusses using Regexes as class level static readonly members and to use RegexOptions.Compiled.  Now, the compilation part I agree with, just not in the way the article intends.

I disagree that creating Regexes as static members on initialization is the best way to do this.  Let's say you have 10 different regular expressions and all are in the same class as you specify.  You likely will not need all 10 for whatever static member you use, but all 10 will be created/initialized at that time. (Point #1 - lazy initialization).  Using RegexOptions.Compiled is good, but not the absolute best way to go about this. Regex precompilation is the way to go for best performance.  See Base Class Library Performance Tips and Tricks for more details on the precompilation steps.  What it boils down to is having an assembly already compiled with your regular expressions already "managed". Compiled, prepared and ready to use.

After you have your assembly prepared, you can create/call the regular expressions directly or make a lazy-loaded thread safe wrapper for them.  An example of how to use this in code is below.  Note that I am showing you bare bones code here, to solidify the mechanics of the example.

using System;
using System.Text.RegularExpressions;
using RL = YourCodebase.RegexLibrary;

namespace YourCodebase.Components
{
  public class CSRegex
  {
    private static readonly object locker = new object();

    static CSRegex() {}
    private CSRegex() {}

    private static Regex __Spacer;

    public static Regex SpacerRegex()
    {
      lock(locker)
      {
        if( __Spacer == null )
        {
          __Spacer = new RL.Spacer();
        }

        return __Spacer;
      }
    }
  }
}

This is lazy loaded, and thread safe.  Perhaps I might use some sort of Singleton<T> to clean this up.  What you end up saving here is external compilation when either your site/app is starting, or when a class that contains this is accessed for the first time.  Think of the cpu that would be used for regular expressions that may or may not be used within your current code branch?  This can shave seconds off of startup time, and that translates into time for your clients.

You see, what I'd actually like is for Microsoft to bake this kind of functionality into the framework. Instead of having precompilation happen on site restarts, just have it happen once - or better yet when you compile your codebase the first time.

3 Comments

  1. Corneliu Says:

    Hi David,

    Thanks for this. Very nice piece of code.

    Your example improves the start-up performance of the application by lazy loading the RegEx but introduces more moving parts, threading locks, and way more code just to increase the start-up performance.

    What you have is a standard way of improving start-up by lazy loading that you just happen to apply it to a RegEx. The technique can be applied to anything.

    In my example I was simply trying to move people away from really bad RegEx usage to something better (not perfect) with the least amount of changes.

    Also, unless you have lots and lots of RegExes, pre-compiling would be a complex and useless step just making your build process more complex without that much of a benefit.

    Regards,

    Corneliu.

  2. David L. Penton Says:

    Corneliu,

    Thanks for the comment. You have a very valid point about applying the lazy-load technique to Regexes. But, as you know that optimization is likely missed usually as well. I happen to work to develop applications that have many regexes in them, and this pattern works well with that scenario. Depending on the hardware that your website is running on, the compilation of a single Regex can takes upwards of 1 to 2 seconds. It may not matter for a single regex or even two, but it eventually adds up. Also, your build process doesn't have to be that much more complicated. I am posting a follow-up to this post to illustrate how you could slip this into a build process.

    Thanks,

    David

  3. Rrivate Live Says:

    Thanks.I just need this code!It's useful!!


Leave a Reply