Regex Precompilation in Practice

From the previous post on this subject, I thought I would explain in detail how to put this into use in a project.  We implemented this in Community Server for the 2008 release, and we saw some great improvement in our startup times.  We had well over 100 regular expressions throughout our codebase.  Of course, not all of them were either static or set to compile (where appropriate, of course) so there was a little work to be done in those areas.

Our process consists of an XML file to describe the regular expressions, namespaces and output locations and a stand alone executable to generate the assembly and wrapper class to call.  Remember in the previous article I said that I'd want a thread safe way to get at the instance? That is what the wrapper is for.  I'll show the basics of what we are doing and provide a sample framework so you can do the same thing in your projects.

A sample of the XML definition file:

<?xml version="1.0" encoding="utf-8" ?>
<RegEx namespace="RegexLibrary">
    <Configuration>
        <ASSEMBLY_OUTNAME>RegexLibrary</ASSEMBLY_OUTNAME>
        <CS_OUTNAME>YourNamespace.Components</CS_OUTNAME>
        <CS_OUTNAME_FILE>YNSRegex</CS_OUTNAME_FILE>
        <ASSEMBLY_FOLDERPATH>.\\outfile\\</ASSEMBLY_FOLDERPATH>
        <CS_FOLDERPATH>.\\outfile\\</CS_FOLDERPATH>
    </Configuration>

    <Item id="Spacer" description="DESCRIPTION" options="NONE">
        <![CDATA[\s{2,}]]>
    </Item>
</RegEx>

ASSEMBLY_OUTNAME is the name of the assembly that is generated.  In this case, RegexLibrary.dll
CS_OUTNAME is the namespace that you want your regular expressions to be contained in
CS_OUTNAME_FILE is the file name of your wrapper file.  This is also the class name for your regular expressions
ASSEMBLY_FOLDERPATH is the destination of the assembly
CS_FOLDERPATH is the path of the wrapper file

The wrapper file provides a similar structure that I provided in the previous post on this topic:

using System;
using System.Text.RegularExpressions;
using RL = YourCodebase.RegexLibrary;

namespace YourCodebase.Components
{
   
public class CSRegex
   
{
       
private static readonly object locker = new object();

       
static CSRegex() {}
       
private CSRegex() {}

       
private static Regex __Spacer;
       
public static Regex SpacerRegex()
        {
           
lock(locker)
            {
               
if( __Spacer == null )
                {
                    __Spacer =
new RL.Spacer();
                }
               
return __Spacer;
            }
        }
    }
}

Now for the method that does the real work.  Note that we are using an extended class for the regular expression data.  This is to provide additional information for the wrapper file.

static void Build()
{
    string assemblyInitialPath = Path.Combine(Environment.CurrentDirectory, ASSEMBLY_OUTNAME + ".dll");
   
string assemblyFullPath = Path.Combine(Path.Combine(Environment.CurrentDirectory, ASSEMBLY_FOLDERPATH), ASSEMBLY_OUTNAME + ".dll");
   
string outFileFullPath = Path.Combine(Path.Combine(Environment.CurrentDirectory, CS_FOLDERPATH), CS_OUTNAME_FILE + ".cs");

    assemblyFullPath =
Path.GetFullPath(assemblyFullPath);
    outFileFullPath =
Path.GetFullPath(outFileFullPath);

   
List<RegexCompilationInfoExt> lrci = GetRegexInfo(ASSEMBLY_OUTNAME);
   
Console.WriteLine( "Total of {0} regular expressions will be created.", lrci.Count );

   
if ( File.Exists( assemblyFullPath ) )
       
File.Delete( assemblyFullPath );

   
// build the assembly
   
Regex.CompileToAssembly(lrci.ToArray(), new AssemblyName(ASSEMBLY_OUTNAME));

   
// copy assembly
   
File.Copy(assemblyInitialPath, assemblyFullPath, true);

   
// build an outfile
   
OutFile(outFileFullPath, lrci);
}

Now, for our environment each of these options were necessary.  I don't expect that everyone would want or need all of this, but the gist of this exercise is to be concerned about what you are initializing in your [static] classes. Any of them.  If it is something that should probably be lazy loaded, then make it happen.  Employ the Singleton pattern when appropriate.  Note that this process [of precompilation of regular expressions] should be used prior to your main build (i.e. as a reference in your main project).

If your project does only have a few regular expressions, by all means, just use the RegexOptions.Compiled and don't do the precompilation (but at least use the lazy-loading).  But consider that when your project grows in complexity, your needs will grow with it.  Consider then when designing your next large scale project.

[ Download a sample project for building these files ]

3 Comments

  1. Rexha03 Says:

    the <a href="https://chrome.google.com/webstore/detail/gacha-life/bidldpedmamenefcipmhojmhjohmblnl">official site</a> of Gacha Life for Pc isn't really about playing a game so much as it is about playing with characters. If you've ever wanted to dress up anime characters and throw them into scenes of your own devising, you are absolutely in the target audience for this game.

    and if you are looking for an intense game try to play and download the game called <a href="https://sites.google.com/view/ai-dungeon-pc/">ai dungeon 2 – ai-generated text adventure game</a>. This game is a two-player version of that, where at least one player has gotten far too deep into a game.

  2. akinator game Says:

    Hey thank you for sharing Regex Precompilation with us here, it will be very useful for someone that needed a Regex topics or guide regarding this matter. I wanted to share with you this guessing game that I played for almost a month. Is a fun and exciting game that can played on your browser and mobile phones.

  3. scary grandma game Says:

    If it's your first time playing this game, it is recommended to try this mode first. In this mode, you get the chance to explore Granny's house without her. Since the layout of the house is just the same with all the other modes, this is a good opportunity for you to find out where the hidden rooms are, and which rooms contain the most important items for the game.


Leave a Reply