Jon Michael Galindo

~ writing, programming, art ~

<< Previous next >>
3 January 2016

(Note: This blog will now update at least once per week, and sometimes more often.)


JavaScript: Regular Expressions Part 2


Picking up where we left off with a script color-coder, let's see if we can match "var" a little more reliably.

The Backslash: \


In RegExp, using a backslash before a character changes its meaning.

Specifically, a "b" normally matches the character "b". However, "\b" instead matches a "word boundary". A "word boundary" is any point between an alpha-numeric character and a non-alphanumeric character, and it works perfectly for identifying "var":

var d = document.getElementById('code'); d.innerHTML = d.innerHTML.replace(@R/\bvar\b/gR@,"<span style='color:blue;'>var</span>");

Now, "var" will only turn blue when its all by itself. But, there are still problems. For example:

var t = "this is a var";

The RegExp I mentioned above will match "var" in that string, even though we don't want it to.

The truth is, there's no good way around this. Regular expressions are not context aware; they have limitations. They find patterns, but they can't do something like "match var only when it stands alone and is preceded by an even number of quotation marks, ignoring those which occur between /* and */ delimiters."

That being said, the coloring script I use on this page uses almost exclusively regular expressions. It uses a single variable to track multi-line comments, but everything else is RegExp, and you can see the "var" in that string is gray, not blue.

To get that, we have to cheat a little. And that's just fine. The truth is, the RegExp I used did match the "var" in that string. The HTML it produced looks like this:

<span style="color:gray"> ... <span style="color:blue">var</span> ... </span>

I just changed the CSS to hide other colors inside the "gray" span, so the var's blue tag wouldn't show up.

span.gray { color:gray; } span.blue { color:blue; } span.gray span.blue { color:gray; }

And, it works pretty well.

So, now that you know we have to cheat, let's get down to the simplest cheat possible, and to do it we'll need three more tools. First:

(Capturing Parentheses)


Our RegExp replace function can actually take pieces of the matched pattern and add them to our replacement text. Remember our nonsense string? What if we wanted to replace "moo", "foo", and the rest, with "maa", "faa", etc.

We want to keep the first letter and change the vowels, but we can't just match "oo", because we want to leave things like "Hoover" alone. We'll use capturing parentheses, and a replacement code:

var nonsense = "moo foo boo too Hoover"; nonsense = nonsense.replace(/([mfbt])oo/g,"$1aa"); //nonsense now reads "maa faa baa taa Hoover"

The two things to note are the parentheses around [mfbt] and the $1 in our replacement string. $1 tells RegExp: insert the first captured match here. $2 would mean the second, $3 the third, etc.

If we had put parentheses around the whole thing, we could do this:

var nonsense = "moo foo boo too Hoover"; nonsense = nonsense.replace(/([mfbt]oo)/g,"$1aa"); //nonsense now reads "mooaa fooaa booaa tooaa Hoover"

Before we can really cheat, we need just two more simple tools:

The . and the +


The decimal point matches everything except a linebreak, and the + matches the preceding character 1 or more times.

So, now we can write a regular expression to make short-hand coloring tags:

var html = "@B This is some text for a blue span. B@"; html = html.replace(@R/@B(.*)B@/gR@,"<span style='blue'>$1</span>"); //html now reads "<span style='blue'> This is some text for a blue span. </span>"

And, honestly, that's basically good enough. You can color-code your text as much as you want using a few regular expressions, and you don't have to write out the HTML code every time.

Of course, it's not really what we wanted. But, it works. For the complicated parts, where you can't figure out an appropriate regular expression (or where none exists) you now know how to cheat.

Nevertheless, we still want to match as much as we can without cheating. So, on to a bit of programming and some lists of words! Next week... :-)



© Jon Michael Galindo 2015