Published: Thu 21 January 2021
This is the story of a hack.
It's not anything
wrong or bad. It works quite well, but it just has that... quality. The one where you see it and you laugh in amused disgust.
This is the story of how I made python do regexes when it shouldn't do regexes.
I've blogged about littlecheck before. It's fish's script test driver . The way it works is that you write a script, and then you write the output you expect into
# CHECK: comments inside the script.
Littlecheck then lets whatever interpreter you picked run the script and compares its output to all the
# CHECK: lines. This is super simple and works quite well in practice. Here's an example:
# CHECK: Hello!
# CHECK: Goodbye
echo No check for this
# ^^ Oh no, that one will fail.
Only... when things went wrong, littlecheck did this naive comparison where it complained about the first line that was wrong, and then let you figure out the context of all of that.
Was it a superfluous line of output? A
# CHECK too many? Or actually a line that was different from what was expected?
It didn't tell you. It just said "this line doesn't look like this CHECK on line XYZ, also here's the rest of the output".
Now, if you've been around unix a few times, you might know what this is a problem for: diff!
You have a bunch of lines on one side, a bunch on the other and you want to know what the
diff-erence is between the two, so you run diff on them!
Well, yeah, you would. Only there's a problem: Littlecheck does regexes.
diff utility doesn't handle regexes, and we wouldn't want to launch it anyway .
But Littlecheck is written in python, and that has a lot of stuff in the standard library. Maybe there's a diffing tool?
Oh, there's a
And python can do regexes - that's what we use to match them in the first place. So let's just pass a comparator to the "SequenceMatcher" thing and be done with it.
SequenceMatcher doesn't take a comparator function, or key, or something comparable (hah!) .
Where we're going we don't need comparator functions.
It turns out SequenceMatcher takes its arguments as a list. So how about instead of passing strings we pass the regex objects? No, that won't work because the other side is still strings, so when it compares the two it'll just always be false.
So what if we overload the list's
__contains__ function? Well, no, that won't work because SequenceMatcher takes the list and puts the elements into a dictionary that we don't control.
So... how about we override the equality operator? Well, not quite. Since it's a dictionary it first tries hash comparisons. Luckily, that goes via the
__hash__ function, so I can write the most awful python I have ever written:
def __hash__ ( self ):
# Chosen by fair diceroll
# No, just kidding.
This makes the hash comparison always collide, so whenever python checks if something is
in the dictionary it'll have to take the long route and do an actual comparison. After that, we hack the __eq__ function to do a regex match:
def __eq__ ( self , other ):
if other is None :
if isinstance ( other , CheckCmd ):
return other . regex . match ( self . text )
if isinstance ( other , Line ):
# We only compare the text here so SequenceMatcher can reshuffle these
return self . text == other . text
From there it's smooth sailing putting lipstick on this pig... boat , and this is what it looks like:
And that's the way I like it.