TIL: Or "Today I Remembered" the role of a comma in a tuple
I hope I am not the only one who encloses all or most of their tuples with parentheses, to the point that seeing something like this:
tuple_of_things = "one_thing", "another_thing", "final_thing"
Makes me feel uneasy.
Well, Today I Learned Remembered the role of a comma in a tuple.
Panic sets in when the regular expression doesn't match
Imagine my surprise when I was shown a regular expression that was trying to extract an S3 bucket ID: By convention, the application was using bucket IDs that are composed of a well-known prefix followed by a dash and 5 to 12 numbers. This seemed like something a straight-forward regular expression pattern could match:
import re
haystack = "arn:aws:s3:::bucket-1234567890/path/to/key"
bucket_prefix = "bucket"
result = re.search(f"{bucket_prefix}-[0-9]{5,12}", haystack)
print(result)
No match, no exception being raised.
Looking for help
Naturally, I did what everyone does when debugging regular expressions: Went to the first regular expression tester that popped up online, made sure it had support for Python flavor, and popped in what I thought was the pattern I was matching. Of course, I had to resolve the f-string when copying the pattern to the online tool:
Good! I am still worthy… At least for simple regular expressions like this. But this doesn't answer the original question: Why is the code not matching then?
Of course! The comma makes it a tuple!
Finally, the realization hit as I noticed these special characters {5,12}
are missing an extra set of curly braces. Without the additional characters for escaping, the f-string is resolving the string as "bucket-[0-9](5, 12)"
because 5,12
is a tuple!
For completion's sake, here is the working code:
import re
haystack = "arn:aws:s3:::bucket-1234567890/path/to/key"
bucket_prefix = "bucket"
result = re.search(f"{bucket_prefix}-[0-9]{{5,12}}", haystack)
print(result)
Which now gives us a match:
<re.Match object; span=(13, 30), match='bucket-1234567890'>
Wrapping up
Not quite a "Today I Learned" but a "Today I Remembered" as I am pretty sure I learned very early on in my Python studies that it is the comma that makes the tuple, not the parentheses. What's more, the docs call it out explicitly, and I have those pages open pretty much all day! However, this detail doesn't come up a lot when writing code, at least with my policy of wrap-all-tuples-in-parentheses.
This post is not about a ground-breaking discovery, but I felt good when remembering a little fact I learned all those years ago when I was an aspiring Pythonista.
Always be (re-)learning!