Regular expressions reference

How to search, extract and replace text in Gyana.

Regular expressions (or "regex") are a system for working with text data.

If you've hit the limitations of functions like substitute, upper or strip, you might find regular expressions useful. Think of it like advanced search and replace.

💡 There is a learning curve, but you'll unlock the ability to do pretty much anything you can imagine with text, and maybe even save the day.

In this guide, we'll give you a brief overview of how regular expressions work (with links), and show you how use the Gyana functions regex_extract, regex_replace regex_search.

Regular expressions

Suppose you have the sentence "The quick brown fox jumps over the lazy dog", and you wanted to replace all occurrences of the word "dog" with "cat". You can do that with a standard search and replace, e.g. using the substitute function.

But now suppose you wanted to replace "all words that start with 'd'" with "cat". Standard search replace won't work, because there are lots of words that start with 'd'.

That's where we need a regular expression. A regular expression is how we describe something like "all words that start with 'd'", in a way a computer can understand. It's a language for describing patterns in text, which the computer will then go and search for.

In this case, the regular expression we want is \sd[a-z]+. Here's why:

\s matches a space (typically before a new word)
d matches the letter d
[a-z]+ matches a string of lowercase letters one or more times

Taken together, the computer will look for "a space, followed by a d, followed by one or more lowercase letters" - which is the same as saying "all words that start with 'd'".

Designing a regex for your specific problem will take a few minutes of trial and error. Here are a few resources we recommend:

A guide for learning regex at RegexOne
Regex101, an interactive editor where you can prototype your regex
A syntax reference for re2, the regex engine used by Gyana

Regex functions

Now that we've covered the basics, here's how to think about the regex functions in Gyana.

regex_search will return true if the text matches the regex pattern. Primarily useful for filtering data, e.g. to filter out records that contain a valid zip code or email address.

regex_extract will extract the specific piece of text in the regex pattern. This is great for cleaning data, e.g. extracting emails or phone numbers. Since a regex can have multiple matches, you use the index argument to decide which match to keep (e.g. first, second, ...).

regex_replace will replace the matched text with a replacement you define - think of it as advanced search and replace. If you want to refer to the original text in your replacement, use \0 - for example, if the matched text is dog and the replacement is \0s, the final result is dogs.

If you ever get stuck, bear in mind that even seasoned programmers get tripped up by regex - you're not alone! If you have any issues, just reach out via the support.