# Introduction

A Finite State Automaton (or just FSA) is a mathematical model of computation. A regular expression (or RegEx for short) is the algebraic representation of an FSA.

In less abstract terms, a regular expression is a description for a pattern of text.

Regular expressions are helpful when it comes to checking for certain patterns. With patterns, you can do many things such as complex replacing (way more powerful than Ctrl+F), extremely simple input validation, finding specific patterns amongst extremely large amounts of data very quickly, and data extraction / parsing. You can also write cool, unintelligible expressions like

^(?:\s*(?:=.*?=|<.*?>|$.*?]|$$.*?$$|\{.*?})\s*)* (?:[^\[|$(){}<>=]*\s*\|\s*)?([^$|$(){}<>=]*)

## 2. Which of the following strings would be accepted / matched by the regex below? List all correct answers.

/abb*(a|b*)a(b*|a*)/

1. ababbaab
2. ababa
3. aaabb
4. abbbbab
5. abbaababbaa

We can do this by just looking at all 5 of the strings.

Let's look at #1. /abb*(a|b*)a/ can match aba just fine. However, /(b*|a*)/, which produces only as or only bs, cannot match bbaab. Hence, #1 doesn't match.

A similar idea applies to #2; /(b*|a*)/ cannot match ba.

As for #3, it doesn't match because the string must start with ab.

For #4, it would match. /abb*(a|b*)a/ can display abbbba. The final b can be displayed by /(b*|a*)/, so #4 matches.

The last option would not be valid because babbaa cannot be produced by /(b*|a*)/.

So, only #4 would actually be accepted.

Another (probably better) way to do this problem would be to translate the regex into plain English and do it intuitively. This particular regex means:

Regex English
/abb*/ Matches ab literally, followed by b 0 or more times
/(a|b*)/ Matches either a literally or b 0 or more times. We can simplify this into just an /a?/ because we know /b*b*/ (if we choose 2nd option) is just /b*/, meaning that this part matcher either a or nothing.
/a/ Matches a literally
/(b*|a*)/ Matches either a 0 or more times or b 0 or more times

Then we can compare our plain-English description against the different strings, in left-to-right order.

Here is a brief description of how you might do this intuitively:

String Explanation
ababbaab ab literally, followed by b 0 times. Although the next character is an a, we will not match with that and will instead match with b 0 times so that we don't fail the next part. Then a literally. Last part can't match remaining bbaab.
ababa ab literally, followed by b 0 times. Although the next character is an a, we will not match with that and will instead match with b 0 times so that we don't fail the next part. Then a literally. Last part can't match remaining ba.
aaabb Can't match ab literally.
abbbbab ab literally, followed by b 3 times. Although the next character is an a, we will not match with that and will instead match with b 0 times so that we don't fail the next part. Then a literally. Last part matches remaining b.
abbaababbaa ab literally, followed by b 1 time. Match with optional a and not b*. Then a literally. Last part can't match remaining babbaa.

## 3. Determine what strings would be accepted by the following FSA. Make your answer general. (So basically, give us a regular expression.)

The fork in the FSA represents a union. If we were to take the upper "path", we would have a string of /10*1101*10*0/. If we took the lower path, we'd have /10*110*10*0/. Notice where they are the same and where they differ.

They both share /10*11/ at the beginning as well as /10*0/ at the end. The middle conflicting portion can be written as a union in our final regex.

So, we would have /10*11(01*|0*)10*0/ as our regex.

## 4. Which of the given strings would be accepted by the following FSA?

1. /a(ba*aba|aba)|(aba*b)/
2. /a(ba*a(ba|ab)a)|aba*ab/
3. /a((ba*a(ba|ab)a)|(aba*ab))/
4. /a((ba*aba|aba)|aba*ab)/
5. None of the above

For this problem, it would be easier to analyze the FSA rather than looking at the strings one by one. In this FSA, there are two unions that we would have to keep in mind.

Regardless of what "path" is taken, a would always start the string. Now, we have a union; we can analyze the two subpaths separately. The upper path would produce either /ba*abaa/ or /ba*aaba/; we can write this as /ba*a(ab|ba)a/.

The lower path would produce /aba*ab/.

Now, we can put these together to get our overall regex of /a((ba*a(ab|ba)a)|(aba*ab))/. Note that many parentheses were used to clarify what the different union alternatives are. This regex matches what is written for #3; hence, #3 is our answer.

Authors: Kelly Hong, Raymond Zhao