$\newcommand{\defeq}{\mathrel{\mathop:}=}$

## 2008/08/03

### Regular Expressions and Basic Computability Theory in Haskell

1. x ∈ Σ 是個 regular expression；
2. ε 是個 regular expression；
3. ∅ 是個 regular expression；
4. RS 是 regular expressions，則 R ∪ S 也是個 regular expression；
5. RS 是 regular expressions，則 RS 也是個 regular expression；
6. R 是個 regular expression，則 R* 也是個 regular expression。

data RegExp a = Literal a
| Epsilon
| Empty
| Union (RegExp a) (RegExp a)
| Append (RegExp a) (RegExp a)
| Kleene (RegExp a)


### Regular Expression Matching

Literal x 這樣的 regex 只能匹配「恰有一個元素 x 的 list」。其他的 lists 都不對。

match (Literal x) [y] | y == x = True
match (Literal x) _ = False


match Epsilon xs = null xs


match Empty _ = False


match (r Union s) xs = match r xs || match s xs


match (Kleene r) xs = match (Epsilon Union (r Append Kleene r)) xs


splitAt 4 [1..10] = ([1,2,3,4],[5,6,7,8,9,10])


[splitAt 0 xs, splitAt 1 xs, ..., splitAt (length xs) xs]

flip 改寫一下
[flip splitAt xs 0, flip splitAt xs 1, ..., flip splitAt xs (length xs)]


map (flip splitAt xs) [0..length xs]


map (\(xs, ys) -> match r xs && match s ys)


map (uncurry (&&) . (match r *** match s))


match (r Append s) xs =
or $map (\(xs, ys) -> match r xs && match s ys)$
map (flip splitAt xs) [0..length xs]


### A Semantic Function

match :: (Eq a) => RegExp a -> [a] -> Bool


match :: (Eq a) => RegExp a -> ([a] -> Bool)


### Another Semantic Function

enumerate :: RegExp a -> [[a]]


enumerate (Literal x) = [[x]]
enumerate Epsilon = [[]]
enumerate Empty = []

Kleene star 也用和 match 相同的伎倆。
enumerate (Kleene r) = enumerate (Epsilon Union (r Append Kleene r))


enumerate (r Union s) = enumerate r ++ enumerate s


enumerate (Kleene (Literal '0') Union Literal '1')


enumerate (Kleene (Literal '0'))


finite (Literal _) = True
finite Epsilon = True
finite Empty = True
finite (Union r s) = finite r && finite s
finite (Append r s) = finite r && finite s
finite (Kleene _) = False

finite rfinite s 有一個成立時，就把有窮的那一個 list 放在前面；如果兩個 lists 都是 infinite，我們就交替走訪這兩個 lists。如此便可達成 enumerability，讓每一個元素在有窮時間內出現。
enumerate (r Union s)
| finite r  = enumerate r ++ enumerate s
| finite s  = enumerate s ++ enumerate r
| otherwise = alternate (enumerate r) (enumerate s)
where alternate (x : xs) (y : ys) = x : y : alternate xs ys


enumerate (r Union s) = alternate (enumerate r) (enumerate s)
where alternate (x : xs) (y : ys) = x : y : alternate xs ys
alternate xs [] = xs
alternate [] ys = ys


enumerate (r Append s) = [xs ++ ys | xs <- enumerate r, ys <- enumerate s]


enumerate renumerate s 視為兩個座標軸，我們的目標是讓平面上的任一點在有窮時間內出現。這正是有理數的列舉方式，即「有理數集可數」證明的核心。然而這樣的 traversal 在 Haskell 該怎麼寫呢？

[1,1,2,1,2,3,1,2,3,4,...]


['A','B','A','C','B','A','D','C','B','A',...]

zip 把這兩個 lists 拉在一起就是我們要的 enumeration。如果兩個 lists 都是 infinite lists，這兩個觀察順序就很單純。對於一個 list xs，第一種觀察順序可以寫成
concat $map (flip take xs) [1..]  第二種則可以寫成 concat$ map (reverse . flip take xs) [1..]


concat $unfoldr f (xs, []) where f (y : ys, zs) = Just (y : zs, (ys, y : zs))  對於 finite lists，這兩個觀察順序就比較麻煩，特別是當兩個 lists 都有窮的時候。但只要有一個 finite list，我們就可以把它放在內層迴圈而輕鬆達成 enumerability。所以整個 r Append s 的情況就寫成 enumerate (r Append s) | finite s = [xs ++ ys | xs <- enumerate r, ys <- enumerate s] | finite r = [xs ++ ys | ys <- enumerate s, xs <- enumerate r] | otherwise = zigzag (enumerate r) (enumerate s) where zigzag xs ys = zipWith (++) (concat$ map (flip take xs) [1..])
(concat \$ unfoldr f (ys, []))
f (y : ys, zs) = Just (y : zs, (ys, y : zs))


### Matching via Enumeration

match' r xs = xs elem enumerate r


--

--

Labels: ,