Parsec Tutorial: How to Build a Text Parser in Haskell
Parsing the output of derived Show
The polymorphic function show
returns a string representation of any data type that is an instance of the type class Show
. The easiest way to make a data type an instance of type class is using the deriving
clause. show :: Show a => a -> Stringdata D = D ... deriving (Show)d :: Dd = D ...str :: Stringstr = show d -- string representation of the instance of the data type
show
and return it in XML format. parseShow :: String -> Stringxml = parseShow $ show res
Example data type
First we create a data typePersonRecord
data PersonRecord = MkPersonRecord {name :: String,address :: Address,id :: Integer,labels :: [Label]} deriving (Show)
Address
and Label
are defined as follows: data Address = MkAddress {line1 :: String,number :: Integer,street :: String,town :: String,postcode :: String} deriving (Show)data Label = Green | Red | Blue | Yellow deriving (Show)
Show
using the deriving
clause. The compiler will automatically create the show function for this data type. Our parser will parse the output of this automatically derived show function. Then we create some instances of PersonRecord
: rec1 = MkPersonRecord"Wim Vanderbauwhede"(MkAddress "School of Computing Science" 17 "Lilybank Gdns" "Glasgow" "G12 8QQ")557188[Green, Red]rec2 = MkPersonRecord"Jeremy Singer"(MkAddress "School of Computing Science" 17 "Lilybank Gdns" "Glasgow" "G12 8QQ")42[Blue, Yellow]
main = putStrLn $ show [rec1,rec2]
[wim@workai HaskellParsecTutorial]$ runhaskell test_ShowParser_1.hs[MkPersonRecord {name = "Wim Vanderbauwhede", address = MkAddress {line1 = "School of Computing Science", number = 17, street = "Lilybank Gdns", town = "Glasgow", postcode = "G12 8QQ"}, id = 557188, labels = [Green,Red]},MkPersonRecord {name = "Jeremy Singer", address = MkAddress {line1 = "School of Computing Science", number = 17, street = "Lilybank Gdns", town = "Glasgow", postcode = "G12 8QQ"}, id = 42, labels = [Blue,Yellow]}]
The derived Show format can be summarized as follows: - Lists: [… comma-separated items …]
- Records: { … comma-separated key-value pairs …}
- Strings: “…”
- Algebraic data types: variant type name
Building the parser
We create a moduleShowParser
which exports a single function parseShow
: module ShowParser ( parseShow ) where
Some boilerplate:
import Text.ParserCombinators.Parsecimport qualified Text.ParserCombinators.Parsec.Token as Pimport Text.ParserCombinators.Parsec.Language
Parsec.Token
module provides a number of basic parsers. Each of these takes as argument a lexer, generated by makeTokenParser
using a language definition. Here we use emptyDef
from the Language
module. It is convenient to create a shorter name for the predefined parsers you want to use, e.g. parens = P.parens lexer-- and similar
The parser
The functionparseShow
takes the output from show
(a String
) and produces the corresponding XML (also a String
). It is composed of the actual parser showParser
and the function run_parser
which applies the parser to a string. parseShow :: String -> StringparseShow = run_parser showParsershowParser :: Parser Stringrun_parser :: Parser a -> String -> arun_parser p str = case parse p "" str ofLeft err -> error $ "parse error at " ++ (show err)Right val -> val
The XML format
We define an XML format for a generic Haskell data structure. We use some helper functions to create XML tags with and without attributes.Header:
<?xml version="1.0" encoding="utf-8"?>
xml_header = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
Tags:
<tag> ... </tag>
otag t = "<"++t++">"ctag t = "</"++t++">"tag t v = concat [otag t,v,ctag t]
Attributes:
<tag attr1="..." attr2="...">
tagAttrs :: String -> [(String,String)] -> String -> StringtagAttrs t attrs v =concat [otag (unwords $ [t]++(map (\(k,v) -> concat [k,"=\"",v,"\""]) attrs)),v,ctag t]
concat :: [[a]] -> [a] -- join listsunwords :: [String] -> String -- join words using spaces
joinNL :: [String] -> String -- join lines using "\n"
unlines
from the Prelude, just to illustrate the use of intercalate
and the Data.List
module. Parsers for the derived Show
format
Lists
[ ..., ..., ... ]
XML : <list><list-elt>...</list-elt>...</list>
list_parser = dols <- brackets $ commaSep showParserreturn $ tag "list" $ joinNL $ map (tag "list-elt") ls
Tuples
: ( ..., ..., ... )
XML: <tuple><tuple-elt>...</tuple-elt>...</tuple>
tuple_parser = dols <- parens $ commaSep showParserreturn $ tag "tuple" $ unwords $ map (tag "tuple-elt") ls
Record types
Rec { k=v, ... }
XML: <record><elt key="k">v</elt>...</record>key-value pairs: k = v -- v can be anything
record_parser = doti <- type_identifierls <- braces $ commaSep kvparserreturn $ tagAttrs "record" [("name",ti)] (joinNL ls)kvparser = dok <- identifiersymbol "="t <- showParserreturn $ tagAttrs "elt" [("key",k)] ttype_identifier = dofst <- oneOf ['A' .. 'Z']rest <- many alphaNumwhiteSpacereturn $ fst:rest
Algebraic data types
e.g. Label
XML: <adt>Label</adt>
adt_parser = doti <- type_identifierreturn $ tag "adt" ti
Quoted strings and numbers
quoted_string = dos <- stringLiteralreturn $ "\""++s++"\""number = don <- integerreturn $ show n
Complete parser
Combine all parsers using the choice combinator<|>
. showParser :: Parser StringshowParser =list_parser <|> -- [ ... ]tuple_parser <|> -- ( ... )try record_parser <|> -- MkRec { ... }adt_parser <|> -- MkADT ...number <|> -- signed integerquoted_string <?> "Parse error"
Remember that
try
is used to avoid consuming the input. Main program
Import the parser module import ShowParser (parseShow)
rec_str = show [rec1,rec2]main = putStrLn $ parseShow rec_str
[wim@workai HaskellParsecTutorial]$ runhaskell test_ShowParser.hs<?xml version="1.0" encoding="UTF-8"?><list><list-elt><record name="MkPersonRecord"><elt key="name">"Wim Vanderbauwhede"</elt><elt key="address"><record name="MkAddress"><elt key="line1">"School of Computing Science"</elt><elt key="number">17</elt><elt key="street">"Lilybank Gdns"</elt><elt key="town">"Glasgow"</elt><elt key="postcode">"G12 8QQ"</elt></record></elt><elt key="id">557188</elt><elt key="labels"><list><list-elt><adt>Green</adt></list-elt><list-elt><adt>Red</adt></list-elt></list></elt></record></list-elt><list-elt><record name="MkPersonRecord"><elt key="name">"Jeremy Singer"</elt><elt key="address"><record name="MkAddress"><elt key="line1">"School of Computing Science"</elt><elt key="number">17</elt><elt key="street">"Lilybank Gdns"</elt><elt key="town">"Glasgow"</elt><elt key="postcode">"G12 8QQ"</elt></record></elt><elt key="id">42</elt><elt key="labels"><list><list-elt><adt>Blue</adt></list-elt><list-elt><adt>Yellow</adt></list-elt></list></elt></record></list-elt></list>
Summary
- Parsec makes it easy to build powerful text parsers from building blocks using predefined parsers and parser combinators.
- The basic structure of a Parsec parser is quite generic and reusable
- The example shows how to parse structured text (output from Show) and generate an XML document containing the same information.
Functional Programming in Haskell: Supercharge Your Coding

Our purpose is to transform access to education.
We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.
We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.
Learn more about how FutureLearn is transforming access to education