How to Read a File Line by Line Haskell
Chapter vii. I/O
It should be obvious that almost, if not all, programs are devoted to gathering data from outside, processing information technology, and providing results back to the outside globe. That is, input and output are key.
Haskell's I/O organisation is powerful and expressive. Information technology is easy to work with and of import to understand. Haskell strictly separates pure code from code that could cause things to occur in the world. That is, it provides a complete isolation from side-furnishings in pure code. Besides helping programmers to reason almost the correctness of their code, information technology also permits compilers to automatically introduce optimizations and parallelism.
We'll begin this affiliate with unproblematic, standard-looking I/O in Haskell. And so nosotros'll talk over some of the more powerful options too every bit provide more item on how I/O fits into the pure, lazy, functional Haskell world.
Classic I/O in Haskell
Let's get started with I/O in Haskell by looking at a plan that looks surprisingly similar to I/O in other languages such equally C or Perl.
-- file: ch07/basicio.hs main = exercise putStrLn "Greetings! What is your name?" inpStr <- getLine putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!"
You can compile this program to a standalone executable, run it with runghc , or invoke main
from within ghci . Hither'due south a sample session using runghc :
$runghc basicio.hs
Greetings! What is your name?John
Welcome to Haskell, John!
That's a fairly elementary, obvious result. You can encounter that putStrLn
writes out a String
, followed by an stop-of-line character. getLine
reads a line from standard input. The <-
syntax may be new to y'all. Put simply, that binds the result from executing an I/O activity to a name. [15] Nosotros use the simple list concatenation operator ++
to join the input string with our ain text.
Let'due south accept a await at the types of putStrLn
and getLine
. Y'all tin can detect that information in the library reference, or just ask ghci :
ghci>
:type putStrLn
putStrLn :: String -> IO ()ghci>
:blazon getLine
getLine :: IO String
Notice that both of these types have IO in their return value. That is your cardinal to knowing that they may accept side effects, or that they may render different values fifty-fifty when called with the same arguments, or both. The type of putStrLn
looks similar a function. It takes a parameter of type Cord
and returns value of type IO ()
. Just what is an IO ()
though?
Anything that is type IO
is an I/O activity . You lot can store it and nothing volition happen. I could say something
writefoo = putStrLn "foo"
and zero happens right so. Only if I later apply writefoo
in the middle of some other I/O action, the writefoo
activity will be executed when its parent activity is executed -- I/O deportment tin be glued together to grade bigger I/O actions. The ()
is an empty tuple (pronounced "unit"), indicating that there is no return value from putStrLn
. This is similar to void
in Coffee or C.[16]
Tip | |
---|---|
Actions tin can be created, assigned, and passed anywhere. However, they may only be performed (executed) from inside another I/O action. |
Let's look at this with ghci :
ghci>
allow writefoo = putStrLn "foo"
ghci>
writefoo
foo
In this example, the output foo
is non a return value from putStrLn
. Rather, it's the side effect of putStrLn
really writing foo
to the terminal.
Discover 1 other thing: ghci actually executed writefoo
. This ways that, when given an I/O activity, ghci volition perform it for you lot on the spot.
What Is An I/O Activity? | |
---|---|
Actions:
|
The type of getLine
may look strange to you. It looks like a value, rather than a function. And in fact, that is one way to look at information technology: getLine
is storing an I/O activity. When that activeness is performed, you get a Cord
. The <-
operator is used to "pull out" the consequence from performing an I/O action and shop it in a variable.
main
itself is an I/O activeness with type IO ()
. You can only perform I/O actions from within other I/O actions. All I/O in Haskell programs is driven from the top at principal
, which is where execution of every Haskell program begins. This, so, is the machinery that provides isolation from side furnishings in Haskell: you perform I/O in your IO
actions, and phone call pure (non-I/O) functions from there. Most Haskell code is pure; the I/O actions perform I/O and call that pure code.
do
is a convenient fashion to define a sequence of deportment. As you'll see later, there are other ways. When you use exercise
in this way, indentation is significant; make certain you line upwards your actions properly.
You lot only need to utilize practice
if y'all have more than i action that you need to perform. The value of a do
block is the value of the last action executed. For a complete description of do
syntax, run across the section called "Desugaring of practise blocks".
Allow's consider an instance of calling pure lawmaking from within an I/O action:
-- file: ch07/callingpure.hs name2reply :: String -> Cord name2reply proper noun = "Pleased to meet you, " ++ proper noun ++ ".\n" ++ "Your proper noun contains " ++ charcount ++ " characters." where charcount = show (length proper noun) main :: IO () primary = do putStrLn "Greetings in one case again. What is your name?" inpStr <- getLine let outStr = name2reply inpStr putStrLn outStr
Notice the name2reply
role in this example. It is a regular Haskell function and obeys all the rules we've told you about: it always returns the same result when given the same input, it has no side effects, and information technology operates lazily. It uses other Haskell functions: (++)
, testify
, and length
.
Downwardly in master
, we bind the issue of name2reply inpStr
to outStr
. When yous're working in a do
block, you use <-
to get results from IO deportment and let
to get results from pure code. When used in a do
block, y'all should not put in
after your let
argument.
Y'all can see here how we read the person'due south name from the keyboard. Then, that data got passed to a pure part, and its result was printed. In fact, the last 2 lines of main
could have been replaced with putStrLn (name2reply inpStr)
. So, while master
did have side effects—it caused things to announced on the terminal, for case—name2reply
did not and could not. That'due south because name2reply
is a pure function, non an activity.
Let's examine this with ghci :
ghci>
:load callingpure.hs
[1 of one] Compiling Main ( callingpure.hs, interpreted ) Ok, modules loaded: Main.ghci>
name2reply "John"
"Pleased to meet y'all, John.\nYour name contains four characters."ghci>
putStrLn (name2reply "John")
Pleased to encounter y'all, John. Your name contains 4 characters.
The \n
within the string is the cease-of-line (newline) graphic symbol, which causes the terminal to begin a new line in its output. Just calling name2reply "John"
in ghci will show yous the \n
literally, considering it is using show
to brandish the render value. But using putStrLn
sends it to the terminal, and the final interprets \n
to commencement a new line.
What do you remember will happen if y'all just type main
at the ghci prompt? Give it a endeavor.
After looking at these example programs, y'all may exist wondering if Haskell is really imperative rather than pure, lazy, and functional. Some of these examples expect like a sequence of actions to be followed in order. There'due south more to it than that, though. We'll talk over that question afterwards in this affiliate in the section called "Is Haskell Really Imperative?" and the section called "Lazy I/O".
Pure vs. I/O
Equally a way to help with agreement the differences betwixt pure code and I/O, here'southward a comparison table. When we speak of pure lawmaking, nosotros are talking about Haskell functions that always render the aforementioned result when given the aforementioned input and take no side effects. In Haskell, only the execution of I/O actions avoid these rules.
Table 7.ane. Pure vs. Impure
Pure | Impure |
---|---|
Ever produces the aforementioned result when given the same parameters | May produce different results for the same parameters |
Never has side effects | May have side effects |
Never alters state | May modify the global country of the program, system, or world |
Why Purity Matters
In this section, nosotros've discussed how Haskell draws a clear distinction betwixt pure code and I/O deportment. Well-nigh languages don't draw this distinction. In languages such as C or Coffee, there is no such thing as a function that is guaranteed by the compiler to ever return the aforementioned result for the same arguments, or a function that is guaranteed to never have side effects. The only manner to know if a given function has side furnishings is to read its documentation and hope that information technology's authentic.
Many bugs in programs are caused past unanticipated side furnishings. Still more are caused by misunderstanding circumstances in which functions may return dissimilar results for the same input. Every bit multithreading and other forms of parallelism grow increasingly common, it becomes more than hard to manage global side effects.
Haskell's method of isolating side effects into I/O deportment provides a articulate boundary. You can e'er know which parts of the system may alter land and which won't. You tin can always exist sure that the pure parts of your plan aren't having unanticipated results. This helps you to think well-nigh the program. Information technology also helps the compiler to think about it. Recent versions of ghc , for instance, can provide a level of automatic parallelism for the pure parts of your code -- something of a holy grail for computing.
For more discussion on this topic, refer to the department called "Side Effects with Lazy I/O".
Working With Files and Handles
And then far, you've seen how to collaborate with the user at the figurer'due south terminal. Of class, y'all'll oftentimes need to manipulate specific files. That'south easy to exercise, besides.
Haskell defines quite a few basic functions for I/O, many of which are similar to functions seen in other programming languages. The library reference for Organisation.IO
provides a good summary of all the basic I/O functions, should you need one that we aren't touching upon hither.
You volition generally begin by using openFile
, which will give you lot a file Handle
. That Handle
is then used to perform specific operations on the file. Haskell provides functions such as hPutStrLn
that work just similar putStrLn
but accept an additional argument—a Handle
—that specifies which file to operate upon. When you're washed, you'll use hClose
to shut the Handle
. These functions are all divers in System.IO
, so you'll need to import that module when working with files. At that place are "h" functions corresponding to nigh all of the non-"h" functions; for instance, in that location is print
for printing to the screen and hPrint
for printing to a file.
Let's start with an imperative way to read and write files. This should seem similar to a while
loop that you may detect in other languages. This isn't the best way to write it in Haskell; later, you'll see examples of more Haskellish approaches.
-- file: ch07/toupper-imp.hs import Organisation.IO import Data.Char(toUpper) main :: IO () main = practice inh <- openFile "input.txt" ReadMode outh <- openFile "output.txt" WriteMode mainloop inh outh hClose inh hClose outh mainloop :: Handle -> Handle -> IO () mainloop inh outh = practise ineof <- hIsEOF inh if ineof and then return () else do inpStr <- hGetLine inh hPutStrLn outh (map toUpper inpStr) mainloop inh outh
Like every Haskell program, execution of this programme begins with main
. Two files are opened: input.txt
is opened for reading, and output.txt
is opened for writing. Then we call mainloop
to procedure the file.
mainloop
begins by checking to run across if nosotros're at the stop of file (EOF) for the input. If non, we read a line from the input. We write out the aforementioned line to the output, later get-go converting it to uppercase. And so we recursively call mainloop
over again to continue processing the file.[17]
Notice that return
phone call. This is not really the same equally return
in C or Python. In those languages, return
is used to terminate execution of the current function immediately, and to render a value to the caller. In Haskell, render
is the contrary of <-
. That is, return
takes a pure value and wraps information technology inside IO. Since every I/O action must render some IO blazon, if your result came from pure ciphering, you must use render
to wrap information technology in IO. Every bit an example, if vii
is an Int
, and so return 7
would create an activity stored in a value of type IO Int
. When executed, that action would produce the result 7
. For more details on return
, see the section called "The True Nature of Return".
Let'southward try running the program. We've got a file named input.txt
that looks similar this:
This is ch08/input.txt Examination Input I like Haskell Haskell is swell I/O is fun 123456789
Now, you can use runghc toupper-imp.hs
and you'll detect output.txt
in your directory. It should look similar this:
THIS IS CH08/INPUT.TXT Exam INPUT I Like HASKELL HASKELL IS GREAT I/O IS FUN 123456789
More on openFile
Permit's use ghci to cheque on the blazon of openFile
:
ghci>
:module System.IO
ghci>
:type openFile
openFile :: FilePath -> IOMode -> IO Handle
FilePath
is but another name for String
. Information technology is used in the types of I/O functions to aid clarify that the parameter is beingness used as a filename, and not as regular information.
IOMode
specifies how the file is to be managed. The possible values for IOMode
are listed in Table seven.2, "Possible IOMode Values".
Table seven.2. Possible IOMode Values
IOMode | Tin read? | Can write? | Starting position | Notes |
---|---|---|---|---|
ReadMode | Yes | No | Commencement of file | File must exist already |
WriteMode | No | Yep | Beginning of file | File is truncated (completely emptied) if it already existed |
ReadWriteMode | Yes | Yes | Beginning of file | File is created if it didn't exist; otherwise, existing data is left intact |
AppendMode | No | Aye | End of file | File is created if information technology didn't exist; otherwise, existing data is left intact. |
While we are by and large working with text examples in this affiliate, binary files tin can besides be used in Haskell. If you are working with a binary file, you should utilise openBinaryFile
instead of openFile
. Operating systems such as Windows process files differently if they are opened as binary instead of every bit text. On operating systems such equally Linux, both openFile
and openBinaryFile
perform the aforementioned operation. Nevertheless, for portability, it is notwithstanding wise to always use openBinaryFile
if y'all will be dealing with binary data.
Closing Handles
You've already seen that hClose
is used to close file handles. Let'due south take a moment and recollect about why this is important.
As you'll run into in the section called "Buffering", Haskell maintains internal buffers for files. This provides an important functioning boost. However, it ways that until y'all phone call hClose
on a file that is open for writing, your data may not be flushed out to the operating organization.
Another reason to make sure to hClose
files is that open files take upwards resources on the system. If your programme runs for a long fourth dimension, and opens many files simply fails to close them, it is conceivable that your program could even crash due to resources exhaustion. All of this is no unlike in Haskell than in other languages.
When a programme exits, Haskell will unremarkably take care of closing whatsoever files that remain open. However, in that location are some circumstances in which this may not happen[eighteen], so once over again, it is all-time to be responsible and telephone call hClose
all the time.
Haskell provides several tools for you to use to easily ensure this happens, regardless of whether errors are present. You can read nearly finally
in the department chosen "Extended Instance: Functional I/O and Temporary Files" and subclass
in the section called "The acquire-employ-release cycle".
Seek and Tell
When reading and writing from a Handle
that corresponds to a file on disk, the operating system maintains an internal record of the current position. Each fourth dimension you do some other read, the operating system returns the side by side chunk of information that begins at the electric current position, and increments the position to reflect the data that you read.
Y'all can use hTell
to find out your current position in the file. When the file is initially created, it is empty and your position will be 0. Later you write out five bytes, your position will exist 5, and and so on. hTell
takes a Handle
and returns an IO Integer
with your position.
The companion to hTell
is hSeek
. hSeek
lets you change the file position. It takes three parameters: a Handle
, a SeekMode
, and a position.
SeekMode
tin can be one of three different values, which specify how the given position is to be interpreted. AbsoluteSeek
ways that the position is a precise location in the file. This is the aforementioned kind of information that hTell
gives y'all. RelativeSeek
ways to seek from the current position. A positive number requests going forwards in the file, and a negative number means going backwards. Finally, SeekFromEnd
will seek to the specified number of bytes before the terminate of the file. hSeek handle SeekFromEnd 0
volition have you to the end of the file. For an example of hSeek
, refer to the section called "Extended Instance: Functional I/O and Temporary Files".
Not all Handle
southward are seekable. A Handle
usually corresponds to a file, but it can also stand for to other things such as network connections, tape drives, or terminals. Yous tin employ hIsSeekable
to see if a given Handle
is seekable.
Standard Input, Output, and Error
Before, we pointed out that for each non-"h" function, there is usually also a corresponding "h" function that works on whatever Handle
. In fact, the non-"h" functions are null more than shortcuts for their "h" counterparts.
There are iii pre-defined Handle
southward in System.IO
. These Handle
s are ever available for your apply. They are stdin
, which corresponds to standard input; stdout
for standard output; and stderr
for standard fault. Standard input normally refers to the keyboard, standard output to the monitor, and standard mistake also normally goes to the monitor.
Functions such equally getLine
tin thus be trivially defined like this:
getLine = hGetLine stdin putStrLn = hPutStrLn stdout impress = hPrint stdout
Before, nosotros told yous what the three standard file handles "normally" correspond to. That's because some operating systems let you redirect the file handles to come from (or get to) different places—files, devices, or even other programs. This feature is used extensively in beat out scripting on POSIX (Linux, BSD, Mac) operating systems, simply can also be used on Windows.
It often makes sense to use standard input and output instead of specific files. This lets you interact with a man at the terminal. But it also lets you piece of work with input and output files—or fifty-fifty combine your code with other programs—if that'due south what's requested.[19]
As an example, we tin can provide input to callingpure.hs
in advance like this:
$ repeat John|runghc callingpure.hs
Greetings one time over again. What is your name? Pleased to meet you, John. Your name contains 4 characters.
While callingpure.hs
was running, it did not wait for input at the keyboard; instead it received John
from the echo
program. Observe also that the output didn't incorporate the give-and-take John
on a carve up line as it did when this program was run at the keyboard. The terminal normally echoes everything you type back to you, but that is technically input, and is not included in the output stream.
Deleting and Renaming Files
Then far in this chapter, we've discussed the contents of the files. Permit'southward now talk a flake about the files themselves.
System.Directory
provides two functions y'all may detect useful. removeFile
takes a single argument, a filename, and deletes that file.[20] renameFile
takes ii filenames: the outset is the old name and the second is the new proper name. If the new filename is in a different directory, you can also think of this as a move. The old filename must be prior to the call to renameFile
. If the new file already exists, it is removed earlier the rename takes place.
Like many other functions that take a filename, if the "quondam" name doesn't exist, renameFile
will raise an exception. More than information on exception handling can be found in Chapter 19, Error handling.
There are many other functions in Organization.Directory
for doing things such as creating and removing directories, finding lists of files in directories, and testing for file existence. These are discussed in the section called "Directory and File Information".
Temporary Files
Programmers often need temporary files. These files may exist used to store big amounts of data needed for computations, data to be used by other programs, or any number of other uses.
While you could craft a way to manually open files with unique names, the details of doing this in a secure way differ from platform to platform. Haskell provides a convenient part called openTempFile
(and a corresponding openBinaryTempFile
) to handle the hard bits for you.
openTempFile
takes two parameters: the directory in which to create the file, and a "template" for naming the file. The directory could simply be "."
for the electric current working directory. Or you could use System.Directory.getTemporaryDirectory
to observe the best place for temporary files on a given machine. The template is used every bit the basis for the file name; it volition have some random characters added to it to ensure that the consequence is truly unique. It guarantees that it will be working on a unique filename, in fact.
The return type of openTempFile
is IO (FilePath, Handle)
. The first part of the tuple is the name of the file created, and the second is a Handle
opened in ReadWriteMode
over that file. When you're done with the file, you lot'll want to hClose
it and so telephone call removeFile
to delete it. See the following case for a sample function to use.
Extended Instance: Functional I/O and Temporary Files
Here'due south a larger example that puts together some concepts from this chapter, from some earlier chapters, and a few you oasis't seen even so. Accept a look at the program and see if yous can figure out what it does and how it works.
-- file: ch07/tempfile.hs import System.IO import System.Directory(getTemporaryDirectory, removeFile) import System.IO.Error(take hold of) import Command.Exception(finally) -- The main entry signal. Work with a temp file in myAction. main :: IO () main = withTempFile "mytemp.txt" myAction {- The guts of the program. Chosen with the path and handle of a temporary file. When this function exits, that file volition exist closed and deleted because myAction was called from withTempFile. -} myAction :: FilePath -> Handle -> IO () myAction tempname temph = do -- First by displaying a greeting on the final putStrLn "Welcome to tempfile.hs" putStrLn $ "I take a temporary file at " ++ tempname -- Allow'southward see what the initial position is pos <- hTell temph putStrLn $ "My initial position is " ++ bear witness pos -- Now, write some data to the temporary file let tempdata = bear witness [i..10] putStrLn $ "Writing one line containing " ++ evidence (length tempdata) ++ " bytes: " ++ tempdata hPutStrLn temph tempdata -- Get our new position. This doesn't actually modify pos -- in memory, but makes the proper noun "pos" correspond to a different -- value for the residue of the "practise" cake. pos <- hTell temph putStrLn $ "After writing, my new position is " ++ evidence pos -- Seek to the start of the file and display it putStrLn $ "The file content is: " hSeek temph AbsoluteSeek 0 -- hGetContents performs a lazy read of the entire file c <- hGetContents temph -- Copy the file byte-for-byte to stdout, followed by \north putStrLn c -- Permit's likewise display it every bit a Haskell literal putStrLn $ "Which could be expressed as this Haskell literal:" print c {- This role takes two parameters: a filename pattern and some other function. It will create a temporary file, and pass the proper noun and Handle of that file to the given function. The temporary file is created with openTempFile. The directory is the one indicated past getTemporaryDirectory, or, if the system has no notion of a temporary directory, "." is used. The given pattern is passed to openTempFile. Later on the given function terminates, fifty-fifty if it terminates due to an exception, the Handle is airtight and the file is deleted. -} withTempFile :: String -> (FilePath -> Handle -> IO a) -> IO a withTempFile pattern func = exercise -- The library ref says that getTemporaryDirectory may heighten on -- exception on systems that have no notion of a temporary directory. -- So, we run getTemporaryDirectory under catch. catch takes -- two functions: 1 to run, and a different one to run if the -- first raised an exception. If getTemporaryDirectory raised an -- exception, just use "." (the electric current working directory). tempdir <- catch (getTemporaryDirectory) (\_ -> return ".") (tempfile, temph) <- openTempFile tempdir pattern -- Call (func tempfile temph) to perform the action on the temporary -- file. finally takes 2 actions. The starting time is the action to run. -- The second is an action to run after the starting time, regardless of -- whether the first activeness raised an exception. This way, we ensure -- the temporary file is ever deleted. The return value from finally -- is the outset activity'south render value. finally (func tempfile temph) (do hClose temph removeFile tempfile)
Let's get-go looking at this plan from the end. The withTempFile
office demonstrates that Haskell doesn't forget its functional nature when I/O is introduced. This function takes a String
and another role. The office passed to withTempFile
is invoked with the name and Handle
of a temporary file. When that function exits, the temporary file is closed and deleted. So even when dealing with I/O, we can yet find the idiom of passing functions equally parameters to be convenient. Lisp programmers might notice our withTempFile
office similar to Lisp'due south with-open-file
office.
At that place is some exception handling going on to make the program more robust in the face of errors. You normally want the temporary files to be deleted after processing completes, fifty-fifty if something went incorrect. So we make sure that happens. For more on exception treatment, see Chapter xix, Mistake handling.
Let'south return to the start of the programme. main
is divers but every bit withTempFile "mytemp.txt" myAction
. myAction
, so, volition be invoked with the name and Handle
of the temporary file.
myAction
displays some information to the terminal, writes some data to the file, seeks to the showtime of the file, and reads the data back with hGetContents
.[21] Information technology then displays the contents of the file byte-for-byte, and also equally a Haskell literal via print c
. That's the same as putStrLn (prove c)
.
Permit's look at the output:
$ runhaskell tempfile.hs
Welcome to tempfile.hs I have a temporary file at /tmp/mytemp8572.txt My initial position is 0 Writing ane line containing 22 bytes: [i,ii,3,iv,v,vi,seven,eight,9,10] After writing, my new position is 23 The file content is: [ane,2,3,4,5,6,7,viii,9,ten] Which could exist expressed as this Haskell literal: "[1,2,three,4,5,6,seven,8,nine,10]\due north"
Every time you run this programme, your temporary file name should exist slightly unlike since it contains a randomly-generated component. Looking at this output, there are a few questions that might occur to you:
-
Why is your position 23 subsequently writing a line with 22 bytes?
-
Why is there an empty line afterward the file content brandish?
-
Why is there a
\n
at the cease of the Haskell literal display?
Yous might be able to approximate that the answers to all iii questions are related. See if yous can work out the answers for a moment. If you need some assistance, hither are the explanations:
-
That'due south because we used
hPutStrLn
instead ofhPutStr
to write the information.hPutStrLn
e'er terminates the line by writing a\n
at the finish, which didn't appear intempdata
. -
We used
putStrLn c
to display the file contentsc
. Because the data was written originally withhPutStrLn
,c
ends with the newline character, andputStrLn
adds a second newline character. The result is a blank line. -
The
\northward
is the newline character from the originalhPutStrLn
.
Equally a terminal note, the byte counts may be different on some operating systems. Windows, for example, uses the two-byte sequence \r\due north
as the end-of-line marker, then you may meet differences on that platform.
Lazy I/O
So far in this chapter, you've seen examples of fairly traditional I/O. Each line, or block of information, is requested individually and processed individually.
Haskell has some other approach available to yous as well. Since Haskell is a lazy linguistic communication, significant that any given piece of data is but evaluated when its value must be known, there are some novel ways of approaching I/O.
hGetContents
One novel way to approach I/O is the hGetContents
office.[22] hGetContents
has the type Handle -> IO Cord
. The Cord
it returns represents all of the data in the file given by the Handle
.[23]
In a strictly-evaluated language, using such a function is often a bad idea. It may exist fine to read the entire contents of a 2KB file, but if you try to read the entire contents of a 500GB file, you are likely to crash due to lack of RAM to shop all that data. In these languages, yous would traditionally apply mechanisms such equally loops to process the file's entire data.
But hGetContents
is dissimilar. The String
it returns is evaluated lazily. At the moment you lot call hGetContents
, nothing is actually read. Data is just read from the Handle
equally the elements (characters) of the list are processed. As elements of the Cord
are no longer used, Haskell'south garbage collector automatically frees that retention. All of this happens completely transparently to y'all. And since you have what looks like—and, really, is—a pure String
, you can pass it to pure (not-IO) code.
Allow'south take a quick look at an instance. Back in the section chosen "Working With Files and Handles", you saw an imperative plan that converted the entire content of a file to uppercase. Its imperative algorithm was like to what you'd see in many other languages. Here now is the much simpler algorithm that exploits lazy evaluation:
-- file: ch07/toupper-lazy1.hs import Organization.IO import Data.Char(toUpper) main :: IO () master = do inh <- openFile "input.txt" ReadMode outh <- openFile "output.txt" WriteMode inpStr <- hGetContents inh let issue = processData inpStr hPutStr outh result hClose inh hClose outh processData :: String -> String processData = map toUpper
Notice that hGetContents
handled all of the reading for us. Too, have a expect at processData
. It's a pure function since information technology has no side effects and always returns the same result each fourth dimension it is called. It has no need to know—and no way to tell—that its input is existence read lazily from a file in this case. It can work perfectly well with a twenty-graphic symbol literal or a 500GB data dump on disk.
You can even verify that with ghci :
ghci>
:load toupper-lazy1.hs
[one of 1] Compiling Main ( toupper-lazy1.hs, interpreted ) Ok, modules loaded: Chief.ghci>
processData "Hello, there! How are y'all?"
"Howdy, At that place! HOW ARE Yous?"ghci>
:type processData
processData :: String -> Stringghci>
:type processData "Hello!"
processData "Hello!" :: Cord
Alert | |
---|---|
If we had tried to hang on to |
This program was a bit verbose to make it clear that there was pure code in use. Here's a bit more than concise version, which we volition build on in the next examples:
-- file: ch07/toupper-lazy2.hs import System.IO import Data.Char(toUpper) main = practice inh <- openFile "input.txt" ReadMode outh <- openFile "output.txt" WriteMode inpStr <- hGetContents inh hPutStr outh (map toUpper inpStr) hClose inh hClose outh
Yous are not required to ever consume all the data from the input file when using hGetContents
. Whenever the Haskell organization determines that the entire cord hGetContents
returned can exist garbage nerveless —which means it volition never again be used—the file is closed for y'all automatically. The aforementioned principle applies to data read from the file. Whenever a given piece of data volition never once more exist needed, the Haskell environment releases the retentivity it was stored within. Strictly speaking, we wouldn't have to telephone call hClose
at all in this example program. However, it is still a good do to get into, as later changes to a plan could make the call to hClose
important.
Alarm | |
---|---|
When using |
readFile and writeFile
Haskell programmers use hGetContents
as a filter quite often. They read from one file, do something to the data, and write the result out elsewhere. This is so common that there are some shortcuts for doing it. readFile
and writeFile
are shortcuts for working with files as strings. They handle all the details of opening files, closing files, reading information, and writing data. readFile
uses hGetContents
internally.
Can you lot guess the Haskell types of these functions? Let's cheque with ghci :
ghci>
:type readFile
readFile :: FilePath -> IO Stringghci>
:type writeFile
writeFile :: FilePath -> String -> IO ()
Now, hither'south an example plan that uses readFile
and writeFile
:
-- file: ch07/toupper-lazy3.hs import Data.Char(toUpper) main = do inpStr <- readFile "input.txt" writeFile "output.txt" (map toUpper inpStr)
Look at that—the guts of the program have up only two lines! readFile
returned a lazy Cord
, which we stored in inpStr
. Nosotros then took that, processed it, and passed it to writeFile
for writing.
Neither readFile
nor writeFile
ever provide a Handle
for you to work with, so there is nothing to ever hClose
. readFile
uses hGetContents
internally, and the underlying Handle
volition be closed when the returned String
is garbage-collected or all the input has been consumed. writeFile
volition shut its underlying Handle
when the entire Cord
supplied to it has been written.
A Word On Lazy Output
Past now, yous should understand how lazy input works in Haskell. But what about laziness during output?
Equally y'all know, zip in Haskell is evaluated earlier its value is needed. Since functions such as writeFile
and putStr
write out the entire Cord
passed to them, that entire String
must be evaluated. Then you are guaranteed that the statement to putStr
will be evaluated in full.[24]
But what does that mean for laziness of the input? In the examples above, volition the call to putStr
or writeFile
force the entire input cord to exist loaded into memory at once, just to exist written out?
The answer is no. putStr
(and all the like output functions) write out data as it becomes bachelor. They also have no need for keeping around data already written, so equally long equally zero else in the program needs information technology, the memory can be freed immediately. In a sense, you can retrieve of the String
betwixt readFile
and writeFile
as a piping linking the two. Data goes in one end, is transformed some way, and flows back out the other.
You lot can verify this yourself by generating a big input.txt
for toupper-lazy3.hs
. Information technology may take a bit to process, but you lot should see a constant—and low—retention usage while it is beingness processed.
collaborate
You learned that readFile
and writeFile
address the common situation of reading from one file, making a conversion, and writing to a unlike file. There'due south a situation that's fifty-fifty more mutual than that: reading from standard input, making a conversion, and writing the result to standard output. For that situation, there is a function called collaborate
. The type of interact
is (String -> Cord) -> IO ()
. That is, information technology takes one argument: a function of type Cord -> Cord
. That function is passed the result of getContents
—that is, standard input read lazily. The result of that role is sent to standard output.
We tin can convert our example programme to operate on standard input and standard output past using collaborate
. Here'due south ane style to practice that:
-- file: ch07/toupper-lazy4.hs import Data.Char(toUpper) main = interact (map toUpper)
Look at that— i line of code to achieve our transformation! To achieve the aforementioned effect equally with the previous examples, y'all could run this one similar this:
$ runghc toupper-lazy4.hs < input.txt > output.txt
Or, if you'd like to run into the output printed to the screen, you could type:
$ runghc toupper-lazy4.hs < input.txt
If you desire to meet that Haskell output truly does write out chunks of data as before long as they are received, run runghc toupper-lazy4.hs
without whatever other command-line parameters. You should see each character echoed back out as soon as you blazon it, simply in uppercase. Buffering may modify this behavior; come across the section called "Buffering" later in this chapter for more on buffering. If you run into each line echoed as soon as you type it, or even nil at all for awhile, buffering is causing this behavior.
Y'all can also write simple interactive programs using interact
. Permit's showtime with a simple example: adding a line of text before the majuscule output.
-- file: ch07/toupper-lazy5.hs import Data.Char(toUpper) chief = interact (map toUpper . (++) "Your data, in uppercase, is:\n\n")
Here nosotros add a cord at the beginning of the output. Can you spot the problem, though?
Since we're calling map
on the result of (++)
, that header itself volition appear in capital letter. Nosotros can set up that in this fashion:
-- file: ch07/toupper-lazy6.hs import Data.Char(toUpper) main = interact ((++) "Your data, in uppercase, is:\n\n" . map toUpper)
This moved the header outside of the map
.
Filters with collaborate
Another mutual utilize of interact
is filtering. Let'southward say that you lot want to write a program that reads a file and prints out every line that contains the character "a". Hither's how you might do that with collaborate
:
-- file: ch07/filter.hs main = interact (unlines . filter (elem 'a') . lines)
This may have introduced 3 functions that you aren't familiar with yet. Let'south inspect their types with ghci :
ghci>
:type lines
lines :: String -> [String]ghci>
:type unlines
unlines :: [Cord] -> Cordghci>
:type elem
elem :: (Eq a) => a -> [a] -> Bool
Tin can you guess what these functions do but by looking at their types? If not, you tin can find them explained in the section chosen "Warming upwardly: portably splitting lines of text" and the section chosen "Special cord-handling functions". You'll frequently run into lines
and unlines
used with I/O. Finally, elem
takes a element and a listing and returns True
if that element occurs anywhere in the listing.
Attempt running this over our standard example input:
$ runghc filter.hs < input.txt
I like Haskell Haskell is peachy
Sure enough, you lot got back the two lines that contain an "a". Lazy filters are a powerful style to use Haskell. When you think virtually information technology, a filter—such every bit the standard Unix program grep —sounds a lot similar a function. It takes some input, applies some computation, and generates a predictable output.
The IO Monad
You've seen a number of examples of I/O in Haskell by this point. Let's take a moment to stride dorsum and call up near how I/O relates to the broader Haskell language.
Since Haskell is a pure language, if y'all give a certain part a specific statement, the function will return the same upshot every fourth dimension you lot give information technology that argument. Moreover, the function will not change anything almost the program's overall state.
You may exist wondering, then, how I/O fits into this picture show. Surely if you desire to read a line of input from the keyboard, the role to read input tin can't possibly return the same result every time it is run, correct? Moreover, I/O is all about changing state. I/O could cause pixels on a terminal to light upward, to cause paper to commencement coming out of a printer, or even to cause a package to exist shipped from a warehouse on a different continent. I/O doesn't simply change the state of a program. You can remember of I/O as irresolute the state of the world.
Actions
Near languages exercise not make a stardom between a pure function and an impure ane. Haskell has functions in the mathematical sense: they are purely computations which cannot be altered past anything external. Moreover, the computation can be performed at any time—or even never, if its consequence is never needed.
Clearly, so, nosotros need some other tool to work with I/O. That tool in Haskell is chosen actions . Actions resemble functions. They practise zero when they are defined, only perform some chore when they are invoked. I/O actions are defined within the IO monad. Monads are a powerful way of chaining functions together purely and are covered in Chapter xiv, Monads. Information technology's not necessary to understand monads in order to understand I/O. Simply understand that the outcome type of actions is "tagged" with IO. Let's take a look at some types:
ghci>
:type putStrLn
putStrLn :: String -> IO ()ghci>
:type getLine
getLine :: IO String
The type of putStrLn
is just like whatever other function. The function takes one parameter and returns an IO ()
. This IO ()
is the action. You can store and laissez passer actions in pure code if you wish, though this isn't frequently washed. An action doesn't exercise anything until it is invoked. Permit's look at an case of this:
-- file: ch07/actions.hs str2action :: String -> IO () str2action input = putStrLn ("Data: " ++ input) list2actions :: [String] -> [IO ()] list2actions = map str2action numbers :: [Int] numbers = [one..x] strings :: [String] strings = map show numbers actions :: [IO ()] deportment = list2actions strings printitall :: IO () printitall = runall actions -- Accept a listing of actions, and execute each of them in plow. runall :: [IO ()] -> IO () runall [] = return () runall (firstelem:remainingelems) = do firstelem runall remainingelems chief = exercise str2action "Start of the program" printitall str2action "Done!"
str2action
is a function that takes ane parameter and returns an IO ()
. Equally y'all tin see at the finish of main
, you could utilize this directly in another action and it volition print out a line correct away. Or, you can store—but not execute—the action from pure code. Y'all can see an example of that in list2actions
—we utilise map
over str2action
and render a list of actions, just similar we would with other pure information. You can run across that everything up through printitall
is built upward with pure tools.
Although we define printitall
, it doesn't become executed until its action is evaluated somewhere else. Find in primary
how we use str2action
as an I/O action to be executed, merely earlier nosotros used it exterior of the I/O monad and assembled results into a list.
You could think of it this style: every statement, except let
, in a exercise
block must yield an I/O activity which volition exist executed.
The call to printitall
finally executes all those actions. Actually, since Haskell is lazy, the actions aren't generated until hither either.
When you run the program, your output will await like this:
Data: Start of the program Data: one Information: 2 Information: 3 Data: iv Data: 5 Data: 6 Data: 7 Information: viii Data: nine Data: 10 Information: Done!
We can really write this in a much more meaty fashion. Consider this revision of the case:
-- file: ch07/actions2.hs str2message :: Cord -> String str2message input = "Data: " ++ input str2action :: Cord -> IO () str2action = putStrLn . str2message numbers :: [Int] numbers = [one..10] main = practise str2action "Start of the program" mapM_ (str2action . testify) numbers str2action "Done!"
Observe in str2action
the use of the standard function composition operator. In main
, there'south a call to mapM_
. This part is similar to map
. Information technology takes a function and a list. The office supplied to mapM_
is an I/O action that is executed for every particular in the list. mapM_
throws out the consequence of the function, though y'all can use mapM
to return a list of I/O results if you desire them. Take a look at their types:
ghci>
:type mapM
mapM :: (Monad grand) => (a -> one thousand b) -> [a] -> k [b]ghci>
:type mapM_
mapM_ :: (Monad grand) => (a -> thousand b) -> [a] -> 1000 ()
Tip | |
---|---|
These functions actually piece of work for more than just I/O; they piece of work for any |
Why a mapM
when we already take map
? Because map
is a pure function that returns a listing. Information technology doesn't—and can't—really execute actions directly. mapM
is a utility that lives in the IO monad and thus tin can actually execute the actions.[25]
Going back to main
, mapM_
applies (str2action . show)
to every chemical element in numbers
. show
converts each number to a Cord
and str2action
converts each Cord
to an activity. mapM_
combines these individual deportment into one big activity that prints out lines.
Sequencing
do
blocks are actually shortcut notations for joining together actions. There are two operators that you can use instead of do
blocks: >>
and >>=
. Allow's look at their types in ghci :
ghci>
:type (>>)
(>>) :: (Monad grand) => 1000 a -> m b -> thou bghci>
:type (>>=)
(>>=) :: (Monad k) => m a -> (a -> m b) -> m b
The >>
operator sequences 2 actions together: the showtime action is performed, and then the second. The result of the computation is the consequence of the second activity. The issue of the first action is thrown away. This is like to simply having a line in a do
block. You might write putStrLn "line ane"
to test this out. Information technology will print out two lines, discard the result from the commencement >>
putStrLn "line two"putStrLn
, and provide the event from the 2nd.
The >>=
operator runs an action, so passes its result to a function that returns an action. That second activeness is run as well, and the result of the entire expression is the upshot of that second action. As an example, you could write getLine
, which would read a line from the keyboard so brandish it back out. >>=
putStrLn
Let's re-write one of our examples to avoid do
blocks. Remember this example from the beginning of the chapter?
-- file: ch07/basicio.hs main = practise putStrLn "Greetings! What is your name?" inpStr <- getLine putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!"
Let's write that without a exercise
cake:
-- file: ch07/basicio-nodo.hs primary = putStrLn "Greetings! What is your name?" >> getLine >>= (\inpStr -> putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!")
The Haskell compiler internally performans a translation only like this when y'all define a do
block.
The True Nature of Return
Earlier in this chapter, we mentioned that return
is probably not what information technology looks similar. Many languages have a keyword named return
that aborts execution of a function immediately and returns a value to the caller.
The Haskell render
function is quite different. In Haskell, return
is used to wrap information in a monad. When speaking about I/O, render
is used to take pure data and bring it into the IO monad.
Now, why would we want to do that? Remember that anything whose result depends on I/O must be within the IO monad. So if nosotros are writing a function that performs I/O, then a pure computation, nosotros will demand to use return
to make this pure computation the proper return value of the role. Otherwise, a type error would occur. Hither's an example:
-- file: ch07/return1.hs import Data.Char(toUpper) isGreen :: IO Bool isGreen = do putStrLn "Is green your favorite color?" inpStr <- getLine return ((toUpper . head $ inpStr) == 'Y')
We have a pure computation that yields a Bool
. That computation is passed to return
, which puts it into the IO monad. Since it is the last value in the practice
block, information technology becomes the return value of isGreen
, only this is not because nosotros used the render
role.
Here's a version of the aforementioned programme with the pure computation cleaved out into a separate function. This helps keep the pure code carve up, and can also brand the intent more than clear.
-- file: ch07/return2.hs import Data.Char(toUpper) isYes :: String -> Bool isYes inpStr = (toUpper . caput $ inpStr) == 'Y' isGreen :: IO Bool isGreen = do putStrLn "Is green your favorite color?" inpStr <- getLine return (isYes inpStr)
Finally, here's a contrived example to bear witness that return
truly does not have to occur at the end of a do
cake. In practise, it usually is, but it need not be and so.
-- file: ch07/return3.hs returnTest :: IO () returnTest = do ane <- render i allow 2 = two putStrLn $ show (1 + two)
Detect that we used <-
in combination with return
, only let
in combination with the simple literal. That's considering we needed both values to be pure in lodge to add them, and <-
pulls things out of monads, effectively reversing the event of return
. Run this in ghci and you'll run across 3
displayed, as expected.
Is Haskell Actually Imperative?
These do
blocks may wait a lot similar an imperative linguistic communication. Later on all, you're giving commands to run in sequence most of the time.
Just Haskell remains a lazy language at its core. While it is necessary to sequence actions for I/O at times, this is done using tools that are office of Haskell already. Haskell achieves a nice separation of I/O from the balance of the language through the IO monad as well.
Side Furnishings with Lazy I/O
Before in this affiliate, you read about hGetContents
. We explained that the String
it returns can exist used in pure code.
Nosotros need to get a scrap more than specific about what side effects are. When we say Haskell has no side-effects, what exactly does that hateful?
At a certain level, side-effects are always possible. A poorly-written loop, even if written in pure lawmaking, could crusade the system's RAM to be exhausted and the motorcar to crash. Or it could cause data to be swapped to disk.
When we speak of no side effects, nosotros mean that pure code in Haskell can't run commands that trigger side effects. Pure functions tin can't modify a global variable, asking I/O, or run a command to take downward a organization.
When yous have a String
from hGetContents
that is passed to a pure part, the part has no idea that this String
is backed by a disk file. Information technology will carry just as it always would, but processing that String
may cause the environment to issue I/O commands. The pure function isn't issuing them; they are happening as a upshot of the processing the pure role is doing, just as with the instance of swapping RAM to disk.
In some cases, you may need more command over exactly when your I/O occurs. Possibly you are reading data interactively from the user, or via a pipage from some other program, and need to communicate directly with the user. In those cases, hGetContents
will probably not be appropriate.
Buffering
The I/O subsystem is one of the slowest parts of a modernistic computer. Completing a write to deejay can take thousands of times as long as a write to retentivity. A write over the network tin can be hundreds or thousands of times slower yet. Even if your operation doesn't directly communicate with the disk—perhaps because the data is cached—I/O still involves a organization telephone call, which slows things down past itself.
For this reason, modernistic operating systems and programming languages both provide tools to aid programs perform better where I/O is concerned. The operating system typically performs caching—storing oftentimes-used pieces of data in memory for faster access.
Programming languages typically perform buffering. This ways that they may request one large chunk of data from the operating arrangement, even if the code underneath is processing data ane grapheme at a time. By doing this, they can reach remarkable performance gains because each request for I/O to the operating system carries a processing price. Buffering allows u.s. to read the same amount of information with far fewer I/O requests.
Haskell, too, provides buffering in its I/O arrangement. In many cases, it is even on by default. Upwardly till at present, we have pretended it isn't there. Haskell ordinarily is adept nigh picking a good default buffering mode. Only this default is rarely the fastest. If y'all have speed-critical I/O code, changing buffering could make a significant bear upon on your programme.
Buffering Modes
There are three different buffering modes in Haskell. They are divers every bit the BufferMode
type: NoBuffering
, LineBuffering
, and BlockBuffering
.
NoBuffering
does just what it sounds like—no buffering. Data read via functions like hGetLine
volition exist read from the OS one character at a time. Data written volition be written immediately, and also often will be written one character at a time. For this reason, NoBuffering
is usually a very poor performer and not suitable for general-purpose apply.
LineBuffering
causes the output buffer to be written whenever the newline character is output, or whenever it gets too large. On input, it will usually attempt to read whatever information is bachelor in chunks until it start sees the newline character. When reading from the terminal, information technology should render information immediately later each press of Enter. It is often a reasonable default.
BlockBuffering
causes Haskell to read or write information in fixed-size chunks when possible. This is the best performer when processing big amounts of data in batch, even if that information is line-oriented. However, it is unusable for interactive programs because information technology volition block input until a full cake is read. BlockBuffering
accepts one parameter of blazon Maybe
: if Zip
, it will use an implementation-defined buffer size. Or, you can use a setting such as Just 4096
to ready the buffer to 4096 bytes.
The default buffering mode is dependent upon the operating system and Haskell implementation. You lot can ask the arrangement for the current buffering mode by calling hGetBuffering
. The current manner tin can be set up with hSetBuffering
, which accepts a Handle
and BufferMode
. As an case, you lot can say hSetBuffering stdin (BlockBuffering Nothing)
.
Flushing The Buffer
For any blazon of buffering, you lot may sometimes want to force Haskell to write out any data that has been saved up in the buffer. At that place are a few times when this volition happen automatically: a call to hClose
, for instance. Sometimes you may want to instead telephone call hFlush
, which will strength any pending information to be written immediately. This could be useful when the Handle
is a network socket and yous desire the data to be transmitted immediately, or when y'all want to brand the data on deejay bachelor to other programs that might be reading it concurrently.
Reading Command-Line Arguments
Many command-line programs are interested in the parameters passed on the command line. System.Surround.getArgs
returns IO [String]
listing each statement. This is the same as argv
in C, starting with argv[1]
. The programme proper noun (argv[0]
in C) is bachelor from System.Surround.getProgName
.
The Arrangement.Console.GetOpt
module provides some tools for parsing command-line options. If you have a plan with complex options, yous may find information technology useful. You can discover an example of its use in the section called "Control line parsing".
Environment Variables
If you need to read environs variables, you tin apply one of two functions in System.Environment
: getEnv
or getEnvironment
. getEnv
looks for a specific variable and raises an exception if information technology doesn't exist. getEnvironment
returns the whole environment as a [(Cord, String)]
, and then yous can use functions such as lookup
to discover the environment entry you want.
Setting surround variables is not defined in a cantankerous-platform style in Haskell. If yous are on a POSIX platform such every bit Linux, you can use putEnv
or setEnv
from the System.Posix.Env
module. Environment setting is not defined for Windows.
Source: http://book.realworldhaskell.org/read/io.html
0 Response to "How to Read a File Line by Line Haskell"
Post a Comment