How to Read a File Line by Line Haskell

Chapter vii. I/O

It should be obvious that almost, if not all, programs are devoted to gathering data from outside, processing information technology, and providing results back to the outside globe. That is, input and output are key.

Haskell's I/O organisation is powerful and expressive. Information technology is easy to work with and of import to understand. Haskell strictly separates pure code from code that could cause things to occur in the world. That is, it provides a complete isolation from side-furnishings in pure code. Besides helping programmers to reason almost the correctness of their code, information technology also permits compilers to automatically introduce optimizations and parallelism.

We'll begin this affiliate with unproblematic, standard-looking I/O in Haskell. And so nosotros'll talk over some of the more powerful options too every bit provide more item on how I/O fits into the pure, lazy, functional Haskell world.

Classic I/O in Haskell

Let's get started with I/O in Haskell by looking at a plan that looks surprisingly similar to I/O in other languages such equally C or Perl.

-- file: ch07/basicio.hs main = exercise        putStrLn "Greetings!  What is your name?"        inpStr <- getLine        putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!"

You can compile this program to a standalone executable, run it with runghc , or invoke main from within ghci . Hither'due south a sample session using runghc :

$                          runghc basicio.hs                        Greetings!  What is your name?                          John                        Welcome to Haskell, John!          

That's a fairly elementary, obvious result. You can encounter that putStrLn writes out a String, followed by an stop-of-line character. getLine reads a line from standard input. The <- syntax may be new to y'all. Put simply, that binds the result from executing an I/O activity to a name. [15] Nosotros use the simple list concatenation operator ++ to join the input string with our ain text.

Let'due south accept a await at the types of putStrLn and getLine. Y'all tin can detect that information in the library reference, or just ask ghci :

            ghci>                                      :type putStrLn                        putStrLn :: String -> IO ()            ghci>                                      :blazon getLine                        getLine :: IO String          

Notice that both of these types have IO in their return value. That is your cardinal to knowing that they may accept side effects, or that they may render different values fifty-fifty when called with the same arguments, or both. The type of putStrLn looks similar a function. It takes a parameter of type Cord and returns value of type IO (). Just what is an IO () though?

Anything that is type IO something is an I/O activity . You lot can store it and nothing volition happen. I could say writefoo = putStrLn "foo" and zero happens right so. Only if I later apply writefoo in the middle of some other I/O action, the writefoo activity will be executed when its parent activity is executed -- I/O deportment tin be glued together to grade bigger I/O actions. The () is an empty tuple (pronounced "unit"), indicating that there is no return value from putStrLn. This is similar to void in Coffee or C.[16]

[Tip] Tip

Actions tin can be created, assigned, and passed anywhere. However, they may only be performed (executed) from inside another I/O action.

Let's look at this with ghci :

            ghci>                                      allow writefoo = putStrLn "foo"                        ghci>                                      writefoo                        foo          

In this example, the output foo is non a return value from putStrLn. Rather, it's the side effect of putStrLn really writing foo to the terminal.

Discover 1 other thing: ghci actually executed writefoo. This ways that, when given an I/O activity, ghci volition perform it for you lot on the spot.

[Note] What Is An I/O Activity?

Actions:

  • Take the type IO t

  • Are showtime-class values in Haskell and fit seamlessly with Haskell's type organisation

  • Produce an effect when performed , but not when evaluated . That is, they only produce an effect when chosen by something else in an I/O context.

  • Whatever expression may produce an activity as its value, but the activity will not perform I/O until it is executed inside another I/O action (or it is main)

  • Performing (executing) an action of type IO t may perform I/O and will ultimately deliver a event of type t

The type of getLine may look strange to you. It looks like a value, rather than a function. And in fact, that is one way to look at information technology: getLine is storing an I/O activity. When that activeness is performed, you get a Cord. The <- operator is used to "pull out" the consequence from performing an I/O action and shop it in a variable.

main itself is an I/O activeness with type IO (). You can only perform I/O actions from within other I/O actions. All I/O in Haskell programs is driven from the top at principal, which is where execution of every Haskell program begins. This, so, is the machinery that provides isolation from side furnishings in Haskell: you perform I/O in your IO actions, and phone call pure (non-I/O) functions from there. Most Haskell code is pure; the I/O actions perform I/O and call that pure code.

do is a convenient fashion to define a sequence of deportment. As you'll see later, there are other ways. When you use exercise in this way, indentation is significant; make certain you line upwards your actions properly.

You lot only need to utilize practice if y'all have more than i action that you need to perform. The value of a do block is the value of the last action executed. For a complete description of do syntax, run across the section called "Desugaring of practise blocks".

Allow's consider an instance of calling pure lawmaking from within an I/O action:

-- file: ch07/callingpure.hs name2reply :: String -> Cord name2reply proper noun =     "Pleased to meet you, " ++ proper noun ++ ".\n" ++     "Your proper noun contains " ++ charcount ++ " characters."     where charcount = show (length proper noun)  main :: IO () primary = do        putStrLn "Greetings in one case again.  What is your name?"        inpStr <- getLine        let outStr = name2reply inpStr        putStrLn outStr

Notice the name2reply role in this example. It is a regular Haskell function and obeys all the rules we've told you about: it always returns the same result when given the same input, it has no side effects, and information technology operates lazily. It uses other Haskell functions: (++), testify, and length.

Downwardly in master, we bind the issue of name2reply inpStr to outStr. When yous're working in a do block, you use <- to get results from IO deportment and let to get results from pure code. When used in a do block, y'all should not put in after your let argument.

Y'all can see here how we read the person'due south name from the keyboard. Then, that data got passed to a pure part, and its result was printed. In fact, the last 2 lines of main could have been replaced with putStrLn (name2reply inpStr). So, while master did have side effects—it caused things to announced on the terminal, for case—name2reply did not and could not. That'due south because name2reply is a pure function, non an activity.

Let's examine this with ghci :

            ghci>                                      :load callingpure.hs                        [1 of one] Compiling Main             ( callingpure.hs, interpreted ) Ok, modules loaded: Main.            ghci>                                      name2reply "John"                        "Pleased to meet y'all, John.\nYour name contains four characters."            ghci>                                      putStrLn (name2reply "John")                        Pleased to encounter y'all, John. Your name contains 4 characters.          

The \n within the string is the cease-of-line (newline) graphic symbol, which causes the terminal to begin a new line in its output. Just calling name2reply "John" in ghci will show yous the \n literally, considering it is using show to brandish the render value. But using putStrLn sends it to the terminal, and the final interprets \n to commencement a new line.

What do you remember will happen if y'all just type main at the ghci prompt? Give it a endeavor.

After looking at these example programs, y'all may exist wondering if Haskell is really imperative rather than pure, lazy, and functional. Some of these examples expect like a sequence of actions to be followed in order. There'due south more to it than that, though. We'll talk over that question afterwards in this affiliate in the section called "Is Haskell Really Imperative?" and the section called "Lazy I/O".

Pure vs. I/O

Equally a way to help with agreement the differences betwixt pure code and I/O, here'southward a comparison table. When we speak of pure lawmaking, nosotros are talking about Haskell functions that always render the aforementioned result when given the aforementioned input and take no side effects. In Haskell, only the execution of I/O actions avoid these rules.

Table 7.ane. Pure vs. Impure

Pure Impure
Ever produces the aforementioned result when given the same parameters May produce different results for the same parameters
Never has side effects May have side effects
Never alters state May modify the global country of the program, system, or world

Why Purity Matters

In this section, nosotros've discussed how Haskell draws a clear distinction betwixt pure code and I/O deportment. Well-nigh languages don't draw this distinction. In languages such as C or Coffee, there is no such thing as a function that is guaranteed by the compiler to ever return the aforementioned result for the same arguments, or a function that is guaranteed to never have side effects. The only manner to know if a given function has side furnishings is to read its documentation and hope that information technology's authentic.

Many bugs in programs are caused past unanticipated side furnishings. Still more are caused by misunderstanding circumstances in which functions may return dissimilar results for the same input. Every bit multithreading and other forms of parallelism grow increasingly common, it becomes more than hard to manage global side effects.

Haskell's method of isolating side effects into I/O deportment provides a articulate boundary. You can e'er know which parts of the system may alter land and which won't. You tin can always exist sure that the pure parts of your plan aren't having unanticipated results. This helps you to think well-nigh the program. Information technology also helps the compiler to think about it. Recent versions of ghc , for instance, can provide a level of automatic parallelism for the pure parts of your code -- something of a holy grail for computing.

For more discussion on this topic, refer to the department called "Side Effects with Lazy I/O".

Working With Files and Handles

And then far, you've seen how to collaborate with the user at the figurer'due south terminal. Of class, y'all'll oftentimes need to manipulate specific files. That'south easy to exercise, besides.

Haskell defines quite a few basic functions for I/O, many of which are similar to functions seen in other programming languages. The library reference for Organisation.IO provides a good summary of all the basic I/O functions, should you need one that we aren't touching upon hither.

You volition generally begin by using openFile, which will give you lot a file Handle. That Handle is then used to perform specific operations on the file. Haskell provides functions such as hPutStrLn that work just similar putStrLn but accept an additional argument—a Handle—that specifies which file to operate upon. When you're washed, you'll use hClose to shut the Handle. These functions are all divers in System.IO, so you'll need to import that module when working with files. At that place are "h" functions corresponding to nigh all of the non-"h" functions; for instance, in that location is print for printing to the screen and hPrint for printing to a file.

Let's start with an imperative way to read and write files. This should seem similar to a while loop that you may detect in other languages. This isn't the best way to write it in Haskell; later, you'll see examples of more Haskellish approaches.

-- file: ch07/toupper-imp.hs import Organisation.IO import Data.Char(toUpper)  main :: IO () main = practice         inh <- openFile "input.txt" ReadMode        outh <- openFile "output.txt" WriteMode        mainloop inh outh        hClose inh        hClose outh  mainloop :: Handle -> Handle -> IO () mainloop inh outh =      practise ineof <- hIsEOF inh        if ineof            and then return ()            else do inpStr <- hGetLine inh                    hPutStrLn outh (map toUpper inpStr)                    mainloop inh outh

Like every Haskell program, execution of this programme begins with main. Two files are opened: input.txt is opened for reading, and output.txt is opened for writing. Then we call mainloop to procedure the file.

mainloop begins by checking to run across if nosotros're at the stop of file (EOF) for the input. If non, we read a line from the input. We write out the aforementioned line to the output, later get-go converting it to uppercase. And so we recursively call mainloop over again to continue processing the file.[17]

Notice that return phone call. This is not really the same equally return in C or Python. In those languages, return is used to terminate execution of the current function immediately, and to render a value to the caller. In Haskell, render is the contrary of <-. That is, return takes a pure value and wraps information technology inside IO. Since every I/O action must render some IO blazon, if your result came from pure ciphering, you must use render to wrap information technology in IO. Every bit an example, if vii is an Int, and so return 7 would create an activity stored in a value of type IO Int. When executed, that action would produce the result 7. For more details on return, see the section called "The True Nature of Return".

Let'southward try running the program. We've got a file named input.txt that looks similar this:

This is ch08/input.txt  Examination Input I like Haskell Haskell is swell I/O is fun  123456789          

Now, you can use runghc toupper-imp.hs and you'll detect output.txt in your directory. It should look similar this:

THIS IS CH08/INPUT.TXT  Exam INPUT I Like HASKELL HASKELL IS GREAT I/O IS FUN  123456789          

More on openFile

Permit's use ghci to cheque on the blazon of openFile:

              ghci>                                            :module System.IO                            ghci>                                            :type openFile                            openFile :: FilePath -> IOMode -> IO Handle            

FilePath is but another name for String. Information technology is used in the types of I/O functions to aid clarify that the parameter is beingness used as a filename, and not as regular information.

IOMode specifies how the file is to be managed. The possible values for IOMode are listed in Table seven.2, "Possible IOMode Values".

Table seven.2. Possible IOMode Values

IOMode Tin read? Can write? Starting position Notes
ReadMode Yes No Commencement of file File must exist already
WriteMode No Yep Beginning of file File is truncated (completely emptied) if it already existed
ReadWriteMode Yes Yes Beginning of file File is created if it didn't exist; otherwise, existing data is left intact
AppendMode No Aye End of file File is created if information technology didn't exist; otherwise, existing data is left intact.

While we are by and large working with text examples in this affiliate, binary files tin can besides be used in Haskell. If you are working with a binary file, you should utilise openBinaryFile instead of openFile. Operating systems such as Windows process files differently if they are opened as binary instead of every bit text. On operating systems such equally Linux, both openFile and openBinaryFile perform the aforementioned operation. Nevertheless, for portability, it is notwithstanding wise to always use openBinaryFile if y'all will be dealing with binary data.

Closing Handles

You've already seen that hClose is used to close file handles. Let'due south take a moment and recollect about why this is important.

As you'll run into in the section called "Buffering", Haskell maintains internal buffers for files. This provides an important functioning boost. However, it ways that until y'all phone call hClose on a file that is open for writing, your data may not be flushed out to the operating organization.

Another reason to make sure to hClose files is that open files take upwards resources on the system. If your programme runs for a long fourth dimension, and opens many files simply fails to close them, it is conceivable that your program could even crash due to resources exhaustion. All of this is no unlike in Haskell than in other languages.

When a programme exits, Haskell will unremarkably take care of closing whatsoever files that remain open. However, in that location are some circumstances in which this may not happen[eighteen], so once over again, it is all-time to be responsible and telephone call hClose all the time.

Haskell provides several tools for you to use to easily ensure this happens, regardless of whether errors are present. You can read nearly finally in the department chosen "Extended Instance: Functional I/O and Temporary Files" and subclass in the section called "The acquire-employ-release cycle".

Seek and Tell

When reading and writing from a Handle that corresponds to a file on disk, the operating system maintains an internal record of the current position. Each fourth dimension you do some other read, the operating system returns the side by side chunk of information that begins at the electric current position, and increments the position to reflect the data that you read.

Y'all can use hTell to find out your current position in the file. When the file is initially created, it is empty and your position will be 0. Later you write out five bytes, your position will exist 5, and and so on. hTell takes a Handle and returns an IO Integer with your position.

The companion to hTell is hSeek. hSeek lets you change the file position. It takes three parameters: a Handle, a SeekMode, and a position.

SeekMode tin can be one of three different values, which specify how the given position is to be interpreted. AbsoluteSeek ways that the position is a precise location in the file. This is the aforementioned kind of information that hTell gives y'all. RelativeSeek ways to seek from the current position. A positive number requests going forwards in the file, and a negative number means going backwards. Finally, SeekFromEnd will seek to the specified number of bytes before the terminate of the file. hSeek handle SeekFromEnd 0 volition have you to the end of the file. For an example of hSeek, refer to the section called "Extended Instance: Functional I/O and Temporary Files".

Not all Handlesouthward are seekable. A Handle usually corresponds to a file, but it can also stand for to other things such as network connections, tape drives, or terminals. Yous tin employ hIsSeekable to see if a given Handle is seekable.

Standard Input, Output, and Error

Before, we pointed out that for each non-"h" function, there is usually also a corresponding "h" function that works on whatever Handle. In fact, the non-"h" functions are null more than shortcuts for their "h" counterparts.

There are iii pre-defined Handlesouthward in System.IO. These Handles are ever available for your apply. They are stdin, which corresponds to standard input; stdout for standard output; and stderr for standard fault. Standard input normally refers to the keyboard, standard output to the monitor, and standard mistake also normally goes to the monitor.

Functions such equally getLine tin thus be trivially defined like this:

getLine = hGetLine stdin putStrLn = hPutStrLn stdout impress = hPrint stdout            

Before, nosotros told yous what the three standard file handles "normally" correspond to. That's because some operating systems let you redirect the file handles to come from (or get to) different places—files, devices, or even other programs. This feature is used extensively in beat out scripting on POSIX (Linux, BSD, Mac) operating systems, simply can also be used on Windows.

It often makes sense to use standard input and output instead of specific files. This lets you interact with a man at the terminal. But it also lets you piece of work with input and output files—or fifty-fifty combine your code with other programs—if that'due south what's requested.[19]

As an example, we tin can provide input to callingpure.hs in advance like this:

$                              repeat John|runghc callingpure.hs                            Greetings one time over again.  What is your name? Pleased to meet you, John. Your name contains 4 characters.            

While callingpure.hs was running, it did not wait for input at the keyboard; instead it received John from the echo program. Observe also that the output didn't incorporate the give-and-take John on a carve up line as it did when this program was run at the keyboard. The terminal normally echoes everything you type back to you, but that is technically input, and is not included in the output stream.

Deleting and Renaming Files

Then far in this chapter, we've discussed the contents of the files. Permit'southward now talk a flake about the files themselves.

System.Directory provides two functions y'all may detect useful. removeFile takes a single argument, a filename, and deletes that file.[20] renameFile takes ii filenames: the outset is the old name and the second is the new proper name. If the new filename is in a different directory, you can also think of this as a move. The old filename must be prior to the call to renameFile. If the new file already exists, it is removed earlier the rename takes place.

Like many other functions that take a filename, if the "quondam" name doesn't exist, renameFile will raise an exception. More than information on exception handling can be found in Chapter 19, Error handling.

There are many other functions in Organization.Directory for doing things such as creating and removing directories, finding lists of files in directories, and testing for file existence. These are discussed in the section called "Directory and File Information".

Temporary Files

Programmers often need temporary files. These files may exist used to store big amounts of data needed for computations, data to be used by other programs, or any number of other uses.

While you could craft a way to manually open files with unique names, the details of doing this in a secure way differ from platform to platform. Haskell provides a convenient part called openTempFile (and a corresponding openBinaryTempFile) to handle the hard bits for you.

openTempFile takes two parameters: the directory in which to create the file, and a "template" for naming the file. The directory could simply be "." for the electric current working directory. Or you could use System.Directory.getTemporaryDirectory to observe the best place for temporary files on a given machine. The template is used every bit the basis for the file name; it volition have some random characters added to it to ensure that the consequence is truly unique. It guarantees that it will be working on a unique filename, in fact.

The return type of openTempFile is IO (FilePath, Handle). The first part of the tuple is the name of the file created, and the second is a Handle opened in ReadWriteMode over that file. When you're done with the file, you lot'll want to hClose it and so telephone call removeFile to delete it. See the following case for a sample function to use.

Extended Instance: Functional I/O and Temporary Files

Here'due south a larger example that puts together some concepts from this chapter, from some earlier chapters, and a few you oasis't seen even so. Accept a look at the program and see if yous can figure out what it does and how it works.

-- file: ch07/tempfile.hs import System.IO import System.Directory(getTemporaryDirectory, removeFile) import System.IO.Error(take hold of) import Command.Exception(finally)  -- The main entry signal.  Work with a temp file in myAction. main :: IO () main = withTempFile "mytemp.txt" myAction  {- The guts of the program.  Chosen with the path and handle of a temporary    file.  When this function exits, that file volition exist closed and deleted    because myAction was called from withTempFile. -} myAction :: FilePath -> Handle -> IO () myAction tempname temph =      do -- First by displaying a greeting on the final        putStrLn "Welcome to tempfile.hs"        putStrLn $ "I take a temporary file at " ++ tempname         -- Allow'southward see what the initial position is        pos <- hTell temph        putStrLn $ "My initial position is " ++ bear witness pos         -- Now, write some data to the temporary file        let tempdata = bear witness [i..10]        putStrLn $ "Writing one line containing " ++                    evidence (length tempdata) ++ " bytes: " ++                   tempdata        hPutStrLn temph tempdata         -- Get our new position.  This doesn't actually modify pos        -- in memory, but makes the proper noun "pos" correspond to a different         -- value for the residue of the "practise" cake.        pos <- hTell temph        putStrLn $ "After writing, my new position is " ++ evidence pos         -- Seek to the start of the file and display it        putStrLn $ "The file content is: "        hSeek temph AbsoluteSeek 0         -- hGetContents performs a lazy read of the entire file        c <- hGetContents temph         -- Copy the file byte-for-byte to stdout, followed by \north        putStrLn c         -- Permit's likewise display it every bit a Haskell literal        putStrLn $ "Which could be expressed as this Haskell literal:"        print c  {- This role takes two parameters: a filename pattern and some other    function.  It will create a temporary file, and pass the proper noun and Handle    of that file to the given function.     The temporary file is created with openTempFile.  The directory is the one    indicated past getTemporaryDirectory, or, if the system has no notion of    a temporary directory, "." is used.  The given pattern is passed to    openTempFile.     Later on the given function terminates, fifty-fifty if it terminates due to an    exception, the Handle is airtight and the file is deleted. -} withTempFile :: String -> (FilePath -> Handle -> IO a) -> IO a withTempFile pattern func =     exercise -- The library ref says that getTemporaryDirectory may heighten on        -- exception on systems that have no notion of a temporary directory.        -- So, we run getTemporaryDirectory under catch.  catch takes        -- two functions: 1 to run, and a different one to run if the        -- first raised an exception.  If getTemporaryDirectory raised an        -- exception, just use "." (the electric current working directory).        tempdir <- catch (getTemporaryDirectory) (\_ -> return ".")        (tempfile, temph) <- openTempFile tempdir pattern          -- Call (func tempfile temph) to perform the action on the temporary        -- file.  finally takes 2 actions.  The starting time is the action to run.        -- The second is an action to run after the starting time, regardless of        -- whether the first activeness raised an exception.  This way, we ensure        -- the temporary file is ever deleted.  The return value from finally        -- is the outset activity'south render value.        finally (func tempfile temph)                 (do hClose temph                    removeFile tempfile)

Let's get-go looking at this plan from the end. The withTempFile office demonstrates that Haskell doesn't forget its functional nature when I/O is introduced. This function takes a String and another role. The office passed to withTempFile is invoked with the name and Handle of a temporary file. When that function exits, the temporary file is closed and deleted. So even when dealing with I/O, we can yet find the idiom of passing functions equally parameters to be convenient. Lisp programmers might notice our withTempFile office similar to Lisp'due south with-open-file office.

At that place is some exception handling going on to make the program more robust in the face of errors. You normally want the temporary files to be deleted after processing completes, fifty-fifty if something went incorrect. So we make sure that happens. For more on exception treatment, see Chapter xix, Mistake handling.

Let'south return to the start of the programme. main is divers but every bit withTempFile "mytemp.txt" myAction. myAction, so, volition be invoked with the name and Handle of the temporary file.

myAction displays some information to the terminal, writes some data to the file, seeks to the showtime of the file, and reads the data back with hGetContents.[21] Information technology then displays the contents of the file byte-for-byte, and also equally a Haskell literal via print c. That's the same as putStrLn (prove c).

Permit's look at the output:

$                          runhaskell tempfile.hs                        Welcome to tempfile.hs I have a temporary file at /tmp/mytemp8572.txt My initial position is 0 Writing ane line containing 22 bytes: [i,ii,3,iv,v,vi,seven,eight,9,10] After writing, my new position is 23 The file content is: [ane,2,3,4,5,6,7,viii,9,ten]  Which could exist expressed as this Haskell literal: "[1,2,three,4,5,6,seven,8,nine,10]\due north"          

Every time you run this programme, your temporary file name should exist slightly unlike since it contains a randomly-generated component. Looking at this output, there are a few questions that might occur to you:

  1. Why is your position 23 subsequently writing a line with 22 bytes?

  2. Why is there an empty line afterward the file content brandish?

  3. Why is there a \n at the cease of the Haskell literal display?

Yous might be able to approximate that the answers to all iii questions are related. See if yous can work out the answers for a moment. If you need some assistance, hither are the explanations:

  1. That'due south because we used hPutStrLn instead of hPutStr to write the information. hPutStrLn e'er terminates the line by writing a \n at the finish, which didn't appear in tempdata.

  2. We used putStrLn c to display the file contents c. Because the data was written originally with hPutStrLn, c ends with the newline character, and putStrLn adds a second newline character. The result is a blank line.

  3. The \northward is the newline character from the original hPutStrLn.

Equally a terminal note, the byte counts may be different on some operating systems. Windows, for example, uses the two-byte sequence \r\due north as the end-of-line marker, then you may meet differences on that platform.

Lazy I/O

So far in this chapter, you've seen examples of fairly traditional I/O. Each line, or block of information, is requested individually and processed individually.

Haskell has some other approach available to yous as well. Since Haskell is a lazy linguistic communication, significant that any given piece of data is but evaluated when its value must be known, there are some novel ways of approaching I/O.

hGetContents

One novel way to approach I/O is the hGetContents office.[22] hGetContents has the type Handle -> IO Cord. The Cord it returns represents all of the data in the file given by the Handle.[23]

In a strictly-evaluated language, using such a function is often a bad idea. It may exist fine to read the entire contents of a 2KB file, but if you try to read the entire contents of a 500GB file, you are likely to crash due to lack of RAM to shop all that data. In these languages, yous would traditionally apply mechanisms such equally loops to process the file's entire data.

But hGetContents is dissimilar. The String it returns is evaluated lazily. At the moment you lot call hGetContents, nothing is actually read. Data is just read from the Handle equally the elements (characters) of the list are processed. As elements of the Cord are no longer used, Haskell'south garbage collector automatically frees that retention. All of this happens completely transparently to y'all. And since you have what looks like—and, really, is—a pure String, you can pass it to pure (not-IO) code.

Allow'south take a quick look at an instance. Back in the section chosen "Working With Files and Handles", you saw an imperative plan that converted the entire content of a file to uppercase. Its imperative algorithm was like to what you'd see in many other languages. Here now is the much simpler algorithm that exploits lazy evaluation:

-- file: ch07/toupper-lazy1.hs import Organization.IO import Data.Char(toUpper)  main :: IO () master = do         inh <- openFile "input.txt" ReadMode        outh <- openFile "output.txt" WriteMode        inpStr <- hGetContents inh        let issue = processData inpStr        hPutStr outh result        hClose inh        hClose outh  processData :: String -> String processData = map toUpper

Notice that hGetContents handled all of the reading for us. Too, have a expect at processData. It's a pure function since information technology has no side effects and always returns the same result each fourth dimension it is called. It has no need to know—and no way to tell—that its input is existence read lazily from a file in this case. It can work perfectly well with a twenty-graphic symbol literal or a 500GB data dump on disk.

You can even verify that with ghci :

              ghci>                                            :load toupper-lazy1.hs                            [one of 1] Compiling Main             ( toupper-lazy1.hs, interpreted ) Ok, modules loaded: Chief.              ghci>                                            processData "Hello, there!  How are y'all?"                            "Howdy, At that place!  HOW ARE Yous?"              ghci>                                            :type processData                            processData :: String -> String              ghci>                                            :type processData "Hello!"                            processData "Hello!" :: Cord            
[Warning] Alert

If we had tried to hang on to inpStr in the above example, past the one place where information technology was used (the call to processData), the program would have lost its retentiveness efficiency. That'southward considering the compiler would have been forced to keep inpStr's value in memory for future utilize. Hither information technology knows that inpStr will never exist reused, and frees the memory as soon equally information technology is done with it. Only remember: retentiveness is only freed afterwards its last employ.

This program was a bit verbose to make it clear that there was pure code in use. Here's a bit more than concise version, which we volition build on in the next examples:

-- file: ch07/toupper-lazy2.hs import System.IO import Data.Char(toUpper)  main = practice         inh <- openFile "input.txt" ReadMode        outh <- openFile "output.txt" WriteMode        inpStr <- hGetContents inh        hPutStr outh (map toUpper inpStr)        hClose inh        hClose outh

Yous are not required to ever consume all the data from the input file when using hGetContents. Whenever the Haskell organization determines that the entire cord hGetContents returned can exist garbage nerveless —which means it volition never again be used—the file is closed for y'all automatically. The aforementioned principle applies to data read from the file. Whenever a given piece of data volition never once more exist needed, the Haskell environment releases the retentivity it was stored within. Strictly speaking, we wouldn't have to telephone call hClose at all in this example program. However, it is still a good do to get into, as later changes to a plan could make the call to hClose important.

[Warning] Alarm

When using hGetContents, it is important to remember that even though you lot may never again explicitly reference Handle straight in the rest of the program, yous must not close the Handle until yous have finished consuming its results via hGetContents. Doing then would cause you to miss on some or all of the file'southward data. Since Haskell is lazy, you generally tin presume that you have consumed input merely after you accept output the result of the computations involving the input.

readFile and writeFile

Haskell programmers use hGetContents as a filter quite often. They read from one file, do something to the data, and write the result out elsewhere. This is so common that there are some shortcuts for doing it. readFile and writeFile are shortcuts for working with files as strings. They handle all the details of opening files, closing files, reading information, and writing data. readFile uses hGetContents internally.

Can you lot guess the Haskell types of these functions? Let's cheque with ghci :

              ghci>                                            :type readFile                            readFile :: FilePath -> IO String              ghci>                                            :type writeFile                            writeFile :: FilePath -> String -> IO ()            

Now, hither'south an example plan that uses readFile and writeFile:

-- file: ch07/toupper-lazy3.hs import Data.Char(toUpper)  main = do         inpStr <- readFile "input.txt"        writeFile "output.txt" (map toUpper inpStr)

Look at that—the guts of the program have up only two lines! readFile returned a lazy Cord, which we stored in inpStr. Nosotros then took that, processed it, and passed it to writeFile for writing.

Neither readFile nor writeFile ever provide a Handle for you to work with, so there is nothing to ever hClose. readFile uses hGetContents internally, and the underlying Handle volition be closed when the returned String is garbage-collected or all the input has been consumed. writeFile volition shut its underlying Handle when the entire Cord supplied to it has been written.

A Word On Lazy Output

Past now, yous should understand how lazy input works in Haskell. But what about laziness during output?

Equally y'all know, zip in Haskell is evaluated earlier its value is needed. Since functions such as writeFile and putStr write out the entire Cord passed to them, that entire String must be evaluated. Then you are guaranteed that the statement to putStr will be evaluated in full.[24]

But what does that mean for laziness of the input? In the examples above, volition the call to putStr or writeFile force the entire input cord to exist loaded into memory at once, just to exist written out?

The answer is no. putStr (and all the like output functions) write out data as it becomes bachelor. They also have no need for keeping around data already written, so equally long equally zero else in the program needs information technology, the memory can be freed immediately. In a sense, you can retrieve of the String betwixt readFile and writeFile as a piping linking the two. Data goes in one end, is transformed some way, and flows back out the other.

You lot can verify this yourself by generating a big input.txt for toupper-lazy3.hs. Information technology may take a bit to process, but you lot should see a constant—and low—retention usage while it is beingness processed.

collaborate

You learned that readFile and writeFile address the common situation of reading from one file, making a conversion, and writing to a unlike file. There'due south a situation that's fifty-fifty more mutual than that: reading from standard input, making a conversion, and writing the result to standard output. For that situation, there is a function called collaborate. The type of interact is (String -> Cord) -> IO (). That is, information technology takes one argument: a function of type Cord -> Cord. That function is passed the result of getContents—that is, standard input read lazily. The result of that role is sent to standard output.

We tin can convert our example programme to operate on standard input and standard output past using collaborate. Here'due south ane style to practice that:

-- file: ch07/toupper-lazy4.hs import Data.Char(toUpper)  main = interact (map toUpper)

Look at that— i line of code to achieve our transformation! To achieve the aforementioned effect equally with the previous examples, y'all could run this one similar this:

$                              runghc toupper-lazy4.hs < input.txt > output.txt                          

Or, if you'd like to run into the output printed to the screen, you could type:

$                              runghc toupper-lazy4.hs < input.txt                          

If you desire to meet that Haskell output truly does write out chunks of data as before long as they are received, run runghc toupper-lazy4.hs without whatever other command-line parameters. You should see each character echoed back out as soon as you blazon it, simply in uppercase. Buffering may modify this behavior; come across the section called "Buffering" later in this chapter for more on buffering. If you run into each line echoed as soon as you type it, or even nil at all for awhile, buffering is causing this behavior.

Y'all can also write simple interactive programs using interact. Permit's showtime with a simple example: adding a line of text before the majuscule output.

-- file: ch07/toupper-lazy5.hs import Data.Char(toUpper)  chief = interact (map toUpper . (++) "Your data, in uppercase, is:\n\n")

Here nosotros add a cord at the beginning of the output. Can you spot the problem, though?

Since we're calling map on the result of (++), that header itself volition appear in capital letter. Nosotros can set up that in this fashion:

-- file: ch07/toupper-lazy6.hs import Data.Char(toUpper)  main = interact ((++) "Your data, in uppercase, is:\n\n" .                   map toUpper)

This moved the header outside of the map.

Filters with collaborate

Another mutual utilize of interact is filtering. Let'southward say that you lot want to write a program that reads a file and prints out every line that contains the character "a". Hither's how you might do that with collaborate:

-- file: ch07/filter.hs main = interact (unlines . filter (elem 'a') . lines)

This may have introduced 3 functions that you aren't familiar with yet. Let'south inspect their types with ghci :

                ghci>                                                  :type lines                                lines :: String -> [String]                ghci>                                                  :type unlines                                unlines :: [Cord] -> Cord                ghci>                                                  :type elem                                elem :: (Eq a) => a -> [a] -> Bool              

Tin can you guess what these functions do but by looking at their types? If not, you tin can find them explained in the section chosen "Warming upwardly: portably splitting lines of text" and the section chosen "Special cord-handling functions". You'll frequently run into lines and unlines used with I/O. Finally, elem takes a element and a listing and returns True if that element occurs anywhere in the listing.

Attempt running this over our standard example input:

                $                                  runghc filter.hs < input.txt                                I like Haskell   Haskell is peachy              

Sure enough, you lot got back the two lines that contain an "a". Lazy filters are a powerful style to use Haskell. When you think virtually information technology, a filter—such every bit the standard Unix program grep —sounds a lot similar a function. It takes some input, applies some computation, and generates a predictable output.

The IO Monad

You've seen a number of examples of I/O in Haskell by this point. Let's take a moment to stride dorsum and call up near how I/O relates to the broader Haskell language.

Since Haskell is a pure language, if y'all give a certain part a specific statement, the function will return the same upshot every fourth dimension you lot give information technology that argument. Moreover, the function will not change anything almost the program's overall state.

You may exist wondering, then, how I/O fits into this picture show. Surely if you desire to read a line of input from the keyboard, the role to read input tin can't possibly return the same result every time it is run, correct? Moreover, I/O is all about changing state. I/O could cause pixels on a terminal to light upward, to cause paper to commencement coming out of a printer, or even to cause a package to exist shipped from a warehouse on a different continent. I/O doesn't simply change the state of a program. You can remember of I/O as irresolute the state of the world.

Actions

Near languages exercise not make a stardom between a pure function and an impure ane. Haskell has functions in the mathematical sense: they are purely computations which cannot be altered past anything external. Moreover, the computation can be performed at any time—or even never, if its consequence is never needed.

Clearly, so, nosotros need some other tool to work with I/O. That tool in Haskell is chosen actions . Actions resemble functions. They practise zero when they are defined, only perform some chore when they are invoked. I/O actions are defined within the IO monad. Monads are a powerful way of chaining functions together purely and are covered in Chapter xiv, Monads. Information technology's not necessary to understand monads in order to understand I/O. Simply understand that the outcome type of actions is "tagged" with IO. Let's take a look at some types:

              ghci>                                            :type putStrLn                            putStrLn :: String -> IO ()              ghci>                                            :type getLine                            getLine :: IO String            

The type of putStrLn is just like whatever other function. The function takes one parameter and returns an IO (). This IO () is the action. You can store and laissez passer actions in pure code if you wish, though this isn't frequently washed. An action doesn't exercise anything until it is invoked. Permit's look at an case of this:

-- file: ch07/actions.hs str2action :: String -> IO () str2action input = putStrLn ("Data: " ++ input)  list2actions :: [String] -> [IO ()] list2actions = map str2action  numbers :: [Int] numbers = [one..x]  strings :: [String] strings = map show numbers  actions :: [IO ()] deportment = list2actions strings  printitall :: IO () printitall = runall actions  -- Accept a listing of actions, and execute each of them in plow. runall :: [IO ()] -> IO () runall [] = return () runall (firstelem:remainingelems) =      do firstelem        runall remainingelems  chief = exercise str2action "Start of the program"           printitall           str2action "Done!"

str2action is a function that takes ane parameter and returns an IO (). Equally y'all tin see at the finish of main, you could utilize this directly in another action and it volition print out a line correct away. Or, you can store—but not execute—the action from pure code. Y'all can see an example of that in list2actions—we utilise map over str2action and render a list of actions, just similar we would with other pure information. You can run across that everything up through printitall is built upward with pure tools.

Although we define printitall, it doesn't become executed until its action is evaluated somewhere else. Find in primary how we use str2action as an I/O action to be executed, merely earlier nosotros used it exterior of the I/O monad and assembled results into a list.

You could think of it this style: every statement, except let, in a exercise block must yield an I/O activity which volition exist executed.

The call to printitall finally executes all those actions. Actually, since Haskell is lazy, the actions aren't generated until hither either.

When you run the program, your output will await like this:

Data: Start of the program Data: one Information: 2 Information: 3 Data: iv Data: 5 Data: 6 Data: 7 Information: viii Data: nine Data: 10 Information: Done!            

We can really write this in a much more meaty fashion. Consider this revision of the case:

-- file: ch07/actions2.hs str2message :: Cord -> String str2message input = "Data: " ++ input  str2action :: Cord -> IO () str2action = putStrLn . str2message  numbers :: [Int] numbers = [one..10]  main = practise str2action "Start of the program"           mapM_ (str2action . testify) numbers           str2action "Done!"

Observe in str2action the use of the standard function composition operator. In main, there'south a call to mapM_. This part is similar to map. Information technology takes a function and a list. The office supplied to mapM_ is an I/O action that is executed for every particular in the list. mapM_ throws out the consequence of the function, though y'all can use mapM to return a list of I/O results if you desire them. Take a look at their types:

              ghci>                                            :type mapM                            mapM :: (Monad grand) => (a -> one thousand b) -> [a] -> k [b]              ghci>                                            :type mapM_                            mapM_ :: (Monad grand) => (a -> thousand b) -> [a] -> 1000 ()            
[Tip] Tip

These functions actually piece of work for more than just I/O; they piece of work for any Monad. For at present, wherever you encounter "G", just think "IO". Besides, functions that terminate with an underscore typically discard their result.

Why a mapM when we already take map? Because map is a pure function that returns a listing. Information technology doesn't—and can't—really execute actions directly. mapM is a utility that lives in the IO monad and thus tin can actually execute the actions.[25]

Going back to main, mapM_ applies (str2action . show) to every chemical element in numbers. show converts each number to a Cord and str2action converts each Cord to an activity. mapM_ combines these individual deportment into one big activity that prints out lines.

Sequencing

do blocks are actually shortcut notations for joining together actions. There are two operators that you can use instead of do blocks: >> and >>=. Allow's look at their types in ghci :

              ghci>                                            :type (>>)                            (>>) :: (Monad grand) => 1000 a -> m b -> thou b              ghci>                                            :type (>>=)                            (>>=) :: (Monad k) => m a -> (a -> m b) -> m b            

The >> operator sequences 2 actions together: the showtime action is performed, and then the second. The result of the computation is the consequence of the second activity. The issue of the first action is thrown away. This is like to simply having a line in a do block. You might write putStrLn "line ane" >> putStrLn "line two" to test this out. Information technology will print out two lines, discard the result from the commencement putStrLn, and provide the event from the 2nd.

The >>= operator runs an action, so passes its result to a function that returns an action. That second activeness is run as well, and the result of the entire expression is the upshot of that second action. As an example, you could write getLine >>= putStrLn, which would read a line from the keyboard so brandish it back out.

Let's re-write one of our examples to avoid do blocks. Remember this example from the beginning of the chapter?

-- file: ch07/basicio.hs main = practise        putStrLn "Greetings!  What is your name?"        inpStr <- getLine        putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!"

Let's write that without a exercise cake:

-- file: ch07/basicio-nodo.hs primary =     putStrLn "Greetings!  What is your name?" >>     getLine >>=     (\inpStr -> putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!")

The Haskell compiler internally performans a translation only like this when y'all define a do block.

The True Nature of Return

Earlier in this chapter, we mentioned that return is probably not what information technology looks similar. Many languages have a keyword named return that aborts execution of a function immediately and returns a value to the caller.

The Haskell render function is quite different. In Haskell, return is used to wrap information in a monad. When speaking about I/O, render is used to take pure data and bring it into the IO monad.

Now, why would we want to do that? Remember that anything whose result depends on I/O must be within the IO monad. So if nosotros are writing a function that performs I/O, then a pure computation, nosotros will demand to use return to make this pure computation the proper return value of the role. Otherwise, a type error would occur. Hither's an example:

-- file: ch07/return1.hs import Data.Char(toUpper)  isGreen :: IO Bool isGreen =     do putStrLn "Is green your favorite color?"        inpStr <- getLine        return ((toUpper . head $ inpStr) == 'Y')

We have a pure computation that yields a Bool. That computation is passed to return, which puts it into the IO monad. Since it is the last value in the practice block, information technology becomes the return value of isGreen, only this is not because nosotros used the render role.

Here's a version of the aforementioned programme with the pure computation cleaved out into a separate function. This helps keep the pure code carve up, and can also brand the intent more than clear.

-- file: ch07/return2.hs import Data.Char(toUpper)  isYes :: String -> Bool isYes inpStr = (toUpper . caput $ inpStr) == 'Y'  isGreen :: IO Bool isGreen =     do putStrLn "Is green your favorite color?"        inpStr <- getLine        return (isYes inpStr)

Finally, here's a contrived example to bear witness that return truly does not have to occur at the end of a do cake. In practise, it usually is, but it need not be and so.

-- file: ch07/return3.hs returnTest :: IO () returnTest =     do ane <- render i        allow 2 = two        putStrLn $ show (1 + two)

Detect that we used <- in combination with return, only let in combination with the simple literal. That's considering we needed both values to be pure in lodge to add them, and <- pulls things out of monads, effectively reversing the event of return. Run this in ghci and you'll run across 3 displayed, as expected.

Is Haskell Actually Imperative?

These do blocks may wait a lot similar an imperative linguistic communication. Later on all, you're giving commands to run in sequence most of the time.

Just Haskell remains a lazy language at its core. While it is necessary to sequence actions for I/O at times, this is done using tools that are office of Haskell already. Haskell achieves a nice separation of I/O from the balance of the language through the IO monad as well.

Side Furnishings with Lazy I/O

Before in this affiliate, you read about hGetContents. We explained that the String it returns can exist used in pure code.

Nosotros need to get a scrap more than specific about what side effects are. When we say Haskell has no side-effects, what exactly does that hateful?

At a certain level, side-effects are always possible. A poorly-written loop, even if written in pure lawmaking, could crusade the system's RAM to be exhausted and the motorcar to crash. Or it could cause data to be swapped to disk.

When we speak of no side effects, nosotros mean that pure code in Haskell can't run commands that trigger side effects. Pure functions tin can't modify a global variable, asking I/O, or run a command to take downward a organization.

When yous have a String from hGetContents that is passed to a pure part, the part has no idea that this String is backed by a disk file. Information technology will carry just as it always would, but processing that String may cause the environment to issue I/O commands. The pure function isn't issuing them; they are happening as a upshot of the processing the pure role is doing, just as with the instance of swapping RAM to disk.

In some cases, you may need more command over exactly when your I/O occurs. Possibly you are reading data interactively from the user, or via a pipage from some other program, and need to communicate directly with the user. In those cases, hGetContents will probably not be appropriate.

Buffering

The I/O subsystem is one of the slowest parts of a modernistic computer. Completing a write to deejay can take thousands of times as long as a write to retentivity. A write over the network tin can be hundreds or thousands of times slower yet. Even if your operation doesn't directly communicate with the disk—perhaps because the data is cached—I/O still involves a organization telephone call, which slows things down past itself.

For this reason, modernistic operating systems and programming languages both provide tools to aid programs perform better where I/O is concerned. The operating system typically performs caching—storing oftentimes-used pieces of data in memory for faster access.

Programming languages typically perform buffering. This ways that they may request one large chunk of data from the operating arrangement, even if the code underneath is processing data ane grapheme at a time. By doing this, they can reach remarkable performance gains because each request for I/O to the operating system carries a processing price. Buffering allows u.s. to read the same amount of information with far fewer I/O requests.

Haskell, too, provides buffering in its I/O arrangement. In many cases, it is even on by default. Upwardly till at present, we have pretended it isn't there. Haskell ordinarily is adept nigh picking a good default buffering mode. Only this default is rarely the fastest. If y'all have speed-critical I/O code, changing buffering could make a significant bear upon on your programme.

Buffering Modes

There are three different buffering modes in Haskell. They are divers every bit the BufferMode type: NoBuffering, LineBuffering, and BlockBuffering.

NoBuffering does just what it sounds like—no buffering. Data read via functions like hGetLine volition exist read from the OS one character at a time. Data written volition be written immediately, and also often will be written one character at a time. For this reason, NoBuffering is usually a very poor performer and not suitable for general-purpose apply.

LineBuffering causes the output buffer to be written whenever the newline character is output, or whenever it gets too large. On input, it will usually attempt to read whatever information is bachelor in chunks until it start sees the newline character. When reading from the terminal, information technology should render information immediately later each press of Enter. It is often a reasonable default.

BlockBuffering causes Haskell to read or write information in fixed-size chunks when possible. This is the best performer when processing big amounts of data in batch, even if that information is line-oriented. However, it is unusable for interactive programs because information technology volition block input until a full cake is read. BlockBuffering accepts one parameter of blazon Maybe: if Zip, it will use an implementation-defined buffer size. Or, you can use a setting such as Just 4096 to ready the buffer to 4096 bytes.

The default buffering mode is dependent upon the operating system and Haskell implementation. You lot can ask the arrangement for the current buffering mode by calling hGetBuffering. The current manner tin can be set up with hSetBuffering, which accepts a Handle and BufferMode. As an case, you lot can say hSetBuffering stdin (BlockBuffering Nothing).

Flushing The Buffer

For any blazon of buffering, you lot may sometimes want to force Haskell to write out any data that has been saved up in the buffer. At that place are a few times when this volition happen automatically: a call to hClose, for instance. Sometimes you may want to instead telephone call hFlush, which will strength any pending information to be written immediately. This could be useful when the Handle is a network socket and yous desire the data to be transmitted immediately, or when y'all want to brand the data on deejay bachelor to other programs that might be reading it concurrently.

Reading Command-Line Arguments

Many command-line programs are interested in the parameters passed on the command line. System.Surround.getArgs returns IO [String] listing each statement. This is the same as argv in C, starting with argv[1]. The programme proper noun (argv[0] in C) is bachelor from System.Surround.getProgName.

The Arrangement.Console.GetOpt module provides some tools for parsing command-line options. If you have a plan with complex options, yous may find information technology useful. You can discover an example of its use in the section called "Control line parsing".

Environment Variables

If you need to read environs variables, you tin apply one of two functions in System.Environment: getEnv or getEnvironment. getEnv looks for a specific variable and raises an exception if information technology doesn't exist. getEnvironment returns the whole environment as a [(Cord, String)], and then yous can use functions such as lookup to discover the environment entry you want.

Setting surround variables is not defined in a cantankerous-platform style in Haskell. If yous are on a POSIX platform such every bit Linux, you can use putEnv or setEnv from the System.Posix.Env module. Environment setting is not defined for Windows.

brownwoorst.blogspot.com

Source: http://book.realworldhaskell.org/read/io.html

0 Response to "How to Read a File Line by Line Haskell"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel