Online Book Reader

Home Category

Learn You a Haskell for Great Good! - Miran Lipovaca [88]

By Root 540 0
drawback: It tends to be slow. Lists are really lazy. Remember that a list like [1,2,3,4] is syntactic sugar for 1:2:3:4:[]. When the first element of the list is forcibly evaluated (say by printing it), the rest of the list 2:3:4:[] is still just a promise of a list, and so on. We call that promise a thunk.

A thunk is basically a deferred computation. Haskell achieves its laziness by using thunks and computing them only when it must, instead of computing everything up front. So you can think of lists as promises that the next element will be delivered once it really has to be, and along with it, the promise of the element after it. It doesn’t take a big mental leap to conclude that processing a simple list of numbers as a series of thunks might not be the most efficient technique in the world.

That overhead doesn’t bother us most of the time, but it turns out to be a liability when reading big files and manipulating them. That’s why Haskell has bytestrings. Bytestrings are sort of like lists, only each element is one byte (or 8 bits) in size. The way they handle laziness is also different.

Strict and Lazy Bytestrings


Bytestrings come in two flavors: strict and lazy. Strict bytestrings reside in Data.ByteString, and they do away with the laziness completely. There are no thunks involved. A strict bytestring represents a series of bytes in an array. You can’t have things like infinite strict bytestrings. If you evaluate the first byte of a strict bytestring, you must evaluate the whole thing.

The other variety of bytestrings resides in Data.ByteString.Lazy. They’re lazy, but not quite as lazy as lists. Since there are as many thunks in a list as there are elements, they are kind of slow for some purposes. Lazy bytestrings take a different approach. They are stored in chunks (not to be confused with thunks!), and each chunk has a size of 64KB. So if you evaluate a byte in a lazy bytestring (by printing it, for example), the first 64KB will be evaluated. After that, it’s just a promise for the rest of the chunks. Lazy bytestrings are kind of like lists of strict bytestrings, with a size of 64KB. When you process a file with lazy bytestrings, it will be read chunk by chunk. This is cool because it won’t cause the memory usage to skyrocket, and the 64KB probably fits neatly into your CPU’s L2 cache.

If you look through the documentation for Data.ByteString.Lazy, you will see that it has a lot of functions with the same names as the ones from Data.List, but the type signatures have ByteString instead of [a] and Word8 instead of a. These functions are similar to the ones that work on lists. Because the names are the same, we’re going to do a qualified import in a script and then load that script into GHCi to play with bytestrings:

import qualified Data.ByteString.Lazy as B

import qualified Data.ByteString as S

B has lazy bytestring types and functions, whereas S has strict ones. We’ll mostly be using the lazy versions.

The pack function has the type signature pack :: [Word8] -> ByteString. This means that it takes a list of bytes of type Word8 and returns a ByteString. You can think of it as taking a list, which is lazy, and making it less lazy, so that it’s lazy only at 64KB intervals.

The Word8 type is like Int, but it represents an unsigned 8-bit number. This means that it has a much smaller range of only 0 to 255. And just like Int, it’s in the Num type class. For instance, we know that the value 5 is polymorphic in that it can act like any numeric type, including Word8.

Here’s how we pack lists of numbers into bytestrings:

ghci> B.pack [99,97,110]

Chunk "can" Empty

ghci> B.pack [98..120]

Chunk "bcdefghijklmnopqrstuvwx" Empty

We packed only a handful of values into a bytestring, so they fit inside one chunk. Empty is like [] for lists—they both represent an empty sequence.

As you can see, you don’t need to specify that your numbers are of type Word8, because the type system can make numbers choose that type. If you try to use a big number like 336 as a Word8, it will just wrap around to 80.

When we need

Return Main Page Previous Page Next Page

®Online Book Reader