Learning Scala from a Haskell perspective
Next term, we will be doing a group compiler project. We’ve agreed on using either Scala or Kotlin (we want a JVM language, just not Java). I’m leaning towards Scala for its emphasis on functional programming, so this will be my exploration of Scala where I note down interesting things. I will attempt to make comparisons between Haskell and Scala concepts, and talk about where the comparison breaks down too. This will mostly act as a reference for myself, but hopefully other people might find it useful.
SBT - Scala Build Tool
SBT is a project manager like Stack - it manages the Scala version and dependencies to be used by a project, as well as provide an interface for compiling, running and testing your project. Here are the commands and their stack versions (though they’re mostly the same):
SBT | Stack |
---|---|
sbt clean | stack clean |
sbt compile | stack build |
sbt test | stack test |
sbt run | stack run |
sbt console | stack ghci |
Values and Variables
val
identifies an immutable variable (value) whereas var
identifies a
mutable one:
val x = 1 //immutable
= 10 //illegal
x
var y = 0 //mutable
= 2 //legal y
In this example, types are implicitly inferred from the values contained in the variables. However, variable types can be explicitly given if desired:
val x: Int = 1
var y: String = "hello"
The type of a var is fixed upon declaration, but if you allow Scala to infer the type then it will infer the most specific type possible (it seems).
Sticking to val
would be good for doing pure functional programming ala
Haskell, since they cannot be reassigned (although it might still be possible to
assign a mutable object to a val - in which case its not immutable “all the
way”). On the other hand, var
s may be reassigned. The new value must have a
type matching the variable, which might be implicitly derived from the original
value.
var y = 1
= "hello"
y
: type mismatch;
error: String("hello")
found : Int required
Control Structures
All control structures in Scala return a value 1, because Scala is an
expression oriented language. This allows for them to be used flexibly either as
“pure” values, or as imperative-style constructs (in which case the unit
type
is returned, signifying a non-meaningful value). Although, you can always be
evil by including side-effects and return a value at the same time.
If-Else & Case expressions
As a first example of flexibility, we can use if-else ala Haskell to define a value.
// Note to self: remember to bracket your conditional and not use a "then"
val x = if (y < 10) y else "too big"
The type of x
will be inferred to be Any
, the most specific type
supertyping both Int
and String
. If both alternatives had the same type
(e.g. Int
), then x
will be of type Int
.
However, we can also use if-else as an imperative top-level construct:
// Use brackets to run multiple expressions in body
// last expression in brackets is the one returned
if (y < 10)
println(y)
else {
println("This number is:")
println("too big")
}
Although we don’t use it, this returns the unit
type as printing returns no
meaningful value.
This flexibility also applies for case expressions, which uses the keyword
match
:
// Use it like Haskell case expressions
val x = y match {
case 1 | 2 | 3 => "small"
case 4 | 5 | 6 => "medium"
case _ => "large"
}
// Or use it imperatively
match {
y case 1 | 2 | 3 => println("small")
case 4 | 5 | 6 => println("medium")
case _ => println("large")
}
One additional feature is to match different subtypes on each case. This is not possible in Haskell where there is no subtyping. To do this, just add a type annotation on the pattern match:
match {
y case n: Int => n + 1
case str: String => str + " + 1"
case x => x // otherwise, don't modify it
}
Loops
While loops are purely imperative: they always return the unit
type. With the
exception of having a (vacuous) return value, they seem to just be good ol’ Java
while loops.
while (y < 10) {
println(y)
+= 1
y // This return value is ignored, while-loop always evaluates to Unit
y }
In Scala, the only for loops are foreach loops. They also return the unit type, although they can pattern match on collection elements unlike Java foreach loops:
val numberPairs = List((1,2),(3, 4))
for ((left, right) <- numberPairs) println(s"$left VS $right")
At first glance, this pattern matching notation (with the arrow and all that)
really screams list comprehension. Lo and behold, we can use foreach loops
as list comprehensions by adding the keyword yield
:
val numberPairs = List((1,2),(3, 4))
for ((left, right) <- numberPairs) yield left + right
This will return a collection of left + right
s. And if you really want to be
evil? Do other stuff inside the yield before returning:
for ((left, right) <- numberPairs) yield {
println(left)
println(right)
+= lef≡t
x + right
left }
For loops and do notation
It turns out, the list comprehension-y feel of for expressions makes a lot of sense. This is because they are just syntactic sugar for a series of calls to the following methods:
List[A].map[B]: (A => B) => List[B]
List[A].flatMap[B]: (A => Iterable[B]) => List[B]
List[A].filter: (A => Boolean) => List[A]
List[A].foreach: (A => Any) => Unit
I’m not sure if these are the actual types of the methods (or if they’re even
syntactically correct), but I think they carry the concept just fine. These
functions are encapsulated in the Iterable trait, which means any class
extending the trait can be for looped (notice that the type of flatMap
is the
same as (>>=)
). You can also just implement these methods without becoming an
iterable. Anyway, here are some
examples
of desugaring that I found on StackOverflow.
[for(x <- c1; y <- c2; z <-c3) {...}]
desugar.foreach(x => c2.foreach(y => c3.foreach(z => {...})))
≡ c1
[for(x <- c1; y <- c2; z <- c3) yield {...}]
desugar.flatMap(x => c2.flatMap(y => c3.map(z => {...})))
≡ c1
[for(x <- c; if cond) yield {...}]
desugar.filter(x => cond).map(x => {...})
≡ c
[for(x <- c; y = ...) yield {...}]
desugar.map(x => (x, ...)).map((x,y) => {...}) ≡ c
In particular, notice the parallel between the second example and Haskell’s do-notation:
for(
x <- c1
y <- c2
z <- c3
) yield {...}
do
x <- c1
y <- c2
z <- c3
return ...
and their desugarings:
.flatMap(x => c2.flatMap(y => c3.map(z => {...})))
c1
>>= (x -> c2 >>= (y -> c3 >>= (z -> return ...))) c1
These are just monads! They’re even exactly the same length! Although the condition does seem to be stronger as filter is required as well. From these examples, I tried to piece together the desugaring mechanism:
[for (a <- ma; b1; b2; ...; bk; ...) {...}]
desugar.desugar[b1]
≡ ma.desugar[b2]
...
.desugar[bk]
.foreach((a, c1, ..., cj) => desugar[for(...) {...}])
where each b
has to match either c = expr
or if cond
. c1, ..., cj
is the
list of assigned variable c
s that appear in the b
s. Now, here’s how b
s are
desugared:
// in this context, c1 .. cj are any previous assignments
[c = expr] ≡ map ((a, c1, ..., cj) => (a, c1, ..., cj, expr))
desugar[if cond] ≡ filter((a, c1, ..., cj) => cond) desugar
We also need a base case:
[for (a <- ma; b1; b2; ...; bk) {...}]
desugar.desugar[b1]
≡ ma.desugar[b2]
...
.desugar[bk]
.foreach((a, c1, ..., cj) => {...})
For loops that yield
require the use of flatMap
and map
instead.
[for (a <- ma; b1; b2; ...; bk; ...) {...}]
desugar.desugar[b1]
≡ ma.desugar[b2]
...
.desugar[bk]
.flatMap((a, c1, ..., cj) => desugar[for(...) {...}])
[for (a <- ma; b1; b2; ...; bk) yield {...}]
desugar.desugar[b1]
≡ ma.desugar[b2]
...
.desugar[bk]
.map((a, c1, ..., cj) => {...})
Functions VS Methods
In Scala, functions are values but methods are not. Functions can be defined using anonymous functions
val add = (x: Int, y: Int) => x + y
// If you want a curried version:
val add = (x: Int) => (y: Int) => x + y
or placeholder syntax.2
val add = (_: Int) + (_: Int)
On the other hand, methods are defined using the def
keyword. However, they
cannot be assigned to a val
or var
as they are not values.
def addM(x: Int, y: Int) = x + y
val add = addM
: missing argument list for method addM
errortype is
Unapplied methods are only converted to functions when a function .
expectedthis conversion explicit by writing `addM _` or `addM(_,_)`
You can make . instead of `addM`
As mentioned in the error message, methods cannot be assigned to variables, but
they can be converted to functions. This conversion can be done by either adding
an underscore in front of the function (addM _
) or by applying the method to
placeholders. The former method is called
η-expansion. We
can see how it works by checking the type:
:type addM _
(Int, Int) => Int
// Curried version:
def addM(x: Int)(y: Int) = x + y
:type addM _
Int => (Int => Int)
When passing in a method to a higher-order function, it will automatically expand it for you without having to use an underscore.
This is not technically correct as using
return
is specifically reserved for functions, but its nice to think about it like that.↩︎I have to explicitly type the placeholders as there’s no context for their type here. When passing a placeholdered expression to a higher-order function, Scala should be able to infer the placeholders’ types, removing the need for this.↩︎