I’m going to implement a simple linear regression algorithm on a data set which maps square footage onto housing values in Portland, Oregon in Swift. With this algorithm, I’ll be able to predict housing values given square footage. For this exercise, I’m going to use pure Swift and keep my only dependency as Darwin. If you want to skip right to the code, here’s a Playground .
First, I’m going to need to define a point, with $x$ and $y$ values. If I were using CoreGraphics I could make use of CGPoint
, but I won’t add that dependency and there doesn’t appear to be a Swift Point, which I find a bit surprising. Because Swift value types are much more efficient, I’m going to make my point a struct
.
struct Point { var x: Float var y: Float }
Great. Now I’d like to define a collection of points as an object so that I can perform operations on it. I’ll use a Swift Set
because my data isn’t ordered.
typealias Data = Set
Unfortunately this is where I run into my first problem with Swift: my Point
cannot go into a set because it’s not Hashable
; and to be Hashable
, the struct
must also be Equatable
. So let’s do some stuff to make the compiler happy:
func ==(lhs: Point, rhs: Point) -> Bool { return lhs.x == rhs.x && lhs.y == rhs.y } extension Point : Hashable { internal var hashValue : Int { get { return "(self.x),(self.y)".hashValue } } }
Now that I have all the preliminaries done, I’d like to define an extension
on my new custom Point
type which adds all of the functions I’ll need to perform a linear regression.
extension Data { }
This causes my second run-in with the Swift compiler: it seems that constrained extensions must be declared on the unspecialized generic types with constraints
specified by a where
clause. This means that instead of using my custom Data
object, I’ll have to use Set
and constrain the Elements
to Point
structures. Let’s see what happens:
extension Set where Elements : Point { }
Unfortunately this also does not work: the compiler is complaining that I’m constraining Elements
to a non-protocol type Point
, which is true. I cannot quite tell, but it seems that this feature may be coming in a future version of Swift, along with the following syntax (which also did not work for me this time):
extension Set where Generator.Element == Point { }
In any case, I’ve now found the winning combination to get the functionality I want while keeping the compiler happy: a PointProtocol
which defines an x
and y
, a Point struct which implements PointProtocol
, and an extension
on Set
where the Elements
conform to (the admittedly superfluous) PointProtocol
:
protocol PointProtocol { var x: Float { get } var y: Float { get } } struct Point : PointProtocol { var x: Float var y: Float } extension Set where Element : PointProtocol { }
Now it’s time to implement the derivative values I’ll need to plot a linear regression on my Set of Points. With Andrew Ng’s first three lectures fresh in my mind and a little help from Salman Khan, I came up with the following implementation:
extension Set where Element : PointProtocol { var size: Float { get { return Float(self.count) } } var avgOfXs: Float { get { return self.reduce(0) { $0 + $1.x } / self.size } } var avgOfYs: Float { get { return self.reduce(0) { $0 + $1.y } / self.size } } var avgOfXsAndYs: Float { get { return self.reduce(0) { $0 + ($1.x * $1.y) } / self.size } } var avgOfXsSquared: Float { get { return self.reduce(0) { $0 + pow($1.x, 2) } / self.size } } var slope: Float { get { return ((self.avgOfXs * self.avgOfYs) - self.avgOfXsAndYs) / (pow(self.avgOfXs, 2) - self.avgOfXsSquared) } } var yIntercept: Float { get { return self.avgOfYs - self.slope * self.avgOfXs } } func f(x: Float) -> Float { return self.slope * x + self.yIntercept } }
I have been trying to find a way to generalize the averaging functionality and pass in just the value I want to use in the summation, but I have yet to find a good way to do that.
Now that I have all the tools I’ll need, it’s just the matter of plugging in some data and running the regression:
var data = Data([ Point(x: 2104, y: 400), Point(x: 1600, y: 330), Point(x: 2400, y: 369), Point(x: 1416, y: 232), Point(x: 3000, y: 540) ]) for var i in [0, 1000, 2000, 3000, 4000, 5000] { XCPlaygroundPage.currentPage.captureValue(data.f(Float(i)), withIdentifier: "") }
This creates a beautiful graph in Xcode’s Playground which reveals to me the profound insight that a house with 0 square footage should be worth $267,900 in Portland, Oregon. More interestingly, at the 3000 square footage mark, just like we might expect by cross-referencing with out original data set, my linear regression shows the house should cost $522,146. Take a look for yourself: