# Machine Learning in Swift: Linear Regressions

I’m going to implement a simple linear regression algorithm on a data set which maps square footage onto housing values in Portland, Oregon in Swift. With this algorithm, I’ll be able to predict housing values given square footage. For this exercise, I’m going to use pure Swift and keep my only dependency as Darwin. If you want to skip right to the code, here’s a Playground .

First, I’m going to need to define a point, with \$x\$ and \$y\$ values. If I were using CoreGraphics I could make use of `CGPoint`, but I won’t add that dependency and there doesn’t appear to be a Swift Point, which I find a bit surprising. Because Swift value types are much more efficient, I’m going to make my point a `struct`.

```struct Point {
var x: Float
var y: Float
}```

Great. Now I’d like to define a collection of points as an object so that I can perform operations on it. I’ll use a Swift `Set` because my data isn’t ordered.

```typealias Data = Set
```

Unfortunately this is where I run into my first problem with Swift: my `Point` cannot go into a set because it’s not `Hashable`; and to be `Hashable`, the `struct` must also be `Equatable`. So let’s do some stuff to make the compiler happy:

```func ==(lhs: Point, rhs: Point) -> Bool {
return lhs.x == rhs.x && lhs.y == rhs.y
}

extension Point : Hashable {
internal var hashValue : Int {
get { return "(self.x),(self.y)".hashValue }
}
}```

Now that I have all the preliminaries done, I’d like to define an `extension` on my new custom `Point` type which adds all of the functions I’ll need to perform a linear regression.

```extension Data { }
```

This causes my second run-in with the Swift compiler: it seems that constrained extensions must be declared on the unspecialized generic types with constraints
specified by a `where` clause. This means that instead of using my custom `Data` object, I’ll have to use `Set` and constrain the `Elements` to `Point` structures. Let’s see what happens:

`extension Set where Elements : Point { }`

Unfortunately this also does not work: the compiler is complaining that I’m constraining `Elements` to a non-protocol type `Point`, which is true. I cannot quite tell, but it seems that this feature may be coming in a future version of Swift, along with the following syntax (which also did not work for me this time):

`extension Set where Generator.Element == Point { }`

In any case, I’ve now found the winning combination to get the functionality I want while keeping the compiler happy: a `PointProtocol` which defines an `x` and `y`, a Point struct which implements `PointProtocol`, and an `extension` on `Set` where the `Elements` conform to (the admittedly superfluous) `PointProtocol`:

```protocol PointProtocol {
var x: Float { get }
var y: Float { get }
}

struct Point : PointProtocol {
var x: Float
var y: Float
}

extension Set where Element : PointProtocol { }```

Now it’s time to implement the derivative values I’ll need to plot a linear regression on my Set of Points. With Andrew Ng’s first three lectures fresh in my mind and a little help from Salman Khan, I came up with the following implementation:

```extension Set where Element : PointProtocol {
var size: Float {
get { return Float(self.count) }
}

var avgOfXs: Float {
get { return self.reduce(0) { \$0 + \$1.x } / self.size }
}

var avgOfYs: Float {
get { return self.reduce(0) { \$0 + \$1.y } / self.size }
}

var avgOfXsAndYs: Float {
get { return self.reduce(0) { \$0 + (\$1.x * \$1.y) } / self.size }
}

var avgOfXsSquared: Float {
get { return self.reduce(0) { \$0 + pow(\$1.x, 2) } / self.size }
}

var slope: Float {
get { return ((self.avgOfXs * self.avgOfYs) - self.avgOfXsAndYs) / (pow(self.avgOfXs, 2) - self.avgOfXsSquared) }
}

var yIntercept: Float {
get { return self.avgOfYs - self.slope * self.avgOfXs }
}

func f(x: Float) -> Float {
return self.slope * x + self.yIntercept
}
}```

I have been trying to find a way to generalize the averaging functionality and pass in just the value I want to use in the summation, but I have yet to find a good way to do that.

Now that I have all the tools I’ll need, it’s just the matter of plugging in some data and running the regression:

```var data = Data([
Point(x: 2104, y: 400),
Point(x: 1600, y: 330),
Point(x: 2400, y: 369),
Point(x: 1416, y: 232),
Point(x: 3000, y: 540)
])

for var i in [0, 1000, 2000, 3000, 4000, 5000] {
XCPlaygroundPage.currentPage.captureValue(data.f(Float(i)), withIdentifier: "")
}```

This creates a beautiful graph in Xcode’s Playground which reveals to me the profound insight that a house with 0 square footage should be worth \$267,900 in Portland, Oregon. More interestingly, at the 3000 square footage mark, just like we might expect by cross-referencing with out original data set, my linear regression shows the house should cost \$522,146. Take a look for yourself: 