When we make a model of a system, we want it recreate some desired behaviour(s).
To do so, we make a cost (also called loss) function, that takes in model behaviour, and spits out how bad it is. Larger loss <=> less desirable model behaviour.
Of course, the modeller has to define the measure of bad behaviour. For instance, squared deviation from data the model is supposed to fit (is used too much).
By the way
Most systems one would model in Biology and Engineering (my interests) are input-output systems. You turn the steering wheel, the car veers. You inject current, the neuron fires. You add a cytokine, the cell's metabolic system responds.
Therefore reasonable cost functions should penalise bad model behaviour for multiple inputs. I want my car to turn left when I steer left. I also want my car to turn right when I steer right.
Don't conveniently forget this to make the modelling/analysis easier. Too many biological models do. I'm also guilty. MinimallyDisruptiveCurves.jl allows you easily sum multiple cost functions (one for each input). It evaluates them in a multithreaded way, so that performance doesn't take such a hit.
Model fitting (also known as training, optimisation, calibration, ...) is running:
subject to .
Since is a scalar function (returns a scalar), we can think of it as a landscape, where height is analogous to , and lateral co-ordinates are the parameters of the problem. Then model fitting is just finding a path to the deepest point of the landscape! (see below)
This gif was taken from an Intro to Optimization blog post, website here
Recall the goal. Given a fitted model, we want to:
Find functional relationships between model parameters that best preserve model behaviour.
To do this, our curve generator wants to
get as far away as possible from the initial parameters, while keeping the cost function as low as possible.
By the way
Optimisation consists of descending to the lowest point of the cost landscape (without worrying about your particular trajectory).
We, on the other hand, are trying to flow along the valleys of the landscape. Like a river. But one that has enough energy to go uphill to a specified elevation (the momentum parameter) if it can't find a flat/downhill direction.
Let's express this mathematically. We can define a curve on parameter space as a function:
The input to the curve is a scalar number, that specifies 'how far along' the curve we are. So is the initial point, and is the final point.
Any point on the curve has a derivative, . This is the direction in which it is pointing (i.e. the direction of the tangent). Let's fix
, the locally optimal parameter vector gained from model fitting.
, the initial starting direction of the curve, as a user-defined parameter.
We will denote the set of allowable curves (i.e. those satisfying the above constraints), as .
By the way
We are going to iteratively add constraints to the set of allowable curves, and thus shrink .
Now we want all points on our curve to define low-cost parameter configurations. In other words, we want to be small for any . How can we distinguish between 'good' and 'bad' curves? How about this:
is the line integral of the cost along the curve . If is small, then is necessarily small along the entire curve.
By the way
Vector calculus 101: we can expand out the line integral as
So now the best curve would be the solution of
What's preventing the curve from just buzzing around . Since we know that is as low as can be?
So let's add a constraint that the direction of the curve must always be pointing away from . Mathematically:
But wait again!
What if the curve 'doesn't want to evolve' because every direction increases the cost. So it just...stays short.
Absolutely nothing again!
We want a curve that actually changes the parameters appreciably. So let's constrain the length of the curve, by setting how fast it moves as a function of :
Now the curve has to have length .
So to wrap up, our allowable curves are:
for some fixed initial direction . Our mathematical problem is
If equation (10) reminds you of a variational calculus problem, you would be right! Now we just have to solve the Euler-Lagrange equations, right?
So instead of searching directly over curves, we are going to turn this into a dynamic problem. Stop thinking about curves as static shapes. Start thinking about curves as things that are drawn over time.
This turns it into an optimal control problem:
Imagine a rocket sitting at
It needs to traverse a trajectory in parameter space
At each point in time, we send it a control input, that determines its current direction (i.e. we steer the rocket).
Problem: Design a steering law so that the trajectory traced by the rocket is a (locally) optimal solution of equation (10).
By the way
The rocket analogy is there for a reason. Optimal control theory got going in the 1950s, when the USSR and the USA suddenly got interested in designing aircraft/rocket trajectories that were optimal for doing various military things. No prizes for guessing why.
Two solutions emerged, separated by the iron curtain. Pontryagin's maximum principle, and Bellman's principle of optimality.
Using one of these methods, we shall seize the means of curve generation, Comrade!
Let's mathematically express equation (10) as an optimal control problem:
At each time , we choose an input . The entire trajectory, from to , we denote . The space of possible trajectories is . So .
The dynamics of the 'rocket' are (i.e. we directly control rocket velocity).
By transcribing the constraints on in the previous section to constraints on , we define the space of possible input trajectories as
Our minimisation problem (10) is now
You can then apply Pontryagin's maximum principle to get some complicated constraints on .
You can work through these constraints (see P135 of my PhD thesis) and translate them into a differential equation.
By the way
Usually, solving the optimal control problem using Pontryagin's maximum principle requires solution of a boundary value problem. These are annoying to solve.
Luckily, by tinkering with constraints (see PhD thesis reference), we factor out the boundary constraints, and end up with an explicit ordinary differential equation that defines optimal curve evolution.
Now, we are going to write down the differential equations for curve evolution
They are complicated and you should feel free to skip them. Intuition will be provided subsequently.
... no one more thing.
Pontryagin's maximum principle requires dynamics for not only the state of the system (i.e. the position of the rocket) but also the costate of the system. What is this? Something roughly analogous to the momentum of the rocket, but also affected by all the input constraints we have set.
So we get two curves. One for the state , and the other for the costate, which we shall denote .
So now, the coupled dynamics for state and costate. Angle brackets denote inner (dot) products.
Let be the final cost. We set this a priori as a hyperparameter. Curve evolution stops prematurely if .
By the way
Pre-setting is the trick that turns this from a boundary value problem into a (much easier) ODE problem.
Define , and impose the constraint .
Given (the current cost), (and it's gradient), we can solve this as a differential equation in the variables and .
By the way
Get some intuition by setting in your head, then reviewing the equations. This occurs when the curve is currently pointing away from .
Then you see that , the curve direction, is proportional to , the costate ( momentum).
Meanwhile, the costate is integrating the negative cost gradient,
Notice the analogy with optimisation algorithms based on gradient descent with momentum, if you're into that kind of thing.
What we are trying to do
We wanted a method to construct curves in parameter space, where
Every point on the curve represents a set of model parameters that is as low cost as possible.
The curves don't 'double back' on themselves (monotonically increasing distance from origin).
The curves have a fixed length .
The user sets the initial direction of the curve.
This resulted in the optimisation problem of equation (10).
How we do it
To make solution of this variational problem tractable, we turned it into an optimal control problem.
We made the analogy of a rocket, with an automatic steering law, traversing parameter space
We asked what steering law would result in a rocket trajectory corresponding to a curve satisfying the above requirements.
This resulted in a set of differential equations, that evolve paired 'position' and 'momentum' variables for the rocket. Position represents the point on parameter space (i.e. the curve).
Solution of this differential equation requires evaluation of the cost function and it's gradient at every step.