There are many fields in which Deep Learning (DL) can be applied, such as computer vision, speech recognition, data science and many more. As we love cutting edge technology and specialize in control systems, we are interested in how DL can act as an alternative to traditional model-based control algorithms, such as LQR, MPC and H∞.
Last spring, a thesis was conducted here at Combine that investigated this issue by applying both an LQR controller and a DL controller based on a state-of-the-art reinforcement learning method on a practical system. In this post, we will dig a bit deeper in the comparison and the future capabilities of DL as an alternative to traditional control algorithms.
To begin with, one can start to compare which systems the algorithms can be applied to. Traditional control algorithms are often based on a linearized system. This implies that the system on which the controller is applied need to be linear enough in a direct neighborhood of the operating point. The controllers can only guarantee stability in the vicinity of that point. However, DL controllers (further on called policies) are nonlinear controllers, which means that they are in theory not limited to linear-like systems. In practice, on the other hand, things might not look as bright for the DL controller. For instance, some systems may be sensitive to failures and training on the real physical system may be hard or even practically infeasible. In literature, there are many examples of policies controlling a simulated system but its hard to find a high-level algorithm in an actual real-world application.
To address this problem, our thesis workers decided to construct such a critical system (a unicycle) and benchmarked the DL policy to a traditional LQR controller. They solved the issue of impractical real-world training by training the policy in simulation to then transfer it to the real system. A bit like the process of designing a traditional controller. This approach implies that one cannot eliminate the need of a mathematical model of the system that is often a challenge when constructing traditional controllers.
One can ask the question that if you cannot eliminate the need of a mathematical model, why would one choose the more complex DL policy compared to the model-based controller that is optimized for the system? There can be several reasons. First, one need to keep in mind that the model-based controller is only optimal for the linearized version of the system. If the system is highly nonlinear, one can get a better performance using the nonlinear DL approach. Secondly, one can include nonlinear features to the system considerably easier when designing a DL policy in comparison to a traditional controller. Let’s further dig into this using the unicycle as an example.
Imagine one were to design the controller to be robust to external disturbances (e.g. a push from the side). For the traditional controllers, one would have to model this push in a linear way and describe it in the frequency plane to then implement an H∞ controller. This is possible, but as the number of features one would like to add increase, the complexity of implementing it increases significantly. This is a problem if one wish to add another disturbance the unicycle should be robust to, e.g. signal noise. If one were to implement this using DL, the only thing one would need to do is to add a small subroutine to simulate the feature every now and then during the training. As one can model the features in a nonlinear way, this can be very powerful while keeping the implementation simple.
As promised, let’s compare the methods when deployed to a real-world application. In the thesis, both methods stabilized the unicycle in a satisfactory fashion. However, the traditional LQR controller outperformed the DL policy in most perspectives in which the hardware of the unicycle did not set any limitations, both in practice and simulation. This is most likely due to the policy converging to a local optimum. The impression is that developing a high performing DL policy requires more time and resources compared to a traditional controller. Due to thesis projects being limited in time, our thesis workers had to accept the stabilizing local optima solution and evaluate that policy. An interesting aspect would be to fine tune the policy on the actual hardware. This would give the DL controller a chance to train on the real system to increase its performance, something that is not possible using a traditional control method.
Another interesting aspect is how one can guarantee stability of the system. In traditional controllers, one can often verify stability within a state space using well known concepts. In DL policies it is harder to guarantee stability. One way of doing it is to state that the policy has stabilized in a number of simulations. However, if the state space is large, it may be time consuming to reach and explore all states in the state space, both during training and verification. DL policies may have a degraded performance if it reaches unexplored states. Signs of this was showed in the thesis, as tests were made in unexplored states and the policy showed worse performance in these states.
As a conclusion, DL policies can act as an alternative to traditional model-based controllers for certain systems. If one face a system which will be a subject to several set of known disturbances or a system that will reach nonlinear states, one should at least consider a DL approach. On the other hand, if one face a system which will act in a region in which the system can be described as linear, traditional model based linear controller is recommended. This is also the case for systems with a large state space. Finally, DL policies show signs of great potential, such as fine tuning the controller on a real system to increase practical performance or training it with subject to nonlinear features such as noise. In the future, we can see a potential of DL based controllers replacing traditional controllers on even more systems that is possible today.
If you are interested, you can read the complete thesis work here: https://hdl.handle.net/20.500.12380/300526