Blog | Combine

# Blog

Från den 1:e April kommer Combine Control Systems AB ledas som en oberoende enhet inklusive Combine Technology AB. Det tidigare moderbolaget Combine AB blir istället en del av bolaget Infotiv AB.

”Med en självständig enhet kan vi fokusera på våra spetsområden, vilket är reglerteknik samt data science” – Erik Silfverberg, CEO – Combine Control Systems AB

På grund av dessa strukturella förändringar lanserar Combine även en ny grafisk profil inklusive logotyp och denna hemsidan.

Hoppas ni gillar vår nya look!

Enter the Next Level!

Life is complicated, you probably know that. If we take a magnifying glass and look at a living thing from a chemical and biological perspective, it is astonishingly complicated. In this blog post, I will walk through an example of a process that occurs in all living things and how we can study this process with a computer. In fact, I will demonstrate that by using clever approximations, simple statistics and robust software, life does not have to be complicated.

## Setting the stage

The process that we will look at is the transport of small molecules over the cell membrane. That sounds complicated, I know. So let me explain a little bit more. Each cell, in every living organism, is surrounded by a membrane that is essential for cell viability (see Figure below). It is also important for the cell to transport small molecules across the membrane. This can be for example nutrients, waste or signals.

If we can understand this process, we can utilize it to our advantage. We can design new drug molecules that enter the cell, fix the broken cell machinery and heal diseases. We can also design better biofuel-producing cells and assess environmental effects, but that is another story.

Here, we want to estimate how fast a molecule is transported. We will use the following assumptions that make the modeling much easier:

• We will assume that the small molecules cross the membrane by themselves
• We will approximate the membrane as a two-phase system, ignoring any chemical complexity
• We will model one phase as water and the other as octanol, an alcohol (see Figure above)

By making these assumptions, we can reduce our problem to estimate the probability of finding the small molecule in octanol compared to water. In technical language, we are talking about a free energy or a partition coefficient, but it is good to keep in mind that this is nothing but a probability.

## A multivariateregression model

We will use a very simple linear regression model to predict partition coefficients. You will soon see that this is a surprisingly good model.

In a regression model, we are trying to predict an unknown variable Y given some data X. This is done by first training the model on known Xs and Ys. In the lingo of machine learning X is called features and is some properties of the system from which we can predict Y. So, what are the features of our problem?

Recall that we are trying to predict partition coefficients of small molecules, so it is natural to select some features of the small molecules. There are many features available – over one thousand have been used in the literature!

We will use three simple features that are easy to compute:

1. The weight of the molecule
2. The number of possible hydrogen bonds (it will be called Hbonds)
3. The fraction of carbon atoms in the molecules (Nonpolar)

If a molecule consists of many carbon atoms, it does not like to be in the water and prefers octanol. But if the molecule, on the other hand, can make hydrogen bonds it prefers water to octanol.

Our regression equation looks like this:

"Partition coefficient" = c0 + c1"Weight" + c2"Hbonds" + c3"Nonpolar"

and our task is to now to calculate c0c1c2, and c3. That is just four parameters – didn’t I say that life wasn’t so complicated!

We will use a database of about 600 molecules to estimate the coefficients (training the model). This database consists of experimental measurements of partition coefficients, the known Ys. To evaluate or test our model we will use some 150 molecules from another database with measured partition coefficients.

## Sympathy for Data

To make and evaluate our model, we will use the open-source software Sympathy for Data. This software has capabilities to for example read data from many sources, performing advanced calculations and fitting machine learning models.

First, we will read in a table of training data from an Excel spreadsheet.

And if one double-clicks on the output port of the Table node, we can have a look at the input data.

The measured partition coefficient is in the Partition column and then we have several feature columns. The ones that are of interest to us is Weight, HA (heavy atoms), CA (carbon atoms), HBD (hydrogen bond donors) and HBA (hydrogen bond acceptors).

From HA and CA, we can obtain a feature that describes the fraction of carbon atoms, and from HBD and HBA, we can calculate the number of possible hydrogen bonds. These feature columns will we calculate using a Calculator node.

In the calculator Node, one can do a lot of things. Here, we are creating two new columns Hbonds and Nonpolar. These columns are generated from the input table.

Next, we are using the machine learning capabilities of Sympathy for data to create a linear model. We are selecting the WeightHbonds, and Nonpolar columns as the X and the Partition column as the Y.

If one double-clicks on the output port of the Fit node, we can see the fitted coefficients of the model.

Remember that many hydrogen bonds tell us that the molecule wants to be in the water (a negative partition coefficient) and that many carbon atoms tell us that the molecule wants to be in octanol or the membrane (a positive partition coefficient). Unsurprisingly, we see that the Hbonds column contributes negatively to the partition coefficient (c2=–1.21) and the Nonpolar column contributes positively to the partition coefficient (c3=3.91).

How good is this model? Let’s read the test data and see! There is a Predict node in Sympathy for data that we can use to evaluate the X data from the test set.

By using another Calculator node, we can compute some simple statistics. The mean absolute deviation between the model and experimental data is 0.86 log units, and the correlation coefficient R is 0.76. The following scatter plot was created with the Figure from Table node.

This is a rather good model: first, the mean deviation is less than 1 log unit, which is about the experimental uncertainty. That is, we cannot expect or trust any lower deviation than this because of experimental error sources. Second, the correlation is significant and strong. It is possible to increase it slightly to 0.85 – 0.90, using more or very advanced features. But what is the point of that? Here, we are using a very simple set of features that we easily can interpret.

What’s next? You could use this model to predict the partition coefficient of a novel molecule. Say you are designing a new drug molecule and want to know if it has good transport properties. Calculate three simple features and plug it into the model, and you have your answer!

The data and the Sympathy for data flows can be obtained from Github: https://github.com/sgenheden/linear_logp

The picture of the cell was borrowed from www.how-to-draw-cartoons-online.com/cartoon-cell.html

A company which wants to start taking advantage of existing and latent information within itself has a long way to go. It is, unfortunately, necessary to wade through a swamp of concepts of buzzwords such as “Business Intelligence,” “Data Science,” “Big Data,” and so on. Management may want to build “Data Warehouses” and store more of the data which is produced in the organization. But then we wake up from the dream.

All of this data locked into various databases just cost money if no one touches it. The database administrators might not let people interact with it because the database is not capable of executing the required queries. The lucky ones who obtain data can deliver information about the data itself, answering questions like “how many?”, “how often?” and “where?”. Using tools from the Business Intelligence domain charts and tables are shown as reports where the user has to gain insights by drilling through the heaps and dimensions of information. This is the lowest level of refinement.

So, when embarking the endeavor to improve the business value of the reports using analysis, companies should not start looking at how existing data could be utilized and how to collect new data and how to store data. These issues should, of course, be addressed later on, but this is not the first priority.

Think backward instead. Start thinking about what the organization needs to know to work better. This is up to the domain-experts of the company to find out. They should be aware of some low-hanging fruits already. Then continue by sketching the process backward. If the management of the company needs to know something to be able to make better actions, what is required to produce that information? Is the data required available or can it be created from other data? And so on. Think about what is already there (an inventory might be needed), what new metrics are needed and are they measurable, is the development of new measurement methodologies required, is a given measurement method returning what it is expected to?

There are several technical challenges on the way which could involve the “four V:s” of Big Data: volume (how much?), velocity (how often?), veracity (can the data be trusted?) and variety (is the data heterogeneous?). There are also soft challenges which need to be solved like tribalism in the organization, the attitude of people towards change and tearing down walls of organizational data silos.

For a data-driven system to shine, it needs to be able to deliver insights which represent deviations outside the normal range of operation. Based on the insights the system should provide a set of recommended actions to improve the situations. The recommendations are presented together with an impact analysis if the action would be applied. This way, decisions are documented along with the facts available at the moment of decision.

Basing decisions on data is nothing new, but new technology helps to make data available sooner than previously. Having an expert system recommending actions and predicting the outcome can be a real time and money saver. Making the decision making fully automated as well is, however, an entirely different beast. Humans like being in control and it very hard to verify the correctness of wholly automated decision systems. Humans can help to introduce some suspicion and common sense when recommendations do not make sense. Extra scrutiny can then be called upon to make sure there are no strange artifacts in the underlying data.

Data-driven decisions will most certainly gain traction in the future as increasingly more complex decisions can be handled. Technology is seldom a stopper. The number crunching can be solved. Company culture and politics is the primary stopper and must not be forgotten when setting up a new project.

When building business-critical applications for an enterprise environment, it is common to first gather requirements from domain experts using a business analyst. The business analyst then formulates a set of requirements which are given to an architect. The architect, in turn, creates some design documents which the development team translates into code.

One problem here is that the distance between the ones who implement the solution and the domain experts is considerable. Information is filtered through many persons. An advantage is that we end up with documentation describing the product.

In agile development, the domain experts talk directly to the development team. Changes are fast, and the development pace can be high, but the process is likely to produce some waste as well. The development team might not be proficient in communicating with the domain expert.

The business analyst in the previous example is an expert in interviewing domain experts and writing down requirements. The agile process might also produce less documentation which makes it harder for developers entering the project late to understand the big picture.

In domain driven development domain experts, the development team and other stakeholders strive to build a shared mental model of the business process. Having a programming language with a strong type system also helps to model the domain directly in code meaning that if the requirements change the code will not compile anymore (see “Domain Modeling Made Functional“).

The claimed advantage of aligning the software model with the business domain is a faster time to market, more business value, less waste and easier maintenance and evolution.

Recommended guidelines for working with a domain-driven design is to focus on what is called business events and workflows instead of data structures. This way the business requirements are captured in a way that hidden requirements are not lost as easily while the developer is not superimposing his or her technical solutions on the design too early.

The problem domain needs to be partitioned into smaller subdomains such that the subdomains are not too large. The subdomains should match domains in the organization, not necessarily the actual company hierarchy, but instead real domains.

Each subdomain has to be modeled in the solution in such a way that the solution does not share any persistent data with other subdomains. If a subdomain needs information from another subdomain, it has to ask for it instead of just accessing directly in the database.

A so-called ubiquitous language needs to be developed. The language is shared among all the participants of the project and in the code as well. There might be local variations in the meaning of words in different domains, and that is okay as long as those differences are understood, otherwise unnecessary conflicts and misunderstandings could arise.

The shared mental model of the domain allows other stakeholders to understand what is going on as well since business processes are described on a high level while the code is documenting the requirements directly and dictates how data structures should be designed instead of the other way around.

## Introduction

Functional Principal Component Analysis (FPCA) is a generalization of PCA where entire functions act as samples ($$X \in L^2(\mathcal{T})$$ over time a interval $$\mathcal{T}$$) instead of scalar values ($$X \in \mathbb{R}^p$$). The FPCA can be used to find the dominant modes of a set of functions. One of the central ideas is to redefine the scalar product from $$\beta^T x = \left \langle \beta, x \right \rangle = \sum_j \beta_j x_j$$ into a functional equivalent $$\left\langle \beta, x \right\rangle = \int_{\mathcal{T}} \beta(s) x(s) ds$$.

## Temperature in Gothenburg

Using data from SMHI, we are going to look at variations of temperature over the year in Gothenburg.

The data spans from 1961 to today and all measurements have been averaged per month and grouped by year. To be able to do an FPCA we need to remove the mean from the data.

The first principal component of the data which explains 94% of the total variation is unsurprisingly the variation over seasons followed by the second and third principal components at 1.8% and 0.95% respectively.

Looking at the scores for the two first components gives us an idea which years differ the most from each other, i.e., the points which are farthest away from each other.

Horizontally, the years 1989 and 2010 seem to be different for the first principal component. Apparently, the winter of 1989 was much warmer than 2010.

The years 1971 and 2004 are very close to each other which suggests that they should be very similar, and they are.

The second principal component represents a mode where the late winter differs from the autumn/early winter between the years. The year 2006 had a cold early year and a warm late year while 2002 was warm to start with and cold at the end.

## Conclusion

The FPCA is a powerful tool when analyzing variations in functional data. It applies to multidimensional functional data as well. Functional data analysis, in general, is a powerful tool which also can be used to categorization where different clusters of, e.g., motion trajectories needs to be found.

#### Introduction

Scheduling constrained resources over time is a tough problem. In fact, the problem is NP-hard. One part of the problem is to find a feasible solution where all constraints are satisfied simultaneously. Another part is to also find a solution which also satisfies some measure of optimality. In practice, it is often sufficient to solve the problem partially such that the solution is better than the first feasible solution which has been found.

Solving combinatorial scheduling problems can be done using mixed-integer linear programming (MIP) or some heuristic approaches such as the Tabu search.

#### The Flow Approach

There are several ways to formulate the scheduling problem. The flow formulation presented by Christian Artigues in 2003 is one interesting approach. Assume for simplicity a process with two tasks:

The nodes 1 and 2 are representing the tasks. “S” is the start task and no edges can go back here. “E” is the end task and only incoming edges are allowed. The directed graph shows all possible edges for this configuration per resource type.

The graphs tell us how resources can be transported between different tasks. Hence, “S” can give resources to task 1 and task 2 while also sending resources which are not needed directly to “E.”

Tasks can handle resources in three different ways:

1. A task can consume a resource such that it disappears.
2. A task can produce a new resource making it available for someone else to interact with.
3. A task can pass a resource through for others to use (e.g., a person or a tool).

Assume that we have the following resources available in “S” to start with:

• One operator.
• Two tools.
• Three raw materials.

Task 1 requires the following to be able to produce one product A.

• One operator.
• One tool.
• One raw material.

Task 2 requires the following to produce one product B.

• One operator.
• Two tools.
• One product like the one produced by task 1.
• One raw material.

This problem can easily be solved by hand yielding:

The solution is an acyclic directed graph where resources flow from “S” to “E” fulfilling the constraints given by each node. Based on the solution task 2 must wait for task 1 to finish. Also, note that the edge between 2 and 1 has been removed since it is not needed.

Solving more complex resource constrained scheduling problems increases the available number of combinations rapidly making the problem harder and harder to solve as can be seen for the full graphs for 3, 6 and 10 tasks:

#### Scheduling

In some cases, it is possible to change the order between tasks or even execute them in parallel without violating any constraints. The graph tells us whether any task must follow any other task(s), either directly or through a chain of events. What is left is to take the duration of each task into consideration to produce the final time schedule.

#### Conclusion

The resource-constrained scheduling problem is relatively simple to solve if there is either excess of tasks or resources compared to the other. The number of combinations is much reduced then. When the distribution of resources is close to the number of tasks involved the problem gets much harder to solve.

Playing around with the measure of optimality can yield many different results depending on the formulation. For example, some tasks could be prioritized to be executed before others and tasks could be distributed over a time interval to maximize robustness for deviations and so forth.

The problem can be solved using mixed-integer linear programming (MIP) for which there are several mature solvers available on the market.

#### Introduction

Scheduling constrained resources over time is a tough problem. In fact, the problem is NP-hard. One part of the problem is to find a feasible solution where all constraints are satisfied simultaneously. Another part is to also find a solution which also satisfies some measure of optimality. In practice, it is often sufficient to solve the problem partially such that the solution is better than the first feasible solution which has been found.

Solving combinatorial scheduling problems can be done using mixed-integer linear programming (MIP) or some heuristic approaches such as the Tabu search.

#### The Flow Approach

There are several ways to formulate the scheduling problem. The flow formulation presented by Christian Artigues in 2003 is one interesting approach. Assume for simplicity a process with two tasks:

The nodes 1 and 2 are representing the tasks. “S” is the start task and no edges can go back here. “E” is the end task and only incoming edges are allowed. The directed graph shows all possible edges for this configuration per resource type.

The graphs tell us how resources can be transported between different tasks. Hence, “S” can give resources to task 1 and task 2 while also sending resources which are not needed directly to “E.”

Tasks can handle resources in three different ways:

1. A task can consume a resource such that it disappears.
2. A task can produce a new resource making it available for someone else to interact with.
3. A task can pass a resource through for others to use (e.g., a person or a tool).

Assume that we have the following resources available in “S” to start with:

• One operator.
• Two tools.
• Three raw materials.

Task 1 requires the following to be able to produce one product A.

• One operator.
• One tool.
• One raw material.

Task 2 requires the following to produce one product B.

• One operator.
• Two tools.
• One product like the one produced by task 1.
• One raw material.

This problem can easily be solved by hand yielding:

The solution is an acyclic directed graph where resources flow from “S” to “E” fulfilling the constraints given by each node. Based on the solution task 2 must wait for task 1 to finish. Also, note that the edge between 2 and 1 has been removed since it is not needed.

Solving more complex resource constrained scheduling problems increases the available number of combinations rapidly making the problem harder and harder to solve as can be seen for the full graphs for 3, 6 and 10 tasks:

#### Scheduling

In some cases, it is possible to change the order between tasks or even execute them in parallel without violating any constraints. The graph tells us whether any task must follow any other task(s), either directly or through a chain of events. What is left is to take the duration of each task into consideration to produce the final time schedule.

#### Conclusion

The resource-constrained scheduling problem is relatively simple to solve if there is either excess of tasks or resources compared to the other. The number of combinations is much reduced then. When the distribution of resources is close to the number of tasks involved the problem gets much harder to solve.

Playing around with the measure of optimality can yield many different results depending on the formulation. For example, some tasks could be prioritized to be executed before others and tasks could be distributed over a time interval to maximize robustness for deviations and so forth.

The problem can be solved using mixed-integer linear programming (MIP) for which there are several mature solvers available on the market.

#### Introduction

Ordinary linear differential equations can be solved as trajectories given some initial conditions. But what if your initial conditions are given as distributions of probability? It turns out that the problem is relatively simple to solve.

#### Transformation of Random Variables

If we have a random system described as

$$\dot{X}(t) = f(X(t),t) \qquad X(t_0) = X_0$$

we can write this as

$$X(t) = h(X_0,t)$$

which is an algebraic transformation of a set of random variables into another representing a one-to-one mapping. Its inverse transform is written as

$$X_0 = h^{-1}(X,t)$$

and the joint density function $$f(x,t)$$ of $$X(t)$$ is given by

$$f(x,t) = f_0 \left[ x_0 = h^{-1}(x,t) \right] \left| J \right|$$

where $$J$$ is the Jacobian

$$J = \left| \frac{\partial x^T_0}{\partial x} \right|$$.

#### Solving Linear Systems

For a system of differential equations written as

$$\dot{x}(t) = A x(t) + B u(t)$$

a transfer matrix can be defined

$$\Phi(t,t_0) = e^{A(t-t_0)}$$

which can be used to write the solution as

$$x(t) = \Phi(t,t_0) x(0) + \int_{t_0}^{t} {\Phi(t,s) B u(t) ds}$$.

The inverse formulation of this solution is

$$x(0) = \Phi^{-1}(t,t_0) x(t) – \Phi^{-1}(t,t_0) \int_{t_0}^{t} {\Phi(t,s) B u(t) ds}$$.

#### Projectile Trajectory Example

Based on the formulations above we can now move on to a concrete example where a projectile is sent away in a vacuum. The differential equations to describe the motion are

$$\left\{ \begin{array}{rcl} \dot{p}_{x_1}(t) & = & p_{x_2}(t) \\ \dot{p}_{x_2}(t) & = & 0 \\ \dot{p}_{y_1}(t) & = & p_{y_2}(t) \\ \dot{p}_{y_2}(t) & = & -g \end{array} \right.$$

where $$p_{x_1}$$ and $$p_{y_1}$$ are cartesian coordinates of the projectile in a two dimensional space while $$p_{x_2}$$ is the horizontal velocity and $$p_{y_2}$$ is the vertical velocity. We only have gravity as an external force, $$-g$$, and no wind resistance which means that the horizontal velocity will not change.

The matrix representation of this system becomes

$$A = \left( \begin{array}{cccc} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right)$$

with

$$B^T = \left( \begin{array}{cccc} 0 & 0 & 0 & 1 \end{array} \right)$$.

The transfer matrix is (matrix exponential, not element-wise exponential)

$$\Phi(t,t_0) = e^{A(t-t_0)} = \left( \begin{array}{cccc} 1 & 0 & t-t_0 & 0 \\ 0 & 1 & 0 & t-t_0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right)$$

Calculating the solution of the differential equation gives

$$x(t) = \Phi(t,0) x(0) + \int_0^t {\Phi(t,s) B u(t) ds}$$

where $$u(t) = -g$$ and $$x^T(0) = \left( \begin{array}{cccc} 0 & 0 & v_x & v_y \end{array} \right)$$. The parameters $$v_x$$ and $$v_y$$ are initial velocities of the projectile.

The solution becomes

$$x(t) = \left( \begin{array}{c} v_x t \\ v_y t – \frac{g t^2}{2} \\ v_x \\ v_y – g t \end{array} \right)$$

and the time when the projectile hits the ground is given by

$$p_y(t) = v_y t – \frac{g t^2}{2} = 0 \qquad t > 0$$

as

$$t_{y=0} = 2 \frac{v_y}{g}$$.

A visualization of the trajectory given $$v_x = 1$$ and $$v_y = 2$$ with gravity $$g = 9.81$$ shows an example of the motion of the projectile:

Now, if assume that the initial state x(0) can be described by a joint Gaussian distribution we can use the formula shown earlier to say that

$$f(x,t) = f_0\left[x(0)=h^{-1}(x,t)\right] \left|J\right| = \frac{1}{\sqrt{\left|2 \pi \Sigma \right|}} e^{-\frac{1}{2}(x(0)-\mu)^T \Sigma^{-1} (x(0)-\mu)}$$,

where $$\left| J \right| = \left| \Phi^{-1}(t) \right|$$, $$\mu^T = \left( \begin{array}{cccc} 0 & 0 & v_x & v_y \end{array} \right)$$ and

$$\Sigma = \left( \begin{array}{cccc} 0.00001 & 0 & 0 & 0 \\ 0 & 0.00001 & 0 & 0 \\ 0 & 0 & 0.01 & 0 \\ 0 & 0 & 0 & 0.01 \end{array} \right)$$

which means that we have high confidence in the firing position but less in the initial velocity.

We are only interested in where the projectile lands and we can marginalize the velocities to get:

$$f\left(p_{x_1},p_{y_1},t\right) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x,t) dp_{x_2} dp_{y_2}$$

which when plotted gives

Since we have used the landing time for the deterministic trajectory, we get a spread across the y-axis as well (the ground is located at $$p_y = 0$$). We could marginalize the y-direction as well to end up with:

This shows the horizontal distribution of the projectile at the time when the deterministic trajectory of the projectile is expected to hit the ground.

#### Conclusion

Given a set of ordinary differential equations, it is possible to derive the uncertainty of the states given a probability distribution in the initial conditions. There are two other important cases to look into as well: stochastic input signals and random parameters.

The process of testing the software includes both virtual testing, as mentioned above, and actual physical testing with the “for-the-purpose” designed hardware. The whole process of testing can be compared to a multistage rocket, where a sequence of “in-the-loop”-tests are pinpointing different areas within the software development.

In this post, we will look into this family of “in-the-loop”-tests and present the different members. Most of all, we will present the latest sibling in the family, the vHIL, and the introduction of virtual processors into the “in-the-loop” environment.

#### Physical testing

Still, physical testing is important, since the final goal of the software is to control some physical hardware. However, physical testing has its drawback. To perform physical testing, one needs plants, physical test prototypes, which can be both expensive and time-consuming to produce. Because of this, the number of plant objects is highly restricted. Also, they might have to be shared among multiple development teams so that the effective time for testing is greatly limited. Add to this, the plants often are hard configured, which means that to test another set-up; it is necessary to send the plant to the workshop to be modified, a time-consuming and error-prone procedure.

Physical testing is a kind of a gamble since this can be the first time the software, uploaded on the ECU connected to the plant, is in contact with any physical hardware and the software can have one or several bugs that can damage the plant. Since the pressure after test prototypes can be very high, one wants to avoid the risk of damage the plants. Therefore, physical testing is first taking place late in the development process when the software has gone through some security checks; it is here the purpose of “in-the-loop” tests comes into the picture.

#### MIL – Model in the loop

The whole purpose of the “in-the-loop” process is to have the software as mature as possible when it has its first contact with the physical plant. At the first stage, the control software components, or control models, are tested; one checks the behaviors of the control algorithms against virtual plant model. The purpose of these virtual plant models is to simulate the physical behavior of physical test objects. Often, the software components are limited to control a specific task; which also limits the range of the virtual plant model.

The testing starts by considering single components, unit testing, and as they start to function properly, one starts to consider larger and larger groups of components. With the groups, it is possible to test the interfaces and the communications between the components.

When the tests show that the behavior is satisfying, the control algorithms are implemented. The implementation can either be performed manually by writing C-code or by using code-generating tools, like Simulink.

#### SIL – Software in the loop

At the software in the loop stage, the implementation of the control algorithms is validated. The compiled code is here simulated against the same virtual model of the plant used in the MIL testing. In this way, it is possible to compare the results from the two stages to control that the implementation has not caused any differences in the behavior of the system.

The problem with the SIL is that the C-code is compiled for the workstation the engineer is sitting at, not for the actual microcontroller. Between these, there are several differences, for example when it comes to precision and different data types. The microcontroller is also much limited when it comes to memory and performance.

#### PIL – Processor in the loop

The third stage is not that commonly used. The purpose here is to remedy the shortcomings in the SIL environment and compile the code against the microcontroller architecture. Again, the implementation of the control algorithms into compiled C-code is tested and the same setup of testing against virtual plant models, as in the first two “in-the-loop” stages, is used. Except testing the compiled code, it is also possible to check memory allocations and executions times.

#### HIL – Hardware in the loop

It is first at this stage; the code is leaving the host PC and is uploaded to a real ECU/MC. When leaving the host PC, the control algorithms can no longer be handled as single units or small groups of units. Instead, they are one component of many in software stacks. Here, they share space with other software components, like other control algorithms, but also with other software layers, like OS, drivers, and middleware.

To get the software stack compiled and the components in it to function together as a unit, will be the first task in the procedure to set up an HIL environment. It is not uncommon that it requires multiple iterations through all previous “in-the-loop” stages to have a mature software stack ready for the HIL environment.

The actual HIL environment consists of an ECU/MC connected to the virtual plant model through an HIL simulation box, which speeds up the simulation of the virtual plant model to real-time. The problem is that this environment suffers from some of the drawbacks common to physical testing; expensive setups shared among multiple teams doing integration and testing.

#### vHIL – Virtual hardware in the loop

Lately, there have appeared fast simulation models of digital ECU/MC hardware on the market, so-called virtual prototypes or VPs. These are executing the same binary software as the target MCU and can be simulated against the same virtual plant models used for the MIL, and SIL testing. This new environment is the vHIL.

It should be stated that the vHIL will not replace the HIL. Instead, it is smoothening the path between the SIL and the HIL. Since it does not require any physical ECU, the vHIL can be up and run early in the process and deliver a more mature software stack to be integrated into the HIL environment, reducing the bring-up time from weeks to days, which gives more time for actual HIL testing.

With the VP, one can achieve better control and visibility compared to testing on a real ECU/MCU. Many of the simulation platforms for the VP give the opportunity to debug, analyze and automate the testing procedure. It is possible to set breakpoints in the software for debugging, which halts the whole simulation. From these breakpoints, one can step through the software row by row and jump in and out of subroutines. The platforms can also give you a visualization of the communication paths in the software and check the code coverage during a test.

With the vHIL, more tests are performed early in the process, since one do not need to wait for a physical prototype of the ECU. Also, the vHIL is easier to scale up and have running at multiple virtual locations at the same time. More testing at an early stage will result in higher quality testing in the HIL environment or the physical testing. Also, it is possible, with the vHIL, to prepare the tests for the upcoming HIL in advance, compose and test them in a virtual environment and have them ready when the prototypes of the physical ECUs appear.

Two examples of VP platforms are the Virtualizer from Synopsys and the Vehicle System Integrator, VSI, from Mentor Graphics.

The process of testing the software includes both virtual testing, as mentioned above, and actual physical testing with the “for-the-purpose” designed hardware. The whole process of testing can be compared to a multistage rocket, where a sequence of “in-the-loop”-tests are pinpointing different areas within the software development.

In this post, we will look into this family of “in-the-loop”-tests and present the different members. Most of all, we will present the latest sibling in the family, the vHIL, and the introduction of virtual processors into the “in-the-loop” environment.

#### Physical testing

Still, physical testing is important, since the final goal of the software is to control some physical hardware. However, physical testing has its drawback. To perform physical testing, one needs plants, physical test prototypes, which can be both expensive and time-consuming to produce. Because of this, the number of plant objects is highly restricted. Also, they might have to be shared among multiple development teams so that the effective time for testing is greatly limited. Add to this, the plants often are hard configured, which means that to test another set-up; it is necessary to send the plant to the workshop to be modified, a time-consuming and error-prone procedure.

Physical testing is a kind of a gamble since this can be the first time the software, uploaded on the ECU connected to the plant, is in contact with any physical hardware and the software can have one or several bugs that can damage the plant. Since the pressure after test prototypes can be very high, one wants to avoid the risk of damage the plants. Therefore, physical testing is first taking place late in the development process when the software has gone through some security checks; it is here the purpose of “in-the-loop” tests comes into the picture.

#### MIL – Model in the loop

The whole purpose of the “in-the-loop” process is to have the software as mature as possible when it has its first contact with the physical plant. At the first stage, the control software components, or control models, are tested; one checks the behaviors of the control algorithms against virtual plant model. The purpose of these virtual plant models is to simulate the physical behavior of physical test objects. Often, the software components are limited to control a specific task; which also limits the range of the virtual plant model.

The testing starts by considering single components, unit testing, and as they start to function properly, one starts to consider larger and larger groups of components. With the groups, it is possible to test the interfaces and the communications between the components.

When the tests show that the behavior is satisfying, the control algorithms are implemented. The implementation can either be performed manually by writing C-code or by using code-generating tools, like Simulink.

#### SIL – Software in the loop

At the software in the loop stage, the implementation of the control algorithms is validated. The compiled code is here simulated against the same virtual model of the plant used in the MIL testing. In this way, it is possible to compare the results from the two stages to control that the implementation has not caused any differences in the behavior of the system.

The problem with the SIL is that the C-code is compiled for the workstation the engineer is sitting at, not for the actual microcontroller. Between these, there are several differences, for example when it comes to precision and different data types. The microcontroller is also much limited when it comes to memory and performance.

#### PIL – Processor in the loop

The third stage is not that commonly used. The purpose here is to remedy the shortcomings in the SIL environment and compile the code against the microcontroller architecture. Again, the implementation of the control algorithms into compiled C-code is tested and the same setup of testing against virtual plant models, as in the first two “in-the-loop” stages, is used. Except testing the compiled code, it is also possible to check memory allocations and executions times.

#### HIL – Hardware in the loop

It is first at this stage; the code is leaving the host PC and is uploaded to a real ECU/MC. When leaving the host PC, the control algorithms can no longer be handled as single units or small groups of units. Instead, they are one component of many in software stacks. Here, they share space with other software components, like other control algorithms, but also with other software layers, like OS, drivers, and middleware.

To get the software stack compiled and the components in it to function together as a unit, will be the first task in the procedure to set up an HIL environment. It is not uncommon that it requires multiple iterations through all previous “in-the-loop” stages to have a mature software stack ready for the HIL environment.

The actual HIL environment consists of an ECU/MC connected to the virtual plant model through an HIL simulation box, which speeds up the simulation of the virtual plant model to real-time. The problem is that this environment suffers from some of the drawbacks common to physical testing; expensive setups shared among multiple teams doing integration and testing.

#### vHIL – Virtual hardware in the loop

Lately, there have appeared fast simulation models of digital ECU/MC hardware on the market, so-called virtual prototypes or VPs. These are executing the same binary software as the target MCU and can be simulated against the same virtual plant models used for the MIL, and SIL testing. This new environment is the vHIL.

It should be stated that the vHIL will not replace the HIL. Instead, it is smoothening the path between the SIL and the HIL. Since it does not require any physical ECU, the vHIL can be up and run early in the process and deliver a more mature software stack to be integrated into the HIL environment, reducing the bring-up time from weeks to days, which gives more time for actual HIL testing.

With the VP, one can achieve better control and visibility compared to testing on a real ECU/MCU. Many of the simulation platforms for the VP give the opportunity to debug, analyze and automate the testing procedure. It is possible to set breakpoints in the software for debugging, which halts the whole simulation. From these breakpoints, one can step through the software row by row and jump in and out of subroutines. The platforms can also give you a visualization of the communication paths in the software and check the code coverage during a test.

With the vHIL, more tests are performed early in the process, since one do not need to wait for a physical prototype of the ECU. Also, the vHIL is easier to scale up and have running at multiple virtual locations at the same time. More testing at an early stage will result in higher quality testing in the HIL environment or the physical testing. Also, it is possible, with the vHIL, to prepare the tests for the upcoming HIL in advance, compose and test them in a virtual environment and have them ready when the prototypes of the physical ECUs appear.

Two examples of VP platforms are the Virtualizer from Synopsys and the Vehicle System Integrator, VSI, from Mentor Graphics.