Blog | Combine


This blog post is a continuation on a series of posts using Sympathy for Data for doing image processing. Sympathy is an Open-Source tool for graphically programming data-flows which lends itself well for quickly setting up and testing different image processing and classical machine learning algorithms that we can use to classify objects in an industrial setting. We will show how we can perform simple object recognition using only a modicum of feature engineering and a very small dataset and some simple machine learning algorithms. By doing feature engineering on the input data we get a high precision training set for the machine learning algorithm sufficient for classifying objects. This can be contrasted with the shotgun approach of deep learning which requires wast datasets of training examples to solve the same task.

The task

In the previous entry we started on an algorithm for automatically extracting objects in an image taken top-down of objects against a neutral background. These example objects consists of a mix of screws, washers and nuts on a conveyor belt and we would ideally like to classify them in order to sort them in a later step.

The output from our previous step was a list containing the mask for each object found in the input image. We will continue from this step by using image processing to do feature engineering as a pre-processing step to using a simple machine learning algorithm to do the classification.

What is feature engineering?

Many simplistic approaches to object classification using machine learning feed the raw pixel data to machine learning algorithms such as support vector machines, random forest or classical neural networks in order to solve tasks such as the MNIST classification. While these approaches have been successful in small domains such as the 28×28 pixels images of MNIST it is much more problematic to do classification of arbitrarily sized objects in larger images due in part to an explosion of the number of parameters in the models which leads to a need for very large datasets. We cannot reasonably train a model with fewer examples than number of free parameters.

Solutions to this problem include either doing feature learning or feature engineering. While the former is within the purview of deep learning and out of the scope for our solution we can instead use the later by using classical image processing to extract new features that enables the machine learning algorithm to work with the images.

A classical algorithm used for pre-processing images before feeding them to machine learning algorithms is SIFT features (PDF). The original algorithm proposed in 1999 was considered by many to be a large step forward since it allows to extract a number of features for points on a real world object such that the points extracted from two different images of the same object would be close in the feature space regardless of the scale (size) and rotation of the object. This allows us to compare the features from the same object in two different images.

Each feature consisted of the XY position of a keypoint (eg. corners) as well as a multi-dimensional vector that describes that point in such a way that the description is mostly invariant under different scale, rotation and lighting conditions. While this algorithm have been used by many to do successful object recognition it is today not as often used due to being patented and due to many other new alternative algorithms for extracting image features.

One good free alternative to SIFT (and later SURF) is to use the ORB algorithm (PDF) which combines two other algorithms for keypoint detection and for creating a feature descriptor for each such point. We will base our solution around this algorithm.

Using ORB features for object recognition

Step one is to load an image containing only the object we want to detect, in this case an example with a number of screws.  We give this image to the ORB features extractor in the node image statistics.  With the default arguments we get an output that contain a number of XY points (see table below) as well as a feature vector  f0 … f255 describing each such point.  We can draw a small circle around each XY point in order to see which points have been extracted in the image, and we see that we have a number of such points for each object in the image at key locations such as the head and bottom of the screws.

Next we can train a one-class classification algorithm to match these features. Two options that are included in the default Sympathy are the isolation forest  and one-class support vector machine  algorithms. We will use the former to create a machine learning model that matches features that are present in the first image, while rejecting all other features. Note that by having only a single image with a few screws as a training example we are only doing a very light and cheap form of machine learning, and should adjust our expectations for the end result thereafter.

Before feeding the feature points to the classification algorithm we remove the XY coordinates by using the select columns from table node. We use the fit node to train the isolation forest using the features from the training image and we use the predict node to create a prediction for each of the features in a test image containing both screws, washers and nuts.

The output of the predict node is a single column Y with value +1 for features that match the original features and value -1 for all other features. We can use this Y value to determine a color to be drawn on top of each feature in order to see how the model handles each feature in the test image.

As we can see in the images above we have a large number of positive features (Y=1, white circles) for the screws in the image and mostly negative (Y=-1, black circles) for the washers and nuts.  In order to make a final classification we just need to count the number of features that are positive vs. the number of features that are negative for each object identified in the image. If the ratio of positive to negative features exceed a threshold (eg 0.6) then we classify the object as a screw.

We do this by creating a lambda subflow that takes two inputs. The first input should be the table with a y0 prediction for each feature. The second input should be an image mask.  Note on the main flow you can click on your lambda and select add input port to make these two ports visible so you can give a test inputs to them. By connecting the table with features/predictions as well as an input mask to the lambda it allows you to test-run the lambda on these values when you are editing the lambda.

Once we have our inputs to the lambda we take a look at the content of the lambda. You can right click on the lambda sub-flow and select edit just like you would on a normal subflow. The first thing we do inside the lambda is to use morphology to extend the border around each object, we want to get keypoints that are not only inside the objects but also along the border of them.

After that it is a small matter of extracting the value of the mask (true/false) at each keypoint and summing the keypoints that have value y0=1 and y0=-1 respectively in a calculator node. We do this by giving the XY coordinates for each keypoint to the Extract Image Data node, this gives a table with a single column ch0_values that gives the mask value at the XY coordinate for each keypoint. Next we can use the following expression in the calculator node to compute the ratio of positive to negative features for each object:

What this means is that we require a keypoint to both be inside the mask using the column ch0_values, and multiply it with a check if the column y0  had the value 1.  The output of this will be 1 only for the points that had y0=1 and that had True in the input mask. The sum of all these are the correct column which gives the number of features predicted true by the classifier.

Similarly, by doing the same but comparing the column y0 != 1 gives us the number of keypoints predicted false by the classifier.

The final step is to apply this lambda to the classified data and map it on each input mask, in order to get a classification for each object.

Note that we need to use apply first with the table as input since we only have one table that should be used for all invocations of the lambda. We use the map second since we have a list of input masks to check, and want a list with the outputs. Finally we can use the filter list predicate function to only keep the outputs that had a sufficiently high score.

The final output is a list of all the objects that was classified as a screw. Note that with the given threshold of 0.6 we miss two of the screws. You can experiment with different values of the threshold and with different parameters to the basic classifier (isolation forest) to get better results. You can also try to use the One class SVM node instead of an isolation forest.


We have shown how you can use the built-in nodes in Sympathy for Data to solve a simple image classification task using a simple one-class classifier machine learning node with ORB features as a pre-processing step on the image. The final system can work with only a single example of the object to detect, although with a high miss-classification chance. For better classifications more training examples and/or a different machine learning algorithm can be substituted for the isolation forest, but keeping the general ORB features as the feature engineering method.

Read more

Although usually forgotten, it is extremely important to remember that all these technologies run on computers. It doesn’t matter if they are in the cloud, some internal resources or any external company, at the bottom of the stack there is always a computer! The usual approach when the performance of the algorithm cannot be improved any more and more computing power is still required, is to scale the hardware the algorithm is running on, creating massive computer clusters that are often inefficient, expensive and difficult to understand. At this point one question might have already reached your mind. Why instead of trying to increase and scale the hardware we are running on, don’t we try to optimize it? It would be definitely cooler to have an algorithm running on a 20-node computer cluster than just on a big optimized server, but it is also way more expensive! All this brings us to today’s blog post topic: Linux Containers.

For several years, big data computing has been executed inside virtual machines due to mainly two reasons: resources optimization (the same algorithm is hardly ever running 24/7, but computers are, so usually several algorithms are installed and run in batches or simultaneously) and security/integrity (when different kind of algorithms are installed on a same computer, it is crucial that a breach in one of them does not affect the other). However, although modern hypervisors have close-to-native performances in terms of CPU usage, filesystem access is considerably slowed down compared to accessing the filesystem from outside the hypervisor. This problem can often be minimized, but it is impossible to completely fix due to the fact that hypervisor’s filesystem access has to go through the underlying manager before. To avoid this problem (and still with the security and resources optimization goals in mind), a different technique (known as OS virtualization, jails or containers) has been developed through the years but only reaching maturity and mainstream adoption in the last years[1] with the final implementation on Linux and the growth of Docker and Linux Containers. What differentiates containers from virtual machines is that processes are running isolated from the rest of the system, but still sharing the same kernel and in consequence having access to external devices in the same fashion (and thus, same performance) as the native system. Linux Containers is a tool designed with security in mind, able to run a fully functional OS sharing the kernel and a branch of the filesystem with the host but being completely unaware of the host or other containers existence. In consequence, it is an extremely useful tool for sharing resources on a same host. However, the fact that the kernel is shared between the containers and that limiting CPU and memory resources is still not as effective as with virtual machines, makes it not being a feasible implementation for “the Cloud”, which still lacks behind in terms of storage throughput and latency compared to our internal resources built on top of Linux Containers.

[1] As an example, FreeBSD jails were introduced in 2000 and Solaris containers in 2004, but full support was not finished in Linux kernel until 2013 and first user implementations were not usable until a bit later.


Read more

A cat does not like when its environment changes drastically. It might be stressed and start urinating on furniture to protest. Humans may not behave in the same way, but they have ways of showing discontent both directly and indirectly.

To transform an organization we need information to be able to act objectively. A company has a mixture of structured and unstructured data from which historic events up to now and how well we performed can be extracted. This is where most companies stop. Going further requires much more work.

The first step is to find out why we are where we are. Going here requires data-crunching, deduction, and discussions with domain experts. The results are reported to the management and we are done. The natural continuation is to find out what would happen if the historic trends would continue in the future and what to expect. Once we know all of this we need to find out what our next steps should be and decide on some action. An action is a mutation of the current state of affairs and requires a change, either small or big.

The change could involve minor adjustments which fit within existing processes (easy changes) or major changes which would disrupt existing daily patterns of the employees of the company. People are driven by their interests, and if their interests are harmed by change they would try to hold on to or increase their influence according to action theory. They could do so by (Learning to Change, Caluwé & Vermaak):

“…behaving unpredictably; by concealing information or distorting it; by imposing rules for the game, or, on the contrary, simply ignoring them; by forming coalitions; or by blackening somebody’s reputation.”

There are both formal and informal organizations where the latter is undocumented and constantly changing over time. To make people change (not just the formal organization), Caluwé & Vermaak discusses seven different ways to change things of which the last two are inappropriate to use in professional settings.

  1. Yellow: Change using power and processes to get everyone on the same wavelength.
  2. Blue: Rational change using blueprints with a given outcome (waterfall design).
  3. Red: Change using inducements and/or penalties.
  4. Green: Change by letting people grow through education and learning.
  5. White: Change through self-organization and evolution.
  6. Steel: Change using violence and repression.
  7. Silver: Change through circumstances (“if God wants”).

Which method (among the first five) to choose (or which to combine) depends on the company, the individuals within the company, and the culture. Data based change might often end up in the blue category, but depending on the conclusions made from the data and how much impact the change has on humans and company structures (both formal and informal), blue might not be the best choice. The conclusion is to never forget the humans involved because if the basic needs and relations of humans are disrupted major discontent might be the result. Pure rationality and objectivity do not always work out as expected.

Read more

It seems for me that it is a long way to travel to Sweden to do a master’s in engineering, why did you choose Sweden?
Yes, that is correct. The interest to do a master’s started out during my career as a hardware design engineer back in India. For most part of my bachelor’s in electrical and electronics in India I was interested in projects which was more hardware oriented. Right after graduation I did get a job to assemble/test electric control units (ECU) for solar inverters with a Swedish electrical company, ABB. During this time, I also became curious about how the software in the ECUs was created. Thus, I took the decision to chase a master’s degree and Sweden was close at hand due to its good universities and the positive work culture. 

Why did you choose Combine?
Well, when I reached the end of my studies I got involved in several hiring processes and few of them went far. It was the employees’ market at that time and the lack of engineers was striking. I bumped into Combine at a job fair for graduating students and thought that the guys in the booth was both social and technically experienced within the fields where I wanted to make a difference. Moreover, during the hiring process, I did get a completely different experience. The process was completely an eye-opener for me since the manager who was to hire me, forced me to answer difficult questions like, ‘What is your background?’, ‘Why did you choose engineering?’, ‘What do you want to do in your career?’ Those questions created a feeling that this company really cared about their employees and that feeling is still there. 

Tell me something from your assignment that you are proud of?
In my assignment, I do work with software/model-in-the-loop testing and how to streamline the testing process. In the beginning, I did tests by hand which was a tedious process and not creative enough. It didn’t take the team long before we started looking for other approaches to do tests more effective and increase the test coverage of the software. After some time, we stumbled upon an automatic testing framework and adapting this was both interesting and challenging. During the last year, we have tried with the help of Python and existing tools incorporate an automatic testing framework in a larger scale than what I have seen before. It was an intriguing task to set the whole framework up to work and I am really proud of the teamwork and co-operation during setting the framework up. Now my work is focused more on setting up the test requirements individually rather than the framework for the test itself. 

Is the work as an engineer what you had envisioned?
Well, I didn’t have any clear idea of what an engineer should do. With experience, a picture started to take form that an engineer is someone that solves problems and issues. The problems do vary in both shape and size but yes that is what I and my fellow colleagues do every day. I really enjoy what I do and would love to keep facing more challenging tasks in the future. 

You have been in Sweden for a few years now. What do you think about Sweden?
Well to begin with, I am extremely fond of the ‘fika’ culture in Sweden. It’s such a nice custom where in colleagues get together every week, just to have some cakes and coffee! Moreover, I really enjoy working in Sweden because of the support, flexibility and co-operation that you enjoy at your workplace.  The weather can be difficult at times, but after three winters in Sweden, I don’t feel it is going to bother me anymore. 

Read more

Earlier this month WARA PS, a part of the WASP arena, demonstrated an autonomous search-and-rescue system used at sea. This was done with one UAV automatically searching for people in the water while transmitting aerial images to the rescue center. An autonomous boat followed by a second UAV showing the situation closer to the water surface was guided by the first UAV when a person , in the water; thus making it possible for the person in the water to be saved by the boat.

Almost the same procedure is used when finding people on land. One UAV systematically scans a predetermined area for missing people. If a person in need is found, a second UAV is contacted which flies to the destination and releases a help package, such as a phone or medicine. All of this is performed automatically.

In January this year, a UAV dropped lifebuoys to swimmers in Australia who were in trouble, possible saving their lives.

But not only human lives can be saved by UAVs. Project Ngulia who aims to develop technical solutions to monitor rhinos and thus combat poaching are, among other solutions, looking at UAVs to support park rangers. This summer, a new agreement spanning over three years was signed between the Kenya Wildlife Service and Linköping University. Hopefully, UAVs can become a cost efficient and sustainable solution saving the black rhinos from extinction.

Some of the technologies used in the project are tested at Kolmården Wildlife Park, which is located in close proximity to the technical team at Linköping University while offering a realistic savannah. The test site also renders data that are not highly classified, which is the case in real parks and sanctuaries.

These are just a few examples of ongoing projects that uses UAVs to save lives. However, the sectors of UAV usage have increased greatly. They are now used in marketing, professional film making, construction, delivery, imaging, agriculture, family occasions, entertainment, inventory, weather forecast, environment, insurance, policing, sports and more.

The development and enhancements of UAVs are of great interest of Combine since it contains problems in one of our main fields of expertise. But equally interesting is how the UAVs are used, as saving lives and protecting wildlife are important steps in creating a better planet.

Read more


At Combine, we play board games during so-called “Game Nights”. On several occasions, there have been discussions regarding how to shuffle cards efficiently (i.e. having an unpredictable order of the cards).

A deck of ordinary playing cards consists of four suites of 13 cards each giving a total of 52 cards. The total number of permutations of the card deck is given by:

P^n_r = \frac{n!}{(n-r)!}

where \(n = 52\) is the total number of cards and \(r = 52\) is the length of the sequence we want to generate from the \(n\) cards. Since \(n = r\) we obtain \(n! = 52!\), which is approximately \(8 \cdot 10^{67}\) combinations.

How to shuffle cards have been studied before and also discussed in other ways, and these sources have been used as a foundation for this text.

Given a deck of 52 cards, each card is numbered from 0 to 51 in order (\(F_i = i\)). After shuffling the deck we then know the id of each card in the new order. The Shannon Entropy is then calculated by first estimating the distribution of distances between each card in order:

\Delta F_j = F_{j+1} – F_j

The Shannon Entropy is then calculated using

E = \sum_{j=0}^{51} -p_j \log_2(p_j)

The variable \(p_j\) is a normalized histogram of distances between cards.

The maximum possible entropy is \(\log_2(52) = 5.7\) (measured in the unit “bits”), which is useful as a reference.
According to literature, it is enough to cut the deck and riffle shuffle seven times to obtain an unpredictable order.

Overhand Shuffle

Not everyone is able to perform the riffle shuffle and might instead use the overhand shuffle. We experimented with the overhand shuffle, wrote down the order of the cards for each shuffle, and ended up with the following increase in entropy (the red line is the maximum possible entropy).

The first iteration does not increase the entropy at all since the deck was only cut once without any shuffling taking place.

Hash Shuffle

One idea which was discussed at one Game Night was to shuffle cards using a method similar to how hash values are calculated in computer science. This is a deterministic shuffling method, but by adding some random elements, like cutting the deck, we get some interesting results.

The idea is to choose a number of piles and divide the cards between them. This would force cards to interleave with each other, introducing a distance between them. Just doing this once without any random elements gives the following entropies for different numbers of piles:

If we apply the hash shuffle twice and try all combinations between 2 and 10 piles we find that using 5 piles to start with and then 5 piles a second time again gives the highest entropy. In practice, you should cut the deck between each operation as well.

If you want to repeat the hash shuffle for a fixed number of piles you should obviously avoid 2 and 4 piles.

Riffle Shuffle

The riffle shuffle is claimed to be one of the best ways to shuffle a deck of cards. And, indeed, given the Shannon Entropy measure the riffle shuffle is by far the best way to shuffle according to the following figure:

The entropy rises very fast. Using other measures it is claimed that 7 riffle shuffles should be enough, and more than \(2 \log_2(52) = 11.4\) shuffles is not necessary.


When shuffling you should use the riffle shuffle. Just make sure that you cut the deck between each shuffle since the top and bottom cards tend to get stuck otherwise.

Read more

In recent years, the hype around artificial intelligence (AI) has grown a lot. AI is not a new concept and the term has been around since the 1950’s when computer scientist John McCarthy coined the term. But the concept of creating artificial beings has been around much longer and can be found in greek mythology or in Mary Shelley’s Frankenstein.

AI has also been portrayed in numerous books and movies and some are more interesting than others. There are great classics like Stanley Kubrick’s 2001: A Space Odyssey (1968) inspired by a short story by Arthur C. Clarke. In the movie the ship’s computer HAL (Heuristically programmed ALgorithmic computer) 9000 is the main antagonist. This is one of the first movies that portrayed the idea of a human made AI to the masses.

Another classic movie is Blade Runner (1982) based on Philip K. Dick’s novel Do Androids Dream of Electric Sheep?. In the movie humans have designed intelligent androids, called replicants, that the main protagonist, Rick Deckard, must hunt down and retire (terminate).

One common problem in movies, and sci-fi movies in general is how to balance an intricate and interesting story with special effects and action scenes. The sci-fi genre has so much potential when it comes to explore concepts about science, technology, existentialism, human evolution and the possibility of extraterrestrial life. When the boundaries of our current paradigm is not a hard limit, imagination and science can blend in the most interesting ways. The movies and TV shows below are not based around special effects or action but rather trying to explore what it is to be human.

Her (2013)

(image from themoviedb)

Director: Spike Jonze
Starring: Joaquin Phoenix, Amy Adams, Rooney Mara, Olivia Wilde, Scarlett Johansson
Genres: Sci-fi, Romance, Drama

Synopsis: Joaquin Phoenix plays a man who installs a new operating system with artificial intelligence to help him with various tasks and over time their relationship becomes more and more romantic.

Being able to convey a romantic relationship without the need of a materialized body, using only a voice, is beautifully executed. Most other movies that explore romantic human/machine involvement are doing so almost by cheating. Using an android, more or less indistinguishable from a human, makes it much easier to relate to. Human/computer interaction by voice is not something new and has been around for a quite some time, mostly as an accessibility tool for people not able to use standard equipment. In October 2011, Siri was launched (by Apple) as the first smart phone integrated voice controlled virtual assistant. At the time, the actions it could perform was rather limited, problems understanding voice input and could be seen as more of a gimmick than a valuable tool. Since then other companies have released voice controlled virtual assistants and the biggest competitors are Amazon Alexa released in 2014 and the Google Assistant released in 2016. These products are getting better and better and voice controlled virtual assistants are probably here to stay although they are not perfect.. yet.

Her, artificial intelligence and the concept of time

Humans, like animals are evolutionary equipped to experience some basic aspects of time. What happens when an artificial intelligence emerge with the computational power of living years, decades or millenniums every day, hour or second? Their concept of time will be something completely different which could have huge implications. For some perspective, OpenAI Five  recently competed with five bots against five former professional gamers in the competitive game Dota 2. Everyday the AI played 180 years worth of games against itself running on 256 GPUs and 128 000 CPU cores. When summing up each character (a total of five) it amounts to 900 years every day. Time is relative but even more so for an AI running on a giant cluster and there is a great scene about this in the movie.

Ex Machina (2014)

(image from themoviedb)

Director: Alex Garland
Starring: Domhnall Gleeson, Alicia Vikander, Oscar Isaac
Genres: Sci-fi, Drama, Mystery, Thriller

Synopsis: Meet Caleb Smith, a talented programmer at a large tech company, who wins an office competition to stay a week at the CEO’s remotely located house. When he arrives he is introduced to an android who according to Nathan Bateman, the CEO, has passed the Turing test but he wants more validation before going public with the news. Surprise, surprise.. Caleb starts to develop feelings for the android.

How close is humanity to develop an human-like conscious AI and what happens then?

What happens then?, is a question people have been debating for a long time. Some view this as the inevitable future and hope for the human race and others see it as humanity’s demise. This has also been explored in sci-fi books and movies over the years and The Terminator (1984) portrays one of the darker scenarios for mankind. But how close are we to develop an AI able to pass the Turing test? Probably not very close. Even though great progress has been made in the last few years, with everything from AlphaGo being able to beat the top players at Go to Libratus a poker AI able to beat top players in heads up no-limit Texas hold ’em (and of course the OpenAI Dota 2 bots mentioned earlier), the Turing test is something else to beat. The AI’s created today are heavily specialized and contextual but can’t do much outside their specialization. And in order to pass the Turing test an AI would have to behave like a human when exposed to a multitude of different questions that could range from anything to everything.

The idea of sitting down with an AI, like Caleb, having a conversation and slowly become more and more amazed how well it performs, is thrilling. This is something else than the Turing test because it should not be known beforehand whether it is a human or not, but that doesn’t make it less interesting. Problems will arise when the AI’s starting to become too humanlike and at the same time have their own agenda. Sooner or later it would become really difficult to be able to control all the possible outcomes when dealing with an AI.

Black Mirror (2011)

(image from themoviedb)

Creator: Charlie Brooker
Genres: Sci-fi, Drama, Mystery, Thriller

Black Mirror is a TV show exploring new technology, society and possible scenarios from today to a more distant future. It has been called a modern day The Twilight Zone (1959), also a great show but many episodes can feel rather outdated. Every episode is standalone and explore a new theme. This means that viewers don’t have to see the episodes in a chronological order and can cherry pick the ones that seems most interesting, and even skip the first episode in the first season. This episode is mostly built on shocking the audience and is not representative for the rest of the show. Some favourites:

The Entire History of You (2011)

Synopsis: Some people have an implant recording everything they see and hear.

Think Google Glass but built into the body and it is not hard that this could create all kinds of problems. At the same time a lot of people were really excited about the idea of wearing glasses recording everything and honestly it is really not that far from our current reality.

San Junipero (2016)

Synopsis: Two women meet in a California-esque small town named San Junipero in 1987.

This is not a story about relationship with an AI. This is about human relationships, love, consciousness and how future technology could change our lives for the better.

Hang the DJ (2017)

Synopsis: In a maybe dystopian future two people, Amy and Frank, try out a dating/matching service that always puts an expiration date on all relationships. The reason for this is that each short relationship will give “the system” more data to eventually find the optimal match for each individual using the service.

So what if you could see the expiration date for a relationship, is it something you really want to know. And how would this knowledge change your actions? It is not that far from the classic theme of knowing the day you will die.


Read more

You have been working with electrical systems and software in vehicles your entire career. Why? 

Actually, I’m not into cars or any other vehicles specifically. What drives me is mainly to solve problems in a challenging technical environment. It might as well have been airplanes or something else entirely. It just so happened that cars and buses had the right combination of technology and challenges to attract me.

How has your career developed at Combine?

The first step was into the telematics area. From there I moved on to Infotainment, and currently I am helping a customer with IT and processes relating to software development and software management.

When Combine sends you to help a customer, what can the customer expect?

I have realized that I have a knack for understanding how the processes, support systems and organization in a company are meant to facilitate technical development. Once I understand this, I bring out my broomstick and begin clean-up operations so that things work the way they are supposed to. Consequently, my CV is full of activities such as ”responsible for project documentation”, ”task force leader”, ”team leader” and similar. In my current assignment I also have the opportunity to develop improvements and IT that raise the quality of the customers processes.

If you could choose a completely different assignment, what would it be?

I believe that we have the technology, or the ability to develop it, needed to help solve some of the big issues facing us globally, issues like our impact on the climate. Creating the right incentives and mechanisms, as well as developing the solutions themselves, would be really stimulating and interesting.

Does your technical interest spill over outside your job?

It certainly does. I have been brewing beer for many years and I finally started a microbrewery called Sad Robot Brewing. Being who I am I tried to learn as much as possible about the engineering side of all the steps in brewing processes, such as chemistry and thermodynamics. Just like at an assignment I like things to be clean and controlled, so the only solution was to team up with some friends and do it ourselves. It was just like a second job. I spend less time on brewing nowadays, but one of the things I have done lately is to mentor a thesis project at Combine aimed at controlling and monitoring the brewing process (Editor’s note: you can read about this in the Combine blog here).
I also do some acting in theatre and movies, so not everything is technical. And yes, you can probably figure out what kind of books and movies I like from the name of the brewery.

Read more

Image processing in Sympathy for Data

This is the second blog post in a series of posts on image processing using Sympathy for Data, an Open-Source tool for graphically programming data-flows. See the previous entry for an example of how you can read the time from an analog clock using only basic image processing building blocks. No programming required.


The task

When it comes to object recognition today most people think about deep learning and throw vast datasets onto deep machine learning algorithms — hoping that something will stick. One thing that all such algorithms have in common is that they all have a large number of parameters, requiring an even larger number of examples to be trained. There are two major costs associated with this approach: firstly the computational cost in training the datasets, usually using a single or a cluster of high-end graphic cards; and secondly the difficulty in acquiring large enough datasets to do the training with.  Sure, there exists techniques for artificially extending existing datasets into larger ones in order to help against over fitting, but even these cannot handle the case of datasets with only a hand full of examples. With all the hype of deep learning it is easy to forget that earlier approaches to object recognition, while much more limited in what they could solve, did not suffer from these difficulties and can sometimes still be favourable to be used.

If we look back at when image recognition was first considered as a problem to be solved with computers we see that the problem was at-first greatly underestimated. Back in the summer of 1966 a very optimistic project was started at MIT using only the student summer workers that year and with the aim of solving the computer vision problem. As you can read in the PDF the final goal was, in hindsight, a quite ambitious one indeed:

“The final goal is OBJECT IDENTIFICATION which will actually name objects by matching them with a vocabulary of known objects”.

Needless to say, this task proved more complex that what was first imagined, and have since led the the creation of a whole field of research. It is not until recently, more than 50 years after that summer project that we can say that general purpose object recognition is a more or less solved or solvable problem.

In my previous image processing post we looked at a simple image processing task in reading the time from an analog clock, and showed how this could be solved using the image processing tools available in Sympathy for Data, all without having to write a single line of code.  A major factor in this solution was by limiting ourselves only to images acquired in a very specific way. This solution generalizes more to industrial image processing such as eg. reading a pressure valve rather than doing general purpose like reading like a random clock you find on the side of a building.

In this and the upcoming image processing post I will show how we can use the image processing tools and the machine learning tools of Sympathy to similarly solve an object recognition task under well defined circumstances. These circumstances generalizes again more to an industrial setting, such as analysing objects on a conveyor belt, where we can have a clearly defined environment and camera setup.

For this purpose we will have a camera mounted straight above the incoming objects. The objects are photographed against a neutral background (white) clearly distinguishable from the objects themselves (metallic grey). Furthermore we ensure that the lighting is smooth and even over the whole area and that no sharp shadows are cast by the objects themselves or anything else. In the example dataset used here we use pictures of a mix of fasteners, with the target of identifying the screws. Furthermore we ensure that objects are overlapping since it would require more advanced techniques to separate overlapping objects,  a problem almost as hard as object recognition itself. If we would like to do this in an industrial setting we could use a mechanical solution to ensure this before the objects enter the belt, eg. using a suitable hopper.

Segmenting the image

We will start by solving the problem of segmenting and labelling an input image, with the task of deciding which areas of the image correspond to different objects.  The intention here is to pick out individual objects and to classify each found object whether it matches the target object.

Thus our workflow will contain the following steps:

  1. Separate the image into pixels that belong to objects or to the background
  2. Cleanup this image to remove noise and to completely close all objects
  3. Create labels for each pixel
  4. Extract a list of binary image masks, one per found label.

A typical step in many image segmentation tasks is to use a simple thresholding algorithm. We can use simple thresholding and the fact that the metallic grey objects all are darker than the background paper in order to create a binary representation of the pixels that belong to objects. We start by attempting to use a simple basic threshold at the value 0.5.

Note that we added a filtering step that inverts the image by scaling it by a factor of -1 and adding an offset 1 to it before we do the thresholding. Thus we can ensure that a completely dark pixel (value 0) becomes 1.0 before thresholding and is classified as a “true” boolean after the thresholding.

We can also note that the result of the basic thresholding is quite poor, We incorrectly classify the bottom half of the image as belonging to an object. If we raise the threshold until no background is classified as an object, then we instead start losing pixels from the objects that are classified as background. You can see this effect in the images below, where we have a higher threshold on the right side than on the left side.

Furthermore, just using a simple scalar value as a hard-coded threshold will not work very well if there is even the slightest change in global illumination from picture to picture.

We can use one of the automatic thresholding algorithms that automatically finds a scalar suitable for thresholding. The simplest automatic thresholding algorithm is the mean or median which sets the threshold such that half the image will be True and half the image False. This is however seldom good, and most definitively not good for our application since we are almost guaranteed that background (which is more than 50% of the image) is classified as part of the objects.

Other alternatives to automatic thresholding include a number of algorithms that consider the overall distribution of pixel values and tries to find a suitable threshold. For example the Otsu algorithm assumes that the pixel values follows a bi-modal distribution and find a global threshold that minimises the variance within each found class.

The results of Otsu is surprisingly good for most images, as you can see in the image above. However we note that this algorithm still misses some parts of the objects (see the upper edge of the circular washers in the image above). Sometimes, it is impossible to get a good enough result by just setting a single global threshold value.

Other alternatives exists that perform an adaptive threshold that considers a window around each pixel and calculates a threshold value for that pixel based on this window. With this technique we for instance can easily compensate for any unevenness in the overall lighting.

One example of this is an adaptive gaussian thresholding method. Here we first perform a low pass filtering with a gaussian kernel of size 21 and sigma 11. We take the lowpass filtered value and apply an offset (-0.01) before testing if it is higher or lower than the pixel that is being thresholded. We picked the value for the kernel size based on the overall size of the objects (the circular ones are approximately 20 pixels wide). The offset compensates for small irregularities in the background itself.

The noise on the background can be removed in a later stage using morphological opening. Before we progress to this however we consider one more approach which is to instead extract all the edges in the image and to perform morphological operations to close the objects based on the edge data.  We do this by applying a Canny edge detector to the raw input image (no pre-scaling step needed anymore). As we can see below this method generates no false positives and does capture all sides of the objects.

The interior of the objects can filled in using morphological closing after the Canny edge detector. What this does is to perform to perform a dilation operation followed by an erosion operation where the dilation makes all objects “thicker” by a given radius and the erosion makes them correspondingly “thinner”. Each of these operations are done by checking a neighbourhood around each pixel and taking the MAX or MIN value in the neighbourhood, respectively.

Consider the image on the left side below. In this image if we perform dilation then we get a white pixel in the areas marked red and green and only the area marked in blue would get a black pixel. If we instead perform erosion then we get black pixels in the red and blue areas and only the green area stays white. In the right side of the example below we can see the result of performing the erosion operation followed by a dilation operation. It has first made the white objects significantly thinner, followed by thicker.

For many objects making them thicker followed by thinner would not change the overall shape of the object. However, if two edges both become thick enough to touch each other then there is no black areas in the middle that can make them thinner again. Thus the end-result is that the objects have been closed as can be seen in the images below:

One problem that we can spot with the morphologically closed image is that some objects are now touching each other due to the thickening radius being larger that the distance between the objects which have created small bridges between some of the objects. To compensate for this we can perform a morphological opening that removes the small bridges between the objects. This step also removes all the small dots of false positives given by the thresholding algorithm if that one is used instead of the edge detection.

For the final step before we can start working with the objects it to use labeling to create a unique ID for each object. The labeling algorithm takes a binary image as input and creates an image with integers for each pixel. The integer values of a pixel correspond to a unique value for each object. If there were even a single pixel linking two objects to each other then both objects would be assigned the same integer value. We can visualise the result of this step by clicking on the object, this gives a pseudo-colour for each object based on a default colour map.

Note that since objects that are close to each other have similar ID’s then they are mapped to almost the same color. The ID values assigned differs even when not evident in the image below:

One final node that is useful is to create a list of all the found objects. The node Image to List can be used to convert the labeled image into a list of images. Use the configure menu to select “from labels” to do this conversion.

As we can see in the preview window below we have a list that contains many images. Each entry in the list is an image mask that is true only for one single object (as defined by the unique ID’s given by the labeling operation). We will use these images as the inputs to our classification algorithm to detect the individual objects.


In this post we have looked at the segmentation problem and shown how simple thresholding or edge detection algorithms can be used together with morphological operations and labeling to create a list of objects in an input image. This list of consists of a mask singling out each individual object in the image, one at a time. In part 2 we will continue to perform the classification of each found object.

Read more
Contact us