What I like most about Combine is that they take care of their employees and listen to them to create a great workplace. They are also very clear about which competences they possess, so that both new employees and clients know what to expect. My job will focus on creating long lasting relationships with all parties. The employees shall know that their assignments will be chosen from an individual view, with their background and experiences. Clients shall know what Combine stands for and what we contribute, which is more of partnership than a customer/supplier relationship.
But then again, we are suffering from Covid-19, which is affecting all industries and companies, not least in the consulting sector. How I am supposed to develop a company under these circumstances? One should be humble when facing this pandemic, but my strategy will be to make the new become the normal. Combine as a company is used to digital meetings and projects performed from a distance, for example doing work for American and Asian companies. We have the potential to manage projects like these even on a local scale, which renders even better prerequisites for satisfied clients.
I will put Combine on the map, at least on a regional scale. I look forward to getting to know all the contacts I will make in my new position. Before the end of 2021 we shall double the amount of people tied to our office in Linköping and during the upcoming years we look forward to doubling the workforce again. I am excited that, together with Rikard Hagman, carry through our vision for Combine and I believe that there are people who feels just like me. Let us know so we can do this together instead of waiting for Combine to headhunt you.
The upcoming year will be one of my most thrilling years. I believe that many people will move between companies and we will work proactive to connect with the people we want to grow with. I want to represent good leadership and mediate a great feeling, a transparent way of working and supply possibilities to develop.
Large industrial plants consist of hundreds if not thousands of sensors, constantly collecting data from the numerous processes that make up the plant. These sensors then transmit this data not only to the control room, but also to controllers placed throughout the plant. In turn, these controllers use this information to calculate the appropriate control signals and send them to the numerous actuators of the plants. These actuators, consisting of pumps, heaters, valves, etc., then apply the control signals on the industrial processes. Hence, it can be said that there is a mass of communication happening during the running of an industrial plant, as signals are consistently sent from sensors to controllers and from controllers to actuators.
These communication requirements pose challenges when designing the plant. Traditionally the communication has been done using wires to connect the sensors, controllers, and actuators where appropriate. However, this is an expensive and space-intensive solution. Furthermore, wired communications are inflexible as if one wishes to move a single sensor to a different location, this may necessitate new wiring. Therefore, there are unquestionably considerable benefits to be gleaned from utilizing wireless communications for industrial plants.
However, wireless communications pose limitations on their own. The chief among them is the question of reliability. Wireless communications are generally considered less reliable than wired communication. This means that with wireless communication, the communication channels between the controller and actuator and between the sensor and controller may be subject to the possibility of packet losses and packet delays.
With wireless communication, there may be an unreliable channel between the sensors and the controllers and between the controllers and the actuators.
Control in the case of unreliable channels poses a problem in itself. The unreliability of the channel forces us to consider how to adapt our control scheme to compensate for the fact that signals sent by the controller may arrive to the actuator delayed or may not arrive at all. Furthermore, the delays from unreliable communication channels tend to be stochastic, so conventional control strategies for handling constant delays cannot be used.
Is LQG control the solution?
LQG control is an attractive proposition as it is based on the expected outcome, and hence can be expanded to consider the stochastic nature of the communication channel. Furthermore, if one assumes that acknowledgments ensure that the controller knows which signals have reached the actuators, the separation principle can be proved to hold. This is a key principle of LQG control, which states that the control problem and the state estimation problem can be solved separately. What this means in effect is that we do not need to concern ourselves with the unreliability of the communication channel between the sensors and controllers when deriving our control input. Conversely, we do not need to concern ourselves with the unreliability of the channel between controller and actuator when deriving our state estimates.
For state estimation, only relatively minor modifications to the conventional Kalman filter is required to compensate for the possibility of packet delays and packet losses between the sensor and controller. However, when implementing LQG controllers, the required modifications are somewhat more extensive. In this case, the optimal control signal depends not only on the states (as in conventional LQG control) but also on past inputs sent by the controller but have not yet reached the actuator.
If we evaluate our optimal LQG control solution (modified to take delays and package losses into account) and compare it to the standard LQG solution on a model of an inverted pendulum, we get the result below. As can be seen from the figure below, compensating for the delays results in considerably less deviation from the reference while requiring far less control input. Thus, the importance of taking the unreliability of the channel into account can be shown.
Comparing the optimal LQG solution (adapted to take into account the unreliability of the channel) with the standard LQG control.
As the developers have published a version in PyPI, installing the Accelerator now is as simple as running pip! As we are mostly using Python3 and we are testing and developing most of our code locally before sending it to our processing server, we can simply install it like this:
pip3 install –user accelerator
Then, the accelerator tool becomes available with the `ax` command!
Once the accelerator is intalled, it’s time to configure it! The new version if fully documented in the manual, but as a starting point it should be enough to run `ax –help` and do the same for the small subset of commands provided. Then, to actually get started, we run `ax init` in the working directory where we wish to create our first project. Usually, we would configure the accelerator locally for every project, and therefore run all the comands from the corresponding directory. As that is not necessarily the case for every developer, one could also setup a global configuration file (e.g: /etc/accelerator.conf) and simply add `alias ax=ax –config /ect/accelerator.conf` to the .bashrc file (assuming bash as the default shell).
Running `ax init` will create a set workdirs and files in the directory where it is run. The most important there is `accelerator.conf`, which contains the basic configuration. There we can configure thinks like the amount of threads that we want to run or where we want to the results of our scripts to land. Luckily, the stardard one created by `ax init` contains sane defaults and comments that help understanding what the different options do.
Finally, to start the server process, it is enough to run `ax server` from the project directory. Now the server is listening and to send some work, run `ax run “PIPELINE”` from another terminal. Starting running the tests can be a good start: `ax run tests`.
The new changes to the accelerator do not only come from the installation side, but also from the User Interface part. Now, it is possible to get detailed information about the tasks that have been run, their options, datasets, or results from the browser. For that, it is as simple as running `ax board` in the corresponding working directory or setting the `board listen` parameter in accelerator.conf. The board will start a server process that is by default listening on localhost:8520, which is of course customizable. There, we can check the status of the accelerator server process and search the results that were returned by our jobs, the options that were passed or which dataset was used. In addition, it’s even also possible to see the source code of the script that was imported to run a specific tasks. All this together, allows for a much better user experience and greatly helps debugging, specially when getting started using the tool.
Some additional extra improvements include adding a `grep-like` command to search for patterns in datasets and merging urd’s (the part that keeps track of tasks) functionality into a single tool, not requiring urd to be set up independently anymore or the ability to configure the python interpreter for every method.
To understand this, we need to define a broader term ‘Vehicle Energy Efficiency’. Fuel Efficiency is generally described as a ratio of distance traveled per unit of fuel consumed(source: Wikipedia). Since fuel is a form of energy anyway, this term applies to fuel efficiency as well, albeit in a broader sense. When we consider the vehicle’s energy efficiency as a whole, other factors such as carbon dioxide emissions, output energy over a drive cycle, amongst others, come into play. Let’s face it, owning a car is a luxury and a privilege, irrespective of the number of arguments we stack up to defend such a statement. “I need the car to pick up and drop off the kids every day”, “Without the car, my wife and I would find it increasingly difficult to manage both our busy schedules”, some would say. Increasingly demanding, yes! But not impossible. Because here’s the thing with cars: THEY LEAVE SOME SERIOUS CARBON FOOTPRINT BEHIND!
According to the European Union official website, passenger cars account for around 12% of the total EU emission of carbon dioxide. There is thus, a dire need for more stringent regulations for the emissions of passenger cars. This primarily motivates the auto industry to focus more on “energy efficiency” to meet the target requirements of each new year. Having worked on an assignment at Volvo Cars within the Vehicle Energy Efficiency team helped me gain a deeper understanding of how the reduction of emissions and setting stricter targets every year is crucial in the long run to contribute positively toward a cleaner environment.
A typical workday would go on like this: ensure that the correct vehicle models were received to be simulated and to verify if the road load coefficients matched the models received. Once all the vehicle models with all the necessary parameters were obtained, then came the main and the most crucial step: running the simulations to obtain results for Fuel Consumption, CO2 emissions, and other relevant parameters which give us an idea of how much emissions each vehicle model belonging to a fleet or market was releasing. Sometimes, a part of the job description also included something called “Correlation Analysis”. The main goal of such an analysis was to “correlate” the results from the virtual tests we’d performed (via simulations) to the results of the physical tests carried out in the testing lab. This was primarily done using Sympathy For Data, Combine’s internal tool for data processing, and data visualization.
Anyway, the results of the simulations run, as well as the correlation analysis, were then presented to the Attributes Realization Engineers, who would then draw a comparison of the models to see which did and which did not meet the emission certification standards required. Keeping the fierce spirit of competition in mind, it was thus a no-brainer that the emission numbers of other auto manufacturers were compared. What possible tuning of parameters could lead to better numbers was discussed.
The goal of reducing emissions is of such a high priority to these automakers that there is even a comparison of emission numbers for vehicles of the same model but having a small difference in components for eg., in my case, it was generally a comparison between a vehicle having a 7-speed gearbox and a similar vehicle having an 8-speed one. While this change might seem insignificant and small, a different gearbox introduces a change in the vehicle mass, the gear ratios become entirely different, and of course, the gear shifting RPM changes as well, all thus directly or indirectly contributing to minute change in fuel economy per drive cycle or road trip.
Thus, the underlying goal remained the same: if there was any way, any way at all the emissions of the vehicles could be possibly reduced, go for it! Of course, the actual numbers I cannot mention here as such information is generally considered “hush-hush”. But I’m assuming you get it: Cars are something which people will only keep buying and buying either for their convenience or even indulgence. And because vehicles, in general, have become so embedded into the web that we like to call life, to think that we will, on a fine Saturday morning, reduce or possibly even stop the usage of passenger cars altogether is a farce. Even the new Electric Vehicles (EVs) currently available in the market have almost twice the carbon footprint during the manufacturing phase compared to conventional ICE vehicles even though their subsequent emissions during usage are practically negligible. And for the entire world to embrace EVs would still take a significant amount of time, especially the developing and the under-developed countries. This, in turn, means the pressure is back on the manufacturers of conventional ICE (and hybrid) vehicles to amp up their technologies to produce less damaging cars to the environment.
I guess what I’m trying to say is this: In a corporate world, where the general impression of people working in the auto industry, put forth by pressure groups, is that of a person who’s “sold their soul”, it is heartening to know that engineers like us, directly or indirectly, are making whatever minuscule efforts we can, in our capacity, to bring about a positive impact as far as the environment and the struggle of achieving climate-change goals is concerned.
Use case 2: Automatic species identification in the Kosterhavet National Park.
Figure 1: Kosterhavet National Park Map
The Kosterhavet National Park is one of Sweden’s most important and unique marine environments and is not only the first but also the most species-rich marine park in all of Sweden. The area obtained official protected status in 2009 and monitoring changes in the marine environment has become a top national priority. Understanding this complex ecosystem and its development in light of a warming planet and increased human activity is crucial to ensuring its survival for generations to come.
Initial problem formulation:
Researchers studying the changes in the marine ecosystem over the last 30 years at Kosterhavets National Park face many challenges, including:
· Storing and accessing observation data in a centralised and standardised way
· Identifying species when most of the captured footage contains little to no fauna/flora
· Analysing past footage with poor visibility conditions and/or low camera resolution
In the past, this footage has been examined and annotated manually by experts in marine biology. Given the advances in data science, this is simply not an efficient use of their time, and we believe that machine learning techniques could help to automate the parts of the process that significantly slow things down.
Initial Research Question
Could we set up the data infrastructure for a highly-performant object detection model to help us detect an important habitat-building marine species (Lophelia Pertusa) in real-time footage?
Figure 2: An example of Lophelia Pertusa
Figure 3: ROV used for monitoring seafloor
For the last 30 years, the scientists at the Tjärnö Marine Laboratory have used Remotely Operated Vehicles (ROVs) and underwater cameras to monitor the area around the Kosterhavet National Park. The movies and images available to us today not only present an otherwise unseen part of the national park, but they also can take us 30 years back in time. In our research, we want to study how climate change and human activities influence the fauna in this area and which positive effects the protective status as a national park had on the seafloor habitats.
Data Annotation Workflow
Citizen science platforms allow researchers to educate others on topical issues such as marine biodiversity and environmental management, whilst also benefiting from crowd-sourced data annotation. In this project, we used the Zooniverse platform, which allowed us to upload the captured footage directly and present these as part of online workflows for citizen scientists.
The data annotation process consists of two separate workflows on Zooniverse. The first provides a 10-second video clip and then guides annotators to select the species they see in the clip with an extensive tutorial (as shown in Figure 4). Annotators may also provide the time of the first species appearance during the clip.
Figure 4: Screenshot from workflow 1 of Koster Seafloor Observatory on Zooniverse
Using the information from the first workflow, we can reduce the amount of footage significantly (by excluding all footage with no identified species) and extract relevant frames by species based on an aggregation of the citizen scientists’ input.
Figure 5: Screenshot from workflow 2 of Koster Seafloor Observatory on Zooniverse
The second workflow then prompts annotators to draw bounding boxes around the species they identified earlier (see Figure 5). These annotations are then aggregated and used to train the object detection model.
Setting up the infrastructure for storage and retrieval of footage and metadata is a crucial part of ensuring the project’s longevity and scalability. For this purpose, we used a high-performance Linux-based project-specific server hosted by Chalmers University of Technology, Gothenburg. We also created an SQLite database to keep track of the information related to the movies and the classifications provided by both citizen scientists and machine learning algorithms. The database followed the Darwin Core (DwC) standards to maximise the sharing, use and reuse of open-access biodiversity data.
Automatic object detection model
The machine learning model chosen for this task is generally referred to as a single-shot detector, which means that the model looks at the entire image and predicts multiple objects inside the image at once, as opposed to region-based models which first identify regions of interest and then detect features in these regions as a second step. This leads to faster detections in most cases and allows these models to predict in near real-time. For our object detection model, we used the YOLOv3 architecture by Redmon and Farhadi (2018) . As the name suggests, the third iteration of the YOLO model offers improvements. Still, it essentially uses the same building blocks as its predecessors, such as Darknet-53 as the feature detector, adding useful enhancements such as a multi-scale detector at three different scales to detect smaller objects. Our model was based on an open-source implementation to ensure that our results could be easily replicated and also be kept up to date with advances in future YOLO releases.
Figure 6: YOLOv3 model architecture
See the model in action (video)
Visualising our overall workflow
Figure 7: Flowchart showing the entire Koster Seafloor Observatory as an end-to-end process
Tools in development and education
In line with our goals at ODF to make our project outputs accessible and maximally flexible, we have created a web app that allows non-technical audiences to upload their own footage of Lophelia Pertusa corals and obtain predictions directly from the model. To increase understanding of the model, users can also tweak important parameters that affect model output, such as the confidence threshold and overlap thresholds of various bounding boxes.
Figure 8: Screenshot from the web app (to be released soon)
We open-source all our code in line with ODF objectives. Here are the repositories associated with the Koster seafloor observatory:
· Our object detection model was able to analyse 150 hours’ worth of footage in just 30 hours.
· Our open-source database is now a valuable resource contains well-documented observation data for future research on marine environment management in this region and beyond.
· There is room to expand to other important habitat-building species as the model has been successfully validated on Lophelia Pertusa, and to continue adding footage as more protected areas get added.
The challenges ahead
No machine learning model is ever perfect, and as such, we still face many challenges that will be addressed in future iterations, including:
· Dealing with low-confidence or low-consensus predictions
· Improving underwater image quality for older footage
· Expanding the model to include more key species
· Improving model confidences by adding feedback from the model into future training data.
The ROV we are using has a variety of sensors, such as IMU, magnetometer, barometer, ultrasonic sensors, and a camera. An HKPilot handles the sensor data while all the computations are, since last year, made on a Raspberry Pi also mounted on-board. This change increases the ROV’s autonomy, and the ethernet connection to the workstation is now only used to transmit commands to the ROV and to be able to watch the live feed from the on-board camera.
Several different autonomous modes have been implemented. It is, for instance, possible to follow objects, once the camera has found the object in question (such as a yellow ball). It will then be able to maintain the same distance to it as it moves. Furthermore, it is possible to position the ROV in the pool and to maintain position while being exposed to streams or other movements, e.g. by people. Of course, an operator can control the ROV by hand via the ethernet connection. With the wide range of sensors mounted in the ROV today, the door is open for future development of even more advanced autonomous modes.
The codebase, as of today, consists of a mixture of manually coded functions as well as generated code from Matlab/Simulink models. After the ongoing project, all code will hopefully be generated, which will give the development environment a lower threshold for new users.
Now, a deep dive (pun intended) into how the controller was updated during the last year. Previously, the control system of the ROV consisted of six different PID controllers, one for each degree of freedom (DOF). It resulted in poor performance when controlling multiple DOFs simultaneously. The reference generated only excited one DOF at a time, which made the pathfinding inefficient. For this reason, a Linear Quadratic Regulator (LQR) was implemented, and the system model was linearized.
The implemented LQR is a multiple-input-multiple-output (MIMO) controller which minimizes the criteria
which should be interpreted as minimizing the difference from requested states as well as the amplitude of the input signal. The model setup for the LQR is
The optimal linear state feedback controller, assuming known states x, is given by
where L is given by
The controller was implemented in Matlab/Simulink, as can be seen below. The central part is the matrix multiplication of the gain matrix L and the control error. The other crucial element is the feed-forward block that derives the input signals for maintaining the current attitude and depth of the ROV. The difference in result between the LQR and the PID control system can be seen in the picture below, although it is hard to visualize a 3D-path on paper.
Model predictive control (MPC) is the big brother in control theory. One common interpretation is that as long as one has access to good computation powerMPC can control very complicated systems optimally. There is a bunch of reports on the world wide web that shows simple simulations of cars driving onroads, using MPC, which might lead one to think that this is a great way tooperate self–driving cars.
We set out to find out just how accurate this claim is and the limitationsMPC has within this field. At the heart of the MPC algorithm is the optimization problem. It is the job of the optimizer to calculate the best series of controlinputs for the car in its given environment. This thesis has, therefore, evaluatedthree different algorithms. Two of them are Sequential quadratic programming(SQP) and Interior-point (IP) method, which are powerful and very commonoptimizers for non–linear problems. The third optimizer is the Proximal AveragedNewton-type method for Optimal Control (PANOC), which is a new state ofthe art optimizer.
The optimizers were tested in a simulation where a road was modeled togetherwith some obstacles. The Collision cone method was used as a collision–avoidance system, and it works by calculating the subset of relative velocities that thecar can have, which will lead to a collision. Then the optimizerchoose the best velocitythat is not within this calculated subspace. The collision cone only works if theobstacle is moving linearly. Thus we developed our non–linear version. Thismethod calculates the predicted collision point, which is shown as a black dotin the figures below and applies the collision cone constraint on an obstacleprojected onto it.
All the optimizers managed to drive the track but with different results. Inshort SQP and IP are not performing very well in terms of computation speed, but SQP converges optimally. PANOC, on the other hand, converges very fast,around 33 ms per optimizer call, and is not as sensitive to disturbance. Unlikethe others, PANOC can also be initiated from any arbitrary starting point andstill manages to converge well.
What follows is a recommendation of application for the optimizers with thisthesis as a foundation for the rating.
IP is recommended for lightly constrained problems. The environment should be a linear and smooth environment. If the solution is located close to or on difficult constraint borders, IP will have a hardtime converging to a solution.
SQPis best used in medium to highly constrained environments that arenot time–critical. If the algorithm can be given enough computational time, it will improvethequality of the solution and is, therefore,suitable for systems that are unstable or ifthe need for an optimal solution is high. For mildly constrained environments,theneed for a warm start is not that high. The algorithm can be used in non–linear,non-smooth environments, given that it is warm started with a guess close tothe solution.
PANOCcan be used in time–critical systems due to its superior convergencerate. It will rapidly provide a solution. It can also be used in highly constrainedenvironments and even non-smooth and non–linear ones. It should not be used insaid complicated environments if the system is unstable or if the requirementson optimal output are high since it is not reliable. The output should be checkedso that it meets the specified requirements. PANOC is a suitable choice whenthere is no way to warm start the optimizer with a good initial guess since itcan converge fast from any initial state.
Why PANOC is superior in terms of computation speed is due to that it isconstructed with very cheap operations, and it cleverly combines the proximal operator with a quasi-newton method. These two methods togetherdeliver an algorithm that is both fast and, thanks to the proximal operator,can converge in non-smooth highly non–linear environments. PANOC exhibitsa couple of strange behaviors, and the cause of these was not found. It doesnot converge optimally from time to time, and this behavior does not improvewith increased computation time. PANOC is definitely a worthy candidate andwill most probably find its way out in the market due to its superior advantagein the field of non–linear non-smooth applications such as embedded MPC controllers for autonomous systems.
In general, the environment that has to be modeled for a car to be able todrive in a realistic setting is to complex for current optimizers. MPC can not beused as a controller on its own. For it to give reliable results, the problem has tobe simplified a lot so that more accurate response time and convergence can beguaranteed. One possible solution could be to split the optimization probleminto two parts. For example, by having one optimization algorithm focusing on driving in its lane and avoid collisions, while another one determines when it’s possible to perform an overtaking maneuver. This could simplify the environment enough to allow thealgorithms to behave more reliably.
Combine is proud of the open-source history of Sympathy for Data and to be able to offer the product free of charge for everyone. A tradition we plan to continue in the form of an open-source edition of Sympathy for Data. With the introduction of Sympathy for Data 2.0.0, we decided to introduce a new edition, called Sympathy for Data Enterprise, a commercial product built on top of the open-source edition which extends the functionality with enterprise-tailored features and specialized toolkits. From the start, it includes a machine learning, time series and image analysis toolkit. The number of toolkits and the overall enterprise functionality will be extended with future versions.
We will offer the enterprise edition as a subscription-based license, including support, maintenance and continuous updates with new features and functionality. As usual, we are working in a customer-centric way to bring the best and most requested features for the new toolkits as well as the platform to our enterprise customers. To not forget our valued users of the open-source edition, we will also include selected features and improve functionality in this edition.
With the release of Sympathy of Data 2.0.0, we also changed to semantic versioning and a new release cycle which we will describe later. In general, releases of the enterprise edition will follow the releases of the open-source edition.
What is new in Sympathy for Data 2.0.0 and 2.1.0
Time Series Analysis Toolkit (Enterprise only)
The Time Series Analysis Toolkit provides a set of specialized nodes to work with time-series data and to perform signal processing, including dynamic time warping, time-series aggregation (PAA, SAX, APCA), time series transformation between different time and frequency domains, and more generic nodes for finding local min/max and plateaus.
Machine Learning Toolkit (Enterprise only)
The Machine Learning Toolkit provides nodes to describe, analyze, and model data. It provides nodes for features selection, dimensionality reduction and data pre-processing as well as supervised and unsupervised machine learning algorithms, including K-nearest neighbours, feed-forward neural networks, support vector machines, decision trees and ridge regression. On the advanced clustering methods side, the toolkit contains nodes for density-based spatial clustering (DBSCAN) and agglomerative clustering amongst others.
Image Analysis Toolkit (Enterprise only)
The Image Analysis Toolkit provides a set of specialized nodes for working with image data, including functionality such as property extraction from labelled images, stereo image analysis, generating noise as well as de-noising images, object identification using pre-trained Haar-cascades or simple template-matching.
Furthermore, the toolkit contains utility nodes for arranging images, converting figures to images, or converting between different data types in images and nodes for post-processing data, for example for selecting a relevant subset of lines from a Hough transform or matching key points identified in two images.
Indexed ADAF files
From version 2.1.0, we store a structure index in the ADAF files. It makes repeated queries of the ADAFs structure information much more efficient. For operations that primarily query the structure, for example, listing the available signals and their types, this leads to significant speedups.
Cursors for ADAF and Table viewer
In the latest version, we added cursors to the ADAF and Table viewer. It is a quick way to measure the difference, min, max, and mean values of the selected interval. The data between two cursors can also be exported as a CSV file for further analysis elsewhere.
Signal search for ADAF viewer
Version 2.1.0 adds a new view for searching for signals within an ADAF. It makes it more convenient and faster to locate signals when working with ADAF files containing many different rasters.
New figure wizards
With version 1.6.2, we introduced a new way of configuring new figure nodes with the help of a wizard. In version 2.0.0, we added new wizards for creating histograms, heatmaps, image, and timeline plots. Furthermore, the wizard view is now shown by default when configuring an empty figure node.
Undo in calculator
The powerful Calculator node has finally received an Undo/Redo functionality. It is accessible through buttons in the column toolbar as well as keyboard shortcuts. To make room for the extra buttons, the layout received some optimizations and shows each calculation in the list on a single row.
Sympathy for Data can now be installed without requiring elevated permissions (administrator) in all cases.
With the release of version 2.0.0, we changed to semantic versioning (major.minor.patch) with guaranteed backwards compatibility for minor and patch versions. The general plan is to release one major version and one to two minor versions per year.
If you want to know more about Sympathy for Data, its use cases, functionality and the different editions or you have comments and suggestions for the product, don’t hesitate to contact us at Combine (email@example.com) or check out your webpage https://www.sympathyfordata.com.
We at Combine saw this situation as an opportunity to build a tool that uses machine learning to break down the language barriers to official information. Using the Google’s Universal Sentence Encoder (GUSE), we took advantage of a pre-trained model that has been exposed to texts in 16 different languages. It is also able to perform semantic retrieval, which looks for the closest semantic match (closest in meaning) to another query from a list of known queries. It uses both a CNN and a BERT model to achieve contextual embeddings that help us deal with more complex queries. More on the technical details can be found at https://arxiv.org/pdf/1907.04307.pdf.
Using GUSE, we were able to train a model using the questions and answers found on the website of the Swedish Public Health Agency (Folkhälsomyndigheten). We then created a simple UI that allows anyone to write a query in one of the supported languages and also choose the language in which they would like to read the recommendations. See below for an example of the question “Can I go to work tomorrow?” in Arabic.
Since this model is trained with only around 100 questions, we cannot expect perfect results, but it still performs reasonably well, especially on shorter queries that use targeted keywords such as “vaccine” and “symptoms”. As we continue to add more data, the performance should become more and more reliable.
The live version of our web application can be found at http://c19swe.info. We hope that this will provide a useful service to those looking for information on official recommendations in other languages that reflect Sweden’s linguistic diversity so that we can all work together to flatten the curve.
Another issue for the scientific world is how to gather all the information from different sources and regions. To date, around 29 000 scientific papers have been published focused on Covid-19 (https://www.theregister.co.uk/2020/03/17/ai_covid_19/). To go through and categorize, analyse and join the information from all these papers is a huge task if done manually. But, this is where AI could come into play. Using Natural Language Processing (NLP) techniques, an AI model could go through and gather all important information from all papers in a fraction of the time it would take a human to do the same. Without a doubt, more research on the subject will emerge and the dataset will only grow from here, so utilizing the power of our computers and AI methods will be essential to keep growing our knowledge globally going forward. In fact, this work has already been started in the U.S, and an open dataset is now available (https://pages.semanticscholar.org/coronavirus-research). This dataset is meant to be machine readable, to enable machine learning algorithms to utilize the information within.
Lastly, we would like to mention that the largest social networks have announced that they will pick up the fight against misinformation on their platforms. As history already has proven, advertisements and misinformation online can really affect the populations behaviour, so this announcement is well received.