The life and times of battery cells – a machine learning perspective - Combine | Combine

The life and times of battery cells – a machine learning perspective

"There is no question that the electric vehicle has a great future". Surprisingly, it is not Elon Musk who is credited with these words, but W.C. Anderson, president of an electric car company, a little over a hundred years ago. Unfortunately, they were uttered a bit too early, in 1912 – at the same time as Ford was breaking sales records with its Model T, mobilizing the masses at an unbeatable price of 650 USD (roughly 17000 USD today). The future of electric vehicles looked bleak as the world embraced fossil fuels. And now ours does as well, for very similar reasons. The incredible progress in mobility needs to be put on a sustainable foundation for us not to go the way of Mr. Anderson's products. Of course, it would help not to repeat the mistakes of the past – after all, it was the energy density and abundance that made oil the dominant energy source. Which brings us to batteries.

Kickstarted by the need for portable electronics such as phones and laptops and fueled by the increasing demand in storage capacity, battery technology has seen unprecedented improvements over the past decade.  Nevertheless, even the most advanced battery types share a common trait – they degrade over time, decreasing the amount of potential energy (or capacity) they can store. After they drop to about 80 % of their nominal capacity – typically after several thousand cycles – they are considered at the end of their so-called remaining useful life (RUL) and usually discarded. But the end of RUL does not have to mean retirement. Batteries are prime candidates for second-life applications in systems featuring lower requirements, stationary applications or auxiliary units. But how can we determine how long the battery has left until absolute failure? After all, any investment in cycling applications requires a safe assessment of remaining capacity and projected degradation trajectory. Enter another promising field experiencing rapid growth – machine learning. 

One can argue that the most crucial part of machine learning is acquiring wellannotated data. Even the most complex model will not be able to make sense of a dataset that is too small, missing key variables or of poor quality. Without good quality data, the model choice or parameter tuning is of little importance. The team behind ”Data-driven prediction of battery cycle life before capacity degradation” (Severson et al., 2019) provides an extensive dataset monitoring battery degradation. In their paper the authors claim the dataset to be the largest publicly available dataset so far, containing close to 97,000 charge-/discharge cycles of 124 commercial LiFePO4 battery cells. 

During the charge-/discharge cycles, the team continuously monitored voltage, current, temperature and internal resistance of the batteries. External factors influencing battery degradation were limited during the data generation process by performing measurements in a temperature-controlled environmental chamber set to 30 °C.  The batteries were subjected to varied fast-charging conditions to facilitate different degradation rates while keeping identical discharge conditions.   

In the feature engineering process, the authors found a single feature to have a linear correlation factor of -0.92 with the log of RUL. This feature was the log of the variance of the difference between discharge capacity curves, as a function of voltage, between the 100th and the 10th cycle. Thus, the engineered feature could be used by itself to achieve great prediction results of RUL after the 100th charge/discharge cycle. 

Using Elastic Nets, with default parameters, we obtained the following prediction results after the 100th cycle. 


Note that we decided to use the first cycle as a reference when creating the feature instead of the 10th suggested in the paper. Next, the degradation of five (randomly selected) individual batteries is shown together with their predicted RUL. Its exact value is difficult to predict correctly but may be of sufficient precision for simple classification tasks. 


While the engineered feature used has an outstanding correlation with RUL it is nevertheless very restrictive. Using such a feature basically means performing 100 charge/discharge cycles in a controlled environment before a prediction is possible. In a commercial setting, such a setup will be too time-consuming, costly and thus impractical. Therefore, finding other features that are less restrictive but still offer good predictive performance is important. For example, we found that using the first cycle as a reference instead of the 10th is a suitable candidate for predicting RUL, increasing the commercial viability of the prediction method. The figure below visualizes the correlation coefficient between the feature and RUL for each cycle up to 200 cycles. 


The figure tells us that the linear correlation is the largest around 100 cycles. It might still be possible to use the feature already after 40-50 cycles with a smaller reduction in performance. Keep in mind that this is the linear correlation most indicative of prediction performance for linear machine learning methods, while non-linear methods can find useful information for prediction even though the linear correlation is low. 

Until now, we have only considered one feature for predicting RUL, but several more can be engineered from the charge-/discharge cycle data. Introducing more features leads us to another problem, namely, feature selection. There are regression methods that can report on the feature importance for prediction following their training. An example of such a method is the Random Forest Regressor (RFR), which is also a non-linear estimator. Supplied below is an example of feature importance an RFR after fitting. 


Using the smallest subset with a joined total importance of more than 0.8 the top 6 features were selected and the following prediction chart was obtained on test data. 


As can be seen, the predictions are the best when the RUL is smaller than 200, but between 250-1500 the mean prediction is close to the RUL at the cost of increasing variance. Only half of the battery cells live longer than 850 cycles, which decreases the amount of training data for larger RULs and will introduce a bias towards better predictions for smaller RULs. 

The eager reader might be wondering about the elephant in the room by now – what about actual batteries? After all, they would be the focus of any real battery lifetime prediction efforts. The discussion above, however, has considered individual cells, tens or even hundreds of which are usually combined into battery packs to obtain the necessary voltage and current in commercial applications. Unfortunately, no public dataset of the same magnitude as the Stanford study is available. As a stopgap measure, we can construct a collection of virtual batteries by simply aggregating the cells to mimic existing products – e.g. a 72Ah LiFePO4 pack. This method is akin to bootstrapping, in this case choosing 68 cells with replacement from the dataset to obtain a collection of training and testing packs. It is also a poor approximation of reality, since it does not model any of the possible complex interactions within a pack. It thus servers more as a preview of possible future studies. We can then train a Random Forest Regressor on the whole lifetime until failure of the training selection and predict RUL of the testing collection. The figure below presents the resulting predictions in orange against the blue RUL lines of five selected packs, showing not only their striking similarity but also a high accuracy of our predictions (\(r^2\) ≈ 0.99). It serves as a depiction of our main idea – by aggregating the cells together we effectively “smooth” over their individual impurities, removing uncommon outliers and thus enhancing the predictive capabilities of our model.  


Of course, the applicability of this approach needs to be tested extensively against real-world battery health data. Gathering and analyzing it will be the next exciting step on our journey and our contribution toward saving the world. After all, we wholeheartedly agree with Mr. Anderson, and humbly add – the future is electric or not at all.  

AiTree Tech, together with Combine, is eager to help contribute to a better environment and a sustainable future. While many actors are focusing on today’s transfer of energy source from fossil fuels to batteries, we try to look ahead and are focusing on how to reuse batteries after their 1st and 2nd life. 


AiTree technology is providing a data driven machine learning solution with the purpose to predict lithium-ion batteries (complete) lifetime from 1st life to end of life.
Let’s start the journey togheter!  

Johan Stjernberg CEO@ AiTree Tech 
+46(0)733 80 08 44 

Erik Silfverberg CTO@ AiTree Tech 
+46(0)730 87 40 20 

AiTree Technology@LinkedIn 

    Kontakta oss