Machine learning in Support of Space Weather Prediction

DENSER (DEeply uNderstanding Space weathER)

This page presents the work performed on the DENSER project.

One of the driving principles of DENSER was to demonstrate the relative ease of sharing certain key aspects of ML-enabled time series predictors across different products from different space weather domains.

Project summary

In this project, called DENSER (DEeply uNderstanding Space weathER), the DENSER project team has evaluated the potential value of Machine Learning (ML) technologies in the Space Weather (SW) domain using Neural Network techniques.

This activity started with a literature study that has been performed by domain experts from the Royal Belgian Institute for Space Aeronomy (BIRA-IASB) and the Finnish Meteorological Institute (FMI), both embedded in their respective Expert Service Centres (ESCs) Space Radiation (R-ESC) and Geomagnetic Conditions (G-ESC), started within the Space Situational Awareness (SSA) programme of ESA.

In a second phase of this study, a comprehensive list of products for which ML might provide significant advantages – compared to the classical approaches of (physical) model design - has been composed and elaborated upon with respect to potential costs and benefits.

The DENSER project team has developed two products in this project, one from each of the two domains, namely proton flux prediction (guided by Space Radiation experts at BIRA-IASB) and geomagnetic activity (guided by Geomagnetic Conditions experts at FMI), and similar ML techniques have been applied.

Finally, the lessons learnt during the study and development phase were summarised in a document on recommendations and a roadmap (see 'roadmap' tab for a summarising list).


  • ForMaL-SEP

    The first product that has been developed in DENSER is a prediction of the expected maximum proton flux levels for >10 MeV for various time horizons. SEP events are sudden and large increases in the energetic particle fluxes observed in interplanetary space and can last for days or even weeks. A forecast model has been developed to provide the expected maximum flux for protons with energies greater than 10 MeV (E >10 MeV) during the next hour, 6 hours and 24 hours. The input parameters have been derived from time series of X-ray and differential proton flux measurements from 1986 to 2015 covering almost 3 solar cycles.

  • ForMaL-Xrange

    The second product that has been developed in DENSER is a forecast of the hourly range of the ground-based magnetic field north component (i.e. RX=Xmax-Xmin). RX is currently used in the AurorasNow! service of FMI as a proxy for enhanced auroral activity. The service forecasts the range of RX for the next hour based on the average values of solar wind velocity and Interplanetary Magnetic Field (IMF) north-south component (Bz) for the previous hour.

    DENSER developed ML-based RX forecasts for three IMAGE stations: KEV (magnetic latitude ~66, standard auroral oval), OUJ (magnetic latitude ~61, expanded oval), and NUR (magnetic latitude ~57, sub-auroral region).

    IMAGE is a network of magnetometer stations, ie. an International Monitor for Auroral Geomagnetic Effects

    ForMaL-Xrange system reads Near-Real-Time (NRT) solar wind data and compute its averages like currently done in AurorasNow! As output the system yields forecasts of RX values for the three stations with lead times from 1, 3 and 6 hours. The RX forecasts naturally have some trends depending on the Universal Time (UT), as the IMAGE magnetometer stations can capture activity only in a limited sector of Magnetic Local Time (MLT).



A common architecture is shared for both products (ForMaL-SEP and ForMaL-Xrange), ie high-level components use the same code for both product.

The figure hereafter presents the DENSER Architecture:

DENSER Architecture


A design driver for this architecture was to make it generic when possible and to specialize sub-components only when needed.

This high-level architecture contains the following main functional components:

All major building components of the architecture are dockerized and they are orchestrated with docker-compose. In the context of the DENSER project, the complete architecture was instantiated on a high-performance server at the premises of Space Applications Services.

  • Data storage: Stores all data used and generated in the project, for both products.
    This includes:
    • input training/validation/testing data.
    • Input data used to generate predictions.
    • generated predictions and metadata about those predictions. The metadata contains the time needed to run the prediction and the status of the prediction (successful/failed)
  • Data cleaning: Provide functionality to download, extract, pre-process and store pre-processed raw data into the central storage.
  • REST API Provider: This component is central in the architecture as it provides the liaison between the various data producers and data consumers.
  • Front end: End-user interface of both products.
  • Reverse proxy: Provides access to both products on a single server.
  • First line Reverse Proxy: Provides OpenAM agent, and HTTPS termination.
  • Deployed ML models: Used for generating predictions.


Four different models have been considered and implemented.

Finally the Recurrent Neural Network (RNN) based on LSTM (Long Short-Term Memory) architecture has been selected for the advanced validation.

The RNN Model is a recurrent model. It means that it can learn to use a long history of inputs, if it is relevant to the predictions the model is making. The model accumulates a series of timesteps, and makes a prediction of a series of future timesteps. This model can be parametrized with a number of lstm units. Adding too many lstm units can make the model overfit more quickly.


Training Environment

The available (pre-processed) input data is, for ML activities, used for three purposes:

  • Training data: to learn the internal weights of an ML model
  • Validation data: to perform an unbiased evaluation of the model while tuning the model by optimizing the models hyperparameters
  • Test data: to perform an unbiased evaluation of the final model fit.


Model building and model training is performed in Jupyter notebooks, instantiated in GPU-enabled Docker containers (one per DENSER product) that are based on a single Docker image.

The figure hereafter presents the training environment for the DENSER models:

DENSER Training and Execution environments


This allows for quick development as blocks of python code can be (re)evaluated without needing to evaluate a larger codebase each time a change is made. This caters well to the highly iterative process of ML model development. In addition, Jupyter combines live code, visualizations and explanatory texts in a single entity, further inviting and facilitating ML- based experimentation and, afterwards, documentation.

Code that is shared between the various notebooks are put into separate python files and are imported in the various notebooks. Also the final trained models are handled this way. Given the pre-processed input data, initial feature engineering activities whereof it is suspected that these activities might be beneficial for all ML models, has been performed. Using the resulting set of features of data points, the ML model development process has started with implementation of the purged walk forward cross validation data splitting mechanism, allowing a model to draw from the resulting sets of training/validation and testing data.


The application provides a specific interface for each product that takes into consideration the specificity of the products.

orecasts can be browsed using the calendar. Then a viewport displays the prediction, and possibly the measurements in cast of a past prediction.

Markers provide hints such as the storm levels, or the different horizons.

The figures hereunder presents the DENSER interface for both ForMaL-SEP and ForMaL-Xrange products:

ForMaL-SEP interface


ForMaL-Xrange interface


Products performance

The validation of the two products has been an essential part of the DENSER project.


The main conclusion from the validation activities is that ForMaL-SEP has a good POD (Probability of detection) and AWT (Average warning time) compared with other models, but at the cost of a very high FAR (False alarm ratio). The model also shows a tendency to underpredict the expected maximal value on shorter time scales and overpredict for longer prediction horizons. While looking at the behaviour of the predictions during individual events, it was observed that for gradual events (flare location not magnetically connected to Earth) first a positive prediction is issued, then the predicted event length decreases since the eruption is not immediately followed by an increase in proton flux, but once they start rising the predicted event length increases again. The validation report that has been written for this task also lists several other issues at the model level. One of the main issues is the large class imbalance between the active and quiet periods. From the results provided by a first basic validation activity, some changes were applied to the model training in order to try to fix issues with respect to the background subtraction of the input data taken with different instruments, the response to flat, rising and decaying inputs. The number of solutions have been identified to improve the validation results. However, the output did not meet the expectations. Therefore, further work is needed in future to determine additional validation methods.


The validation of the ForMaL-Xrange product used historical data collected during the years 1998-2018 and is based on a selection of 21 events of strong geomagnetic activity (Kp>8).

The main conclusion from the event-based validation is that during strong geomagnetic storms 1H forecasts by ForMaL-Xrange do not catch the peak values in magnetic field north component variations, but they show good performance in estimating the different activity levels (minor, moderate, strong, severe). This conclusion does not depend significantly on the location of the stations at the storm starting time. Comparison between the 1H forecasts by the AurorasNow! service and the ForMaL-Xrange forecasts show that the latter are in RMSE-sense better than the statistical forecast based on 5, 50 and 95 percentiles used in Auroras Now. Aurora enthusiasts today use the 95 percentiles by AurorasNow! in their decision-making. If that value is in the category of strong or severe activity, then the chances to get nice auroral photos is better than in average conditions. Our conclusion is that adding 1H forecasts by ForMaL-Xrange to the tables presented in AurorasNow! would improve the accuracy of the service.

Validation details are documented in the DENSER Validation Report.

Recommendations and roadmap

The overall DENSER project has led to define a list of recommendations and possible future developments:


Short-term developments

Input data and features:
  • Flare location based on NOAA real-time reports of solar events and solar region summaries.
  • AR (Active region McIntosh classification) and location based on NOAA-SWPC Solar Region Summary.
  • Magnetic connectivity between the Earth and the ARs based on solar wind speed.
  • Recent flare count (including cumulative peak flux & fluence).
  • Rate of change of X-ray and proton fluxes.
Machine learning algorithms and model training:
  • Debugging with simplified dataset: training on trivial cases.
  • Principal Component Analysis.
  • Hyperparameter tweaking.

Long-term developments

Input data and features:
  • Active region characteristics derived from magnetograms (SOHO/MDI, SDO/HMI).
  • CME characteristics.
  • In-situ relativistic electrons.
Machine learning algorithms and model training:
  • Advanced models.
  • Advanced study of model behaviour.
  • Uncertainty estimation.


Short-term developments

The short-term development tracks are the following:
  • Predict in e.g. 15 or 30 min windows instead of / in addition to 60 mins
  • Predict at more ground magnetometer stations
  • More advanced handling of data gaps
  • Predict e.g. 90% range instead of 100% range
  • Data seasonality study
  • Signal stochasticity test

Long-term developments

As a long-term development, the validation work that has been conducted for Recurrent Neural Network could be expanded to cover also the other methods available in the presentation layer of current ForMaL-Xrange service. Naturally, if better forecasts will be achieved either with the more advanced neural network algorithms or other statistical/machine learning algorithms described above, they should be used in the next version of the web-service and their performance statistics should be reported there.

More information

Contact the SWE Service Helpdesk at to request more information about the ForMaL-SEP and ForMaL-Xrange products.