Project background

Today’s highest-performance computers employed in NWP rank in the top-20 of the 500 most powerful systems and execute computations at petaflop per second rates, ingesting O(100) Mbytes of observational data and producing O(10) Tbytes of model output per day. Future generations of kilometre horizontal scale global NWP models will integrate O(100) prognostic variables over O (5 x 108) grid points for O(100) ensemble members with O(seconds) time steps in an O(100)-level atmosphere, also coupled to surface models of somewhat smaller dimensions. Observational data usage will also increase by an order of magnitude due to the internationally coordinated availability of high-resolution spectrometers in low-Earth and geostationary orbits with thousands of spectral channels.

However, the expected future HPC technology development imposes new constraints on how to address the science challenges. In the past, processor performance has evolved according to Moore’s law and so did memory capacity. Also clock-speed used to increase and production cost reduced. Even more emphasis will be therefore placed on parallel computing and this is where the ‘scalability’ of an application becomes important indicating the time-to-solution gain when it is run on more processors. In parallel computing, the gain from the parallel execution of parts of the code is limited by the sequentially run elements, which fundamentally limits scalability as does the need to exchange large amounts of data between processors. Making NWP codes more scalable is among the top priorities in NWP for the next 10 years.

For NWP centres such as ECMWF the upper limit for affordable power supply may be about 20 MW. The likely future NWP system will be O(100) times larger as a computational task than today and would require O(10) times more power than 20 MW with existing processor technology. Therefore, a change of paradigm is needed regarding hardware, design of codes, and numerical methods. Energy efficient algorithms and technology will be chosen most likely at the expense of numerical accuracy and stability.

New technologies will combine and integrate low-power processors with successors of today’s CPUs to match the best of both worlds, namely highly parallel compute performance with little data communication at lower clock rates and CPU-type performance with large memory, a fast data interface and higher clock rates. Code design and algorithm choice must be adapted to this technology, which is a fundamental challenge given that we are dealing with vast heritage codes with millions of lines of instructions. In 10 years, global ensemble forecasts will be run on O(105) - O(106) processors and fault awareness and resilience management will be needed to produce successful output in operational services given the certainty of processor failures and the advent of inexact low-energy hardware.

The computing challenge is enhanced by the requirements for data distribution and archiving. While data growth appears slower than compute growth exabyte data production may be reached earlier than exaflop computing. It is also not obvious that re-computing is less costly than archiving and thus tackling the data challenge is inevitable. This also implies that more of the overall financial budget available to NWP centres will be spent on data management. As for processor technology, hardware will limit data transfer bandwidth. Occasional hardware failure needs to be actively accounted for by designing resilient storage systems, which has fundamental implications on the design of future work flows. Technically, advanced data compression methods need to be implemented, and standardized and supported by the weather and climate community.

The general development towards Earth-system modelling at fine scale for both weather and climate science imposes scalability and operability limits on NWP and climate centres that need to be addressed through fundamentally new scientific and technical methods.

For computing, the key figure is the electric power consumption per floating point operation per second (Watts/FLOP/s), or even the power consumption per forecast, while for I/O it is the absolute data volume to archive and the bandwidth available for transferring the data to the archive during production, and dissemination to multiple users. Both aspects are subject to hard limits, i.e. capacity and cost of power, networks and storage, respectively.

The urgency of adaptation to highly parallel computing is different for each component of the forecasting system, namely data assimilation, forecasting and data post-processing/archiving. Regarding ECMWF, it is important to keep integrated aspect of the IFS alive, which means maintaining the approach of a single model and data assimilation system for all applications as opposed to promoting separate components tailored to forecast range and application.

Despite ambitious targets being set for model resolution, complexity and ensemble size, today the bulk of the calculations are not performed with configurations that utilize the maximum possible number of processors. Data assimilation, extended range prediction and research experimentation mostly operate at relatively lower resolutions predominantly for technical and affordability reasons.  However, the forecast suites always contain a cutting-edge component that fully exploits capabilities.

Firstly, two main data assimilation development streams are being pursued at operational NWP centres, namely long-window 4D-Var and EnVar. Both have scientific and technical advantages and disadvantages but 4D-Var reaches efficiency limits very soon. Secondly, the next-generation forecast models are being developed now (ICON, Gung-HO, NICAM, GEM), their scientific and computing performance still needs to be established, and they may only be needed in full operations in 10 years. Lastly, I/O limitations will become effective in the short term, for example linked to data bandwidth not growing at the same rate as computing, and data dissemination becoming impossible for large productions in NWP and climate. I/O optimization and the scientific component in data assimilation research should therefore assume very high priority in the Scalability Programme. For climate research and production the task of data assimilation for model initialization is only emerging now. Model integrations are substantially longer than in NWP and need to be completed in a realistic time frame even though there in no critical production path as in NWP. Data storage and dissemination to a large user community are of fundamental concern for climate prediction.

It is recognized that, while scientific choices differ quite substantially between centres, a more coordinated effort to develop common tools can be made and will produce benefits for the community, e.g. regarding libraries, work-flows, or efficiency monitoring tools. This also holds for common developments between NWP and climate prediction communities. While this could be challenging it may offer the only opportunity for co-development with hardware/software providers and for gaining access to external funding. Other areas identified for collaboration are benchmarking, bit-reproducibility versus fault tolerance and common strategies for I/O.

Example of floating point peak performance as a function of (here Intel) clockspeed evolution (black curve), and necessary investment in parallel processing to further enhance performance (red curves). (Tor-Magne Stien Hagen 2011, PhD Dissertation, University of Tromso UIT).


Evolution of key indicators of processor design and performance.


Simplified illustration of number of compute cores (left y-axis) and power (right y-axis) required for single 10-day model forecast (bottom curves) and 50-member ensemble forecast (top curves) as a function of model resolution. The shaded area indicates the range covered when assuming perfect scaling (bottom curve) and inefficient scaling (top curve), respectively. Today’s ECMWF single global forecasts operate at 16 km while the ensemble has 32 km resolution. With the next resolution upgrade this will be enhanced to 9 and 18 km, respectively (Bauer et al. 2015, Nature 525, 47–55 (03 September 2015), doi:10.1038/nature14956).