Model Training
RAFT’s intensity model16, a deep learning Multi-Layer Perceptron (MLP), is trained to predict 6-hourly intensity changes (either 10m maximum sustained 1-minute wind speed or the instantaneous maximum wind speeds) using global data from the Statistical Hurricane Intensity Prediction Scheme (SHIPS) dataset19. The model’s input features, listed in Table 1, are primarily sourced from the SHIPS dataset, with the exception of LP500_t0, which represents the land percentage within 500 km of the storm center. During training, the model receives SHIPS data at each step in a storm and is trained to predict the 6-hour intensity change for the next step. Given that the model only takes in 10 variables and is trained on a dataset totaling 75,000 storm steps, a grid search is used to find the optimal model architecture and hyperparameters. To maximize the number of North Atlantic Basin TCs in the training data while preventing data leakage, a Leave-One-Year-Out (LOYO) cross-validation method is employed, as depicted in Fig. 2. For example, to simulate storms in the North Atlantic Basin in the year 2005, a model is trained on data from all other TC basins except the North Atlantic Basin in the year 2005. This ensures the model has no prior knowledge of 2005 storms or their environment. The LOYO method is applied for every year in the 40-year range, creating 40 marginally different models. Model performance is evaluated using the left-out data from each year as validation. This validation data is given to the model in the same way as the training data, with SHIPS data provided at each step, and 6-hourly intensity changes are predicted.
Historical Baseline
A 40-year simulation of historical storms in the North Atlantic Basin is conducted using observed TC tracks and initial intensities from the International Best Track Archive for Climate Stewardship (IBTrACS) dataset20, along with environmental inputs from the SHIPS dataset and ERA521. For each step, SHIPS data is used for environmental inputs if the IBTrACS latitude and longitude match a SHIPS storm track within 0.1 degrees; otherwise, ERA5 reanalysis data is used. The model predicts a 6-hour intensity change, which is added to the initial intensity and propagated forward for the next step. This autoregressive process continues until the end of the historical storm track or until the storm’s intensity falls below 10 knots. Additionally, because our focus is on the North Atlantic Basin, we end the simulation of storms which travel into the Pacific Ocean. Two other modifications to the original intensity model are Maximum Potential Intensity (VMPI) decay and forced survival. VMPI decay applies a smooth 10% reduction per 6-hour step over land, where VMPI is undefined. Forced survival ensures that the first four 6-hourly steps of a storm’s simulation are predicted as no change or positive intensity change. After completing the simulation, the model’s performance is evaluated by comparing simulated intensities to 6-hourly IBTrACS intensities. For any step where RAFT ends the storm early, the simulation assigns an intensity of zero.
Warming Signals
Future change signals from eight GCMs were calculated for five variables: air temperature, relative humidity, northward wind speed, eastward wind speed, and sea surface temperature. Monthly data from these GCMs, with 21 ensemble members in both SSP585 and SSP245 scenarios, were utilized to calculate the future change signal. While this signal contains both dynamical and thermodynamic components, we refer to it as a warming signal to match the language from Jones et al.14. Within the pool of GCMs, four models exhibited greater sensitivity to warming, characterized by higher transient climate response (TCR) and equilibrium climate sensitivity (ECS) values, while the remaining four were less sensitive, with lower TCR and ECS values. These metrics quantify how much the global mean temperature responds to increased greenhouse gas concentrations. The distinction between “hot” and “cold” models reflects the models’ ability to simulate relatively stronger or weaker warming responses, with the hot models projecting more pronounced future temperature increases. These subsets are referred to as the “hot model” and “cold model” groups, respectively, throughout this manuscript. The selection of GCMs, ensemble members, and methods for calculating warming signals adhered to the approach outlined by Jones et al.14. Historical data spanning from 1975 to 2014 and future projections from 2015 to 2100 were extracted from each ensemble and standardized to a common 1-degree latitude-longitude grid. For each variable, ensemble averages were computed, followed by model averages for both the cold and hot model groups. To enhance data smoothness, a moving average with an 11-year window centered on each year was calculated for each year between 1979 and 2098. For example, the average for January 2020 is the moving average of January from 2015 to 2025. The moving average values consist of three 40-year segments, including the historical segment (1979-2018), near-future segment (2019-2058), and far-future segment (2059-2098). Monthly deltas were then derived for each year in 2019-2058 and in 2059-2098 relative to the corresponding years in 1979-2018.
Maximum Potential Intensity, VMPI in the model inputs, serves as an indicator of the maximum intensity a storm can attain at a specific location from thermodynamic considerations22. The computation of VMPI involves essential input parameters, including sea surface temperature, mean sea-level pressure, and the entire vertical profile, 1000 hPa to 1 hPa, of both specific humidity and air temperature provided by the GCMs. Therefore, the former four variables from each GCM ensemble were also used, and the moving average of each variable from 1979 to 2098 was determined through a consistent methodology, aligning with the approach used for the computation of other variables as described in previous paragraphs. The tcpyPI Python package, based on the23 algorithm with reversible adiabatic ascent assumed, is employed to calculate the VMPI values24,25 and the monthly deltas of VMPI are subsequently derived for each year in the near future (2019-2058) and far future (2059-2098) relative to the corresponding years in the historical period (1979-2018). Figure 1 shows the difference in 40-year hurricane season mean between SSP585 far future hot model and historical values.
Using the calculated warming signals as environmental deltas from the eight future scenarios, RAFT’s intensity model reruns historical storms in the context of future environments as seen in Fig. 2. In these future simulations, each TC is initialized in the same way as in the historical baseline simulation, with the same track and initial intensity. However, at each time step, deltas are applied to the environmental variables, including vertical wind shear, u200 wind component, VMPI, and relative humidity. As in the baseline simulation, the model predicts 6-hourly intensity changes autoregressively until the storm track ends or the intensity drops below 10 knots.
Overview of Future TC Intensity Results
Following the simulation of historical events from 1979 to 2018, the model inputs are modified by projected warming signals for the near- and far-future, enabling the re-simulation of historical TCs under eight different climate scenarios. We hold initial conditions, storm tracks, and overall TC frequency constant and employ a forced survival criterion that prohibits any simulated decrease in intensity during the first 24 hours. This setup enables us to assess how a fixed set of historical storms responds to modified environments rather than representing a comprehensive projection of future TC intensity.
We evaluate the distributions of intensity, lifetime maximum intensity, landfall intensity, and intensification rate for each scenario, depicted in Fig. 6. Figure 6(a) and 6(b), showing instantaneous and lifetime maximum wind speeds, reveal marginal increases in both 75th- and 99th-percentile intensities for SSP245 and SSP585 near-future scenarios, while the SSP585 far-future displays a modest decrease. Landfall intensities (Fig. 6(c)) show slight decreases in median values across all future scenarios, with minimal changes in extreme landfall events. Lastly, Fig. 6(d) illustrates minimal changes in rapid-intensification and weakening events across all but the SSP585 far-future scenarios, which feature fewer occurrences of both.
These trends must be interpreted within the context of this study’s fixed-track storyline framework, which forces historical storms to persist through modified environments for at least 24 hours rather than dynamically resolving storm formation or frequency changes. Direct comparisons to probabilistic projections, such as those reported in Knutson et al.2, are problematic due to significant methodological differences. In probabilistic studies, weaker storms may fail to develop or survive in future climates, potentially increasing the relative proportion of intense storms. In contrast, our approach retains weaker storms, leading to conservative intensity estimates.
Despite these limitations, the regional trends mapped in Fig. 8 provide valuable insights into storm-environment interactions. Across most scenarios, ensemble-mean intensity declines in the central Gulf of Mexico, while the U.S. East Coast shows slight intensity increases. Notably, the Caribbean Sea exhibits a pronounced drop in simulated wind speeds ( ~10-15 kt in several grid cells) under SSP585 far-future cold and hot models (panels f and h). This regional variability underscores how competing climate signals – such as changes in VMPI, shear, and moisture – balance within modified environments.
Finally, while this study focuses on the RAFT intensity model as part of a controlled experimental design, complementary research employing the full RAFT multi-model framework has explored bias-corrected synthetic storms in more freely evolving simulations to examine future changes in tropical cyclone frequency, intensity, and rainfall18,26,27.
