SST coercion
Conventional low-resolution SST datasets fail to resolve localized warming disparities in nearshore zones, whereas kilometre-scale resolution data delineate land–sea thermal gradients with <0.1 °C precision, pinpointing SLB attenuation thresholds to inform climate-resilient urban planning. Existing WRF implementations with static SST fields create oversimplified marine thermal profiles, where persistent 24-h SST averaging degrades land–sea atmosphere coupling simulations in nearshore zones. This study integrates BCC-CSM2-MR CMIP6 SST outputs to characterize diurnally varying thermal signatures across marine domains adjacent to 18 coastal megacities, addressing inherent limitations of coarse-resolution SST inputs and absence of diurnal variability of WRF. Different GCM exhibit varied performance in SST simulations (Supplementary Fig. 4). Compared with observations, we find that the SST from BCC-CSM2-MR shows strong consistency with both reanalysis datasets and satellite assimilation data in capturing monthly variation of SST among several CMIP6 GCM (ACCESS-ESM1.5, BCC-CSM2-MR, CanESM5, CESM2 and EC-Earth3) (Supplementary Fig. 5). Furthermore, statistical analyses confirm that BCC-CSM2-MR results demonstrate high reliability across several metrics (Supplementary Text 5).
This study advances oceanic surface temperature parameterization of WRF model through the following technical workflow. Using the MODIS land-use classification system where water bodies correspond to category 16, land–sea mask data were extracted from meteorological input files (met_em). The marine grid boundaries were accurately identified through geographic coordinate transformation (WRF Lambert projection to WGS84 coordinate system). A series of high-resolution SST data from the CMIP6 were spatially and temporally aligned with WRF marine domains using a spatiotemporal matching algorithm (Supplementary Fig. 6). Daily SST fields were dynamically integrated into the WRF initial conditions, replacing static SST parameters. Daily and diurnal SST in the future under SSP 245 and SSP 585 scenarios were applied to examine the effect of future ocean warming on SLB days in the same way as introduced above for historical case. With the advancement of remote sensing and multisource data assimilation techniques, ultrahigh-resolution SST products (for example, GHRSST) provide opportunities for cross-validation using several datasets. Our comparison shows that reanalysis data exhibit the highest correlation with the BCC simulations, followed by satellite observations from GHRSST42 (Supplementary Text 5). Residual biases in SLB-day magnitude may arise from uncertainties in nearshore SST fields and coastal wind representation, especially for cities with complex coastlines.
Model configurations
The WRF model is a mesoscale weather model and assimilation system. It is widely adopted by most countries around the world as a tool for both operational and research applications in mesoscale weather forecasting. The WRF model used in this study is v.4.2.2, with the specific physical process parameterization settings (Supplementary Text 6).
This study used a controlled-variable experimental design to conduct sensitivity analyses focused on SST, using an enhanced high-resolution SST modelling result coupled with the WRF model. Simulations conducted under control parameterization with fixed land use, anthropogenic heat and surface roughness, took identical meteorological fields as initial and boundary conditions. Comparative analysis of SLB days changes under different SST conditions quantitatively revealed the impact of anomalous SST warming on SLB patterns.
The simulation period spans from 1 January to 31 December for a total of 365 days each year. A 22-days model spin-up was implemented to optimize computational efficiency and minimize initial field interference under the guidance of several initial experiments. Two nested domains were used with spatial resolutions of 27 km and 9 km (Supplementary Text 7). SST data in the years 1970, 2010 and 2050 (SSP245 and SSP585) simulated by BCC-CSM2-MR model were used for forcing at 6-h intervals.
In selecting the points of interest for this study, we considered previous research43 and factors such as city size, latitude and longitude distribution, the continent they are located on and the climate zone they represent. Our goal was to achieve comprehensive coverage to discuss the potential impacts of SST changes on SLB from several perspectives (Supplementary Text 6 and Supplementary Table 1). To select research regions, this study identified global climatic zones predominantly influenced by SLB first. The night-time light data guided the selection of megacities with the highest light intensity indices within these climate zones (Fig. 1)41,44. Final spot selection integrated several criteria, including urban size, prevailing wind patterns, coastal proximity and climatic characteristics. Urban–coastal boundaries were delineated using land-use data, with two nested domains configured to centre on urban land areas adjacent to the coastline. The inner domain was designed to maintain near-equal representation of land and ocean (~50% each), ensuring balanced atmospheric interactions. This balanced configuration minimizes disproportionate boundary condition influences from any single peripheral direction while reducing potential biases introduced by human intervention in experimental design.
Data collection
High-resolution SST data from the CMIP6 (BCC-CSM2-MR model, 0.3°–1° spatial resolution from the equator towards the poles)45 has been used in this research. The historical SST simulation data from BCC spans from 1950 to 2014. The study also selected data for 2050 under two emission scenarios, SSP 245 and SSP 585, to analyse the changes in SLB under different future emission scenarios. Another four high-resolution SST datasets from CMIP6 were used in this study, namely ACCESS-ESM1.5 (~1.25° × 1.875°), CanESM5 (~2.8° × 2.8°), CESM2 (~1.25° × 0.9°) and EC-Earth3 (~1.0° × 1.0°), to evaluate how intermodel differences in SST affect the simulation outcomes. For meteorological data, this study used the final operational global analysis (FNL) dataset provided by the National Centers for Environmental Prediction46. The FNL dataset, generated by the Global Data Assimilation System, includes various meteorological parameters with a spatial resolution of 0.25° × 0.25° and is updated every 6 h. FNL data are widely used in meteorological research and numerical weather prediction models, making it a common data source for numerical simulations. To control for the influence of other variables on the simulation, all other parameters and input data were maintained at their default model settings. The land-use classification followed the MODIS land-use classification standards and simulations were based on the land-use parameters provided with WRF v.4.2.2.
Criteria to identify SLB day
According to the basic definition, SLB day must meet the criteria that both sea breeze and land breeze occur in one day with obvious conversion process observed29. Moreover, only the day meeting all following criteria including the difference in temperature, specific wind direction, wind speed limitation as well as the duration intensity can be identified as an SLB day in this study. The sea breeze was defined as the case when the wind direction fits the sea breeze area and the temperature in the marine observatory was lower than that in the urban observatory. Similarly, the land breeze was defined as the case when the wind direction fit the land breeze area and the temperature in the marine observatory was higher than that in the urban observatory (Supplementary Table 1). Although the above two conditions satisfy the basic definition of SLB day, this method does not use a specific time for the determination of sea breeze or land breeze as described in the previous studies. This method excluded cases with observed wind speed greater than 10 m s−1 in the coastal observatory to prevent disturbances by large-scale weather systems. The time and intensity of SLB were considered in this method for the identification of SLB day. For a typical SLB day, both sea breeze and land breeze must occur for more than 4 h a day. In addition, the wind speed of sea breeze should be greater than 1 m s−1 and the wind speed of land breeze should be greater than 0.5 m s−1. All these criteria have a logical relevance. The temperature difference is the essential cause of the SLB, which is the fundamental criterion. Wind criterion includes direction and speed, which is specialized through the shape of coastline and the distribution of land and sea. After that, the duration intensity is considered to make sure the SLB is obvious.
Unsupervised cluster analysis
This investigation used unsupervised machine learning (k-means clustering) to delineate SST variability–sea breeze day frequency relationships, identifying three distinct coastal response clusters. The clustering analysis was achieved by constructing two-dimensional feature vectors from SST anomalies and sea breeze day frequency, which geometrically characterized the data. Iterative computation of Euclidean distances between data points and cluster centroids assigned samples to clusters on the basis of a minimum distance threshold. To determine the optimal cluster number (k) without prior knowledge of class categories, the evaluating the sum-of-squared-errors method was applied. Through several iterations, k = 3 was identified as the optimal classification number, balancing computational efficiency and cluster stability. The specific mathematical models used are as follows:
$$J=\mathop{\sum }\limits_{i=1}^{k}\mathop{\sum }\limits_{{x}_{j}\in {C}_{i}}{\mathrm{||}{x}_{i}-{\mu }_{i}\mathrm{||}}^{2}$$
(1)
$${\mu }_{i}=\frac{1}{\left|{C}_{i}\right|}\mathop{\sum }\limits_{{x}_{j}\in {C}_{i}}{x}_{j}$$
(2)
$${C}_{i}=\left\{\begin{array}{c}\left.{x}_{i}\right|{\rm{d}}\left({x}_{i},{\mu }_{{\rm{i}}}\right)\le {\rm{d}}\left({x}_{i},{\mu }_{k}\right),\forall k\ne {\rm{i}}\}\end{array}\right.$$
(3)
The objective function J defined in equation (1), quantifies the total within-cluster variance as the sum of squared Euclidean distances between each data point xj and its corresponding cluster centroid μi. k is the number of clusters, Ci means the ith cluster, containing all data points assigned to it. The data point in the feature space is xj and μi centroid (mean position) of the ith cluster, calculated iteratively.
In equation (2), |Ci| denotes the cardinality (number of data points) within-cluster Ci. This metric reflects the statistical weight of the cluster centroid μi, quantifying its capacity to represent the spatial distribution of elements in Ci during each iteration, equation (3) assigns every data point xj to its nearest cluster centroid Ci.
We further validated the clustering outcome using a Monte Carlo resampling method. The average silhouette coefficient across all iterations was 0.82, confirming the overall validity and stability of the identified clusters (Supplementary Text 8).
Spatiotemporal statistical downscaling method
The method to generate dynamically consistent atmospheric and boundary conditions derived from CMIP6 outputs. This approach enables each future scenario to evolve under its corresponding large-scale atmospheric state, rather than assuming static IC/BC fields. Specifically, we established a variable mapping between CMIP6 datasets and WRF input files (Supplementary Text 4 and Supplementary Table 3). On the basis of the temporal stamps in the WRF files, we used a nearest-neighbour temporal matching algorithm to align CMIP6 time slices, followed by bilinear spatial interpolation to transform CMIP6 data from the native grid to the WRF domain. Special longitude adjustments were applied to ensure continuity across the dateline (0°–360° and −180°–180° systems). For vertical interpolation, we performed a log-pressure remapping of pressure-level variables, maintaining monotonicity and ensuring physical consistency across layers. Comprehensive dimensional checks, alignment verification and post-interpolation validation were also implemented to confirm physical plausibility of the downscaled fields.
