Journal of Young Investigators
    Undergraduate, Peer-Reviewed Science Journal
Volume Ten 
    RESEARCH ARTICLE
RECENT ISSUES | ARCHIVES | RESOURCES | JYI NEWS | ABOUT JYI 
Issue 3, March 2004

Physical Sciences & Mathematics

Combining Weather Data for a Dataset Sufficient for Generating High-Resolution Weather Prediction Models

Jared Fox
Arizona State University
Advisor: Steven Ghan, Ph.D.
Pacific Northwest National Laboratory
Discuss this article!

Abstract

Assessments of the effects of climate change typically require information at scales of 10 km or less. In regions with complex terrain, much of the spatial variability in climate (temperature, precipitation, and snow water) occurs on scales below 10 km. Since the typical global climate model simulation’s grid size is more than 200 km, it is necessary to develop models with much higher resolution. Unfortunately, no datasets currently produced are both highly accurate and provide data at a sufficiently high resolution. As a result, current global climate models are forced to ignore the important climate variations that occur below the 200 km scale. This predicament prompted the creation of a global hybrid dataset with information for precipitation, temperature, and relative humidity. The resulting dataset illustrated the importance of having high-resolution datasets and gives clear proof that regions with complex terrain require a fine resolution grid to give an accurate representation of their climatology. For example, the Andes Mountains in Chile cause a temperature shift of more than 25° C within the same area as a single 2.5° grid cell from the NCEP dataset. Fortunately the CRU, U.D., GPCP, and NCEP datasets, when hybridized, are able to provide both precision and satisfactory resolution with global coverage. This composite will enable the development of both high-resolution models and quality empirical downscaling methods — both of which are necessary for scientists to more accurately predict the effects of global climate change. Without accurate long-term forecasts, climatologists and policy makers will not have the tools they need to effectively reduce the negative effects human activity have on the earth.

 

Introduction

Assessments of the effects of climate change typically require information at scales of 10 km or less. In regions with complex terrain, much of the spatial variability in climate (temperature, precipitation, and snow water) occurs on scales below 10 km (Daly et al. 1994; Ghan et al. 2000; Gyalistras et al. 1998; Leung et al. 1996). Current global climate model simulations typically use grid sizes over 200 km due to limits in computational power (Ghan et al. 2000; Delworth et al. 2000; Flato et al. 2000; Emori et al. 1999; Gordon et al. 2000; Russell et al. 1999; Zhang et al. 2000; Washington et al. 2000). Since halving the grid size requires eight times more computing power, and Moore’s Law indicates this large increase in processing speed will take six years per halving, reducing the grid size to 10 km would take about 25 years. This computational limitation has prompted climatologists to develop downscaling techniques, such as “empirical downscaling,” that can produce high-resolution climatological predictions without requiring such a massive increase in computational power. One method of downscaling global climate models adds highly detailed information about local geography (mountains are the most important added feature) to the model. As demonstrated later in this paper, mountainous geography has a very significant effect on weather conditions such as temperature and precipitation. Since these two items are among the most essential features to accurately predict, downscaling techniques may provide great insight into future climatology. However, in order to create and validate models that use these techniques, it is necessary to first have a high-resolution, global dataset. Using this dataset, work in this project will continue by creating a model that generates its data via an empirical downscaling method.

Unfortunately, there are no datasets currently produced that are both highly accurate and provide data at a sufficiently high resolution. This predicament prompted the author to create a global hybrid data set with information for precipitation, temperature, and relative humidity. Since no current dataset met the project’s requirements, the new dataset is a hybrid — one that takes the best possible information available from several different datasets and combines them in order to make a highly accurate compilation. One valuable side benefit is that the accuracy of the hybrid dataset is actually increased because of the diversification of data sources — each dataset acts as independent validation of the other datasets. The global hybrid dataset discussed in this paper will be used as the base for models based on empirical downscaling techniques.

One of the largest concerns about global climate change is its effect on rainfall and snow pack. These two providers of fresh water are essential to life for both drinking and irrigation of crops. The specific places where these items fall are very important. If snow falls on a different side of a mountain, millions of people could be suddenly without water. If consistent rainfall over agricultural regions is moved only 50 km, millions of acres of farmland could become barren. Accurate assessments of the impacts of climate change typically require information at scales of 10 km or less (Ghan et al. 2000). This composite will enable the development of both high-resolution models and quality empirical downscaling methods — both of which are necessary for scientists to more accurately predict the effects of global climate change. Without accurate long-term forecasts, climatologists and policy makers will not have the tools they need to effectively reduce the negative impacts that human activity have on the earth.

 

 

Materials and Methods

In order to facilitate ease of distribution and simplicity of use of the hybrid data set, the information was published in Network Common Data Form. NetCDF is an interface for array-oriented data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data. The netCDF software was developed at the Unidata Program Center in Boulder, Colorado and is freely available from the Unidata website (unidata.ucar.edu).

The first dataset used is a global 0.5° (about 55 km) precipitation and air temperature monthly 1950-1999 dataset (land data only) developed at the University of Delaware by Cort Willmott, Kenji Matsuura, and David Legates from an analysis of station data. This analysis only adjusts air temperature to account for the influence of topography (Willmott and Matsurra 1995; Willmott et al. 1985). The second global land dataset is the Climatic Research Unit 10-minute (about 18km) 1961-1990 climatology, also developed from station data (New et al. 2002). The CRU has more topographic sensitivity than the U.D. method and provides information for precipitation, surface air temperature, relative humidity, and water vapor pressure, but only one climatological value for each month.

Since both of the datasets cover a slightly different time period and have different spatial resolutions, the CRU and U.D. analyses were combined by using the U.D. data to provide the temporal variability and the CRU to provide the spatial variability. Specifically, for each month and year, the climatological data in each 10 minute CRU grid cell was scaled by the ratio of the U.D. data for that month and each U.D. year to the U.D. climatological mean for that month for the U.D. grid cell closest to the CRU grid cell. For example:
Hybrid(May1977) = CRUClim(May) * U.D.(May1977)/U.D.Clim(May)
The closest U.D. grid cell to a specific CRU grid cell is at most 0.25
° away in both latitude and longitude. The U.D. climatological mean was determined from the average of the U.D. data over the CRU period, 1961-1990.

The next step was to add data for the oceanic regions. Although coarser resolution than the land data, the National Centers for Environmental Prediction’s 2.5° temperature and relative humidity climatologies (covering 1948-1998) provided sufficient resolution for the aforementioned purposes. For each missing value of the CRU/U.D. hybrid dataset, the data from the nearest NCEP grid cell was inserted into the hybrid. For any hybrid grid cell, the NCEP grid cell is at most 1.25° away in both latitude and longitude, with the exception of the very extreme north and south poles.

Finally, oceanic precipitation coverage is provided by the Global Precipitation Climatology Project’s 2.5° V2 monthly precipitation dataset, covering 1979-2002. Again, although less precise than the land data, 2.5° is sufficient resolution for oceanic coverage. For each missing value of the CRU/U.D. hybrid dataset, the data from the nearest GPCP grid cell was inserted into the hybrid. For any hybrid grid cell, the GPCP grid cell is at most 1.25° away in both latitude and longitude.

The end result is a highly accurate, high-resolution, global hybrid dataset. The dataset contains temperature, precipitation, and relative humidity information on a 10-minute (about 18 km grid size) resolution.

All of the data was processed on a dual Pentium machine with 2GB RAM running Redhat Linux. The netCDF files were created using the netCDF Java Interface Version 2 and the Java 1.4 specification.

 

 

Results

Plots of the hybrid data best show the success of the new dataset. The plots illustrate the importance of high-resolution data, and the close correlation of the source datasets to each other. Figure 1 illustrates just how large is the difference in resolution between older models and the current one. Figure 2 shows that the datasets used are indeed quite compatible. Figures 3, 4, and 5 illustrate that fine resolution datasets are essential to truly capture the details of weather patterns.

Temperature climatology january

Figure 1 . Temperature climatology plot. The most obvious item to notice here is the difference in grid cell resolution. The sections over land have data on a 10-minute scale (about 18 km), while the sections over the oceans are on a 2.5° (about 275 km) scale.

 

Temperature climatology september

Figure 2. Climate trends over portions of Europe, Russia, Africa, and the Middle East. This view demonstrates that the general climate trends for regions of land alongside the oceans continue out into the oceanic regions. This trait indicates that the various datasets used in the hybrid are in fact compatible, despite their differences in resolution.

 

Temperature climatology february

Figure 3. Climate over the Andes Mountains of Chile. This figure illustrates that regions with complex terrain require a fine resolution grid to give an accurate representation of their climatologies. The Andes Mountains in Chile cause a temperature shift of over 25° C within the same area as a single 2.5° grid cell from the NCEP dataset. The CRU/U.D. hybrid dataset with 10-minute grid cells makes the distinction easily recognized.

 

precipitation climatology february

Figure 4. Climate over the Andes Mountains of Chile. This figure also illustrates the effect of complex terrain on a region’s climatology. In Figure 3, the Andes Mountains in Chile caused the temperature in that region to reside well below the temperature of the surrounding areas. In this case, the Andes are preventing large amounts of rainfall on top of them, and instead are helping to trap large amounts of moisture in the surrounding areas.

climate over the united states

Figure 5. Climate over the United States. This figure highlights the importance of fine-resolution grid cells. Many of the high-precipitation areas on the land in this image would be easily hidden if they were contained in one of the massive grid cells just off of the eastern coast of the United States.

Discussion

As stated previously, it is essential that the size of the grid cell is small enough to accurately represent the underlying climate conditions. Without sufficiently fine resolution, many important details will be omitted, and that can definitely introduce inaccuracies into any prediction models based upon such a dataset. Additionally, in the case of empirical downscaling methods, a highly detailed and highly accurate dataset is a must. Regrettably, none of the currently available datasets can provide both suitably detailed resolution and dependable accuracy. Fortunately the CRU, U.D., GPCP, and NCEP datasets, when hybridized, are able to provide both precision and satisfactory resolution with global coverage. This composite will enable the development of both high-resolution models and quality empirical downscaling methods.

The hybridization process was designed specifically to make the resulting data as accurate as possible. Using multiple data sources enhances the reliability of the data since each data source provides independent confirmation that the other data sources are accurate. While rounding errors are a potential concern for many computationally generated datasets, the hybridization was done with floating point numbers and algorithms that minimized the effects of round off errors by paying careful attention to the order of the calculations. Additionally, after the hybrid was completed, it was compared to the original data sources and no significant differences were observed. Data for the oceanic regions were not available in high-resolution, but climatological differences over water take place over much greater areas than on land, so the oceanic resolution is sufficient for our purposes. It is important to note that translating low-resolution data into its high-resolution equivalent loses no information. In fact, the accuracy is increased because this new data is scaled by an independent high-resolution data source. All of these factors combined to yield a very accurate, high-resolution, global, hybrid dataset.

The topics of global climate change and its most famous component, global warming, are heavily charged issues in both the scientific and political communities. The EPA has estimated that if humans continue to release large amounts of greenhouse gases, the US alone will reap damages in the hundreds of billions of dollars annually — roughly the equivalent of $1 per gallon of oil burned (Titus 1992). This estimate only refers to the cost of crops lost to drought, increased irrigation and desalinization, etc. On the other hand, major corporations make hundreds of billions of dollars annually by continuing to sell products that release these same greenhouse gases. Research has shown recent changes in the global climate and predicts that these trends will continue, but the results have not been conclusive enough to convince corporations and policy makers to look beyond their short-term special interests. The climate prediction models based upon this hybrid dataset have the potential to provide a much clearer picture of our future. If the delicate balance of life-sustaining conditions on Earth is being disrupted, it is vital that humans have as much information about these changes as possible. Without accurate information, we are severely handicapped in our efforts to combat any serious consequences.

 

Acknowledgements

I would like to thank the U.S. Department of Energy and the National Science Foundation for giving me the opportunity to participate in the Student Undergraduate Laboratory Internships program and the chance to have an incredible learning experience at Pacific Northwest National Laboratory. I am giving special thanks to my mentor, Dr. Steven Ghan for giving me such significant work to do and for providing positive feedback. Thanks to everyone for letting me work on such an enjoyable and meaningful project.

Discuss this article!

References

Daly, C et al. (1994) A statistical-topographic model for mapping climatological precipitation over mountainous terrain. Journal of Applied Meteorology 33: 140-158.

Delworth, TL and TR Knutson (2000) Simulation of early 20th century global warming. Science 287: 2246-2250.

Flato, GM Boer et al. (2000) The Canadian Centre for Climate Modelling and Analysis global coupled model and its climate. Climate Dynamics 16: 451-467.

Emori, S et al. (1999) Coupled ocean-atmosphere model experiments of future climate change with an explicit representation of sulfate aerosol scattering. Journal of the Meteorological Society 77: 1299-1307.

Ghan, SJ et al. (2000) The thermodynamic influence of subgrid orography in a global climate model. Climate Dynamics: 20: 31-44.

Gordon, C et al. (2000) The simulation of SST, sea ice extents and ocean heat transports in a version of the Hadley Centre coupled model without flux adjustments. Climate Dynamics 16: 147-168.

Gyalistras, D et al. (1998) Future Alpine climate. In: Cebon, P et al. (eds) Views from the Alps. Regional perspectives on climate change. Massachusetts: MIT Press, Cambridge.

Leung, LR and SJ Ghan (1995) A subgrid parametrization of orographic precipitation. Theoretical and Applied Climatology 52: 95-118.

Leung, LR et al. (1996) Application of a subgrid orographic precipitation/surface hydrology scheme to a mountain watershed. Journal of Geophysical Research 101: 12,803-12,817.

Leung, LR and SJ Ghan (1998) Parametrizing subgrid orographic precipitation and surface cover in climate models. Monthly Weather Review 126: 3271-3291.

New, M et al. (2002) A high-resolution data set of surface climate over global land areas. Climate Research 21: 1-25.

Russell, GL and D Rind (1999) Response to CO2 transient increase in the GISS coupled model: regional coolings in a warming climate. Journal of Climate 12: 531-539.

Titus, JG (1992) The Costs of Climate Change to the United States. Global Climate Change: Implications, Challenges, and Mitigation Measures, Pennsylvania Academy of Sciences, 1992.

Washington, WM et al. (2000) Parallel climate model (PCM) control and transient simulations. Climate Dynamics 16: 755-774.

Willmott, CJ et al. (1985) Small-scale climate maps: A sensitivity analysis of some common assumptions associated with grid-point interpolation and contouring. American Cartographer 12: 5-16.

Willmott, CJ and K Matsuura (1995) Smart interpolation of annually averaged air temperature in the United States. Journal of Applied Meteorology 34: 2577-2586.

Zhang, XH et al. (eds) (2000) IAP global atmosphere-land system model. Science Press. Beijing, China.


Journal of Young Investigators. 2004. Volume Ten.
Copyright © 2004 by Jared Fox and JYI. All rights reserved.
 
SEARCH   |   SITE MAP   |   RECENT WEB SITE ADDITIONS          PRIVACY POLICY  |    CONTACT US

JYI is supported by: The National Science Foundation, The Burroughs Wellcome Fund, Glaxo Wellcome Inc., Science Magazine, Science's Next Wave, Swarthmore College, Duke University, Georgetown University, and many others.
Copyright ©1998-2004 The Journal of Young Investigators, Inc.