Monday, January 28, 2013

Fixing the National Weather Service's Computer Gap

In previous blogs, I have documented the profound inadequacy of the computational resources used for operational numerical weather prediction by the National Weather Service (NWS) and the serious implications this deficiency has for the quality of weather forecasts in the U.S.   I have described how the world-leading European Center for Medium Range Weather Forecasting (ECMWF) now has more than ten times the computer power as the U.S. Environmental Modeling Center (EMC), how U.S. skill in global prediction is in second or third place, and how the lack of computer resources is crippling the NWS's ability to move forward in probabilistic prediction, the next major area of development.
(Reminder:  EMC, part of the NWS, is the operational weather prediction entity of the U.S.)

I have talked to many people about my blogs and assessment, including meteorologists, both inside and outside the NWS, and highly placed managers and administrators in the NWS:  there is essentially no disagreement that we have a serious problem in numerical weather prediction, and that lack of computer power is a major cause but not the only one.

The new NOAA Fairmont Computer Center hss far more capability than EMC's computer center

It is time to fix the NWS's operational computer deficiency and this blog will describe how it can be done within a year using funds that are already appropriated.  But it will take leadership and a willingness to do things a different way.  And an end to highly disfunctional relationships in NOAA and the NWS.  This is going to be a very frank assessment of the current situation and will get somewhat technical in places...so please forgive me or skip this blog if you find it tedious.

The Problem is Worse Than I Thought

When U.S. Senator Maria Cantwell learned about the lack of computer power for U.S.  numerical weather prediction at a luncheon I attended, she asked an important question of the head of the NWS:  how can this be when Congress has appropriated large amounts of funds for weather and climate computers?  He did not answer, but the answer is clear: nearly all of these resources have been unavailable for weather prediction--most are used for climate studies.

But the problem is deeper and more disturbing than that... other groups in NOAA are securing bigger computers than the national operational center, EMC.   And some of these groups are actively working to acquire computer resources for themselves rather than EMC.   A good case is the NOAA Earth System Research Lab (ESRL) in  NOAA's Office of Atmospheric Research.  This lab is tasked with doing research to support operational numerical weather prediction (NWP) in the NWS, even though they are not in the NWS.  As noted in an earlier blog, this is a crazy organizational situation, with those running operational NWP in the U.S. unable to control the research that supports them.  ESRL has been able to find funds for very large supercomputers (the "jet" machines) that far eclipse what EMC has to work with.  ESRL has established essentially operational capabilities and wants to expand it further (they called it Regular Research).  Amazingly, two high administrators in OAR/ESRL told me that I should not be working to secure big computers for EMC but rather should get it for THEM!  I was really taken aback by their attitude.  And recently the Hurricane Forecast Improvement Program (HFIP) received  a large computer resource (placed at ESRL) and HFIP is using them for operational global and hurricane-scale simulations.

NOAA ESRL in Boulder houses supercomputers more capable than those used by the U.S.'s main numerical weather prediction facility

So we have the nutty situation in which operational NWP is starved for computer resources, undermining progress in weather prediction, while climate studies have massive supercomputers available and NOAA fosters active competitors in its organization that are doing essentially operational weather prediction with far greater resources than EMC, the U.S. operational center.   This screams about poor leadership and management in NOAA.

The other problem is that the NWS is wasting a substantial amount of the limited computer power it does have today.  The graphic below shows how the NWS is using their current computer.  A lot of it does not make sense.  Time is on the X axis (entire day) and the Y axis is the number of nodes (a node is a collection of processors) used.  The various colors represent different models or simulations run on this computer.

The red color on the lower portion, the largest use of the computer, is for the Climate Forecast System, in which they run seasonal forecasts.  But they run these forecasts FOUR TIMES A DAY, which makes no sense.  Why run a seasonal simulation that often?  In contrast, running the global model (the GFS), shown by the dirty green color, only takes a small part of the computer.  Furthermore, they run the GFS out FOUR TIMES A DAY to 384 h--why do they do that?  Most other big centers only find it is useful to run out twice a day.  I could find no objective proof in the literature or elsewhere why such frequent runs could be useful.   I can go into more detail, but the bottom line is that the use of EMC's computer is inefficient and not well thought out.  A lot is done for legacy reasons.  A rational evaluation of cost and benefit would clearly change allocations substantially.   But even if they used the current small computers rationally they don't have enough to do what needs to be done.

Production Schedule for EMC's Computer

What do they really need?

      For EMC to serve the nation in a reasonable way, I believe they need the computational resources to do the following:

(1) Run a global ensemble system at 12-15 km resolution (currently they are at roughly 50 km). (Remember ensembles is when you run a model many times with different starting points and model physics, this allows one to get at the uncertainties in a forecast).  This ensemble needs to be running the best physics possible, unlike the inferior physics used in the current U.S. global ensemble system.
(2)  Run convection resolving high-resolution ensembles over the U.S. (1-4 km resolution).  Currently, the U.S. ensemble system is at 16 km resolution.  Many of the runs of the current use inferior physics to save computer time.
(3)  Run a rapid-update system (like ESRL's HRRR) at 3 km resolution.   Eventually, (2) an (3) should be combined.
(4)  Lowest priority but useful.  Run a global model at 2-4 km resolution.

Doubling resolution takes about 8 times the computer power.    My back of the envelope calculation is that the above is doable if EMC had 5-10 petaflops of computer power (well within the range of recently acquired machines by others).  The plan below will give it to EMC for operational use and maintain high reliability.

How to Fix the Problem Quickly

First, EMC needs to get their house in order and reduce the waste in their current schedule, which I estimate is roughly 25% of their current computer.

EMC will get an upgrade this summer of their two .07 petaflop machines (the vendor is  IBM, one operational and one backup) to .2 petaflops.  This is helpful, but not nearly enough.  Congress just passed the Hurricane Sandy relief bill for roughly 50 billion dollars.   Within this bill is 25 million for enhanced hurricane weather prediction and data assimilation and 50 million for hurricane research...money that is going to NOAA.  One thing we learned is that good global weather prediction is the key for hurricane forecasting--that is why the European Center Global Model was the best during Sandy. So you want to help hurricane forecasting?  USE ALL OF THE 25 MILLION TO UPGRADE EMC's COMPUTER RESOURCES.

The German weather service just purchased a 23 million dollar CRAY supercomputer that dwarfs what the U.S. NWS now uses.
Use the 25 million in Sandy money to acquire (EMC likes to lease) ONE big machine, a computer with 1-3 petaflops or more.  My discussions with several computer vendors suggests that the NWS might be surprised about how much they could get for 25 million.  Perhaps as high as 5-10 petaflops if they play their cards right.  I believe this machine could possess at least 99% reliability and folks in the NWS computer hierarchy agree.  (Hell...I have a cluster I use for weather forecasting that maintains such reliability and I do it on a shoestring, surely they can as well!).  The recently acquired NOAA Fairmont machine can serve as backup for the new EMC computer, as well as being available for development and research. 

Thus, the operational load can be split between the current IBM system, which will increase in size again in 2015 to roughly one petaflop, and the new system purchased with Sandy money.  Using these new resources wisely, the NWS operational can jump to world leadership capability in numerical weather prediction and radically improve the products it provides to U.S. users.

Additional Fixes

There is little doubt EMC could quickly take advantage of the increased computer resources (I have confirmed this by talking with their leadership).  However, as noted earlier, the problems in U.S. numerical weather prediction are deeper than lack of supercomputers (although fixing that deficiency would be a good start).  Management and leadership failures have abounded.  To address these problems, immediate attention should be given to the following items:

1. Establish a numerical weather prediction advisory board for EMC that provides recommendations from experts in the entire community.  A big part of the problem is that the National Weather Service folks have been too isolated from the rest of the meteorological community.  They serve the nation but have generally been unenthusiastic and getting guidance and advice from their users, the private sector, and the research community.  This has led NWS EMC to second/third tier status and must change.  For years, U.S. National Academy committees and others have recommended that EMC establish a representative advisory committee that would act as an active partner.  NWS management has pushed back on this and have done nothing.  Enough is enough....this advisory committee should be established immediately and should serve as a sounding board for deciding on which models are run, how they are run, the computer resources, needed and more.
Japan's weather supercomputer (peak .85 petaflops) is roughly ten times larger then U.S. EMC.
2.  Restructure NWP research and development in NOAA/NWS.  The current separation of  weather prediction research from operations has been a continuing disaster and must end.  NOAA leadership finally must deal with this mess.   Moving EMC into NOAA and combining with OAR/ESRL under one manager might work.  Or move ESRL folks into the NWS under EMC. 

3.  Establish a comprehensive verification program for U.S. models.  To improve weather forecasting models you must know their strengths and weaknesses.   The NWS/EMC model verification program is very weak and superficial.  If you want to see how bad things are, check their very poor model verification web pages.  Ask a simple question:  how well the model's verify over the NW?  Better over the mountains or lowlands?  Or how has forecast skill over California changed during the past few decades? You will be disappointed I guarantee you. A lot of the statistics are monthly, making it impossible to determine the trends in model skill.

NOAA money supports the Developmental Testbed Center, which I know quite a bit about (I have been chair of their Science Advisory Board).  The dream was that folks could provide new research innovations that would be tested in an operational-like environment for a wide range of cases.  If successful, they would go into operations.  Sounds good?  After nearly a decade and millions of dollars, this is a dream that never seems to happen.  The DTC should take the testbed role seriously.  Now.

Taiwan's weather bureau has a computer twice as fast as EMCs and has purchased one over 15 times as powerful.

4.  Support a model improvement research program.   The U.S. has the largest meteorological research community in the world, with universities like the U.S. doing cutting edge research on numerical weather prediction and related topics.  NOAA/NWS have failed to take advantage of this huge community, maintaining a miniscule extramural research program.   Any new research funds goes right into NOAA coffers.  This must change.  Let's start with the 50 million in Sandy research money and use most of it for extramural, university-based research. NWS/NOAA extramural weather model research should be targeted to the most acute needs of the National Weather Service modeling efforts.  Trust me, money speaks in the research community.

5.  Create a strategic plan with community input and do it.   Currently, there is NO comprehensive and detailed strategic plan by the National Weather Service on the improvement of numerical weather prediction.   This contrasts with foreign meteorological services (such as our neighbors, the Canadians), who have laid out detailed and aggressive roadmaps of their future direction.   You can't go far without a map.  The NWS needs one and the community should be at the table when it is constructed.

5.  Provide decent documentation of what U.S. modeling centers are doing.

  Want to figure out the details of the models run by the U.S. ?  Good luck.  It is pretty much impossible to do so by going to EMC or its parent NCEP's web sites.  Scanty, out-of-date material is all you will find.  Amusingly, what you WILL find is their response to "certain blogs."  You can't imagine whose.

Let me be blunt: the state of operational U.S. numerical weather prediction is an embarrassment to the nation and it does not have to be this way.  Taiwan, Germany, England, the European Center, Canada, and other nations have more computer power for their weather prediction services.  Our nation has had inferior numerical weather prediction for too long.  New computers are an obvious and relatively easy first step, because they make everything possible.  For the price of a single warplane we could have greatly improved weather prediction that would save lives and property.   Congress and the American public should not accept delays in action.  If this issue was placed before a real leader like President Lincoln, asking him when we should act, I can imagine what he would say (click on the arrow at the bottom of the  picture to find out):





No comments:

Post a Comment