大数据的最大挑战来自气候变化

Global sea levels are about eight inches higher today than they were in 1880 and they are expected to rise another two to seven feet during this century. At the same time some 5 million people in the U.S. live in 2.6 million coastal homes situated less than 4 feet above high tide.

你知道吗,今天的全球海平面要比1880年的时候高出8英寸,而就在本世纪内,全球海平面预计还将上涨2到7英尺。另外,美国沿海地区有260万户家庭的500余万人口的住宅,在海水满潮时,只高出海平面不到4英尺。

Do the math: Climate change is a problem whatever its cause.
毫无疑问,气候变化是个大问题,不管导致它的原因是什么。

The problem? Actually making those complex calculations is an extremely challenging proposition. To understand the impact of climate change at the local level you’ll need more than back-of-the-napkin mathematics.
那么如何计算气候对环境的影响呢?事实上,要进行这些复杂的计算,是一个极具挑战性的课题。要想了解气候变化对一国一地的影响水平,绝对不是在一张餐巾纸上写写画画就能算得出来的。

You’ll need big data technology.
这时你就需要大数据技术了。

Surging Seas is an interactive map and tool developed by the nonprofit Climate Central that shows in graphic detail the threats from sea-level rise and storm surges to all of the 3000-plus coastal towns cities counties and states in the continental United States. With detail down to neighborhood scale—search for a specific location or zoom down as necessary—the tool matches areas with flooding risk timelines and provides li<x>nks to fact sheets data downloads action plans em<x>beddable widgets and other items.
“上升的海平面”(Surging Seas)是由非盈利组织“气候中心”(Climate Central)开发的一款互动式地图工具,它用图形的形式详细描绘了海平面上升和风暴潮给美国大陆沿海3000多个城市、城镇和农村造成的威胁。它的细节可以精确到每一个街区——你可以搜索一个特定的地理位置,或是按照需要继续缩小目标范围。这个工具会与存在洪泛风险的地区进行匹配,并且提供相关实时报道、数据下载、行动计划、内嵌小工具和其它相关事项的链接。

It’s the kind of number-crunching that was all but impossible only a few years ago.
这种数据处理方式仅仅在几年前还是不可能实现的。

‘Just as powerful just as big’
能力有多大,困难就多大

“Our strategy is to tell people about their climate locally in ways they can understand and the only way to do that is with big data analysis” said Richard Wiles vice president for strategic communications and director of research with Climate Central. “Big data allows you to say simple clear things.”
气候中心的战略沟通副总裁兼研究主任理查德o怀尔斯表示:“我们的战略是以人们能够理解的方式告诉他们当地的气候情况,唯一能实现这个目标的方法就是通过大数据分析。大数据让你能够简单、清晰地表达。”

There are actually two types of big data in use today to help understand and deal with climate change Wiles said. The first is relatively recently collected data that is so voluminous and complex that it couldn’t be effectively manipulated before such as NASA images of heat over cities Wiles said. This kind of data “literally was too big to handle not that long ago” he said “but now you can handle it on a regular computer.”
怀尔斯指出,目前主要有两种大数据形式可以用来帮助人们了解和应对气候变化。第一类是某些在近期才收集到的数据,但它们往往数据量极大且非常复杂,搁在以前很难对其进行有效分析,比如美国国家航空航天局(NASA)对各大城市的热成像绘图。怀尔斯表示,这种数据“一直到不久之前,还因为数据量过大而基本上没法处理,但是现在你已经可以在一台普通的电脑上处理它们了。”

The second type of big data is older datasets that may be less-than-reliable. This data “was always kind of there” Wiles said such as historic temperature trends in the United States. That kind of dataset is not overly complex but it can be fraught with gaps and errors. “A guy in Oklahoma may have broken his thermometer back in 1936” Wiles said meaning that there could be no measurements at all for two months of that year.
第二类大数据是一些相对较老但可能不那么可靠的数据。怀尔斯表示,这些数据“基本上一直都在那儿”,比如美国的历史气温趋势。这种数据一般不太复杂,但有可能存在不少缺口和误差。比如怀尔斯就指出:“1936年,俄克拉荷马州的某个负责量气温的家伙有可能不小心把温度计弄坏了。”这样的话,当年可能就有两个月根本没有气温记录。

Address those issues and existing data can be “just as powerful just as big” Wiles said. “It makes it possible to make the story very local.”
怀尔斯表示,要解决这些问题,现有的数据可以说“能力有多大,困难就有多大。但是大数据技术使得揭示一城一地的气候变化成为可能。”

Climate Central imports data from historical government records to produce highly localized graphics for about 150 local TV weather forecasters across the U.S. illustrating climate change in each station’s particular area. For example “Junes in Toledo are getting hotter” Wiles said. “We use these data all the time to try to localize the climate change story so people can understand it.”
气候中心从政府的历史记录中获取原始数据,然后为美国各地的150余家地方电视台的天气预报节目制作高度本地化的气候图形,以阐释该地区的气候变化。比如怀尔斯指出:“今年六月,托雷多市变热了。我们一直利用这些数据试图让当地人了解气候变化趋势。”

‘One million hours of computation’
100万小时的计算

Though the Climate Central map is an effective tool for illustrating the problem of rising sea levels big data technology is also helping researchers model analyze and predict the effects of climate change.
气候中心的地图是阐释海平面上升情况的一个非常有效的工具。此外,大数据技术还能帮助研究人员模拟、分析和预测气候变化的影响。

“Our goal is to turbo-charge the best science on massive data to create novel insights and drive action” said Rebecca Moore engineering manager for Google Earth Engine. Google Earth Engine aims to bring together the world’s satellite imagery—trillions of scientific measurements dating back almost 40 years—and make it available online along with tools for researchers.
谷歌地图引擎(Google Earth Engine)的工程经理瑞贝卡o摩尔介绍道:“我们的目标是助力最好的大数据分析技术,以催生新颖的见解并且促进行动。”谷歌地图旨在将全球的卫星图像进行汇总,其中还包括40年来数以万亿计的观测数据,并将其与其它为研究人员开发的工具一道放在网上。

Global deforestation for example “is a significant contributor to climate change and until recently you could not find a detailed current map of the state of the world’s forests anywhere” Moore said. That changed last November when Science magazine published the first high-resolution maps of global forest change from 2000 to 2012 powered by Google Earth Engine.
比如在全球荒漠化问题上,摩尔表示:“全球荒漠化是气候变化的一个重要推手,直到不久之前,还没有一份详细的实时地图能够显示全球各地的森林情况。但现在情况不同了,去年11月,《科学》(Science)杂志在谷歌地图引擎的帮助下,发布了首张2000至2012年的高分辨率全球森林变化图。

“We ran forest-mapping algorithms developed by Professor Matt Hansen of University of Maryland on almost 700000 Landsat satellite images—a total of 20 trillion pixels” she said. “It required more than one million hours of computation but because we ran the analysis on 10000 computers in parallel Earth Engine was able to produce the results in a matter of days.”
摩尔介绍道:“我们运行的森林测绘算法是由马里兰大学(University of Maryland)的马特o汉森教授开发的,总共利用了70万张美国陆地资源卫星的图像,加起来大约有20万亿个像素点。它需要超过100万小时的计算时间,但由于我们是在10000台计算机上并行计算的,因此谷歌地球引擎才得以在几天内就得出了结果。

On a single computer that analysis would have taken more than 15 years. Anyone in the world can view the resulting interactive global map on a PC or mobile device.
如果只用一台计算机计算的话,完成这样一次分析大概需要超过15年的时间。但现在全球各地的任何人都可以在电脑或移动设备上查看这次分析得到的这张互动式全球地图。

‘We have sensors everywhere’
传感器无所不在

Rapidly propelling such developments meanwhile is the fact that data is being collected today on a larger scale than ever before.
在这些项目取得快速进展的背后离不开这样一个事实:如今我们对数据的收集程度已经远超以往任何时候。

“Big data in climate first means that we have sensors everywhere: in space looking down via remote sensing satellites and on the ground” said Kirk Borne a data scientist and professor at George Mason University. Those sensors are continually recording information about weather land use vegetation oceans ice cover precipitation drought water quality and many more variables he said. They are also tracking correlations between datasets: biodiversity changes invasive species and at-risk species for example.
乔治梅森大学的数据学家柯克o波恩教授指出:“大数据技术在气候研究领域的发展,首先意味着传感器已经无所不在。首先是太空中的遥感卫星,其次是地面上的传感器。”这些传感器时刻记录着地球各地的天气、土地利用、植被、海洋、冰层、降水、干旱、水质等信息以及许多变量。同时它们也在跟踪各种数据之间的关联,比如生物多样性的变化、入侵物种和濒危物种等等。

Two large monitoring projects of this kind are NEON—the National Ecological Observatory Network—andOOI the Ocean Observatories Initiative.
在这一类监控项目中有两个比较有代表性的大型项目,一个是美国国家生态观测站网络(NEON),一个是海洋观测计划(OOI)。

“All of these sensors also deliver a vast increase in the rate and the number of climate-related parameters that we are now measuring monitoring and tracking” Borne said. “These data give us increasingly deeper and broader coverage of climate change both temporally and geospatially.”
波恩指出:“这些传感器令我们现在正在观测和追踪的气候参数无论在等级还是数量上都有了极大的提高。另外无论是在时间上还是在地理空间上,这些数据对气候变化的覆盖都变得越来越深、越来越广。”

Climate change is one of the largest examples of scientific modeling and simulation Borne said. Efforts are focused not on tomorrow’s weather but on decades and centuries into the future.
波恩表示,气候变化是科学建模仿真应用得最广泛的例子之一。科学家不仅利用建模仿真来预测明天的天气,而且还用它来预测几十年甚至几百年后的气候。

“Huge climate simulations are now run daily if not more frequently” he said. These simulations have increasingly higher horizontal spatial resolution—hundreds of kilometers versus tens of kilometers in older simulations; higher vertical resolution referring to the number of atmospheric la<x>yers that can be modeled; and higher temporal resolution—zeroing in on minutes or hours as opposed to days or weeks he added.
他还表示:“大规模的气候模拟现在每天都在运行,有些甚至可能更为频繁。”这些模拟的水平分辨率更高,达到几百公里,而过去的模拟只能达到几十公里。同时它们垂直分辨率也变得更高,这也就表示可以对大气层中更多的层进行建模。另外还有更高的瞬时分辨率,也就是说只需要几分钟或几个小时就可以进行归零校正,而不是几天或几个星期。

The output of each daily simulation amounts to petabytes of data and requires an assortment of tools for storing processing analyzing visualizing and mining.
每天的气候模拟都会生成几千兆字节的数据,并且需要一系列工具进行存储、处理、分析、挖掘和图像化。

‘All models are wrong but some are useful’
所有模型都是错的,但有些很有用

Interpreting climate change data may be the most challenging part.
气候变化数据的解读可能是最具有挑战性的部分。

“When working with big data it is easy to create a model that explains the correlations that we discover in our data” Borne said. “But we need to remember that correlation does not imply causation and so we need to apply systematic scientific methodology.”
波恩指出:“搞大数据时,要建立一个模型来解释我们在数据中发现的某种关联是很容易的。但我们得记住,这种关联并不代表原因,所以我们需要应用系统化的科学方法。”

It’s also important to heed the maxim that “all models are wrong but some are useful” Borne said quoting statistician George Box. “This is especially critical for numerical computer simulations where there are so many assumptions and ‘parameterizations of our ignorance.’
波恩还指出,搞大数据最好要记住统计学家乔治o博克斯的名言:“所有模型都是错的,但有些很有用。”他表示:“这对数字计算机模拟尤为重要,因为其中有很多假设和‘代表了我们的无知的参数’”。

“What fixes that problem—and also addresses Box’s warning—is data assimilation” Borne said referring to the process by which “we incorporate the latest and greatest observational data into the current model of a real system in order to correct adjust and validate. Big data play a vital and essential role in climate prediction science by providing corrective actions through ongoing data assimilation.”
波恩表示:“要想解决这个问题,以及解决博克斯警告我们的问题,最重要的是做好数据同化。”也就是“把最新最好的观测数据纳入一个真实系统的实时模型中,以对数据进行纠正、调整、确认。通过以不间断的数据同化作为校正措施,大数据在气候预测科学中扮演了至关重要且不可或缺的角色。

‘We are in a data revolution’
我们已经在一场数据革命之中
;