Evaluating the suitability of MapReduce for surface temperature analysis codes

Vinay Sudhakaran, Neil Chue Hong

Research output: Contribution to conferencePaperpeer-review

Abstract

Processing large volumes of scientific data requires an efficient and scalable parallel computing framework to obtain meaningful information quickly. In this paper, we evaluate a scientific application from the environmental sciences for its suitability to use the MapReduce framework. We consider cccgistemp – a Python reimplementation of the original NASA GISS model for estimating global temperature change – which takes land and ocean temperature records from different sites, removes duplicate records, and adjusts for urbanisation effects before calculating the 12 month running mean global temperature. The application consists of several stages, each displaying differing characteristics, and three stages have been ported to use Hadoop with the mrjob library. We note performance bottlenecks encountered while porting and suggest possible solutions, including modification of data access patterns to overcome uneven distribution of input data.
Original languageEnglish
Number of pages10
Publication statusPublished - Nov 2011
EventSuperComputing 2011 (SC11) - Seattle, United States
Duration: 12 Nov 201118 Nov 2011

Conference

ConferenceSuperComputing 2011 (SC11)
Country/TerritoryUnited States
CitySeattle
Period12/11/1118/11/11

Keywords

  • Data-intensive
  • MapReduce
  • Hadoop
  • environmental sciences

Fingerprint

Dive into the research topics of 'Evaluating the suitability of MapReduce for surface temperature analysis codes'. Together they form a unique fingerprint.

Cite this