Twenty-one land-surface schemes (LSSs) participated in the Project for Intercomparison of Land-surface Parameterizations (PILPS) Phase 2(e) experiment, which used data from the Tome-Kalix Rivers in northern Scandinavia. Atmospheric forcing data (precipitation, air temperature, specific humidity, wind speed, downward shortwave and longwave radiation) for a 20-year period (1979-1998) were provided to the 21 participating modeling groups for 218 1/4degrees grid cells that represented the study domain. The first decade (1979-1988) of the period was used for model spin-up. The quality of meteorologic forcing variables is of particular concern in high-latitude experiments and the quality of the gridded dataset was assessed to the extent possible. The lack of sub-daily precipitation, underestimation of true precipitation and the necessity to estimate incoming solar radiation were the primary data concerns for this study. The results from two of the three types of runs are analyzed in this, the first of a three-part paper: (1) calibration-validation runs-calibration of model parameters using observed streamflow was allowed for two small catchments (570 and 1300 km(2)), and parameters were then transferred to two other catchments of roughly similar size (2600 and 1500 km(2)) to assess the ability of models to represent ungauged areas elsewhere; and 2) reruns-using revised forcing data (to resolve problems with apparent underestimation of solar radiation of approximately 36%, and certain other problems with surface wind in the original forcing data). Model results for the period 1989-1998 are used to evaluate the performance of the participating land-surface schemes in a context that allows exploration of their ability to capture key processes spatially. In general, the experiment demonstrated that many of the LSSs are able to capture the limitations imposed on annual latent heat by the small net radiation available in this high-latitude environment. Simulated annual average net radiation varied between 16 and 40 W/m(2) for the 21 models, and latent heat varied between 18 and 36 W/m(2). Among-model differences in winter latent heat due to the treatment of aerodynamic resistance appear to be at least as important as those attributable to the treatment of canopy interception. In many models, the small annual net radiation forced negative sensible heat on average, which varied among the models between - 11 and 9 W/m(2). Even though the largest evaporation rates occur in the summer (June, July and August), model-predicted snow sublimation in winter has proportionately more influence on differences in annual runoff volume among the models. A calibration experiment for four small sub-catchments of the Torne-Kalix basin showed that model parameters that are typically adjusted during calibration, those that control storage of moisture in the soil column or on the land surface via ponding, influence the seasonal distribution of runoff, but have relatively little impact on annual runoff ratios. Similarly, there was no relationship between annual runoff ratios and the proportion of surface and subsurface discharge for the basin as a whole. (C) 2003 Elsevier Science B.V. All rights reserved.