This work is concerned with the convergence of Gaussian process regression. A particular focus is on hierarchical Gaussian process regression, where hyper-parameters appearing in the mean and covariance structure of the Gaussian process emulator are a-priori unknown, and are learnt from the data, along with the posterior mean and covariance. We work in the framework of empirical Bayes, where a point estimate of the hyper-parameters is computed, using the data, and then used within the standard Gaussian process prior to posterior update. We provide a convergence analysis that (i) holds for any continuous function f to be emulated; and (ii) shows that convergence of Gaussian process regression is unaffected by the additional learning of hyper-parameters from data, and is guaranteed in a wide range of scenarios. As the primary motivation for the work is the use of Gaussian process regression to approximate the data likelihood in Bayesian inverse problems, we provide a bound on the error introduced in the Bayesian posterior distribution in this context.