TY - JOUR
T1 - Predicting the neutral hydrogen content of galaxies from optical data using machine learning
AU - Rafieferantsoa, Mika
AU - Andrianomena, Sambatra
AU - Davé, Romeel
N1 - Near final version deposited to arXiv 22.03.18, before e-publication date 05.07.18.
PY - 2018/10/1
Y1 - 2018/10/1
N2 - We develop a machine learning-based framework to predict the HI content of galaxies from optical photometry and environmental parameters. We train the algorithm on z = 0-2 outputs from the MUFASA cosmological hydrodynamic simulation, which includes star formation, feedback, and a heuristic model to quench massive galaxies that yields a reasonable match to a range of survey data including HI.We employ a variety of machine learning methods (regressors), and quantify their performance using the slope of the predicted versus true relation, its root mean square error (RMSE), and Pearson correlation coefficient (r). Training on only Sloan Digital Sky Survey photometry, all regressors give r > 0.8 and RMSE ~ 0.3 at z = 0, led by random forests with r = 0.91, and a deep neural network (DNN) with comparable accuracy (r = 0.9). Adding near-IR photometry improves all regressors. All regressors perform worse with redshift, particularly at z ≲ 1. Slope values are generally sub-linear, so that we overpredict HI in HI-poor galaxies and underpredict HI rich, because the regressors do not fully capture the scatter in the data. We test our framework on REsolved Spectroscopy Of a Local VolumE (RESOLVE) and Arecibo Legacy Fast ALFA (ALFALFA) survey data. Training on a subset of the observations, we find that our machine learning method can reasonably predict H Irichnesses in the remaining data (RMSE ~ 0.28 for RESOLVE and ~0.25 for ALFALFA). Training on mock data from MUFASA to predict observed data is worse (RMSE ~ 0.45 for RESOLVE and 0.31 for ALFALFA), with DNN well outperforming other regressors. Our method will be useful for making galaxy-by-galaxy survey predictions and incompleteness corrections for upcoming HI 21 cm surveys on Square Kilometre Array precursors such as MeerKAT, over regions where photometry is already available.
AB - We develop a machine learning-based framework to predict the HI content of galaxies from optical photometry and environmental parameters. We train the algorithm on z = 0-2 outputs from the MUFASA cosmological hydrodynamic simulation, which includes star formation, feedback, and a heuristic model to quench massive galaxies that yields a reasonable match to a range of survey data including HI.We employ a variety of machine learning methods (regressors), and quantify their performance using the slope of the predicted versus true relation, its root mean square error (RMSE), and Pearson correlation coefficient (r). Training on only Sloan Digital Sky Survey photometry, all regressors give r > 0.8 and RMSE ~ 0.3 at z = 0, led by random forests with r = 0.91, and a deep neural network (DNN) with comparable accuracy (r = 0.9). Adding near-IR photometry improves all regressors. All regressors perform worse with redshift, particularly at z ≲ 1. Slope values are generally sub-linear, so that we overpredict HI in HI-poor galaxies and underpredict HI rich, because the regressors do not fully capture the scatter in the data. We test our framework on REsolved Spectroscopy Of a Local VolumE (RESOLVE) and Arecibo Legacy Fast ALFA (ALFALFA) survey data. Training on a subset of the observations, we find that our machine learning method can reasonably predict H Irichnesses in the remaining data (RMSE ~ 0.28 for RESOLVE and ~0.25 for ALFALFA). Training on mock data from MUFASA to predict observed data is worse (RMSE ~ 0.45 for RESOLVE and 0.31 for ALFALFA), with DNN well outperforming other regressors. Our method will be useful for making galaxy-by-galaxy survey predictions and incompleteness corrections for upcoming HI 21 cm surveys on Square Kilometre Array precursors such as MeerKAT, over regions where photometry is already available.
KW - Galaxies: evolution
KW - Galaxies: statistics
KW - Methods: numerical
UR - http://www.scopus.com/inward/record.url?scp=85051464181&partnerID=8YFLogxK
U2 - 10.1093/mnras/sty1777
DO - 10.1093/mnras/sty1777
M3 - Article
AN - SCOPUS:85051464181
SN - 0035-8711
VL - 479
SP - 4509
EP - 4525
JO - Monthly Notices of the Royal Astronomical Society
JF - Monthly Notices of the Royal Astronomical Society
IS - 4
ER -