We develop amachine learning (ML) framework to populate large darkmatter-only simulations with baryonic galaxies. Our ML framework takes input halo properties including halo mass, environment, spin, and recent growth history, and outputs central galaxy and halo baryonic properties, including stellar mass (M*), star formation rate (SFR), metallicity (Z), neutral (HI), and molecular (H2) hydrogen mass. We apply this to the MUFASA cosmological hydrodynamic simulation, and show that it recovers the mean trends of output quantities with halo mass highly accurately, including following the sharp drop in SFR and gas in quenched massive galaxies. However, the scatter around themean relations is underpredicted. Examining galaxies individually, at z = 0, the stellar mass and metallicity are accurately recovered (σ ≲ 0.2 dex), but SFR and HI show larger scatter (σ ≳ 0.3 dex); these values improve somewhat at z = 1 and 2. Remarkably, ML quantitatively recovers second parameter trends in galaxy properties, e.g. that galaxies with higher gas content and lower metallicity have higher SFR at a given M*. Testing various ML algorithms, we find that none perform significantly better than the others, nor does ensembling improve performance, likely because none of the algorithms reproduce the large observed scatter around the mean properties. For the random forest algorithm, we find that halo mass and nearby (~200 kpc) environment are the most important predictive variables followed by growth history, while halo spin and ~Mpc-scale environment are not important. Finally, we study the impact of additionally inputting key baryonic properties M*, SFR and Z, as would be available e.g. from an equilibrium model, and show that particularly providing the SFR enables HI to be recovered substantially more accurately.
- Cosmology: theory
- Galaxies: evolution
- Large-scale structure of Universe