We search for the baryon acoustic oscillations in the projected cross-correlation function binned into transverse comoving radius between the SDSS-IV DR16 eBOSS quasars and a dense photometric sample of galaxies selected from the DESI Legacy Imaging Surveys. We estimate the density of the photometric sample of galaxies in this redshift range to be about 2900 deg−2, which is deeper than the official DESI emission line galaxy selection, and the density of the spectroscopic sample is about 20 deg−2. In order to mitigate the systematics related to the use of different imaging surveys close to the detection limit, we use a neural network approach that accounts for complex dependences between the imaging attributes and the observed galaxy density. We find that we are limited by the depth of the imaging surveys that affects the density and purity of the photometric sample and its overlap in redshift with the quasar sample, which thus affects the performance of the method. When cross-correlating the photometric galaxies with quasars in the range 0.6 ≤ z ≤ 1.2, the cross-correlation function can provide better constraints on the comoving angular distance DM (6 per cent precision) compared to the constraint on the spherically averaged distance DV (9 per cent precision) obtained from the autocorrelation. Although not yet competitive, this technique will benefit from the arrival of deeper photometric data from upcoming surveys that will enable it to go beyond the current limitations we have identified in this work.