TY - JOUR
T1 - A neural network family for systematic analysis of RF size and computational-path-length distribution as determinants of neural predictivity and behavioral performance
AU - Peters, Benjamin
AU - Stoffl, Lucas
AU - Kriegeskorte, Nikolaus
PY - 2022/12/1
Y1 - 2022/12/1
N2 - Deep feedforward convolutional neural network models (FCNNs) explain aspects of the representational transformations in the visual hierarchy. However, particular models implement idiosyncratic combinations of architectural hyperparameters, which limits theoretical progress. In particular, the size of receptive fields (RFs) and the distribution of computational path lengths (CPL; number of nonlinearities encountered) leading up to a representational stage are confounded across layers of the same architecture (deeper layers have larger RFs) and depend on idiosyncratic choices (kernel sizes, depth, skipping connections) across architectures. Here we introduce HBox, a family of architectures designed to break the confoundation of RF size and CPL. Like conventional FCNNs, an HBox model contains a feedforward hierarchy of convolutional feature maps. Unlike FCNNs, each map has a predefined RF size that can result from shorter or longer computational paths or any combination thereof (through skipping connections). We implemented a large sample of HBox models and investigated how RF size and CPL jointly account for neural predictivity and behavioral performance. The model set also provides insights into the joint contribution of deep and broad pathways which achieve complexity, respectively, through long or numerous computational paths. When controlling for the number of parameters, we find that visual tasks with higher complexity (CIFAR10, Imagenet) and occlusion (Digitclutter; Spoerer et al., 2017) show peak performance in models that trade off breadth to achieve higher depth (average CPL). The opposite holds for a simpler task (MNIST). We further disentangle the contribution of CPL, and RF size to the match of brain and model representation by assessing the ability of HBox models to predict visual representations in regions-of-interests in a large-scale fMRI benchmark (natural scenes dataset; Allen et al., 2021). The HBox architecture family illustrates how high-parametric task-performing vision models can be used systematically to gain theoretical insights into the neural mechanisms of vision.
AB - Deep feedforward convolutional neural network models (FCNNs) explain aspects of the representational transformations in the visual hierarchy. However, particular models implement idiosyncratic combinations of architectural hyperparameters, which limits theoretical progress. In particular, the size of receptive fields (RFs) and the distribution of computational path lengths (CPL; number of nonlinearities encountered) leading up to a representational stage are confounded across layers of the same architecture (deeper layers have larger RFs) and depend on idiosyncratic choices (kernel sizes, depth, skipping connections) across architectures. Here we introduce HBox, a family of architectures designed to break the confoundation of RF size and CPL. Like conventional FCNNs, an HBox model contains a feedforward hierarchy of convolutional feature maps. Unlike FCNNs, each map has a predefined RF size that can result from shorter or longer computational paths or any combination thereof (through skipping connections). We implemented a large sample of HBox models and investigated how RF size and CPL jointly account for neural predictivity and behavioral performance. The model set also provides insights into the joint contribution of deep and broad pathways which achieve complexity, respectively, through long or numerous computational paths. When controlling for the number of parameters, we find that visual tasks with higher complexity (CIFAR10, Imagenet) and occlusion (Digitclutter; Spoerer et al., 2017) show peak performance in models that trade off breadth to achieve higher depth (average CPL). The opposite holds for a simpler task (MNIST). We further disentangle the contribution of CPL, and RF size to the match of brain and model representation by assessing the ability of HBox models to predict visual representations in regions-of-interests in a large-scale fMRI benchmark (natural scenes dataset; Allen et al., 2021). The HBox architecture family illustrates how high-parametric task-performing vision models can be used systematically to gain theoretical insights into the neural mechanisms of vision.
U2 - 10.1167/jov.22.14.4287
DO - 10.1167/jov.22.14.4287
M3 - Meeting abstract
SN - 1534-7362
VL - 22
JO - Journal of Vision
JF - Journal of Vision
IS - 14
ER -