The use of high-quality speech signals has led to considerable breakthroughs in Parkinson's Disease (PD) in the last decade. These include accurate differentiation of PD versus Healthy Controls (HC) and monitoring longitudinal PD symptom severity. We recently concluded the Parkinson's Voice Initiative (PVI) study collecting data from a very large cohort under non-controlled acoustic conditions. We acoustically characterized 11,942 recordings from 6531 US-based participants using 307 dysphonia measures. We selected a robust subset of 30 dysphonia measures using Gram-Schmidt Orthogonalization (GSO). We projected the data onto a two-dimensional representation using t-distributed stochastic neighbor embedding to facilitate visual exploration, and used hierarchical clustering to understand data homogeneity. We demonstrate that there is considerable overlap in the projected feature space between PD and HC, making the binary classification task particularly challenging. The data was grouped into nine clusters using hierarchical clustering which was in broad agreement with the projected two-dimensional representation. These results provide some new insights into understanding the new challenges posed in the PVI project where acoustic recordings conditions were not controlled.