Model selection in ADMIXTURE can be inconsistent: proof of the K=2 phenomenon
Model selection in ADMIXTURE can be inconsistent: proof of the K=2 phenomenon
Do, D.; Terhorst, J.
AbstractSTRUCTURE and ADMIXTURE are two popular methods for detecting population structure in genetic data. They model observed genotypes as mixtures of latent ancestral populations, and the inferred admixture proportions can be used to visualize and summarize population structure. A key parameter in these models is the number of ancestral populations, K. Selecting K is a challenging problem. Perhaps the most widely used method is Evanno's {Delta}K, which selects K based on the second-order change in log-likelihood as K increases. However, practitioners have often noted that {Delta}K often favors overly small K, frequently returning K=2 even when more meaningful substructure is present. In this paper, we provide a theoretical explanation for this phenomenon: we prove that, under certain conditions, the {Delta}K, method can be inconsistent, meaning that it can fail to identify the true number of populations even with infinite data.