An Expert-Informed Synthetic Animal Data Generator: A Physiology-Consistent Generative Framework for High-Fidelity Animal Digital Twins

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

An Expert-Informed Synthetic Animal Data Generator: A Physiology-Consistent Generative Framework for High-Fidelity Animal Digital Twins

Authors

Youssef, A.; Sun, C.; Norton, T.

Abstract

Digital twins are increasingly recognized as a transformative technology for precision livestock farming; however, a major bottleneck in their development remains the scarcity of high-quality, high-granularity physiological data. This study introduces the expert-informed conditional diffusion (EICD) framework, a novel approach to synthesizing high-fidelity metabolic time-series trajectories by embedding mechanistic biological principles directly into the generative process. While traditional generative models often prioritize statistical pattern-matching over biological reality, frequently resulting in physiological hallucinations, the EICD framework utilizes a physiology loss function (PhLF) to act as a form of mechanistic regularization. This guardrail penalizes samples that contradict expert-defined constraints, such as the laws of porcine bioenergetics, effectively steering the model toward a realistic physiological manifold. The framework was validated using an empirical dataset of growing pigs under varying thermal conditions. Quantitative results demonstrate a high statistical distributional fidelity, with the model achieving an average Jensen-Shannon divergence (JSD) of 0.062 and a Kullback-Leibler divergence (KLD) of 0.19. The full EICD model produced a mean energy expenditure (EE) of 284.94 +/- 38.70 kJ/kg/day, mirroring the empirical average of 281.33 +/- 41.58 kJ/kg/day. In contrast, the standard generative diffusion model (i.e., with no physiology guardrail) exhibited significant distributional drift, yielding a mean EE of 334.41 kJ/kg/day. The biological integrity of the model was further assessed using the biological violation rate (BVR), a novel metric defined as the percentage of generated samples that fall outside the physically possible metabolic boundaries established by species-specific laws. While the standard diffusion model produced frequent biological artifacts, the EICD framework successfully suppressed these hallucinations, ensuring that synthetic trajectories remain strictly grounded in mechanistic laws. Despite these advancements, limitations remain at physiological extremes where individual stochasticity is high. By providing a reliable method for generating physiology-consistent synthetic data, this framework provides a robust foundation for the next generation of animal digital twins.

Follow Us on

0 comments

Add comment