An improved generic schema for high fidelity data linkage and sample tracing across complex multi-assay medical entomology studies

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

An improved generic schema for high fidelity data linkage and sample tracing across complex multi-assay medical entomology studies

Authors

Kavishe, D. R.; Msoffe, R. V.; Mmbaga, S.; Tarimo, L. J.; Butler, F.; Kaindoa, E. W.; Govella, N. J.; Kiware, S. S.; Killeen, G.

Abstract

Evidence-based decision making on malaria vector control strategies increasingly rely on triangulation of data which requires informatics systems that can integrate data from complex, multi-stage studies involving mosquitoes. This manuscript describes a performance evaluation of an extended version of the generic schema underpinning the VBDs360 platform, specifically improved to accommodate multiple distinct entomological assays spanning the field, insectary, and laboratory. The utility of this extension, with respect to high-fidelity data linkage and robust sample traceability across complex entomological workflows, was evaluated through a case study conducted in southern Tanzania. Wild female mosquitoes were collected from 40 locations across more than 4,000 square km and then reared through multiple generations in an insectary before derived iso-female lineages were tested for phenotypic susceptibility to a pyrethroid insecticide. Such multi-generational lineages (F0 to Fn; where n is greater than or equal to 2) were propagated to prevent non-heritable maternal effects on phenotype and produce enough progeny for standard WHO susceptibility assays. All samples were subsequently archived in a molecular laboratory, where all F0 specimens were tested for sibling species identity. A paper-based implementation of the extended schema enabled successful integration of 77,017 lines of data distributed across 6 different tables that spanned 3 distinct field, insectary, and laboratory workflows, implemented by three different teams working in different locations. At each step, fully independent and redundant primary and secondary keys enabled high fidelity error correction and sample tracing. Consistently perfect linkage between assay design and sample sorting data was achieved for F0 wild-caught adults, with 100% of 66,108 record successfully linked between field capture and morphological categorization. This complete traceability extended to the propagation of derived Fn lineages, with all 100 and 243 records from 9 adult-derived and 13 larval-derived lineages, respectively, correctly linked. Insecticide susceptibility phenotype further confirmed 100% linkage for 5,654 records between exposure history and recorded mortality outcome data in the insectary. Although such cross-cleaned linkages to sample analysis and storage data recorded by the laboratory team were not entirely perfect and could be improved, they were nevertheless of very high fidelity (97.3% (1967/2,022) for F0 samples and 99.3% (437/440) for Fn samples). Overall, this pilot application of the extended generic schema ensured robust data provenance and minimized transcription errors in this complex study distributed across multiple teams and locations. These findings demonstrate how this generic informatics framework may be scaled and adapted to support data integrity across diverse, large-scale, multi-team entomological research workflows.

Follow Us on

0 comments

Add comment