About the Role
The Associate, Data and Statistics owns the design, execution, and delivery of statistical analyses across Shamiri’s research and operational portfolio. The role drives data quality systems, leads end-to-end analytic workstreams, and produces rigorous outputs that directly inform publications, grant reporting, internal decision-making, and program design. This is an independent contributor role with clear ownership over specific projects and the analytical infrastructure that underpins them.
Shamiri conducts randomized trials, pilots, longitudinal studies, implementation evaluations, and operational performance tracking. As the organization scales, the volume and complexity of data increases. Reliable data systems and structured analytic workflows are essential to maintain scientific rigor and enable continuous improvement.
The Associate owns data cleaning and validation pipelines, executes pre-registered and exploratory analyses, builds and maintains reproducible analytic workflows, and produces publication-ready tables and figures. Working across the Research and Learning team, this role takes primary ownership of assigned study datasets and delivers analytics that are scientifically sound and operationally useful.
Responsibilities
Data management and quality systems
Design, own, and maintain reproducible data cleaning pipelines in R for assigned study and operational datasets.
Develop and execute systematic validation protocols covering range checks, missingness patterns, duplicate detection, and cross-source consistency; document findings and drive resolution with field teams.
Own and maintain codebooks, data dictionaries, and metadata documentation as living assets updated with each data collection wave.
Identify, diagnose, and resolve anomalies and inconsistencies in datasets, determining root causes and recommending process improvements to prevent recurrence.
Lead data quality reviews with research management and field teams, setting standards and timelines for issue resolution.
Statistical analysis and modeling
Lead descriptive and exploratory analyses for assigned studies, synthesizing findings into clear summaries for research and operational audiences.
Execute regression analyses, mixed-effects models, and other inferential models end-to-end, from specification through interpretation and write-up.
Take ownership of analysis plans for assigned workstreams, including structural equation models, mediation analyses, or multilevel models where applicable.
Execute pre-registered analysis plans with fidelity, documenting any deviations and their rationale transparently.
Design and conduct robustness checks, sensitivity analyses, and assumption tests as a standard part of analytic delivery, not as an ad hoc add-on.
Reproducibility and documentation
Set and uphold standards for clean, well-commented, reproducible scripting across the data team, establishing shared conventions for structure and documentation.
Own the team’s analytic repository structure and version control practices, ensuring scripts and outputs are consistently organized, tracked, and accessible.
Produce complete analytic logs documenting all decisions, variable transformations, and cleaning steps, creating an audit trail that meets publication-ready standards.
Conduct peer review of other team members’ scripts, providing structured feedback on reproducibility, clarity, and correctness.
Tables, figures, and reporting
Independently produce publication-quality tables, figures, and summary statistics for donor reports, peer-reviewed manuscripts, and internal decision briefs.
Draft and own the methods and results sections of manuscripts and technical reports, working iteratively with co-authors through peer review.
Act as the statistical accuracy lead for Knowledge team outputs, reviewing written materials and flagging misrepresentations before publication.
Data infrastructure and systems
Own and manage structured databases and data repositories, including access controls, schema documentation, and version history.
Design and run systematic quality checks within database systems, maintaining a log of issues and tracking resolution to closure.
Lead integration of data across platforms including survey tools, clinical systems, and operational dashboards, building pipelines that reduce manual handling and improve data freshness.
Cross functional collaboration
Manage and triage internal analytics requests from delivery, clinics, technology, and product teams, scoping work, setting timelines, and delivering outputs that are directly usable.
Present analytic findings and interpret results for non-technical audiences in leadership, program, and partner meetings.
Collaborate to align analytic outputs with research and operational priorities.
Capacity building
Lead training for junior staff and interns on data quality, coding standards, and analytic methods, building a consistent skill baseline across the team.
Design and facilitate internal workshops and SOPs on analytics, reproducibility, and statistical literacy, raising the floor for evidence use across the organization.
Key competencies
Advanced proficiency in R (tidyverse, ggplot2, lme4, or equivalent packages); able to write modular, well-documented scripts and conduct independent code review.
Proficiency in SQL for querying, transforming, and managing structured datasets; able to write and optimise multi-table queries independently.
Familiarity with version control (Git or equivalent) and collaborative analytic workflows.
Deep knowledge of quantitative research methods and study design, including RCTs, pre-registered analyses, power calculations, and inferential statistics.
Ability to independently scope, plan, and deliver multi-month analytic workstreams with minimal supervision.
Rigorous attention to detail with a track record of catching and correcting errors before they reach downstream outputs.
Commitment to reproducibility as a non-negotiable standard, not a best-effort practice.
Ability to translate statistical findings into clear, decision-relevant language for non-specialist audiences without losing analytical integrity.
High professional standards in data ethics, confidentiality, and responsible use of participant data in a global health context.
Qualifications
Master’s degree in statistics, data science, epidemiology, public health, economics, or a related quantitative field; or a Bachelor’s degree with equivalent demonstrated research experience.
3 to 5 years of experience in data analysis, applied statistics, or quantitative research, with demonstrated ownership of analytic workstreams from start to publication or delivery.
Strong proficiency in R, including tidyverse, data.table, or equivalent; able to write and review production-quality analytic scripts without supervision.
Proficiency in SQL for querying and managing structured databases; experience with BigQuery or similar cloud platforms is an advantage.
Hands-on experience with RCT, longitudinal, or implementation evaluation datasets, including pre-registered analysis, randomization checks, and ITT/CACE estimation.
Experience contributing to or co-authoring peer-reviewed publications or technical reports is a strong advantage.