Description
We’re hiring a part-time Research Data Scientist to lead end-to-end preparation of complex, large-scale health datasets for peer-reviewed publication. This role centers on cleaning, harmonizing, and structuring messy, multi-source datasets, followed by advanced statistical analysis and machine learning to generate publishable insights.
You’ll work with survey, observational, and real-world health data, building reproducible analytical workflows that meet academic research standards. This role is best suited for a PhD-trained data scientist or quantitative researcher with deep experience in machine learning, advanced statistics, and real-world data analysis.
Key Responsibilities
Data Cleaning & Harmonization
Clean, normalize, and integrate messy datasets from multiple sources (e.g., survey data from longitudinal studies)
Resolve inconsistencies and schema mismatches across datasets
Design scalable approaches to dataset harmonization for cross-study comparability
Data Pipeline Development
Build and maintain reproducible data processing workflows for large-scale datasets
Structure datasets for downstream statistical modeling and publication-ready outputs
Implement version-controlled workflows for data processing and analysis
Statistical Analysis & Machine Learning
Apply advanced statistical methods (e.g., mixed-effects models, causal inference, longitudinal modeling)
Develop, validate, and interpret machine learning models for large-scale observational data as needed
Ensure methodological rigor aligned with peer-reviewed research standards
Research Collaboration
Partner with researchers to refine hypotheses, define analytic strategies, and interpret findings
Translate complex analyses into clear, defensible results for academic publication
Reproducibility & Publication Support
Develop reproducible codebases and documentation (e.g., notebooks, pipelines)
Prepare datasets, figures, and statistical outputs for manuscripts, abstracts, and reports
Contribute to methodological transparency and auditability of analyses
Technical publication-ready writing ability required (e.g., writing up Results and Methods sections for publication)
Requirements
Qualifications
PhD (preferred) in Data Science, Statistics, Biostatistics, Epidemiology, Computer Science, Experimental Psychology or a related quantitative field
3–5+ years experience working with large, complex datasets in research, healthcare, or applied data science
Strong expertise in data cleaning, preprocessing, and dataset harmonization at scale
Advanced proficiency in Python or R (e.g., pandas, tidyverse, scikit-learn, statsmodels) or related software/programming experience
Deep experience with machine learning and advanced statistical methods
Strong foundation in reproducible research practices
Ability to communicate technical findings clearly to interdisciplinary teams and collaborate with team members to produce high quality publications
Preferred
Prior experience preparing analyses for peer-reviewed publication
Familiarity with survey data (Qualtrics, REDCap) and/or healthcare data standards (FHIR)
Background in public health, epidemiology, or biostatistics
Experience with causal inference, longitudinal analysis, or real-world evidence studies
Experience working with messy, real-world observational datasets across multiple sources
Familiarity with cloud or distributed data tools (AWS, GCP, or Spark)
Background or familiarity in cannabinoid research
Save this search
×{Error Message Title}
×Insert additional messaging here.
We use cookies on this site to enhance your experience. By using our website you accept our use of cookies.
YourMembership uses cookies for your convenience and security. Cookies are text files stored on the browser of your computer and are used to make your experience on web sites more personal and less cumbersome. You may choose to decline cookies if your browser permits, but doing so may affect your ability to access or use certain features of this site. Please refer to your web browser's help function for assistance on how to change your preferences.