* First authors with equal contribution.
Please see Google Scholar for a full list of publications.
Submitted
-
Mendelian Randomization Analysis Using Multiple Biomarkers of an Underlying Common Exposure
Jin Jin, Guanghao Qi, Zhi Yu, and 1 more author
Under revision for Biometrics, 2021+
Mendelian Randomization (MR) analysis is increasingly popular for testing the causal effect of exposures on disease outcomes using data from genome-wide association studies. In some settings, the underlying exposure, such as systematic inflammation, may not be directly observable, but measurements can be available on multiple biomarkers, or other types of traits, that are co-regulated by the exposure. We propose method MRLE, which tests the significance for, and the direction of, the effect of a latent exposure by leveraging information from multiple related traits. The method is developed by constructing a set of estimating functions based on the second-order moments of summary association statistics, under a structural equation model where genetic variants are assumed to have indirect effects through the latent exposure and potentially direct effects on the traits. Simulation studies showed that MRLE has well-controlled type I error rates and increased power compared to single-trait MR tests under various types of pleiotropy. Applications of MRLE using genetic association statistics across five inflammatory biomarkers (CRP, IL-6, IL-8, TNF-α and MCP-1) provided evidence for potential causal effects of inflammation on increased risk of coronary artery disease, colorectal cancer and rheumatoid arthritis, while standard MR analysis for individual biomarkers often failed to detect consistent evidence for such effects.
-
Associations between Cannabis and Alcohol Consumption Polygenic Risk Scores and Ever Misusing Opioids in an Urban, African American Cohort
Jill A. Rabinowitz, Jin Jin, Sally I-Chun Kuo, and 9 more authors
Under revision for PLOS One, 2021+
BACKGROUND: This study leveraged results from large-scale GWAS to examine whether polygenic risk scores (PRS) for lifetime cannabis use and alcohol consumption were associated with ever misusing opioids and whether sex differences existed in these relations in an urban, African American sample. METHODS: Data were drawn from three cohorts of participants (N = 1,103; 45% male) who were recruited in first grade as part of a series of elementary school-based, universal preventive interventions conducted in a Mid-Atlantic region of the U.S. In young adulthood, participants provided a DNA sample and reported on whether they had used heroin or misused prescription narcotic drugs in their lifetime. Lifetime cannabis use PRS were created based on GWAS conducted by Pasman et al. (2018). PRS for alcohol consumption were created based on two GWAS, one conducted by Gelertner et al. (2019) focused on maximum drinking (i.e., largest number of drinks in one day in a typical month), and another conducted by Kranzler et al. (2019) focused on alcohol consumption indexed via the Alcohol Use Disorder Identification Test (AUDIT-C). RESULTS: Higher PRS for lifetime cannabis use and greater maximum drinking and alcohol consumption were associated with a greater likelihood of misusing opioids among the whole sample and males. CONCLUSION: Findings suggest that cannabis and alcohol consumption are pleiotropic with opioid misuse among young adults generally and males specifically, elucidating the genetic architecture of this public health problem, though replication of our findings are needed.
Refereed Publications
-
T2-DAG: A Powerful Test for Differentially Expressed Gene Pathways via Graph-informed Structural Equation Modeling
Jin Jin, and Yue Wang
Bioinformatics, Nov 2021
btab770
A major task in genetic studies is to identify genes related to human diseases and traits to understand functional characteristics of genetic mutations and enhance patient diagnosis. Compared to marginal analyses of individual genes, identification of gene pathways, i.e., a set of genes with known interactions that collectively contribute to specific biological functions, can provide more biologically meaningful results. Such gene pathway analysis can be formulated into a high-dimensional two-sample testing problem. Given the typically limited sample size of gene expression datasets, most existing two-sample tests tend to have compromised powers because they ignore or only inefficiently incorporate the auxiliary pathway information on gene interactions. We propose T2-DAG, a Hotelling’s T2-type test for detecting differentially expressed gene pathways, which efficiently leverages the auxiliary pathway information on gene interactions from existing pathway databases through a linear structural equation model. We further establish its asymptotic distribution under pertinent assumptions. Simulation studies under various scenarios show that T2-DAG outperforms several representative existing methods with well-controlled type-I error rates and substantially improved powers, even with incomplete or inaccurate pathway information or unadjusted confounding effects. We also illustrate the performance of T2-DAG in an application to detect differentially expressed KEGG pathways between different stages of lung cancer.
-
Genetic Propensity for Risky Behavior and Depression and Risk of Lifetime Suicide Attempt among Urban African Americans in Adolescence and Young Adulthood
Jill A Rabinowitz, Jin Jin, Geoffrey Kahn, and 8 more authors
American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, Nov 2021
Suicide attempts (SA) among African Americans have increased at a greater rate than any other racial/ethnic group. Research in European ancestry populations has indicated that SA are genetically influenced; however, less is known about the genetic contributors that underpin SA among African Americans. We examined whether genetic propensity for depression and risky behaviors (assessed via polygenic risk scores; PRS) independently and jointly are associated with SA among urban, African Americans and whether sex differences exist in these relations. Participants (N = 1,157, 45.0% male) were originally recruited as part of two first grade universal school-based prevention trials. Participants reported in adolescence and young adulthood on whether they ever attempted suicide in their life. Depression and risky behaviors PRS were created based on large-scale genome-wide association studies conducted by Howard et al. (2019) and Karlson Línner et al. (2019), respectively. There was a significant interaction between the risky behavior PRS and depression PRS such that the combination of high risky behavior polygenic risk and low/moderate polygenic risk for depression was associated with greater risk for lifetime SA among the whole sample and African American males specifically. In addition, the risky behavior PRS was significantly positively associated with lifetime SA among African American males. These findings provide preliminary evidence regarding the importance of examining risky behavior and depression polygenic risk in relation to SA among African Americans, though replication of our findings in other African American samples is needed.
-
Provider and Patient Characteristics of Medicare Beneficiaries Who Are High-Risk for COVID-19 Mortality
Jeromie Ballreich, Jin Jin, Prosenjit Kundu, and 1 more author
Journal of General Internal Medicine, Nov 2021
The Medicare population comprises 60 million Americans,1 many of whom are the most vulnerable people susceptible to COVID-19 mortality. The National Academies of Sciences, Engineering and Medicine recommended targeting high-risk individuals including older individuals in the early stages of a national vaccine rollout.2 In this study, we identify Medicare beneficiaries at high risk for COVID-19 mortality, and characterize primary care providers who serve these beneficiaries.
-
Polygenic Risk Scores for Kidney Function to the Circulating Proteome, and Incident Kidney Diseases: the Atherosclerosis Risk in Communities Study
Zhi Yu, Jin Jin, Adrienne Tin, and 8 more authors
Journal of the American Society of Nephrology, Nov 2021
BACKGROUND: Genome-wide association studies (GWAS) have revealed numerous loci for kidney function (eGFR). The relationship between polygenic predictors of eGFR, risk of incident adverse kidney outcomes, and the plasma proteome is not known. METHODS: We developed a genome-wide polygenic risk score (PRS) for eGFR by applying the LDpred algorithm to summary statistics generated from a multiethnic meta-analysis of CKDGen Consortium GWAS (n5765,348) and UK Biobank GWAS (90% of the cohort; n5451,508), followed by best-parameter selection using the remaining 10% of UK Biobank data (n545,158). We then tested the association of the PRS in the Atherosclerosis Risk in Communities (ARIC) study (n58866) with incident CKD, ESKD, kidney failure, and AKI. We also examined associations between the PRS and 4877 plasma proteins measured at middle age and older adulthood and evaluated mediation of PRS associations by eGFR. Results The developed PRS showed a significant association with all outcomes. Hazard ratios per 1 SD lower PRS ranged from 1.06 (95% CI, 1.01 to 1.11) to 1.33 (95% CI, 1.28 to 1.37). The PRS was significantly associated with 132 proteins at both time points. The strongest associations were with cystatin C, collagena-1(XV) chain, and desmocollin-2. Most proteins were higher at lower kidney function, except for five proteins, including testican-2. Most correlations of the genetic PRS with proteins were mediated by eGFR. CONCLUSIONS: A PRS for eGFR is now sufficiently strong to capture risk for a spectrum of incident kidney diseases and broadly influences the plasma proteome, primarily mediated by eGFR.
-
Individual and Community-level Risk for COVID-19 Mortality in the United States
Jin Jin*, Neha Agarwala*, Prosenjit Kundu*, and 4 more authors
Nature Medicine, Nov 2021
Reducing COVID-19 burden for populations will require equitable and effective risk-based allocations of scarce preventive resources, including vaccinations1. To aid in this effort, we developed a general population risk calculator for COVID-19 mortality based on various sociodemographic factors and pre-existing conditions for the US population, combining information from the UK-based OpenSAFELY study with mortality rates by age and ethnicity across US states. We tailored the tool to produce absolute risk estimates in future time frames by incorporating information on pandemic dynamics at the community level. We applied the model to data on risk factor distribution from a variety of sources to project risk for the general adult population across 477 US cities and for the Medicare population aged 65 years and older across 3,113 US counties, respectively. Validation analyses using 54,444 deaths from 7 June to 1 October 2020 show that the model is well calibrated for the US population. Projections show that the model can identify relatively small fractions of the population (for example 4.3%) that might experience a disproportionately large number of deaths (for example 48.7%), but there is wide variation in risk across communities. We provide a web-based risk calculator and interactive maps for viewing community-level risks.
-
Bayesian methods for the analysis of early-phase oncology basket trials with information borrowing across cancer types
Jin Jin, Marie-Karelle Riviere, Xiaodong Luo, and 1 more author
Statistics in Medicine, Nov 2020
Research in oncology has changed the focus from histological properties of tumors in a specific organ to a specific genomic aberration potentially shared by multiple cancer types. This motivates the basket trial, which assesses the efficacy of treatment simultaneously on multiple cancer types that have a common aberration. Although the assumption of homogeneous treatment effects seems reasonable given the shared aberration, in reality, the treatment effect may vary by cancer type, and potentially only a subgroup of the cancer types respond to the treatment. Various approaches have been proposed to increase the trial power by borrowing information across cancer types, which, however, tend to inflate the type I error rate. In this article, we review some representative Bayesian information borrowing methods for the analysis of early-phase basket trials. We then propose a novel method called the Bayesian hierarchical model with a correlated prior (CBHM), which conducts more flexible borrowing across cancer types according to sample similarity. We did simulation studies to compare CBHM with independent analysis and three information borrowing approaches: the conventional Bayesian hierarchical model, the EXNEX approach, and Liu’s two-stage approach. Simulation results show that all information borrowing approaches substantially improve the power of independent analysis if a large proportion of the cancer types truly respond to the treatment. Our proposed CBHM approach shows an advantage over the existing information borrowing approaches, with a power similar to that of EXNEX or Liu’s approach, but the potential to provide substantially better control of type I error rate.
-
A Bayesian method for the detection of proof of concept in early phase oncology studies with a basket design
Jin Jin, Qianying Liu, Wei Zheng, and 4 more authors
Statistics in Biosciences, Nov 2020
In the clinical drug development, proof of clinical concept (PoC) refers to the evidence of treatment efficacy that is obtained from early phase clinical studies. PoC is critical, as it motivates the initiation of late stage clinical trials, and has a profound impact on the “Chemistry, Manufacturing and Controls” (CMC) process, which is preferably launched as early as possible so as to save valuable time for drug development. A new type of oncology clinical trial called basket trial has emerged recently, where the experimental treatment targets on a specific oncogenic pathway that is hypothesized to modulate tumor growth and/or metastasis, and patients with potentially multiple cancer types can be enrolled. The problem of PoC in basket trials has not been formally investigated in the statistical literature. In early phase basket trials, the commonly used independent analysis lacks statistical power of detecting PoC due to limited sample size. A more powerful approach is needed, especially when the treatment effect is not strong enough for each individual cancer type. In this paper, we propose a novel approach for PoC detection in the early phase basket trials under a Bayesian framework. We classify cancer types into a “sensitive subgroup” that responds positively to the treatment, and an “insensitive subgroup” that does not respond to the treatment. We then assess PoC using the posterior probability that at least one cancer type is sensitive. Simulation results show that our proposed approach has a promising performance, with considerable gain in power compared with the independent approach when a relatively large number of the cancer types are sensitive to the treatment.
-
Multi-resolution super learner for voxel-wise classification of prostate cancer using multi-parametric MRI
Jin Jin, Lin Zhang, Ethan Leng, and 2 more authors
To appear in Journal of Applied Statistics, Nov 2020
While current research has shown the importance of Multi-parametric MRI (mpMRI) in diagnosing prostate cancer (PCa), further investigation is needed for how to incorporate the specific structures of the mpMRI data, such as the regional heterogeneity and between-voxel correlation within a subject. This paper proposes a machine learning-based method for improved voxel-wise PCa classification by taking into account the unique structures of the data. We propose a multi-resolution modeling approach to account for regional heterogeneity, where base learners trained locally at multiple resolutions are combined using the super learner, and account for between-voxel correlation by efficient spatial Gaussian kernel smoothing. The method is flexible in that the super learner framework allows implementation of any classifier as the base learner, and can be easily extended to classifying cancer into more sub-categories. We describe detailed classification algorithm for the binary PCa status, as well as the ordinal clinical significance of PCa for which a weighted likelihood approach is implemented to enhance the detection of the less prevalent cancer categories. We illustrate the advantages of the proposed approach over conventional modeling and machine learning approaches through simulations and application to in vivo data.
-
Bayesian Spatial Models for Voxel-wise Prostate Cancer Classification Using Multi-parametric MRI Data
Jin Jin, Lin Zhang, Ethan Leng, and 2 more authors
Statistics in Medicine, Nov 2020
Multi-parametric magnetic resonance imaging (mpMRI) plays an increasingly important role in the diagnosis of prostate cancer. Various computer-aided detection algorithms have been proposed for automated prostate cancer detection by combining information from various mpMRI data components. However, there exist other features of mpMRI, including the spatial correlation between voxels and between-patient heterogeneity in the mpMRI parameters, that have not been fully explored in the literature but could potentially improve cancer detection if leveraged appropriately. This paper proposes novel voxel-wise Bayesian classifiers for prostate cancer that account for the spatial correlation and between-patient heterogeneity in mpMRI. Modeling the spatial correlation is challenging due to the extreme high dimensionality of the data, and we consider three computationally efficient approaches using Nearest Neighbor Gaussian Process (NNGP), knot-based reduced-rank approximation, and a conditional autoregressive (CAR) model, respectively. The between-patient heterogeneity is accounted for by adding a subject-specific random intercept on the mpMRI parameter model. Simulation results show that properly modeling the spatial correlation and between-patient heterogeneity improves classification accuracy. Application to in vivo data illustrates that classification is improved by spatial modeling using NNGP and reduced-rank approximation but not the CAR model, while modeling the between-patient heterogeneity does not further improve our classifier. Among our proposed models, the NNGP-based model is recommended considering its robust classification accuracy and high computational efficiency.
-
Signature maps for automatic identification of prostate cancer from colorimetric analysis of h&e-and IHC-stained histopathological specimens
Ethan Leng, Jonathan C Henriksen, Anthony E Rizzardi, and 8 more authors
Scientific reports, Nov 2019
Prostate cancer (PCa) is a major cause of cancer death among men. The histopathological examination of post-surgical prostate specimens and manual annotation of PCa not only allow for detailed assessment of disease characteristics and extent, but also supply the ground truth for developing of computer-aided diagnosis (CAD) systems for PCa detection before definitive treatment. As manual cancer annotation is tedious and subjective, there have been a number of publications describing methods for automating the procedure via the analysis of digitized whole-slide images (WSIs). However, these studies have focused only on the analysis of WSIs stained with hematoxylin and eosin (H&E), even though there is additional information that could be obtained from immunohistochemical (IHC) staining. In this work, we propose a framework for automating the annotation of PCa that is based on automated colorimetric analysis of both H&E and IHC WSIs stained with a triple-antibody cocktail against high-molecular weight cytokeratin (HMWCK), p63, and α-methylacyl CoA racemase (AMACR). The analysis outputs were then used to train a regression model to estimate the distribution of cancerous epithelium within slides. The approach yielded an AUC of 0.951, sensitivity of 87.1%, and specificity of 90.7% as compared to slide-level annotations, and generalized well to cancers of all grades.
-
Detection of prostate cancer with multiparametric MRI utilizing the anatomic structure of the prostate
Jin Jin, Lin Zhang, Ethan Leng, and 2 more authors
Statistics in medicine, Nov 2018
Multiparametric magnetic resonance imaging (mpMRI), which combines traditional anatomic and newer quantitative MRI methods, has been shown to result in improved voxel-wise classification of prostate cancer as compared with any single MRI parameter. While these results are promising, substantial heterogeneity in the mpMRI parameter values and voxel-wise prostate cancer risk has been observed both between and within regions of the prostate. This suggests that classification of prostate cancer can potentially be improved by incorporating structural information into the classifier. In this paper, we propose a novel voxel-wise classifier of prostate cancer that accounts for the anatomic structure of the prostate by Bayesian hierarchical modeling, which can be combined with post hoc spatial Gaussian kernel smoothing to account for residual spatial correlation. Our proposed classifier results in significantly improved area under the ROC curve (0.822 vs 0.729, P < .001) and sensitivity corresponding to 90% specificity (0.599 vs 0.429, P < .001), compared with a baseline model that does not account for the anatomic structure of the prostate. Furthermore, the classifier can also be applied on voxels with missing mpMRI parameters, resulting in similar performance, which is an important practical consideration that cannot be easily accommodated using regression-based classifiers. In addition, our classifier achieved high computational efficiency with a closed-form solution for the posterior predictive cancer probability.
-
Development of a measure for evaluating lesion-wise performance of CAD algorithms in the context of mpMRI detection of prostate cancer
Ethan Leng, Benjamin Spilseth, Lin Zhang, and 3 more authors
Medical physics, Nov 2018
PURPOSE: Computer-aided detection/diagnosis (CAD) of prostate cancer (PCa) on multiparametric MRI (mpMRI) is an active area of research. In the literature, the performance of predictive models trained to detect PCa on mpMRI has typically been reported in terms of voxel-wise measures such as sensitivity and specificity and/or area under the receiver operating curve (AUC). However, it is unclear whether models that score higher by these measures are actually superior. Here, we propose a novel method for lesion identification as well as novel measures that assess the quality of the detected lesions. METHODS: A total of 46 axial MRI slices of interest from 34 patients and the associated histopathologic ground truths were used to develop and to characterize the proposed measures. The proposed lesion-wise score sℓ is based on the Jaccard similarity index with modifications that emphasize the overlap and colocalization of predicted lesions with ground truth lesions. Thresholding of sℓ allowed for the sensitivity and specificity of lesion detection to be assessed, while the proposed lesion-summary score sσ is a weighted average of sℓs that provides a single summary statistic of lesion detection performance. The proposed measures were used to compare the lesion detection performance of a predictive model vs that of a radiologist on the same data set. The measures were also used to evaluate the degree to which viewing the cancer prediction improved diagnostic accuracy. RESULTS: The lesion-wise score qualitatively reflected the goodness of predicted lesions over a wide range of values (sℓ = 0.1 to sℓ = 0.8) and was found to encompass a larger range of values than the Dice coefficient did over the same range of prediction qualities (0–0.9 vs 0–0.75). The lesion-summary score was shown to vary linearly with voxel-wise sensitivity and quadratically with voxel-wise specificity and correlated well with voxel-wise AUC (ρ = 0.68) and the Dice coefficient (ρ = 0.88). Radiologist performance was found to be significantly improved after viewing the model-generated cancer prediction maps as quantified by both sσ (P = 0.01) and DSC (P = 0.04), with improvements in both lesion detection sensitivity and specificity. CONCLUSION: The proposed measures allow for the assessment of lesion detection performance, which is most relevant in a clinical setting and would not be possible to do with voxel-wise measures alone.
Manuscripts in Preparation
-
A Quasi-likelihood-based Bayesian Framework for the Integration of Multiple Regression Models across Studies with Disparate Covariate Information
Jin Jin, and Nilanjan Chatterjee
2021+
-
Developing Trans-ethnic Polygenic Risk Scores Using Empirical Bayes and Super Learning Algorithm
Haoyu Zhang, Jianan Zhan, Jin Jin, and 10 more authors
2021+
-
Ancestry-specific Polygenic Risk Scores for Telomere Length and a Phenome-wide Association Study for Their Association with Risks of Age-related Diseases
Jin Jin, Margaret Taub, Matthew Conomos, and 2 more authors
2021+
-
Polygenic Risk Prediction and Precision Prevention. Statistical Methods for Precision Health. Chakraborty, B., Laber, E., Moodie, E., Cai, T., Van der Laan, M.J. (Eds)
Jin Jin, and Nilanjan Chatterjee
Chapman & Hall/CRC, 2021+
-
Association of Polygenic Coronary Heart Disease Risk with CAC=0 and CAC>1000 in Adults>75 Years Old: The Atherosclerosis Risk in Communities Study
O. Dzaye, A.C. Razavi, Z.A. Dardari, and 15 more authors
2021+
-
A Bayesian Framework for Polygenic Risk Prediction Leveraging Information across Multiple Ethnic Groups
Jin Jin, Jingning Zhang, Jianan Zhan, and 6 more authors
2021+
Other Articles
-
Transparency, Reproducibility, and Validity of COVID-19 Projection Models.
Medium - Towards Data Science, 2020