pub_data | Yang Lab

Chen et al. 2025

Title: Genetic control of non-coding RNAs in the human brain and their implications for complex traits

1. BrainMeta.circRNAeQTL_archive.tar.gz: circRNA-eQTL summary statistics (SMR format)

2. BrainMeta.lncRNAeQTL_archive.PCG.tar.gz: PCG-eQTL summary statistics (SMR format)

3. BrainMeta.lncRNAeQTL_archive.lnc.tar.gz: lncRNA-eQTL summary statistics (SMR format)

4. BrainMeta.lncRNAeQTL_archive.tar.gz: all features (including lncRNA, PCG, others)-eQTL summary statistics (SMR format)

5. final_candidate_lncRNA.gtf: lncRNA annotation file (custom annotations, GTF format, hg19-based)

6. lncRNA.novel.fasta: novel lncRNAs in fasta format with both hg19 and hg38 coordinate.

Cheng et al. 2025

Title: Cross-ancestry genome-wide association study augments genetic insights and clinical potentials for refractive error

1. Cross_ancest_EUR_EAS_AFR_no23andme1tbl: Summary statistics of cross-ancestry genome-wide association study (GWAS) meta-analysis from European, East Asian and African ancestries, excluding the 23andMe cohort.

2. EUR_seven_cohorts_myopia_metal_ratio01_adjusted_no23andme.fastGWAz: Summary statistics of population-stratified GWAS meta-analysis from European ancestry, excluding the 23andMe cohort.

3. five_cohort_EAS_metal.fastGWAz: Summary statistics of population-stratified GWAS meta-analysis from East Asian ancestry.

4. two_cohort_AFR_metal.fastGWAz: Summary statistics of population-stratified GWAS meta-analysis from African ancestry.

5. fixed_effect_recreate.script: The script to recreate the meta-analysis result for bona fide researchers who meet the criteria for accessing the association summary statistics data from 23andMe.

6 .EUR_seven_cohort_metal_ratio01_adjusted_no23andme_sbrc.txt: The polygenic risk score (PRS) weights generated from SBayesRC using EUR GWAS summary, excluding the 23andMe cohort.

7. EAS_metal_sbrc.txt: The PRS weights generated from SBayesRC using EAS GWAS summary. In applying cross-ancestry PRS, please use the target tuning samples to calibrate the PRS weights across ancestries, following the procedure demonstrated in "sbayesrc_script.R".

8. AFR_metal_sbrc.txt: The PRS weights generated from SBayesRC using AFR GWAS summary. In applying cross-ancestry PRS, please use the target tuning samples to calibrate the PRS weights across ancestries, following the procedure demonstrated in "sbayesrc_script.R".

9. sbayesrc_script.R: The script to generate PRS weights and multi-ancestry PRS by combining the SNP effects with weights estimated from the tuning data using SBayesRC.

Xue et al. 2024, Communications Medicine

Title: Unravelling the complex causal effects of substance use behaviours on common disease

1. Xue_et_al_ukbEUR_SI_common_2024.txt.gz: Summary statistics of Smoking Initiation (SI).

2. Xue_et_al_ukbEUR_FS_common_2024.txt.gz: Summary statistics of Former Smoking (FS).

3.Xue_et_al_ukbEUR_CS_common_2024.txt.gz: Summary statistics of Current Smoking (CS).

4.Xue_et_al_ukbEUR_SC_common_2024.txt.gz: Summary statistics of Smoking Cessation (SC).

5. Xue_et_al_ukbEUR_AC_common_2024.txt.gz: Summary statistics of Alcohol Consumption (AC).

6. Xue_et_al_ukbEUR_TI_common_2024.txt.gz: Summary statistics of Tea Intake (TI).

7. Xue_et_al_ukbEUR_CI_common_2024.txt.gz: Summary statistics of Coffee Intake (CI).

8. Xue et al MR_SUB Commun Med 2024.pdf: Description of the dataset.

Qi et al. 2022, Nature Genetics

Title: Genetic control of RNA splicing and its distinctive role in complex trait variation

1. BrainMeta sQTLs: BrainMeta sQTL summary statistics (2,865 samples on 2,443 individuals).

2. BrainMeta eQTLs: BrainMeta eQTL summary statistics (2,865 samples on 2,443 individuals).

3. Qi_et_al_SMR_COLOC.tar.gz: SMR and COLOC analyses summary statistics for 12 brain-related phenotypes.

Xue et al. 2021, Nature Communications

Title: Genome-wide analyses of behavioural traits are subject to bias by misreports and longitudinal changes

1. Xue et al AC MLC bias Nat Commun 2020.tar.gz: Summary statistics of genome-wide association of alcohol consumption.

2. Xue et al AC MLC bias Nat Commun 2020.pdf: Description of the dataset.

Adolphe et al. 2020, Genome Medicine

Title: Genetic and functional interaction network analysis reveals global enrichment of regulatory T Cell genes influencing basal cell carcinoma susceptibility

1. Adolphe_Xue_et_al_BCC_Genome_Med_2020.tar.gz: Summary statistics of basal cell carcinoma (BCC) GWAS.

2. Adolphe_Xue_et_al_BCC_Genome_Med_2020.pdf: Description of the dataset.

Xue et al. 2018, Nature Communications

Title: Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes

1. Xue_et_al_T2D_META_Nat_Commun_2018.gz: GWAS summary statistics of common variants.

2. Xue_et_al_T2D_META_Nat_Commun_2018.pdf: Description of the dataset.

3. Xue_et_al_T2D_META_rare_Nat_Commun_2018.gz: GWAS summary statistics of rare variants (added on 6 Feb 2022).

Zhu et al. 2018, Nature Communications

Title: Causal associations between risk factors and common diseases inferred from GWAS summary data

The summary-level GWAS data for 23 phenotypes were from GERA and UK Biobank. Each data set has been made available as a whitespace-separate table in GCTA-COJO format. Columns are SNP, the effect allele, the other allele, frequency of the effect allele, effect size, standard error, p-value and sample size.

1. GERA data: Details of quality controls of the genotyped and imputed data can be found in Zhu et al. (2018 Nat. Commun.). The individual-level ICD-9 codes were classified into 22 common diseases. We added an additional trait ‘Disease Count’ (a count of the number of diseases affecting each individual) as a crude measure of general health status of each individual.

2. UK Biobank data: Details of quality controls of the genotyped and imputed data can be found in Zhu et al. (2018 Nat. Commun.). Individual-level ICD-10 codes were available in the UKB data. To match the diseases in GERA, we classified the phenotypes into 22 common diseases by projecting the ICD-10 codes to the classifications of ICD-9 codes in GERA taking into account the self-reported disease status. Note that we did not perform the association analysis for dermatophytosis because the number of cases was too small. We only performed the association analyses on a subset of SNPs (in common with the top associated SNPs for the risk factors) for insomnia, iron deficiency anemias, macular degeneration, peripheral vascular disease and acute reaction to stress.

Yang et al. 2015, Nature Genetics

Title: Estimation of genetic variance from imputed sequence variants reveals negligible missing heritability for human height and body mass index

1. LDSCORE_release_July2015.tar.gz: per-SNP and per-segment LD scores calculated from 44,126 unrelated indivduals and ~17M imputed variants. Columns are SNP, per-SNP LD score, and per-segment LD score.

2. GWAS_summary_release_July2015.tar.gz, GWAS summary data. Columns are SNP, the coded allele, effect size, and standard error.

Back to Data