Data Format¶

ST Data¶

The input ST data must be an h5ad file containing at least the gene expression matrix and spatial coordinates. The gene expression matrix should be stored in the layers attribute, and the spatial coordinates should be in the obsm attribute with the key spatial. Optionally, the h5ad file may include spot (cell) annotations in the obs attribute.

import scanpy as sc

adata = sc.read_h5ad("gsMap_example_data/ST/E16.5_E1S1.MOSTA.h5ad")

print(adata.layers["count"].shape)
print(adata.obsm["spatial"].shape)
print(adata.obs["annotation"].value_counts().head())

GWAS Data¶

The input GWAS data is a text file containing at least the columns for SNP (rs number), Z (Z-statistics), and N (sample size). Column headers are keywords used by gsMap.

zcat gsMap_example_data/GWAS/IQ_NG_2018.sumstats.gz | head -n 5

SNP  A1 A2 Z N
rs12184267 T C 0.916 225955
rs12184277 G A 0.656 226215
rs12184279 A C 1.050 226224
rs116801199 T G 0.300 226626

How to format the GWAS data¶

You can convert GWAS summary data into the required format using custom code. For convenience, gsMap provides a command to do this. Below is an example of how to use the command.

Download the human height GWAS data and decompress it.

wget https://portals.broadinstitute.org/collaboration/giant/images/4/4e/GIANT_HEIGHT_YENGO_2022_GWAS_SUMMARY_STATS_ALL.gz

gzip -d GIANT_HEIGHT_YENGO_2022_GWAS_SUMMARY_STATS_ALL.gz

Convert the summary statistics to the required format.

gsmap format_sumstats \
--sumstats 'GIANT_HEIGHT_YENGO_2022_GWAS_SUMMARY_STATS_ALL' \
--out 'HEIGHT'

You will obtain a file named HEIGHT.sumstats.gz

zcat HEIGHT.sumstats.gz | head -n 5

SNP  A1 A2 Z N
rs3131969 G A 0.328 1494218.000
rs3131967 C T 0.386 1488150.000
rs12562034 A G 1.714 1554976.000
rs4040617 G A -0.463 1602016.000

For more usage options, please refer to:

gsMap format_sumstats -h