READ ME This text describes the data presented in the paper: Genome-wide association study identifies risk loci for progressive chronic lymphocytic leukemia. Lin et al. ======================== Introductory information ======================== Files included in the data deposit (include a short description of what data are contained): 1)Genome-wide association summary statistics file for time to first treatment in chronic lymphphocytic leukemia. Key words used to describe the data: chronic lymphocytici leukemia; CLL; time to first treatment; TTFT ========================== Methodological information ========================== The primary outcome assessed in this study was time to first treatment (TTFT), defined as the interval between CLL diagnosis and date of first treatment or last follow-up. For each study, allelic dosage was estimated for the minor allele at each variant position and included in a cox proportional hazard model to estimate hazard ratio (HR) and 95% confidence interval (CI). Variants were included in the meta-analysis if they had results from all six studies. Study-specific single nucleotide polymorphism (SNP) effects were combined using an inverse-variance-weighted method (fixed effects model) and the DerSimonian-Laird approach (random effects model) using R metafor package (v2.4-0). Date(s) of data collection: November 2020 Geographic coverage of data: United Kingdom Data validation (how was the data checked, proofed and cleaned): For each GWAS we excluded markers with departure from Hardy-Weinberg equilibrium (HWE; P = 10-3), a call rate < 95% or with significant differences in minor allele frequency (P = 10-3) between genotype batches. Data from all six GWAS were combined for sample quality control processing. Samples were excluded if the call rate was < 95%, heterozygosity exceeded 3 standard deviations from the overall mean heterozygosity or were identified as non-European based on principal components analysis using 1000 genome data as a reference. Samples were also removed such that there were no two individuals with estimated relatedness pihat = 0.1875, with retention of the sample with the higher call rate. ========================= Data-specific information ========================= Definitions of names, labels, acronyms or specialist terminology uses for variables, records and their values: chromosome number; base position in genome build 37; SNP base; ref sequence identifier; meta P value for association with time to first treatment in CLL; directly genotyped or imputed SNP ======= Contact ======= Please contact rdm@ncl.ac.uk for further information