Research project

Identification of transcription factor binding sites as sequencing targets, using Diamond-Blackfan Anaemia as a model

Clinical Bioinformatics - Genomics
Clodagh McGuire
Training location
King's College Hospital NHS Foundation Trust


Diamond-Blackfan Anaemia (DBA) is a rare inherited bone marrow failure syndrome. Most patients have a heterozygous pathogenic variant in a ribosomal protein (RP) gene. In rare cases, DBA is caused by a variant in GATA1 – an important transcription factor known to act downstream of the RP genes in the erythroid pathway, but suspected to also act upstream to regulate these genes. A third of patients do not receive a molecular diagnosis after sequencing these genes, so we suspect they may have variants in regulatory regions.


To investigate GATA1 binding in RP genes and identify targets in regulatory regions to sequence in undiagnosed patients.


We used publicly available ChIP-seq data to investigate GATA1 binding in erythroblasts. ChIP-seq peaks were assigned to genes using ChIPseeker and histone modifications were used to confirm whether the GATA1-bound genes were transcriptionally active. We then scanned the sequences beneath the GATA1 peaks using FIMO to identify the GATA1 binding motifs and primers were designed to target these regions for future sequencing.


We identified 66 RP genes where GATA1 was binding and confirmed that these genes were being actively transcribed in erythroblasts, demonstrating that GATA1 acts upstream to regulate these genes. In total, 558 GATA1 motifs were identified in RP genes and their regulatory regions; these regions +/- 30bp will be used to target for sequencing. This methodology could be used in future to investigate regulatory regions of genes implicated in other rare diseases.

Last updated on 4th October 2022