Investigate the uncertainty of measurement in NGS data
Whilst on my elective I worked on a project using some NGS data that NIBSC had produced from their JAK2 international reference standard. This was an interesting dataset to work with as it was very high coverage data (~20,000 read depth) and due to the collaborative study that was previously performed on the data, there was a consensus value on the proportions of variant that should be present in each sample. This made it possible to investigate the uncertainty of measurement for the NGS samples in this dataset.
Identify parameters for improving quality of NGS calls
The results of the analysis are available in the Elective Report document. I investigated a number of different factors to see how they affected the accuracy of the variant percentage using the results from the collaborative study as the truth set for each sample. My analysis showed that the accuracy was reduced when reads where the variant was located near to the start or end of the read were included in the analysis. When these reads were excluded, the results were found to be closer to those recorded in the collaborative study.
Expand knowledge of data analysis techniques using R
I used both Python and R in this project. I used Python to extract the read information from the BAM file and produce a TSV file of the results. I then used R to perform the analysis and produce the plots. The code produced is available on the NIBSC GitHub.
Find out more about the work of NIBSC and how it links to Public Health
The National Institute of Biological Standards and Control (NIBSC) is a government agency which plays a national and international role in assuring the quality of biological medicines. This is achieved through direct batch testing of biological medicines, generation of international standards and reference materials (used by manufacturers for calibrating testing procedures), and applied research. Through effective regulation of medicines underpinned by science and research, NIBSC help to protect and improve the health of the population.
Experience working with new colleagues in a department outside the NHS
Throughout the project I worked closely with the lead bioinformatician at NIBSC who agreed to host me on for my elective. I also worked closely with one of the scientists based there who helped to develop the JAK2 international standard. We had regular progress update meetings once a week to discuss the project and steer the direction for the following week. These meetings were useful to help keep focused and allowed the project to progress at a fast pace.
Critical Reflection of Experience
My elective was originally scheduled to last for six weeks however my supervisor was offered a new job and was due to move offices before my placement would finish. I therefore only spent four weeks at NIBSC. It was a shame to cut the elective short as I was enjoying the work and got on very well with my new colleagues there. However I was pleased with the amount of work I managed to achieve in the time that I was there and I learnt some useful skills which I will try to use in my future practice. There were a number of factors that made my elective a useful experience. My supervisor was very knowledgeable and always made time for me to discuss the project work. We had a regular scheduled meeting where we would meet together with the scientist who produced the dataset I was working on to discuss the progress of the project, but also whenever I had any questions I was able to speak to my supervisor to ask for his advice. Having the dataset available and ready to go made sure I could get started on the project from the first day. The project also had some clear aims and my supervisor had made suggestions on how I could investigate the data so this initial forethought and planning made it much easier to start working on the project straight away. Having the freedom to install what I wanted onto my computer also meant I wasn’t restricted with what tools I could use. I installed RStudio, Tablet and Notepad++ onto my computer, all of which I used regularly throughout the project. I was a bit apprehensive about using R for the data analysis as I did not have much experience of using this before. It was quite a steep learning curve and I do not think that the code that I produced was very efficient; however I did enjoy learning to use R for this project though and found the plotting feature using ggplot very powerful. I would definitely like to continue using R, particularly for any data analysis projects.
I enjoyed using R for my elective project but feel I have only scratched the surface of what it has to offer. I would like to continue to learn more about R so that I can use it in my future practice. I will see if there are any online courses to provide a more formal introduction to the language which I can work through and improve my skills. Since returning from my elective the scientist I was working with on the project has managed to get some additional data from one of the centres involved in the collaborative study. This is also NGS sequencing data of the JAK2 international standard however it has been sequenced using a different technique. I intend on analysing this data to see how it compares with the dataset I was using during my elective. Due to the differences in techniques used I will not be able to directly reuse the scripts developed whilst on my elective but I will attempt to modify these and build on my experiences to analyse the new dataset. Towards the end of my elective there was some discussion as to whether the data produced could be used as part of a publication. I will continue to work with my colleagues at NIBSC if they need any further assistance from me regarding this. I will present my elective project results and experience to the laboratory to share what I have learnt and inform them of how it will shape my future practice.