Today was the second day of the introductory course in NGS bioinformatics that I'm taking as part of my PhD studies.
For me it started with a substantial oversleep, probably due to a combination of an annoying cold and the ~2 hour commute from south Stockholm to Uppsala and BMC. Thus I missed some really interesting material (and tutorial) on file types in NGS analysis, but will make sure to go through that in my free time during the week.
I came to the lecture quite fittingly, exactly when Olga Vinnere-Petterson were going from the "older" NGS techniques that I already know fairly well (454, etc), and into the newer ones, such as PacBio and IonTorrent.
In the afternoon, Adam Ameur covered some examples of typical analysis workflows for NGS data, taken from their real-world projects. This was super-interesting, and really gave a glimpse into what NGS analysis often looks like in practice.
A common theme for the day was constrasting of different types of Next-gen sequencing machines, their specific quality characteristics, and the subsequent impact on the analytics of its data.
In some of the final slides, we had this kind of "overview" of the main three technologies today:
Then there is also some upcoming really interesting technologies, some of which is already available at SciLifeLab:
Some other random learnings from today was:
Finally the course participants had the chance to shortly present their own on-going projects and to discuss any questions they might have related to that. We had quite some discussion on the suitable number of samples, and it was concluded that 3 well functioning samples is the absolute minimum (of course), but that one should always strive to have a bit more, as a margin, since otherwise if one of them is dropped for any reason (low quality or other problems), you're smoked. Also it was noted that just "technical samples" (taking multiple times from the same material) is not as interesting as "biological samples" (multiple cells / tissue samples, but optimally even multiple donors).
All in all, it was a highly interesting day, with a lot of discussion of the intricacies of data analysis in the face of messy data, which has to involve various quality control measures, and ways to handle that, etc.
Now looking forward to tomorrow (or today, as I write this blog post one day late, on the train to the course), wednesday, when we will dive into the core of the course, with "Alignment with BWA; Samtools; Data Processing with Picard; Variant Calling with GATK; SAM/BAM and VCF Formats".
Bionics IT currently serving as a research and development blog for Samuel Lampa, a PhD student (Pharmaceutical Bioinformatics at Uppsala University).
Find me elsewhere on the web:
Do you want to get faster on using the commandline? ... and help me pay my linode and and domain fees at the same time? :)
Then feel free to check out my
commandline productivity course!