Welcome to the tech & research blog of Samuel Lampa, an open source developer, R&D consultant at Savantic AB and PhD alumn from pharmb.io.
Basic PUB/SUB connection with ZeroMQ in Python
Nov 13, 2019: ZeroMQ is a great way to quickly and simply send messages between multiple programs running on the same or different computers. It is very simple and robust since it doesn't need any central server, but instead talks directly between the programs through sockets, TCP-connections ...
Table-driven tests (and more) in C#
Nov 2, 2019: Folks in the Go community have championed so called table-driven tests (see e.g. this post by Dave Cheney and the Go wiki) as a way to quickly and easily writing up a bunch of complete test cases with inputs and corresponding expected outputs, and looping over them to execute the ...
SciPipe paper published in GigaScience
Apr 27, 2019: I might blog more at a later stage, but just wanted to share that the paper on our Go-based workflow library, SciPipe, was just published in GigaScience:
The complex nature of biological data has driven the development of specialized software ...
Structured Go-routines or framework-less Flow-Based Programming in Go
Mar 2, 2019: I was so happy the other day to find someone else who found the great benefits of a little pattern for how to structure pipeline-heavy programs in Go, which I described in a few posts before. I have been surprised to not find more people using this kind of pattern, which has been ...
Preprint on SciPipe - Go-based scientific workflow library
Aug 2, 2018: A pre-print for our Go-based workflow libarary SciPipe, is out, with the title SciPipe - A workflow library for agile development of complex and dynamic bioinformatics pipelines, co-authored by me and colleagues at pharmb.io: Martin Dahlö, Jonathan Alvarsson and Ola Spjuth. Acce ...
Make your commandline tool workflow friendly
May 25, 2018: There are a number of pitfalls that can make a commandline program really hard to integrate into a workflow (or "pipeline") framework. The reason is that many workflow tools use output file paths to keep track of the state of the tasks producing these files. This is done for exam ...
Semantic Web ❤ Data Science? My talk at Linked Data Sweden 2018
Apr 10, 2018: During the last months, I have had the pleasure work together with Matthias Palmér (MetaSolutions AB) and Fernanda Dórea (National Veterinary Institute), to prepare for and organize this year's version of the annual Linked Data Sweden event, which this year was held in Uppsala ...
Parsing DrugBank XML (or any large XML file) in streaming mode in Go
Mar 15, 2018: I had a problem in which I thought I needed to parse the full DrugBank dataset, which comes as a (670MB) XML file (For open access papers describing DrugBank, see: , ,  and ).
It turned out what I needed was available as CSV files under "Structure External Links". ...
Equation-centric dataflow programming in Go
Dec 27, 2017: Mathematical notation and dataflow programming
Even though computations done on computers are very often based on some type of math, it is striking that the notation used in math to express equations and relations is not always very readily converted into programming code. Out ...
What is a scientific (batch) workflow?
Dec 7, 2017:
Workflows and DAGs - Confusion about the concepts
Jörgen Brandt tweeted a comment that got me thinking again on something I've pondered a lot lately:
"A workflow is a DAG." is really a weak definition. That's like saying "A love letter is a sequence of characters." r ...
Go is growing in bioinformatics workflow tools
Nov 10, 2017:
TL;DR: We wrote a post on gopherdata.io, about the growing ecosystem of Go-based workflow tools in bioinformatics. Go read it here
It is interesting to note how Google's Go programming language seems to increase in popularity in bioinformatics.
Just to give a sample of ...
The frustrating state of note taking tools
Nov 7, 2017: One year left to the dissertation (we hope) and now turning from mostly software development into more of data analysis and needing to read up quite a pile of books and papers on my actual topic, pharmaceutical bioinformatics. With this background, I'm feel forced to ponder ways ...
Learning how to learn
Oct 31, 2017: I'm reading A mind for numbers, by Barbara Oakley. Firstly, it is a very interesting book, but the main lesson I've already learned from this book seems so paramount that I have to write it down, so I don't forget it (some meta-connotations in that statement ;) ). I found the boo ...
Provenance reports in Scientific Workflows
Oct 19, 2017: Hoping to write up a series of posts that go through some of the design decisions made in the SciPipe, the Go-based scientific workflow library we're working on, to share some of the thinking behind them, and get the opportunity to get feedback and suggestions that might be imple ...
(Almost) ranging over multiple Go channels simultaneously
Oct 5, 2017: In my experiments (flowbase, scipipe) in using Flow-based programming (FBP) principles in pure Go, there is a common pattern occuring all the time: The need to synchronously read a set of values from multiple channels at the same time. That is, if I have three in-bound channels, ...
First production run with SciPipe - A Go-based scientific workflow tool
Sep 28, 2017: Today marked the day when we ran the very first production workflow with SciPipe, the Go-based scientific workflow tool we've been working on over the last couple of years. Yay! :)
This is how it looked (no fancy GUI or such yet, sorry):
The first result we got in this ...
Compiling RDFHDT C++ tools on UPPMAX (RHEL/CentOS 7)
Sep 13, 2017: A little background
RDFHDT is an exciting new data format for Semantic Web data in the RDF format. RDF has generally been plagued by extremely verbose textual data formats that have made it impractical for really large data sets. RDFHDT is here to change that with a compact bi ...
Notes on launching kubernetes jobs from the Go API
Feb 15, 2017: This post is also published on medium
My current work at pharmb.io entails adding kubernetes support to my light-weight Go-based scientific workflow engine, scipipe (kubernetes, or k8s for short, is Google’s open source project for orchestrating container based compute clust ...
SMWCon Fall 2016 - My talk on large RDF imports
Oct 7, 2016: I was invited to give a talk at Semantic MediaWiki (SMW) conference (SMWCon) in Frankfurt last week, on our work on enabling import of RDF datasets into SMW. I have presented at SMWCon before as well (2011: blog, slides, video, 2013: slides), so it was nice to re-connect with som ...
Tutorial: Luigi for Scientific Workflows
Jun 21, 2016: This is a Luigi tutorial I held at the e-Infrastructures for Massively parallel sequencing workshop (Video archive) at SciLifeLab Uppsala in January 2015, moved here for future reference.
What is Luigi?
Luigi is a batch workflow system written in Python and developed by ...
Combining the best of Go, D and Rust?
Jun 11, 2016: Don't take this post too serious, but I can't help entertaining the thought.
I have for years been looking for a replacement for Python and Java for developments of various data processing tools in bioinformatics / cheminformatics, which happens to be my field of study. That i ...
Time-boxing and a unified trello board = productivity
Feb 26, 2016:
Figure: Sketchy screenshot of how my current board looks. Notice especially the "Now" stack, marked in yellow, where you are only allowed to put one single card.
I used to have a very hard time getting an overview of my current work, and prioritizing and concentrating on a ...
The unexpected convenience of JSON on the commandline
Dec 8, 2015: I was working with a migration from drupal to processwire CMS:es, where I wanted to be able to pipe data, including the body field with HTML formatting and all, through multiple processing steps in a flexible manner. I'd start with an extraction SQL query, through a few component ...
Wanted: Dynamic workflow scheduling
Oct 26, 2015:
Photo credits: Matthew Smith / Unsplash
In our work on automating machine learning computations in cheminformatics with scientific workflow tools, I have came to realize something; Dynamic scheduling in scientific workflow tools is important and sometimes badly needed.
How to be productive in vim in 30 minutes
Sep 15, 2015:
I had heard a lot of people say vim is hard, very hard. They said it is good, and you will benefit from using it, but that it will take a great investment to switch to using it.
While I have came to understand that they are right in that there is a lot of things to invest ...
How to compile tmux 2.0 on RHEL6 / SL6 to get zoomable panes
Aug 20, 2015: I needed to do this, to get the "zoomable", or "maximizable" panes feature of tmux 1.8+ on UPPMAX, which has only tmux 2.6, so here follows the steps I took:
# Download and unpack libevent (needed by tmux)
wget https://sourceforge.net/projects/levent/files/li ...
How to compile vim for use with pyenv and vim-pyenv
Aug 20, 2015: I use pyenv for managing custom python installations on the UPPMAX HPC cluster, and since I use vim with jedi-vim for auto-completion in python, I tried to get vim-pyenv to work.
It turned out that the system version of VIM on UPPMAX was outdated though (7.2 rather than the requ ...
How I would like to write Go programs
Jul 18, 2015:
Some time ago I got a post published on GopherAcademy, outlining in detail how I think a flow-based programming inspired syntax can strongly help to create clearer, easier-to-maintain, and more declarative Go programs.
These ideas have since became clearer, and we (Ola Spj ...
Terminator as a middle-way between floating and tiling window managers
Jul 17, 2015: I have tried hard to improve my linux desktop productivity by learning to do as much as possible using keyboard shortcuts, aliases for terminal commands etc etc (I even produced an online course on linux commandline productivity).
In this spirit, I naturally tried out a so calle ...
A few thoughts on organizing computational (biology) projects
Jun 23, 2015:
I read this excellent article with practical recommendations on how to organize a computational project, in terms of directory structure.
Directory structure matters
The importance of a good directory structure seems to often be overlooked in teaching about computationa ...
A cheatsheet for the iRODS rule language
Jun 11, 2015: iRODS, the "integrated rule oriented data system" is a super cool system for managing datasets consisting of files, from smallish ones, to really large ones counted in petabytes, and possibly spanning multiple continents.
There's a lot to be said about iRODS (up for another blog ...
Patterns for composable concurrent pipelines in Go
Jun 1, 2015: I realize I didn't have a link to my blog on Gopher Academy, on patterns for compoasable concurrent pipelines in Go(lang), so here it goes:
The role of simplicity in testing and automation
Mar 23, 2015: Disclaimer: Don't take this too seriously ... this is "thinking-in-progress" :)
It just struck me the other minute, how simplicity is the key theme behind two very important areas in software development, that I've been dabbling with quite a bit recently: Testing, and automation ...
The problem with make for scientific workflows
Mar 14, 2015:
The workflow problem solved once and for all in 1979?
As soon as the topic of scientific workflows is brought up, there are always a few make fans fervently insisting that the problem of workflows is solved once and for all with GNU make, written first in the 70's :)
Dynamic Navigation for Higher Performance
Mar 11, 2015: Improving performance in Delphi Bold MDA applications by replacing navigation code with derived links in the model
Guest Post on Model Driven Architecture in Delphi and Bold, by Rolf Lampa
Modeling class structures takes some thinking, and when done the thinking and the dra ...
Random links from the Hadoop NGS Workshop
Feb 19, 2015: Some random links from the Hadoop for Next-Gen Sequencing workshop held at KTH in Kista, Stockholm in February 2015
UPDATE: Slides and Videos now available!
By Big Data Genomics
Tweet by Frank Nothaft on common workflow def
Part of G ...
Links: Our experiences using Spotify's Luigi for Bioinformatics Workflows
Feb 12, 2015:
Fig 1: A screenshot of Luigi's web UI, of a real-world (although rather simple) workflow implemented in Luigi:
Update May 5, 2016: Most of the below material is more or less outdated. Our latest work has resulted in the SciLuigi helper library, which we have used in production ...
NGS Bioinformatics Intro Course Day 2
Feb 10, 2015: Today was the second day of the introductory course in NGS bioinformatics that I'm taking as part of my PhD studies.
For me it started with a substantial oversleep, probably due to a combination of an annoying cold and the ~2 hour commute from south Stockholm to Uppsala and BM ...
Taking a one week introductory course in Bioinformatics for NGS data
Feb 9, 2015: This week, as part of my PhD studies in Pharmaceutical Bioinformatics, I will be taking the course "Introduction to Bioinformatics using NGS data" at Science for Life Laboratory here in Sweden.
I will try to blog here a little every day about what I learned.
Right now I'm sitti ...
NGS Bioinformatics Intro Course Day 1
Feb 9, 2015: Just finished day 1 of the introductory course on Bioinformatics for Next generation sequencing data at Scilifelab Uppsala. Attaching a photo from one of the hands-on tutorial sessions, with the tutorial leaders, standing to the right.
Today's content was mostly introductions ...
Jan 13, 2015: Update Sep 7, 2017: New Virtual Machine
A new virtual machine was just created, with the latest RDFIO 3.0.2, installed on MediaWiki 1.29 and Semantic MediaWiki 2.5.
Download it from figshare here (DOI:10.6084/m9.figshare.5383966)
The new VM is based on the new RDFIO va ...