Bioinformatics and Informatics

The bioinformatics team supports the activities and services performed by the PGx Centre. It manages complete production analysis pipelines, reporting databases, development process tracking, and web tools. Established bioinformatics pipelines include whole genome and whole-exome sequencing, sequencing-based and genotyping-based copy number variants (CVN) analysis and genotyping and genome-wide association study (GWAS) pipelines.

Storage is maintained on an IBM SAN for user directories and genetic data is stored on a multi-node EMC Isilon cluster of 76 TB of useable disk storage. The Centre has a controlled data backup system with duplication and off-site data storage. The analysis processes function on separate system including a 160-CPU cluster with Grid Engine as well as on the high performance computing systems from Calcul Quebec / Compute Canada.

The information management system is regulated by a structured security plan. All information procedures are documented, there is an environment separation between production and development activities, there are strict security mechanisms and a reliable back-up plan.

The bioinformatics team also supports the GLP and FDA 21 CFR Part 11 compliant Laboratory Information Management System (LIMS) used by the laboratory to support the collection, management, and tracking of samples. The LIMS includes quality controls, validation systems and full audit trail functionality.

Statistical Genetics

The bioinformatics team is engaged in various efforts to streamline statistical analyses and reporting of high throughput genotyping data. Python tools have been developed to enable automation of processes including genetic data cleanup scripts, statistical analysis plan creation and automated statistical analysis report generation. The non-exhaustive list of such tools created by the team can be found on GitHub and StatGen’s web site.

Next Generation Sequencing Analysis

The analysis of high-throughput DNA sequencing data is a major challenge in today’s computational biology. The bioinformatics team strives to deliver next generation sequencing data of the highest quality. We rely on benchmarking and validation procedures to implement the best pipeline strategies for our NGS projects. We have developed a Rufus-based analysis pipeline in Python that is flexible, easy to configure, and adapted to our distributed computing environment and those of Calcul Quebec’s servers. The new pipeline has been in operation since June 2014 and is continually updated with new versions of tools and new reporting capabilities.

Software Development

Our software development team is dedicated to creating quality applications that adhere to a high degree of security and validation standards. The team has built web applications using the most modern back-end techniques including Scala.

Laboratory Information Management System

The software development team supports the LIMS of the GLP laboratory activities. The team has developed an integrated reporting system integrated into the laboratory’s intranet allowing for live report and status updates. Activities rely on the Sapio’s Exemplar LIMS, supported by an extensive suite of custom workflows and Java plug-ins.

Integrated Agena Pipeline

We have developed an integrated analysis pipeline for the MassArray Agena system including quality control processes, decision support tools, haplotyping and phenotyping utilities for pharmacogenomic variants, and an automated reporting tool.