The bioinformatics team supports the activities and services performed by the PGx Centre. It manages complete production analysis pipelines, reporting databases, development process tracking, and web tools. Established bioinformatics pipelines include whole genome and whole-exome sequencing, sequencing-based and genotyping-based copy number variants (CVN) analysis and genotyping and genome-wide association study (GWAS) pipelines.

Storage is maintained on an IBM SAN for user directories and genetic data is stored on a multi-node EMC Isilon cluster of 76 TB of useable disk storage. The Centre has a controlled data backup system with duplication and off-site data storage. The analysis processes function on separate system including a 160-CPU cluster with Grid Engine as well as on the high performance computing systems from Calcul Quebec / Compute Canada.

The information management system is regulated by a structured security plan. All information procedures are documented, there is an environment separation between production and development activities, there are strict security mechanisms and a reliable back-up plan.

The bioinformatics team also supports the GLP and FDA 21 CFR Part 11 compliant Laboratory Information Management System (LIMS) used by the laboratory to support the collection, management, and tracking of samples. The LIMS includes quality controls, validation systems and full audit trail functionality.


Statistical Genetics

The bioinformatics team is engaged in various efforts to streamline statistical analyses and reporting of high throughput genotyping data. Python tools have been developed to enable automation of processes including genetic data cleanup scripts, statistical analysis plan creation and automated statistical analysis report generation. The non-exhaustive list of such tools created by the team can be found on StatGen’s web site.

Next Generation Sequencing Analysis

The analysis of high-throughput DNA sequencing data is a major challenge in today’s computational biology. The bioinformatics team strives to deliver next generation sequencing data of the highest quality. We have recently completed a validation and quality project comparing a number of available tools used in the analysis of sequence data. 625 pipelines were compared composed of combinations of aligners (BWA-backtrack, BWA-MEM, Bowtie2, Novoalign with 2 settings), post-alignment processing methods (GATK and SRMA for realignment, GATK for recalibration, and Picard for marking duplicates), variant-callers for SNVs and INDELs (GATK HaplotypeCaller, GATK UnifiedGenotyper, SAMtools) and indel callers (Dindel, Pindel). These pipelines were assessed for accuracy against the NIST reference sample. These results have provided us with benchmark references to implement an optimal pipeline strategy for NGS projects at the Pharmacogenomics Centre. We have developed a Rufus-based analysis pipeline in Python that is flexible, easy to configure, and adapted to our distributed computing environment and those of Calcul Quebec’s servers.  The new pipeline has been in operation since June 2014 and is continually updated with new versions of tools and new reporting capabilities.

Software Development

Our software development team is dedicated to creating quality applications that adhere to a high degree of security and validation standards. The team has built web applications using the most modern back-end techniques such as Scala.

NGS-Based Clinical Decision System

The team has entered a collaborative effort towards the creation of a data warehouse and a clinical decision system for epilepsy. We are developing the front-end web application where, after uploading a VCF file and providing clinical information, the user can consult the list of epilepsy-related variants found in the dataset together with their predicted clinical effect. Supporting information for the predicted deleterious effects follows a parameterizable decisional algorithm for every candidate genetic variant. Currently in a prototype version, when completed, this web application will follow best practices with regards to security and quality as expected in clinical systems. The portal is developed as a Single Page Application in Backbone + Marionette for the client interface and with a REST server developed in Scala using Spray.

Laboratory Information Management Syste

The software development team also supports the LIMS of the GLP laboratory activities. The team has developed an integrated reporting system integrated into the laboratory’s intranet allowing for live report and status updates. The team is presently in the process of migrating systems from Ocimum’s Biotracker LIMS to Sapio’s Exemplar LIMS. As part of this effort, the team is implementing a large suite of custom workflows including supporting plug-ins developed in Java.

PGx Intranet (My.PGx)

The software development team has created the My.PGx. intranet for the Pharmacogenomics Centre. It includes an LDAP-backed user management list, a vacation / absence calendar, a CPU performance, memory usage and disk availability monitoring system, public FTP account management, access to LIMS reports, and a conference room reservation tool for external users. My.PGx was developed in Scala using Lift.

Integrated Sequenom Pipeline

The bioinformatics and software development teams are developing a complete analysis pipeline for the analysis of MassArray data from the Sequenom system. The integrated pipeline includes a quality control application to assess concordance between calls performed by two users, it allows to make decisions based on assay controls and to integrate redos into final genotype reports. The pipeline is integrated with a data warehouse (see next section). A haplotyping and phenotyping utility is presently being developed and results are reported within a flexible, secured and automated reporting tool.

Genetic Data Warehouse

The programming team is developing a comprehensive solution for storing, managing and interrogating genetic data generated by the many platforms it operates (iScan, Sequenom, HiSeq, MiSeq), augmented by the available clinical data / meta-data.