COMBAT-TB: Computational Bacterial Analytical ToolKit for Tuberculosis


Functional genomics is a branch of genomics that determines the biological functions of
genes using large volumes of data obtained through high-throughput techniques categorized
under the umbrella of transcriptomics, proteomics and genomics. Collectively the analyses
underpinned by these ‘omics approaches converge on a systems biology rationale to
understand the functions of an organism or in the case of a disease, the mechanisms
underlying a disease within an organism. The successful exploitation of this diverse biological
data relies on the interactions of researchers in mathematical, computational, statistical and
biomedical sciences.

The use of ‘omics technologies is increasing in South Africa. For example, in the context of
tuberculosis research, South African researchers have sequenced hundreds of mycobacterial
genomes. Yet, the South African competitive edge can only be realized with the development
of computational algorithms to rapidly synthesize this rich genetic resource and be accessible
as a user-friendly interface for a biomedical researcher. The overwhelming response among
South African biomedical researchers during the past two years for next generation
sequencing analysis training and support, underscores the need for computational skills or
resource intervention. Unfortunately, there is not a critical mass of researchers able to use
bioinformatics software in the absence of a graphical user interface or internet browser. There
remains a need to train biologists with similar skills to meet the demands of a post-genomic
era in South Africa and on the African continent in a resource-limited setting. Our ability to
rapidly analyze the unique datasets in South Africa will shape our technology innovation
strategy. The dilemma of a lack of human capital resources to carry out large-scale
computational biomedical analyses has been echoed in the recently established human,
heredity and health ( genetics programme, funded through the National
Institutes of Health where researchers on the African continent recognize the dichotomy of
making Africa’s unique genetic resources available to the international community and at the
same time have the opportunity to answer specific research questions governing our local

COMBAT-TB sets out to address two needs that face researchers around the world but
specifically impacts research development on the African continent, namely; (i) access to
computational tools for rapid deployment in a resource-limiting laboratory; and (ii) access to
an integrated environment that will allow researchers to interrogate their in-house data as well
as interpret their data in the context of data available in public repositories. These two needs
might appear to be disconnected but in reality feed off each other. For example, data
repositories or archives provide a space for researchers to store their data in a predefined
format. Yet other resources attempt to connect these independent resources in an apparent
integrated platform to glean insights from overlapping data. At the same time the upstream
steps or protocols that generate all the data that feed these repositories, are not developed in
a way that provides a communication layer to the data repositories. Conversely, the data
repositories do not necessarily model their storage design based on the range of data types
that are being generated using high-throughput technologies.

To this end, the current proposal has brought together a multi-disciplinary team around the
theme of tuberculosis research with three overarching goals, namely:
1. Development of a state of the art configurable scientific workflow management
system with reproducible biomedical workflows

2. Generation of transcriptome and enzyme assays to model tuberculosis infection

3. Development of an integrated knowledge management system that allows users to
interrogate data across domains in the context of tuberculosis research