Case: Breast cancer gene expression
Breast cancer is the most common cancer in women, with 1.5 million new cases and 450,000 mortalities each year worldwide. Like all cancers, breast cancer cells acquire DNA mutations that transform normal cells into malignant ones that proliferate rapidly and invade tissues.
One major effect of these mutations is profound changes in gene expression patterns, i.e., the control of which genes are instantiated into functional proteins that establish the cancer phenotype. Gene expression can be measured at genome-wide level with DNA microarray and deep sequencing technologies.
In this demonstration, we follow the process of investigating breast cancer gene expression patterns using DNA microarrays. We use publicly available data from The Cancer Genome Atlas. In real projects, customer’s private data is used.
Step 1: Experimental design
Purpose: Make decisions on which experiments to perform and which hypotheses to evaluate.
Results: We compare mRNA extracted from breast cancer patients against mRNA from normal breast tissue. This is a simple two-group experiment.
|Sample type||Number of biological replicates|
|Breast cancer patient tissue||20|
|Normal breast tissue||10|
Step 2: Wet lab experiments
Purpose: Conduct laboratory experiments and generate raw microarray data for subsequent analysis. This step is performed by customers and microarray service providers.
Results: RNA is extracted from sample material and hybridized on microarrays.
Step 3: Preprocessing and normalization of microarray data
Purpose: Make numeric values from distinct samples comparable to each other. Ensure proper technical quality of raw data.
Results: All microarrays are considered as good quality technically. Normalized data are used in the rest of the analysis.
Step 4: Primary analysis: differentially expressed genes
Purpose: Determine which genes have different expression levels between breast cancer and normal tissue.
Results: Differential gene expression is determined by statistical techniques, yielding 680 result genes.
Among the top result genes are collagens and matrix metallopeptidases, which shape the extracellular matrix and are associated with tumor invasion. Other prominent genes include cyclins and cell division cycle genes that accelerate the growth of cells.
- Download: Differentially expressed genes (Excel)
Step 5: Functional annotation: Gene Ontology
Purpose: To aid in the interpretation of results from the primary analysis by integrating the list of genes into Gene Ontology (GO), a curated database of gene and protein functions.
Results: GO enrichment analysis shows that genes coding for extracellular matrix components are overrepresented in our results. Overrepresented biological processes include cell proliferation, inflammation and blood vessel development — all known hallmarks of cancer.
Step 6: Interpretation of results and iteration
Purpose: Make conclusions based on the data. If needed, plan additional bioinformatics steps to evaluate new hypotheses or clarify existing ones. Plan further wet lab experiments to validate or deepen conclusions.
Results: Microarray results are in line with existing knowledge on the biology of cancer. A key challenge in breast cancer oncology is predicting which tumors will metastasize: here, genes related to the extracellular matrix are potential leads.
Follow-up bioinformatics studies may include characterization of dysregulated signaling pathways such as the growth-associated MAP kinase pathway. Further wet lab experiments may include validation of selected genes on protein level.