The motivation for this post was a recent twitter exchange discussing difficulties faced when analyzing microbiome data with missing covariate values (i.e., incomplete sample metadata fields). I have run into this issue as well, and given the increasing size of microbiome studies, expect so to have many others.
Inherent limitations with one-at-a-time (OaaT) feature testing (i.e., single feature differential abundance analysis) have contributed to the increasing popularity of mixture models for correlating microbial features with factors of interest (i.
One of the most common questions we get from investigators at the Microbial Metagenomics Analysis Center (MMAC) is how many samples should I collect for my study? Even once we have a clearly stated and testable hypothesis, this is not always easy, since sample size calculations for microbiome studies are typically not amenable to closed form solutions (i.
Microbiome studies often seek to identify individual features (i.e., OTUs/ASVs, species, pathways, etc.) associated some condition (i.e., exposure, experimental treatment, etc.) of interest. This problem can be approached in many different ways, but most commonly, one-at-a-time (OaaT) feature screening is undertaken.
A common goal in many microbiome studies is to identify features (i.e., species, OTUs, gene families, etc.) that differ according to some study condition of interest. While often done, this is a difficult task, and in the Introduction to the Statistical Analysis of Microbiome Data in R post I touch on some of the reasons for this.
Below I provide scripts to implement the current default workflow for taxonomic and functional profiling using the Huttenhower Lab’s Biobakery Tool Suite used by the Microbial Metagenomics Analysis Center (MMAC) at CCHMC for paired-end data.
Below I provide scripts to implement several workflows for denoising 16s rRNA gene sequences used by the Microbial Metagenomics Analysis Center (MMAC) at CCHMC for paired-end data. These scripts are written to run on the CCHMC high-performance computing (HPC) cluster.
This is a link to a talk I will be giving to the Cincinnati Children’s Hospital Medical Center R Users Group on November 6th, 2019. The goal of the talk is to introduce members to some of the functionality provided by Frank Harrell’s Hmisc and rms …
This is post is to introduce members of the Cincinnati Children’s Hospital Medical Center R Users Group (CCHMC-RUG) to some of the functionality provided by Frank Harrell’s Hmisc and rms packages for data description and predictive modeling.
During the Introduction to Metagenomics Summer Workshop we discussed denoising amplicon sequence variants and worked through Ben Callahan’s DADA2 tutorial. During that session, I mentioned several other approaches and algorithms for denoising or clustering amplicon sequence data including UNOISE3, DeBlur and Mothur.