Ocean Metagenomics Assembly and Gene Prediction¶
In this tutorial, we will analyse a small dataset of oceanic microbial metagenomes.
Note
This tutorial uses the full Ocean Microbial Reference Gene Catalog presented in Structure and function of the global ocean microbiome Sunagawa, Coelho, Chaffron, et al., Science, 2015
- Download the toy dataset
First download all the tutorial data:
ngless --download-demo ocean-short
We are reusing the same dataset as in the Ocean profiling tutorial. It may be a good idea to read steps 1-4 of that tutorial before starting this one.
- Preliminary imports
To run ngless, we need write a script. We start with a few imports:
ngless "1.4"
- Preprocessing
First, we want to trim the reads based on quality:
sample = 'SAMEA2621155.sampled'
input = load_mocat_sample(sample)
input = preprocess(input, keep_singles=False) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
- Assembly and gene prediction
This is now very simply two calls to the function assemble and orf_find:
contigs = assemble(input)
write(contigs, ofile='contigs.fna')
orfs = orf_find(contigs)
write(contigs, ofile='orfs.fna')
Full script¶
ngless "1.4"
sample = 'SAMEA2621155.sampled'
input = load_mocat_sample(sample)
input = preprocess(input, keep_singles=False) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
contigs = assemble(input)
write(contigs, ofile='contigs.fna')
orfs = orf_find(contigs)
write(contigs, ofile='orfs.fna')