West Coast Green Crab Experiment Part 92

Modifying the transcriptome assembly

I ran my action plan by Carolyn. She didn’t provide any comments on which samples to input, and said my plan of running the assembly with the correct library type was a good first step! I started a script for 15 days to run the transcriptome assembly and collapse the isoforms into supertranscripts.

# DE NOVO TRANSCRIPTOME ASSEMBLY

echo "Start de novo transcriptome assembly"

# Run Trinity to assemble de novo transcriptome. Using primarily default parameters.
${TRINITY}/Trinity \
--seqType fq \
--max_memory 100G \
--samples_file ${trinity_file_list} \
--SS_lib_type FR \
--min_contig_length 200 \
--full_cleanup \
--CPU 28

# Move transcriptome to the correct location
mv trinity_out_dir.Trinity.fasta ${OUTPUT_DIR}/trinity_out_dir/Trinity.fasta

# Collapse isoforms into supertranscripts.
# Output files are trinity_genes.fasta (supertranscripts in fasta format), trinity_genes.gtf (transcript structure annotation in gtf format), and trinity_genes.malign (multiple alignment view that contrasts the different candidate splicing isoforms)
${TRINITY}/Analysis/SuperTranscripts/Trinity_gene_splice_modeler.py \
--trinity_fasta ${OUTPUT_DIR}/trinity_out_dir/Trinity.fasta \
--incl_malign

# Move output files to a new folder
mkdir supertranscript_output
rsync --archive --progress --verbose trinity_genes.* supertranscript_output/.

echo "Completed de novo transcriptome assembly"

The job was pending due to resources, but hopefully it starts sooner rather than later!

Going forward

Tweak transcriptome assembly parameters to reduce the number of assembly artifacts and total supertranscripts
Annotate transcriptome with EnTAP
Remove contaminant sequences identified by EnTAP
Create count matrix for clean transcriptome
Calculate Ex50 and N50 statistics for clean transcriptome
Repeat analysis with clean transcriptome and fuller annotations in edgeR
Identify temperature- and genotype-specific differentially expressed genes at the end of the experiment
Identify genes influenced by both temperature and time
Determine methods for functional analysis
Additional strand-specific analysis in the supergene region
Examine HOBO data from 2023 experiment
Demographic data analysis for 2023 paper

Written on June 30, 2026