West Coast Green Crab Experiment Part 91

Modifying the transcriptome assembly

I ran my action plan by Carolyn. She didn’t provide any comments on which samples to input, and said my plan of running the assembly with the correct library type was a good first step! I started a script for 15 days to run the transcriptome assembly and collapse the isoforms into supertranscripts.

# DE NOVO TRANSCRIPTOME ASSEMBLY

echo "Start de novo transcriptome assembly"

# Run Trinity to assemble de novo transcriptome. Using primarily default parameters.
${TRINITY}/Trinity \
--seqType fq \
--max_memory 100G \
--samples_file ${trinity_file_list} \
--SS_lib_type FR \
--min_contig_length 200 \
--full_cleanup \
--CPU 28

# Move transcriptome to the correct location
mv trinity_out_dir.Trinity.fasta ${OUTPUT_DIR}/trinity_out_dir/Trinity.fasta

# Collapse isoforms into supertranscripts.
# Output files are trinity_genes.fasta (supertranscripts in fasta format), trinity_genes.gtf (transcript structure annotation in gtf format), and trinity_genes.malign (multiple alignment view that contrasts the different candidate splicing isoforms)
${TRINITY}/Analysis/SuperTranscripts/Trinity_gene_splice_modeler.py \
--trinity_fasta ${OUTPUT_DIR}/trinity_out_dir/Trinity.fasta \
--incl_malign

# Move output files to a new folder
mkdir supertranscript_output
rsync --archive --progress --verbose trinity_genes.* supertranscript_output/.

echo "Completed de novo transcriptome assembly"

The job was pending due to resources, but hopefully it starts sooner rather than later!

Going forward

  1. Tweak transcriptome assembly parameters to reduce the number of assembly artifacts and total supertranscripts
  2. Annotate transcriptome with EnTAP
  3. Remove contaminant sequences identified by EnTAP
  4. Create count matrix for clean transcriptome
  5. Calculate Ex50 and N50 statistics for clean transcriptome
  6. Repeat analysis with clean transcriptome and fuller annotations in edgeR
  7. Identify temperature- and genotype-specific differentially expressed genes at the end of the experiment
  8. Identify genes influenced by both temperature and time
  9. Determine methods for functional analysis
  10. Additional strand-specific analysis in the supergene region
  11. Examine HOBO data from 2023 experiment
  12. Demographic data analysis for 2023 paper
Written on June 30, 2026