DML Analysis Part 6
Blasting for Uniprot codes (again)
Turns out I didn’t do my
Mistake 1: Not properly saving my
I don’t know how I didn’t realize I never added in a line of code to actually SAVE my
blastx output. How. What. Why.
Mistake 2: Blasting the entire genome against the Uniprot database, and not just the genes!
The solution is to re-blast! To help my own brain think clearly, I created a new folder for gene enrichment analyses. I also moved the database files and R Markdown file I created previously into this folder. Thankfully, I could use the same Uniprot database. I then downloaded the C. virginica transcrpt file.
Once all of my input files were assembled, it was time to
blastx. I looked through the
blastx help menu, but I couldn’t find the arguments I needed awash in the “oh shit I just did everything incorrectly” feeling. I posted this Github issue, and Sam helped identify the arguments I needed. My code looked like this:
- Path to blastx
- -query provides the file we want to blast 3.-db specifies database created in the previous step
- -outfmt specifies the type of output file. I will use 6, a tabular file
will allow me to save the output as a new file
- -num_threads 4 uses 4 CPUs in the BLAST search
- -max_target_seqs 1 keeps only one aligned sequence (i.e. the best match)
/Users/Shared/Apps/ncbi-blast-2.2.29+/bin/blastx \ -query 2018-09-06-Virginica-transcripts.fna \ -db uniprot-filtered-reviewed.fasta \ -outfmt 6 \ -out 2018-09-06-Transcript-Uniprot-blastx.txt \ -num_threads 4 \ -max_target_seqs 1
Hopefully that will finish running sometime this weekend! Once I have the
blastx results, I can merge them with the overlap files and do some gene enrichment. I also need to send the transcript-Uniprot results to Mike Riffle in Genome Sciences so he can fix the background in the gene enrichment tool he built. Until that gets fixed, I can use DAVID to get some preliminary gene enrichment results for PCSGA. I can also review the methods in Emma’s geoduck paper so I understand the statistical analyses that accompany gene enrichment.