DML Analysis Part 6
Blasting for Uniprot codes (again)
Turns out I didn’t do my blastx
correctly.
Mistake 1: Not properly saving my blastx
output…
I don’t know how I didn’t realize I never added in a line of code to actually SAVE my blastx
output. How. What. Why.
Mistake 2: Blasting the entire genome against the Uniprot database, and not just the genes!
Again, how.
The solution is to re-blast! To help my own brain think clearly, I created a new folder for gene enrichment analyses. I also moved the database files and R Markdown file I created previously into this folder. Thankfully, I could use the same Uniprot database. I then downloaded the C. virginica transcrpt file.
Once all of my input files were assembled, it was time to blastx
. I looked through the blastx
help menu, but I couldn’t find the arguments I needed awash in the “oh shit I just did everything incorrectly” feeling. I posted this Github issue, and Sam helped identify the arguments I needed. My code looked like this:
- Path to blastx
- -query provides the file we want to blast 3.-db specifies database created in the previous step
- -outfmt specifies the type of output file. I will use 6, a tabular file
- -out
will allow me to save the output as a new file - -num_threads 4 uses 4 CPUs in the BLAST search
- -max_target_seqs 1 keeps only one aligned sequence (i.e. the best match)
/Users/Shared/Apps/ncbi-blast-2.2.29+/bin/blastx \
-query 2018-09-06-Virginica-transcripts.fna \
-db uniprot-filtered-reviewed.fasta \
-outfmt 6 \
-out 2018-09-06-Transcript-Uniprot-blastx.txt \
-num_threads 4 \
-max_target_seqs 1
Hopefully that will finish running sometime this weekend! Once I have the blastx
results, I can merge them with the overlap files and do some gene enrichment. I also need to send the transcript-Uniprot results to Mike Riffle in Genome Sciences so he can fix the background in the gene enrichment tool he built. Until that gets fixed, I can use DAVID to get some preliminary gene enrichment results for PCSGA. I can also review the methods in Emma’s geoduck paper so I understand the statistical analyses that accompany gene enrichment.