Killifish Hypoxia RRBS Part 17

Removing L4 samples

Given the weird coverage patterns in L4 samples, Neel suggested I run through the workflow without these samples to see if I get different results. If I am able to identify more DMR without these samples (or even if I get different results generally), I will consider excluding them as outliers based on mapping and filtering differences. First, I ran the code with 500 maximum reads per locus, which are the current settings I used to identify DMR with the L4 samples. I also used 100 maximum reads per locus, which were the first settings I tried before experimenting with different filtering options.


I removed these samples from my BAT_summarize code: 5-S1, 20-N1, 20-S2, OC-S5. Then, I ran BAT_summarize and BAT_overview. All of the new BAT_summarize output is here, and the BAT_overview output is here. I didn’t see a boost in methylation rate after removing those samples when looking through the BAT_overview output.


Table 1. Number of DMR per contrast using different settings

Contrast Group 1 Group 2 MDP_max = 500, L4 included MDP_max = 500, L4 excluded *MDP_max = 100, L4 excluded
All samples N S 0 0 0
OC N S 1 1 0
20 N S 1 1 0
5 N S 2 2 0
N 20 5 16 2 0
N 20 OC 2 0 0
S 20 5 1 1 0
S 20 OC 1 1 0

Looks like reverting back to 100 reads maximum per locus doesn’t yield any DMR, even when removing L4 samples. I’m now unsure if I should proceed with removing L4 samples or not, but I’ll ask Neel. But, I will see if any of the DMR I identified without L4 samples are the same as those with L4 samples, because that would lend more importance for those DMR.

Going forward

  1. Annotate DMR locations
  2. Revise methylation landscape information
  3. Update methods and results
  4. Match DMR with RNA-Seq information
  5. Start mapping with new genome
  6. Try DMR identification with bismark and methylKit
  7. Create OSF repository for all intermediate files
Written on May 31, 2022