Problem 2.2

Recreate figure 2.1, in R and/or Python. This is, in essence, problem 2.2

Problem 2.3

For a single heterozygous individual, the probability that they become homozygous in the next generation is $\frac{1}{2}$. The number of generations until a success (becoming homozygous) is the mean of a geometric distribution, or $\frac{1}{p}$, where $p = \frac{1}{2}$.

$\frac{1}{\frac{1}{2}} = 2$, so we would expect it to take 2 generations on average.

Assigned Problem 3

Plot the equation shown above Problem 2.3 over time, starting from some initial value. Run a bunch of sims (using your machinery from #1 above). Plot the heterozygosity from each simulated replicate over time, the mean over all replicates, and make sure you are matching that theoretical curve!

Equation: $$ H_{t} = H_{0}(1-\frac{1}{2N})^t $$

Assigned Problem 4

Work through Figure 2.3. This is a classic example of a coupled forward/backwards argument, and a conceptual precursor to "coalescent thinking". Many students in my undergrad class find this one rough.

Answer 1:
The left figure represents the probability that two alleles are identical by descent. The second allele will come from the same individual with probability $\frac{1}{2N}$ where $N$ is the number of diploids in the population (not haploids).

The right figure represents the probability that two alleles are not identical by descent. This is the same as $1 - P(identical\space by\space descent)$ or $1 - \frac{1}{2N}$.

In order to get the total probability that two alleles chosen at random are identical by state ($G'$), you add the probability that they are identical by descent (because when you're identical by descent you're always identical by state) to the probability that they're not identical by state times the probability that the two alleles chosen are identical by state ($G$). (See equation 2.1.)

Answer 2:
I'm guessing this question should actually reference figure 2.4 (?), so I'll explain that one here too. Time flows from left (ancient) to right (recent). Each dot on the branch represents a mutation. Mutations before the common ancestor of all our samples (at $t_w$) are shared by all samples, and thus are either fixed or lost already. Mutations after $t_w$ are polymorphic because only a subset of samples can trace their ancestry back to a branch containing that mutation.

Assigned Problem 5

How would use use a simulation to verify Eqn 2.6? Provide an example of such a simulation. Hint: don't start p so small that pi(p) is very small.

Equation 2.6: $$ \pi(p) = p $$

In English, equation 2.6 says that the probability of fixation of a neutral allele is its current frequency.

The probability of fixation can be approximated by averaging the allele frequencies of all replicates for the final generation of the simulation. This may be slightly off due to alleles that are still polymorphic at the end of the simulation, but I've set the number of generations to be large enough that there should be very few of those.

Problem 2.10

$$ 4N \times 10^{-7} \times 20 = 0.1 $$$$ N = 12,500 $$

Because $N_e$ is based on the harmonic mean of historic population sizes, this is reasonable.