Model selection, part 2

Name: Model selection, part 2
Uploaded: 2017-01-31T19:05:43.000Z
Duration: 34 min 21 s

AIC: Understanding the Akaike Information Criterion

Introduction to AIC

The Akaike Information Criterion (AIC) is introduced as a method based on information theory, differing from classical statistics that rely on p-values.

AIC aligns more closely with Bayesian statistics, focusing on modal probabilities in data analysis.

Probabilistic Models and Their Importance

Probabilistic models allow for the computation of probabilities for various outcomes; for example, determining the likelihood of getting a certain number of hits when tossing a coin multiple times.

In phylogenetics, having a probabilistic model enables the calculation of probabilities for different alignments based on specific parameters.

Kullback-Leibler Divergence

The Kullback-Leibler divergence measures the distance between probability distributions, providing insight into how well one distribution approximates another.

For discrete probability distributions, this measure involves comparing probabilities across all possible values in a dataset.

Model Selection Using AIC

To find an effective model that approximates reality, one should minimize the Kullback-Leibler divergence from the true probability distribution.

A visual representation illustrates three models attempting to approximate reality; among them, Q2 has the smallest divergence and is thus preferred.

Calculating AIC

AIC estimates expected relative Kullback-Leibler distances between models and reality. It cannot provide absolute distances due to unknown true realities but offers comparative insights.

The formula for calculating AIC is straightforward: it combines the likelihood of a fitted model with its number of free parameters. Smaller AIC values indicate better models.

Practical Application of AIC

When applying AIC in practice, one fits multiple alternative models to data without requiring nested structures or limits on quantity.

Understanding AIC and Model Selection

Introduction to AIC

The Akaike Information Criterion (AIC) is calculated as the log-likelihood plus two times the number of free parameters, allowing for model comparison based on AIC values.

The smallest AIC values indicate the best models; in this case, the CVM + I + G model was identified as superior among those tested.

Delta AIC and Model Probabilities

To enhance model selection, one can compute Delta AIC values by subtracting the minimal AIC value from each model's AIC value.

For example, if the minimal AIC is at the top of a table, its Delta AIC will be 0; subsequent models will have positive Delta values reflecting their relative fit.

Calculating Akaike Weights

After computing Delta AIC values, Akaike weights are derived using an exponential function applied to -0.5 times each Delta value.

These weights represent probabilities that any given model is the best one based on available data; for instance, a 45% chance for one model being optimal.

Bayesian Connection and Scientific Inquiry

This probabilistic approach aligns with Bayesian inference principles where uncertainty quantification is crucial in scientific reasoning.

Using probabilities allows researchers to assess multiple hypotheses simultaneously rather than relying solely on null hypothesis testing.

Practical Applications of Model Selection

Constructing a comprehensive set of plausible alternative models enables effective evidence assessment through computed model probabilities.

This method differs significantly from traditional null hypothesis testing by evaluating various plausible models instead of just one.

Multi-modal Inference and Parameter Importance

Making Robust Predictions

Multi-modal inference allows predictions to be made more robustly by averaging predictions across different models weighted by their respective probabilities.

Estimating Parameters Across Models

When parameters appear in multiple investigated models (e.g., gamma shape parameter), averaging these estimates enhances reliability using model probabilities as weights.

Assessing Parameter Importance

By summing up probabilities from models containing specific parameters (like transitions), researchers can determine their relative importance within a system under study.

Case Study: Comparing Evolutionary Models

Hypotheses Overview

Considering two hypotheses regarding sequence evolution:

Jukes-Cantor model with uniform substitution rates (one free parameter).

Kimura two-parameter model with distinct rates for transitions and transversions (two free parameters).

Model Comparison and AIC Calculation

Log Likelihood Values of Models

The Jukes-Cantor model has a log likelihood of -2034.3, indicating that probabilities are always between 0 and 1, resulting in negative logarithm values.

The Kimura 2-parameter (K2P) model shows a slightly larger log likelihood of -2026.2, suggesting it fits the data better than the Jukes-Cantor model.

AIC Calculation for Model Assessment

To assess models, we compute the Akaike Information Criterion (AIC), using the formula: AIC = -2 * log likelihood + 2 * number of parameters.

For Jukes-Cantor, AIC is calculated as -2 * (-2034.3) + 2 * 1 = 4050.6; for K2P, it is -2 * (-2026.2) + 2 * 2 = 4056.4.

Delta AIC and Model Probabilities

Delta AIC is computed by subtracting the smallest AIC from each model's AIC; K2P has a Delta value of 0 while Jukes-Cantor has a Delta value of approximately 14.

The exponential function of -0.5 times Delta AIC gives probabilities: for Jukes-Cantor it's approximately 0.0000825; for K2P it's e^0 = 1.

Final Model Probability Calculations

Total probability sums to about 1.0000825; dividing individual probabilities by this sum yields results: Jukes-Cantor at ~0.08% and K2P at ~99.92%.