Lecture 11: Regression Analysis (cont.)

Lecture 11: Regression Analysis (cont.)

Regression Modeling: Theory and Application

Recap of Linear Regression Theory

  • Peter Kempthorne introduces the session focused on regression modeling, emphasizing the importance of understanding both theory and application.
  • Discusses properties of beta hat in a normal linear regression model, highlighting its distribution as a multinormal random variable with specific mean and covariance matrix characteristics.
  • Explains that epsilon hat is also a multinormal distribution, which allows for independent random variables under the normal model assumption.

Estimating Error Variance

  • Describes how to estimate error variance from residual vectors, leading to an unbiased estimate by dividing sum of squared residuals by n minus p.
  • Introduces t-statistics for least squares estimates, detailing the formula involving beta hat and sigma hat Cjj, where Cjj represents diagonal entries of the inverse variance.

Hypothesis Testing in Regression

  • Highlights that t-statistics follow a t-distribution with n minus p degrees of freedom when testing hypotheses about beta j being equal to 0.
  • Discusses constructing confidence intervals around 0 for hypothesis testing and implications for excluding factors from regression models.

F-tests in Regression Analysis

  • Explains how to test multiple beta j's simultaneously by comparing residual sums of squares under different hypotheses.
  • Notes that if k equals p minus 1, this corresponds to testing whether the last coefficient is zero; relates this back to t-squared statistics being equivalent to F-tests.

Generalized Least Squares (GLS)

  • Introduces generalized least squares as a method addressing non-diagonal error structures in regression models.
  • Emphasizes transforming models using inverse square root matrices to satisfy Gauss-Markov assumptions for optimal estimation.

Maximum Likelihood Estimation (MLE)

  • Concludes with an introduction to maximum likelihood estimation within normal linear regression frameworks, focusing on computing data density given explanatory variables.

Maximum Likelihood Estimation in Regression

Understanding Maximum Likelihood Estimation (MLE)

  • MLE identifies parameter values that maximize the probability of observing the given data, making it an optimal estimator with the smallest variance in large samples.
  • As sample size increases, MLE becomes the best estimator across different distributions; this will be demonstrated using Gaussian distribution.
  • In normal linear regression models, MLE can be derived by minimizing a least squares criterion Q(beta) , which leads to estimating beta coefficients effectively.
  • The maximum likelihood estimate for error variance is calculated as the sum of squared residuals divided by n, although this results in a biased estimate.
  • Generalized M-estimators extend beyond least squares by minimizing alternative Q functions, allowing for more robust estimations.

Robust Estimators and Alternatives

  • Robust estimators consider variations of least squares and mean absolute deviations, providing alternatives when error distributions are unknown or contaminated.
  • If the density of errors is known, MLE suggests using that specific distribution; otherwise, robust methods serve as useful alternatives.
  • Generalized M-estimators include quantile estimators that scale mean absolute deviations based on positive and negative residual values to provide estimates like the 90th percentile outcomes.

Calculating Mean Absolute Deviation Estimates

  • To calculate mean absolute deviation estimates or quantile estimators, one must minimize convex h functions associated with residual values.
  • The median serves as the best estimate for center data when considering mean absolute deviations due to its robustness against outliers.

Ridge Regression: An Extension of Linear Models

  • Ridge regression incorporates a penalty term into least squares estimation to address issues such as multicollinearity among predictors.
  • This method adds a penalty based on the squared length of the beta vector to improve model stability and performance. Standardizing independent variables enhances effectiveness.

Ridge Regression and Its Connections to Bayesian Models

Ridge Regression Parameter Estimation

  • Ridge regression involves rescaling the predictor matrix columns, centering them with the inverse of their covariance matrix. This standardization ensures all predictor variables are on equal footing.
  • Standardizing eliminates dependence on original units of X, preventing any single beta coefficient from being disproportionately large due to differing units.

Connection to Bayesian Models

  • Ridge regression can be interpreted through a Bayesian lens by assuming a prior distribution for regression parameters that is multinormal with mean zero and a covariance matrix proportional to the identity matrix.
  • The ridge regression criterion corresponds to the exponential part of the log likelihood plus the log likelihood of the prior density, suggesting that all directions in p-dimensional space are equally likely for characterizing regression parameters.

Computation and Singular Value Decomposition

  • The minimization process in ridge regression leads to formulas resembling least squares but includes an additional factor (lambda times identity).
  • Using singular value decomposition (SVD), fitted values in ridge regression can be expressed as a sum of factors multiplied by coefficients, where lambda influences shrinkage based on singular values.

Shrinkage Effects in Ridge Regression

  • Ridge regression exhibits less shrinkage for larger squared singular values and more shrinkage for smaller ones, indicating that coefficients corresponding to principal component axes experience different levels of adjustment.
  • This relationship connects ridge regression with principal components analysis (PCA), highlighting how both methods utilize SVD for computation.

Principal Components Regression: Methodology and Insights

Principal Component Variables

  • In principal components regression, orthogonal principal component variables simplify coefficient calculations. Researchers may choose only the first m components for their models.
  • Fitted values from principal components regression represent projections onto spaces spanned by selected principal component directions, showcasing similarities across least squares, ridge, and PCA methodologies.

Extensions Beyond Traditional Methods

  • LASSO (Least Absolute Shrinkage and Selection Operator) extends estimation methods by applying penalties based on absolute values rather than squared terms. This introduces unique characteristics compared to ridge penalties.
  • The LASSO penalty shrinks estimates towards zero differently than ridge; it focuses on minimizing absolute magnitudes instead of squared sums.

Understanding Lasso Regression and ETF Analysis

Lasso Regression Overview

  • Lasso regression minimizes least squares while constraining the sum of absolute magnitudes, leading to shrinkage effects that concentrate estimates at the vertices of a penalty function.
  • This method often results in some parameters being zero, effectively excluding certain variables from the regression based on the value of lambda.

ETF Case Study Introduction

  • The case study involves analyzing prices of exchange-traded funds (ETFs) across various sectors in the US market, which includes about 10 sectors like consumer staples and energy.
  • The analysis focuses on regressing sector ETFs against market index ETFs such as S&P 500 and NASDAQ to understand their relationships.

Purpose of Regression Analysis

  • The goal is to determine how well sector ETFs can be explained by broader market indices, with an emphasis on understanding variance in returns.
  • A key application is hedging investments against market risk by eliminating dependencies on major indices, thus reducing exposure during market downturns.

Hedge Fund Replication Insights

  • Similar regression techniques are used for hedge fund replication strategies, where returns may largely come from liquid market instruments available for trading.

Regression Coefficients and Diagnostics

  • Analyzing regression coefficients helps identify whether explanatory factors significantly differ from zero; t-values and p-values provide this insight.
  • R-squared values indicate how well the model explains variability; diagnostics help assess model fit and performance.

Principal Component Analysis (PCA)

  • PCA is employed to decompose variability among independent variables (index sector ETFs), revealing that the first two components explain a significant portion (90%+) of variability.
  • Each principal component variable is orthogonal, allowing for straightforward linear regressions with varying degrees of statistical significance noted across components.

Statistical Significance in Model Selection

  • Evaluating p-values helps determine which principal component variables should be included in the final model; earlier analyses showed differing significance levels over time.

Principal Component Analysis and Regression Techniques

Overview of Principal Component Variables

  • Discussion on the significance of principal component variables in regression models, emphasizing that high-order components may not always include important regressors.
  • Noted that statistically significant principal components can explain factors affecting response variables despite low variability.

Ridge and Lasso Regression Insights

  • Examination of ridge regression results, highlighting how varying constraints on the L2 norm affect regression parameters.
  • Observations on lasso regression where increasing the penalty (lambda parameter) leads to smaller L1 norms for estimates, indicating a shrinkage effect.

Comparison of Regression Methods

  • Illustration comparing coefficients from different methods; most methods yield similar results except when using only the first three principal component variables which produce distinct estimates.
  • Emphasis on sensitivity of estimators to data variations, particularly in smaller sample sizes where differences between ridge and lasso become more pronounced.

Statistical Significance in Principal Components

  • In an example with principal components regression, significant components led to comparable estimates as least squares values, suggesting effective model selection based on statistical significance.

Understanding Capital Asset Pricing Model (CAPM)

Introduction to CAPM

  • Overview of CAPM's definition: expected return equals risk-free return plus beta times market excess return. Beta represents market risk factor.

Empirical Analysis Framework

  • Outline of steps for empirical analysis involving CAPM: computing returns for stocks versus market index and testing hypotheses about individual coefficients.

Linear Regression Model Specification

  • Description of linear regression model setup using empirical returns data for stocks and market excess returns; includes intercept (alpha j) and slope (beta j).

Testing CAPM Validity

  • Explanation that if CAPM holds true, alpha j should equal 0; fitting real data allows testing this hypothesis against actual stock pricing behavior.

Practical Application with GE Stock Example

  • Demonstration using GE stock to fit a regression model; highlights importance of p-value in assessing whether alpha is consistent with zero under CAPM assumptions.

Understanding Regression Analysis and Residuals

Key Insights on Regression Coefficients

  • The significance of large t-values and p-values is discussed, emphasizing that a high t-value with a low p-value indicates correlation. Testing the beta coefficient against an average market risk (beta = 1) is suggested as more meaningful.

Importance of R-squared Values

  • R-squared values are crucial for understanding linear regression models. The discussion includes calibrating scatter plots to these values, which aids in interpreting model fit.

Residual Analysis Fundamentals

  • Residual analysis checks if model assumptions hold true post-fitting. A normal distribution of residuals is assumed, and a histogram illustrates this concept using GE's regression data.

Evaluating Normality of Residuals

  • Two bell-shaped curves are presented: one from maximum likelihood estimates (MLE) and another from robust estimators. This comparison highlights the importance of normal distribution in linear regressions.

QQ Plot Interpretation

  • A QQ plot compares ordered residual samples to theoretical normals. If residuals are Gaussian, the plot should align along a straight line, indicating proper model fit.

Assessing Robust Estimates

Standard Deviation Estimation Techniques

  • The robust estimate of standard deviation uses interquartile range to assess variance among residuals, suggesting it may provide better insights than traditional methods.

Percentile Distribution Evaluation

  • Fitted percentiles under the residual model should ideally be uniformly distributed if the normal model holds true. Discrepancies indicate potential non-normal distributions in data.

Implications of Non-Normal Residual Distribution

  • Observations suggest that fitted percentiles using MLE underestimate near the mean while robust estimates perform better across most ranges except extremes, hinting at possible mixture models for error distribution.

Hypothesis Testing in Regression Models

Testing Individual Model Coefficients

  • Hypothesis testing involves assessing whether individual coefficients significantly differ from zero or other benchmarks like beta = 1, which represents average market risk across stocks.

Statistical Significance Assessment

  • T-statistics derived from least squares estimators allow for testing parameter significance within regression models; specific interest lies in alpha tests within capital asset pricing models (CAPM).

Utilizing R Packages for Regression Analysis

CAR Package Functionality

  • The CAR package facilitates hypothesis testing through F-tests, providing results such as residual sum of squares and corresponding statistics essential for evaluating regression parameters.

Exploring Submodel Suitability

  • Discussion includes testing submodels with fewer factors and examining regime shifts where parameter values may change over time—highlighting dynamic aspects of regression analysis.

Testing Regression Parameters in Capital Asset Pricing Model

Overview of Hypothesis Testing

  • The discussion begins with the potential to test changes in regression parameters over time using the capital asset pricing model (CAPM).
  • It is suggested that one can split data into two periods (A and B) to analyze differences in regression models for each period.
  • An F-test statistic is introduced as a method to determine if regression parameters differ between the two periods.

Analysis of GE Stock Data

  • A subjective choice was made to split the data for GE stock into two periods, allowing for analysis of cumulative standardized residuals.
  • The results indicate strong time dependence in residuals, with significant changes observed when fitting separate regression models for both periods.
  • Notable findings include a shift from negative to positive alpha values and an increase in market risk beta from 0.773 to 1.126 across the two periods.

Application of CAPM Across S&P 500 Stocks

  • The speaker transitions back to applying CAPM across all stocks within the S&P 500, noting around 380–400 stocks were analyzed by sector.
  • R-squared values are reported, typically above 0.2, indicating reasonable explanatory power of the regressions performed on these stocks.

Sector Analysis and Alpha Values

  • Box plots illustrate alpha estimates by sector; construction and computer technology sectors show positive alphas while conglomerates exhibit negative alphas.
  • P-values are calculated for testing whether alphas are significantly different from zero; most exceed the threshold of 0.05, suggesting consistency with CAPM.

Insights on Beta and Alpha Relationships

  • A plot shows a correlation where higher beta values may lead to higher alpha values; low beta stocks might reflect lower alpha potential as well.
  • Different fits for various sectors complicate interpretations but serve as initial insights into how alphas vary across sectors.

Risk Assessment by Sector

  • Beta coefficients reveal that consumer staples have lower betas compared to high-tech sectors like computer technology which have much higher betas.
  • Top ten stocks with highest betas are primarily found in consumer discretionary and computer technology sectors; lowest betas cluster around consumer staples.

Significant Alphas Among Stocks

  • A table lists top ten stocks with significant alphas; ENPH from oils and energy sector has an alpha value of 0.003 daily, prompting discussion on annualizing this figure based on daily returns.

Understanding Stock Returns and the Capital Asset Pricing Model

Key Insights on Stock Performance

  • The speaker discusses a calculation involving 252 days in a year, leading to an adjustment that results in a return of 2.12 minus 1, indicating significant stock performance.
  • A notable observation is made regarding the capital asset pricing model (CAPM), which appears to align well with stock returns over extended periods.
  • Despite the effectiveness of the CAPM, there is an acknowledgment that more complex models may be necessary for deeper analysis.
  • The discussion highlights the balance between simplicity and complexity in financial modeling, suggesting that while basic models can yield insights, they may not capture all market dynamics.
  • The emphasis is placed on understanding how different models can provide varying perspectives on stock performance and investment strategies.
Video description

MIT 18.642 Topics in Mathematics with Applications in Finance, Fall 2024 Instructor: Peter Kempthorne View the complete course: https://ocw.mit.edu/courses/18-642-topics-in-mathematics-with-applications-in-finance-fall-2024 YouTube Playlist: https://www.youtube.com/playlist?list=PLUl4u3cNGP601Q2jo-J_3raNCMMs6Jves This lecture covers the theory and application of regression modeling, including linear regression properties, hypothesis testing, and advanced methods like ridge, lasso, and principal components regression. It also explores practical use cases such as ETF sector regressions and empirical analysis of the Capital Asset Pricing Model, highlighting model diagnostics, parameter estimation, and challenges like residual distribution and regime changes. License: Creative Commons BY-NC-SA More information at https://ocw.mit.edu/terms More courses at https://ocw.mit.edu Support OCW at http://ow.ly/a1If50zVRlQ We encourage constructive comments and discussion on OCW’s YouTube and other social media channels. Personal attacks, hate speech, trolling, and inappropriate comments are not allowed and may be removed. More details at https://ocw.mit.edu/comments.