Lecture 11: Regression Analysis (cont.)

Name: Lecture 11: Regression Analysis (cont.)
Uploaded: 2025-12-03T15:44:30.000Z
Duration: 2 h 44 min 43 s

Regression Modeling: Theory and Application

Recap of Linear Regression Theory

Peter Kempthorne introduces the session focused on regression modeling, emphasizing the importance of understanding both theory and application.

Discusses properties of beta hat in a normal linear regression model, highlighting its distribution as a multinormal random variable with specific mean and covariance matrix characteristics.

Explains that epsilon hat is also a multinormal distribution, which allows for independent random variables under the normal model assumption.

Estimating Error Variance

Describes how to estimate error variance from residual vectors, leading to an unbiased estimate by dividing sum of squared residuals by n minus p.

Introduces t-statistics for least squares estimates, detailing the formula involving beta hat and sigma hat Cjj, where Cjj represents diagonal entries of the inverse variance.

Hypothesis Testing in Regression

Highlights that t-statistics follow a t-distribution with n minus p degrees of freedom when testing hypotheses about beta j being equal to 0.

Discusses constructing confidence intervals around 0 for hypothesis testing and implications for excluding factors from regression models.

F-tests in Regression Analysis

Explains how to test multiple beta j's simultaneously by comparing residual sums of squares under different hypotheses.

Notes that if k equals p minus 1, this corresponds to testing whether the last coefficient is zero; relates this back to t-squared statistics being equivalent to F-tests.

Generalized Least Squares (GLS)

Introduces generalized least squares as a method addressing non-diagonal error structures in regression models.

Emphasizes transforming models using inverse square root matrices to satisfy Gauss-Markov assumptions for optimal estimation.

Maximum Likelihood Estimation (MLE)

Concludes with an introduction to maximum likelihood estimation within normal linear regression frameworks, focusing on computing data density given explanatory variables.

Maximum Likelihood Estimation in Regression

Understanding Maximum Likelihood Estimation (MLE)

MLE identifies parameter values that maximize the probability of observing the given data, making it an optimal estimator with the smallest variance in large samples.

As sample size increases, MLE becomes the best estimator across different distributions; this will be demonstrated using Gaussian distribution.

In normal linear regression models, MLE can be derived by minimizing a least squares criterion Q(beta) , which leads to estimating beta coefficients effectively.

The maximum likelihood estimate for error variance is calculated as the sum of squared residuals divided by n, although this results in a biased estimate.

Generalized M-estimators extend beyond least squares by minimizing alternative Q functions, allowing for more robust estimations.

Robust Estimators and Alternatives

Robust estimators consider variations of least squares and mean absolute deviations, providing alternatives when error distributions are unknown or contaminated.

If the density of errors is known, MLE suggests using that specific distribution; otherwise, robust methods serve as useful alternatives.

Generalized M-estimators include quantile estimators that scale mean absolute deviations based on positive and negative residual values to provide estimates like the 90th percentile outcomes.

Calculating Mean Absolute Deviation Estimates

To calculate mean absolute deviation estimates or quantile estimators, one must minimize convex h functions associated with residual values.

The median serves as the best estimate for center data when considering mean absolute deviations due to its robustness against outliers.

Ridge Regression: An Extension of Linear Models

Ridge regression incorporates a penalty term into least squares estimation to address issues such as multicollinearity among predictors.

This method adds a penalty based on the squared length of the beta vector to improve model stability and performance. Standardizing independent variables enhances effectiveness.

Ridge Regression and Its Connections to Bayesian Models

Ridge Regression Parameter Estimation

Ridge regression involves rescaling the predictor matrix columns, centering them with the inverse of their covariance matrix. This standardization ensures all predictor variables are on equal footing.

Standardizing eliminates dependence on original units of X, preventing any single beta coefficient from being disproportionately large due to differing units.

Connection to Bayesian Models

Ridge regression can be interpreted through a Bayesian lens by assuming a prior distribution for regression parameters that is multinormal with mean zero and a covariance matrix proportional to the identity matrix.

The ridge regression criterion corresponds to the exponential part of the log likelihood plus the log likelihood of the prior density, suggesting that all directions in p-dimensional space are equally likely for characterizing regression parameters.

Computation and Singular Value Decomposition

The minimization process in ridge regression leads to formulas resembling least squares but includes an additional factor (lambda times identity).

Using singular value decomposition (SVD), fitted values in ridge regression can be expressed as a sum of factors multiplied by coefficients, where lambda influences shrinkage based on singular values.

Shrinkage Effects in Ridge Regression

Ridge regression exhibits less shrinkage for larger squared singular values and more shrinkage for smaller ones, indicating that coefficients corresponding to principal component axes experience different levels of adjustment.

This relationship connects ridge regression with principal components analysis (PCA), highlighting how both methods utilize SVD for computation.

Principal Components Regression: Methodology and Insights

Principal Component Variables

In principal components regression, orthogonal principal component variables simplify coefficient calculations. Researchers may choose only the first m components for their models.

Fitted values from principal components regression represent projections onto spaces spanned by selected principal component directions, showcasing similarities across least squares, ridge, and PCA methodologies.

Extensions Beyond Traditional Methods

LASSO (Least Absolute Shrinkage and Selection Operator) extends estimation methods by applying penalties based on absolute values rather than squared terms. This introduces unique characteristics compared to ridge penalties.

The LASSO penalty shrinks estimates towards zero differently than ridge; it focuses on minimizing absolute magnitudes instead of squared sums.

Understanding Lasso Regression and ETF Analysis

Lasso Regression Overview

Lasso regression minimizes least squares while constraining the sum of absolute magnitudes, leading to shrinkage effects that concentrate estimates at the vertices of a penalty function.

This method often results in some parameters being zero, effectively excluding certain variables from the regression based on the value of lambda.

ETF Case Study Introduction

The case study involves analyzing prices of exchange-traded funds (ETFs) across various sectors in the US market, which includes about 10 sectors like consumer staples and energy.

The analysis focuses on regressing sector ETFs against market index ETFs such as S&P 500 and NASDAQ to understand their relationships.

Purpose of Regression Analysis

The goal is to determine how well sector ETFs can be explained by broader market indices, with an emphasis on understanding variance in returns.

A key application is hedging investments against market risk by eliminating dependencies on major indices, thus reducing exposure during market downturns.

Hedge Fund Replication Insights

Similar regression techniques are used for hedge fund replication strategies, where returns may largely come from liquid market instruments available for trading.

Regression Coefficients and Diagnostics

Analyzing regression coefficients helps identify whether explanatory factors significantly differ from zero; t-values and p-values provide this insight.

R-squared values indicate how well the model explains variability; diagnostics help assess model fit and performance.

Principal Component Analysis (PCA)

PCA is employed to decompose variability among independent variables (index sector ETFs), revealing that the first two components explain a significant portion (90%+) of variability.

Each principal component variable is orthogonal, allowing for straightforward linear regressions with varying degrees of statistical significance noted across components.

Statistical Significance in Model Selection

Evaluating p-values helps determine which principal component variables should be included in the final model; earlier analyses showed differing significance levels over time.

Principal Component Analysis and Regression Techniques

Overview of Principal Component Variables

Discussion on the significance of principal component variables in regression models, emphasizing that high-order components may not always include important regressors.

Noted that statistically significant principal components can explain factors affecting response variables despite low variability.

Ridge and Lasso Regression Insights

Examination of ridge regression results, highlighting how varying constraints on the L2 norm affect regression parameters.

Observations on lasso regression where increasing the penalty (lambda parameter) leads to smaller L1 norms for estimates, indicating a shrinkage effect.

Comparison of Regression Methods

Illustration comparing coefficients from different methods; most methods yield similar results except when using only the first three principal component variables which produce distinct estimates.

Emphasis on sensitivity of estimators to data variations, particularly in smaller sample sizes where differences between ridge and lasso become more pronounced.

Statistical Significance in Principal Components

In an example with principal components regression, significant components led to comparable estimates as least squares values, suggesting effective model selection based on statistical significance.

Understanding Capital Asset Pricing Model (CAPM)

Introduction to CAPM

Overview of CAPM's definition: expected return equals risk-free return plus beta times market excess return. Beta represents market risk factor.

Empirical Analysis Framework

Outline of steps for empirical analysis involving CAPM: computing returns for stocks versus market index and testing hypotheses about individual coefficients.

Linear Regression Model Specification

Description of linear regression model setup using empirical returns data for stocks and market excess returns; includes intercept (alpha j) and slope (beta j).

Testing CAPM Validity

Explanation that if CAPM holds true, alpha j should equal 0; fitting real data allows testing this hypothesis against actual stock pricing behavior.

Practical Application with GE Stock Example

Demonstration using GE stock to fit a regression model; highlights importance of p-value in assessing whether alpha is consistent with zero under CAPM assumptions.

Understanding Regression Analysis and Residuals

Key Insights on Regression Coefficients

The significance of large t-values and p-values is discussed, emphasizing that a high t-value with a low p-value indicates correlation. Testing the beta coefficient against an average market risk (beta = 1) is suggested as more meaningful.

Importance of R-squared Values

R-squared values are crucial for understanding linear regression models. The discussion includes calibrating scatter plots to these values, which aids in interpreting model fit.

Residual Analysis Fundamentals

Residual analysis checks if model assumptions hold true post-fitting. A normal distribution of residuals is assumed, and a histogram illustrates this concept using GE's regression data.

Evaluating Normality of Residuals

Two bell-shaped curves are presented: one from maximum likelihood estimates (MLE) and another from robust estimators. This comparison highlights the importance of normal distribution in linear regressions.

QQ Plot Interpretation

A QQ plot compares ordered residual samples to theoretical normals. If residuals are Gaussian, the plot should align along a straight line, indicating proper model fit.

Assessing Robust Estimates

Standard Deviation Estimation Techniques

The robust estimate of standard deviation uses interquartile range to assess variance among residuals, suggesting it may provide better insights than traditional methods.

Percentile Distribution Evaluation

Fitted percentiles under the residual model should ideally be uniformly distributed if the normal model holds true. Discrepancies indicate potential non-normal distributions in data.

Implications of Non-Normal Residual Distribution

Observations suggest that fitted percentiles using MLE underestimate near the mean while robust estimates perform better across most ranges except extremes, hinting at possible mixture models for error distribution.

Hypothesis Testing in Regression Models

Testing Individual Model Coefficients

Hypothesis testing involves assessing whether individual coefficients significantly differ from zero or other benchmarks like beta = 1, which represents average market risk across stocks.

Statistical Significance Assessment

T-statistics derived from least squares estimators allow for testing parameter significance within regression models; specific interest lies in alpha tests within capital asset pricing models (CAPM).

Utilizing R Packages for Regression Analysis

CAR Package Functionality

The CAR package facilitates hypothesis testing through F-tests, providing results such as residual sum of squares and corresponding statistics essential for evaluating regression parameters.

Exploring Submodel Suitability

Discussion includes testing submodels with fewer factors and examining regime shifts where parameter values may change over time—highlighting dynamic aspects of regression analysis.

Testing Regression Parameters in Capital Asset Pricing Model

Overview of Hypothesis Testing

The discussion begins with the potential to test changes in regression parameters over time using the capital asset pricing model (CAPM).

It is suggested that one can split data into two periods (A and B) to analyze differences in regression models for each period.

An F-test statistic is introduced as a method to determine if regression parameters differ between the two periods.

Analysis of GE Stock Data

A subjective choice was made to split the data for GE stock into two periods, allowing for analysis of cumulative standardized residuals.

The results indicate strong time dependence in residuals, with significant changes observed when fitting separate regression models for both periods.

Notable findings include a shift from negative to positive alpha values and an increase in market risk beta from 0.773 to 1.126 across the two periods.

Application of CAPM Across S&P 500 Stocks

The speaker transitions back to applying CAPM across all stocks within the S&P 500, noting around 380–400 stocks were analyzed by sector.

R-squared values are reported, typically above 0.2, indicating reasonable explanatory power of the regressions performed on these stocks.

Sector Analysis and Alpha Values

Box plots illustrate alpha estimates by sector; construction and computer technology sectors show positive alphas while conglomerates exhibit negative alphas.

P-values are calculated for testing whether alphas are significantly different from zero; most exceed the threshold of 0.05, suggesting consistency with CAPM.

Insights on Beta and Alpha Relationships

A plot shows a correlation where higher beta values may lead to higher alpha values; low beta stocks might reflect lower alpha potential as well.

Different fits for various sectors complicate interpretations but serve as initial insights into how alphas vary across sectors.

Risk Assessment by Sector

Beta coefficients reveal that consumer staples have lower betas compared to high-tech sectors like computer technology which have much higher betas.

Top ten stocks with highest betas are primarily found in consumer discretionary and computer technology sectors; lowest betas cluster around consumer staples.

Significant Alphas Among Stocks

A table lists top ten stocks with significant alphas; ENPH from oils and energy sector has an alpha value of 0.003 daily, prompting discussion on annualizing this figure based on daily returns.

Understanding Stock Returns and the Capital Asset Pricing Model

Key Insights on Stock Performance

The speaker discusses a calculation involving 252 days in a year, leading to an adjustment that results in a return of 2.12 minus 1, indicating significant stock performance.

A notable observation is made regarding the capital asset pricing model (CAPM), which appears to align well with stock returns over extended periods.

Despite the effectiveness of the CAPM, there is an acknowledgment that more complex models may be necessary for deeper analysis.

The discussion highlights the balance between simplicity and complexity in financial modeling, suggesting that while basic models can yield insights, they may not capture all market dynamics.

The emphasis is placed on understanding how different models can provide varying perspectives on stock performance and investment strategies.