Bayesian Methods in Modern Marketing Analytics with Juan Orduz
Introduction
The speaker welcomes the audience and introduces the purpose of the webinar.
- The speaker expresses excitement about sharing knowledge on advanced usage of Pi MC in marketing analytics.
- The speaker highlights the rare combination of statistical and domain expertise that Juan Ordos possesses.
- There will be a Q&A section at the end of the webinar.
Purpose of Today's Session
The speaker explains what to expect from today's session.
- Patient methods can effectively help solve marketing data science problems.
- The session will provide an overview of certain applications for patient methods in marketing analytics.
- The focus is not on detailed descriptions of methods but rather on getting an overview.
- Participants can reach out to the speaker or IMC Labs team for more information.
Outline
The speaker outlines what they will cover during the session.
- Introduction
- Simple application with A/B testing
- Media mix models
- Customer lifetime value problems
- Counting Charlene friends (briefly mentioned)
- Collaboration with Times Labs team
- Experimental approach to retention revenue matrices
Big Picture
The speaker emphasizes keeping sight of business problems when using models.
- Models are just a piece in solving business problems.
- Integration with product, measuring efficiency, and feedback loops with stakeholders are important aspects to consider.
- It is an ongoing cycle where models should make outcomes actionable.
Benefits of Bayesian Modeling
In this section, the speaker discusses the benefits of using Bayesian modeling in data analysis.
Advantages of Explicit Data Generation Process
- Bayesian modeling forces you to explicitly think about the data generation process.
- This mindset helps you describe your assumptions more explicitly.
Adding Priors for Domain Knowledge
- Being able to add priors is powerful because it allows domain knowledge to be encoded as constraints.
- This is especially useful in domains with limited or low-quality data, such as marketing.
Flexibility and Uncertainty Quantification
- The flexibility of Bayesian modeling allows for a Lego-like approach where you can start simple and build on top.
- Uncertainty quantification is important for making decisions and assessing risk in different scenarios or outcomes of models.
Simulation: Geo Experiment
In this section, the speaker presents a simulation of a geo experiment and demonstrates how Bayesian modeling can be used to estimate uplift associated with a campaign.
Simulating a Geo Experiment
- A geo experiment involves launching a campaign in certain zip codes to see if it induces an uplift in orders.
- The speaker presents a simple method called time-based regression to measure the effectiveness of the campaign.
Parametrizing the Model
- To connect input data with output data, you need to parametrize your model by giving functional form to the relationship between them.
- In this case, a simple linear regression is used with an intercept and beta coefficient that mediates the relationship between control and treatment groups.
Running Inference and Making Decisions
- Inference is run using MCMC samples or other patient methods to get predictions, credible intervals, and uncertainty estimates.
- By taking the difference between treatment and control groups, you can estimate uplift associated with the campaign.
Introduction to Model Building
In this section, the speaker introduces the process of building a model and explains how it can be translated into a PyMC3 model.
Defining Variables
- The first step in building a model is defining the set of variables.
- The training data is defined as the printer version data.
Setting Priors
- The second step is setting up priors, which involves telling the model which variables you have and possibly constraining them.
- In this case, a half-normal distribution is imposed on the relationship between controller treatment because it's assumed to be positive.
Parametrization
- The third step is parametrization, which involves specifying the relationship between variables.
- In this case, a linear relationship with a student T distribution was used for robust regression.
Marketing Measurement through Media Mix Models
This section discusses marketing measurement through media mix models and how they can be used to optimize advertising efficiency.
Attribution Models
- Attribution models are no longer effective due to privacy regulations and their inability to account for complex marketing strategies.
- A more holistic approach involves combining attribution models with experimentation and media mix models using linear regression.
Media Mix Models
- Media mix models blend information from experimentation and attribution by means of linear regression.
- They consider two types of transformations: carryover or ad stock effect and saturation effect.
Carryover Effect
- The carryover effect refers to the lag in effect of media spend on sales.
- It can be seen as smoothing in simulated data examples provided by the speaker.
Saturation Effect
- The saturation effect refers to the point at which putting more money into a specific channel no longer results in the same amount of sales.
- It can be seen as a chop off in simulated data examples provided by the speaker.
Media Mix Model
In this section, the speaker discusses how to set up a regression model where sales are the target and input channels are requests. They explain how to encode this transformation in such a way that you can fit the linear model and learn parameters defining carryover and saturation effects.
Regression Model for Sales
- The goal is to set up a regression model where sales are the target and input channels are requests.
- The transformation needs to be encoded in such a way that you can fit the linear model and learn parameters defining carryover and saturation effects.
Joint Project with Prime C Lab Box
- A joint project with Prime C Lab Box allows us to infer what kind of background cells there are, which we call organic sales, as well as the sales distribution evolution or contribution over time.
- With this fitted model, we can run simulations to determine how much increase or decrease in media spend would result in expected amounts of sales. This information is useful for optimization purposes when allocating budgets across channels.
Piency Marketing Package
- A joint project with Time Select Force has resulted in a package called Piency Marketing, which aims to work on modern marketing media mix models. It has a solid baseline that is user-friendly through an API that provides diagnostics and plots seen earlier in the presentation.
- The goal is to make all these models more accessible within the Python ecosystem and PMC ecosystem as well so that they can be used by practitioners in various industries.
Customer Lifetime Value
In this section, the speaker discusses how to ensure that customers acquired are valuable and that investments are paid off in the long run. They explain how customer lifetime value is calculated and why it's important.
Importance of Customer Lifetime Value
- It's not enough to acquire customers; they must also be valuable for the investment to pay off in the long run.
- The cost per acquisition or return on advertisement spend should be coupled with customer lifetime value, which is how much a certain type of user is expected to contribute over time.
Customer Lifetime Value Modeling
In this section, the speaker discusses customer lifetime value modeling and how it can be used to estimate the expected number of purchases per user for a given time period.
Features Relevant to Customer Lifetime Value Modeling
- Three natural data features relevant in customer lifetime value modeling are frequency, age, and recency.
- Frequency is the number of repeated purchases a customer has made.
- Age refers to how long a customer has been making purchases.
- Recency encodes when was the last purchase.
Models for Customer Lifetime Value Estimation
- The BG/NBD model and Gamma-Gamma model are two models that allow estimation of customer lifetime value using input data.
- The BG/NBD model takes into account frequency, age, and recency of users to output the number of expected purchases per user for a given time period.
- The Gamma-Gamma model predicts monetary spend per transaction or basket size.
Data Generation Process Assumptions
- Transactions follow a decaying exponential curve with each user having a decay rate of Lambda.
- Lambda depends on the user but can be modeled as coming from a global distribution (Gamma distribution).
- Each customer becomes inactive with probability p after making a purchase. P depends on the user and is assumed to come from a global distribution (Beta distribution).
Likelihood Function and Unoptimized Parameters
In this section, the speaker discusses the likelihood function and unoptimized parameters for an average user. They also touch on estimating these parameters for individual users.
Estimating Parameters with Lifetime Package
- The speaker mentions using Python's lifetime package to estimate parameters.
- They note that they haven't gotten any exciting results yet but having an uncertainty band on the probability of being active at a user level is helpful for decision making.
Gamma Model and Hyperparameters
- The speaker explains the assumptions of the gamma model and how it can be used to estimate hyperparameters.
- They mention that once you learn these hyperparameters from a global distribution, you can draw conclusions about an unseen user which is useful for modeling purposes.
Hierarchical Models
In this section, the speaker discusses hierarchical models and their usefulness in solving cold start problems for new cohorts.
Building a Hierarchy Model
- The speaker describes building a hierarchical model on top of BG and BD.
- They explain that this allows them to solve cold start problems for new cohorts by transferring information through hierarchical models.
Shrinkage Phenomena
- The speaker notes that overall, the hierarchical model brings values closer to the global mean due to shrinkage phenomena.
- This has important implications for production life systems when thinking about customer lifetime value and probability of being active.
Cultural Inference
In this section, the speaker introduces cultural inference as a big role in decision making processes.
Cultural Inference in Decision Making
- The speaker notes that cultural inference plays a big role in decision making processes when thinking about incentives and measuring treatment effects.
- They mention that cultural pie is another product developed by IMC Labs which provides an API and framework to estimate different types of effects with different methods depending on the problem.
Applications of Bayesian Methods in Marketing
In this section, the speaker briefly discusses the importance of Bayesian methods in marketing and provides examples of their applications.
Importance of Bayesian Methods
- Bayesian methods are important in marketing because they provide a better estimation of effects on users.
- They can be used to control for non-compliant effects and reduce variance.
- Previous experiment experience is useful in business contexts.
Applications
Incentive Effect on Users
- Use instrument variables to get a better estimation accounting for non-compliant effect.
- Gain a better estimation by passing information from previous experiments.
Retention Modeling
- Customer lifetime value is tricky to model at user level.
- Retention matrix shows how cohorts develop over time.
- A certain type of model can extract the most important information outside of it.
Cohort Analysis and Revenue Modeling
In this section, the speaker discusses cohort analysis and revenue modeling. They explain how to model retention explicitly through a linear combination of age cohort and month, as well as how to use seasonal patterns to consider all types of features. The speaker also introduces Basin additive regression trees as a way to model retention and revenue.
Modeling Retention
- The age of the period with respect to the initial date is used to describe cohorts.
- Seasonal patterns can be considered when modeling cohorts.
- Retention can be modeled explicitly through a linear combination of age cohort and month.
- Basin additive regression trees can be used to model retention.
Modeling Revenue
- Revenue depends on the number of active users.
- A binomial likelihood can be used to model active users through retention.
- A gamma likelihood can be used for modeling revenue since it is a positive quantity.
- Non-linearity often comes in retention but can also come in revenue modeling.
Coupling Problems
- Retention and revenue problems can be coupled together into one basic model.
- Stratifying acquisition channels by retention is important for overall optimization.
- Out-of-sample prediction is possible using this bad model coupled with revenue.
Patient Models in Marketing
In this section, the speaker talks about the potential of patient models in marketing and how they can be used to improve marketing analytics pipelines. The speaker also mentions some packages that can be used for patient modeling.
Potential Applications of Patient Models in Marketing
- Patient models have a lot of potential applications in marketing.
- A/B testing is one such application that can be explored using patient models.
- Packages like PyMC3, Marketing Mix Modeling, and Partial Pooling are useful for implementing patient models in marketing.
Free Consultation Offer
- The speaker's team is offering a free 30-minute strategy consultation call to help companies improve their marketing analytics pipeline.
- Interested parties can use the provided link to schedule a time for the consultation call.
Corporate Workshops
- The speaker's team also offers corporate workshops on various topics related to data science and applied statistics.
- These workshops cover topics such as PyMC3, hierarchical modeling, linear modeling, and time series analysis.
Important Parameters and Model Calibration
The speaker discusses the importance of informative priors in obtaining reliable results from media mix models. They also address concerns about overfitting when model MAP goes down to 2-3% and how to calibrate the MMM with experimental findings.
Identifying Important Parameters
- There are many parameters in MMM, some of which are not identifiable, such as shape parameters and situation functions.
- It is important to have good informative priors to obtain reliable results.
Concerns About Overfitting
- When model MAP goes down to 2-3%, there may be concerns about overfitting.
- Start with a simple model like linear regression and look into the data to see where the variance is not entirely captured.
- Domain knowledge should drive model building.
Calibrating MMM with Experimental Findings
- It depends on the type of experiments that you run and the type of target metric that you want.
- For example, if you want an estimation on customer acquisition, you can use uplift tests.
- It's not trivial but can be done with study experience and thought.
Encoding Categorical Control Variables
The speaker addresses how categorical control variables can be encoded in non-time varying coefficient cases.
Dummy Variables
- In non-time varying coefficient cases, dummy variables can be used for categorical control variables like holidays (e.g., Christmas or Black Friday).
Bump Functions
- Bump functions can also be used for categorical control variables like Christmas season.
- Google Trends can provide a nice proxy for modeling bump functions.
Model Structure and Parametrization
In this section, the speaker discusses how to learn the model structure and parametrize certain types of models. The speaker emphasizes that thinking about the data generation process is crucial in determining the model structure.
Learning Model Structure
- Start simple and look into errors where the virus is not explained.
- Think about the data generation process to determine model structure.
Parametrizing Models
- Use experience and domain knowledge to decide on distributions.
- Stand documentation has guidelines for selecting and parametrizing models.
- Careful parameterization is important for running MCMC effectively.
Prior Predictive Checks
In this section, the speaker discusses prior predictive checks as a helpful tool in modeling latent variables.
Prior Predictive Checks
- Prior predictive checks are helpful in modeling latent variables.
- They can be used to calibrate priors or MMM through experiments.
- Blend MMM and attribution results with experiments as a component.
Measuring Baseline Uplift in Media Mix Models
In this section, the speaker discusses measuring baseline uplift after a campaign in media mix models.
Measuring Baseline Uplift
- No global model captures all aspects of media mix models.
- Attribution can be used to get input data from medium bottle.
- Calibration of priors or MMM through experiments implicitly measures baseline uplift.
Superpower of Hierarchical Modeling
In this section, the speaker explains the concept of hierarchical modeling and how it can be used to model marketing channels.
Understanding Hierarchical Modeling
- Hierarchical modeling allows you to model a structure that is similar to the average channel in the absence of data.
- As you get more data, you can learn more about individual channels and how they deviate from the mean.
- The global average is modeled at the top of the hierarchy, while individual channels are modeled below it.
Benefits of Hierarchical Modeling
- It allows you to make predictions about new marketing channels based on existing data.
- It helps you understand how individual channels differ from each other and from the global average.
Conclusion
In this section, the speaker concludes by thanking everyone for their participation and engagement in open source and community projects.
Final Thoughts
- The speaker thanks everyone for their questions and engagement in open source and community projects.
- Many people joined Discord during the session, which was exciting.