Process Mining Café 39 — Clickstreams

Process Mining Café 39 — Clickstreams

Clickstream Analysis and Customer Journey Insights

Introduction to Clickstream Analysis

  • The session introduces clickstream analysis as a vital aspect of customer journey analysis, highlighting its relevance in process mining.
  • Hosts Ry and Irena Trias from Online Dialogue discuss the significance of web analytics in understanding user behavior on websites.

Upcoming Events and Participation

  • Announcement of the Process Mining Camp scheduled for May 14-16, with early registration available until March 14.
  • Viewers are encouraged to participate live by joining the chat without needing an account, allowing for interactive discussions.

Understanding Web Analytics

  • Irena explains her role as a data analyst focused on conversion rate optimization (CRO), helping organizations enhance their websites for better user engagement.
  • She emphasizes combining data with psychology to identify opportunities for website optimization through tools like Google Analytics.

Data Analysis Techniques

  • Discussion on analyzing web data includes tracking user interactions such as page views, clicks, and traffic sources to optimize website performance.
  • Irena describes conducting A/B tests to compare different versions of a webpage and determine which performs better based on statistical analysis.

Limitations of Traditional Web Analytics

  • Traditional web analytics often relies on one-dimensional data metrics that may not capture the full picture of user behavior.
  • The concept of funnels is introduced as a method to visualize user pathways through e-commerce sites but acknowledges inherent assumptions about user behavior.

Balancing User Goals with Organizational Objectives

  • The conversation highlights the importance of aligning organizational goals with user needs; if these do not match, optimization efforts may fail.

Finding Balance in Web Analytics

Understanding User Behavior and Data Collection

  • The speaker emphasizes the importance of balancing personal desires with customer needs when analyzing web data.
  • They highlight that web analytics collects extensive data on user interactions, which can be leveraged to enhance user experience.
  • Tracking is often done via URLs, capturing both the content viewed and the actions taken by users on a webpage.

Insights into Google Analytics

  • The current version of Google Analytics (GA4) functions as a comprehensive database where each line represents an individual user's action.
  • Actions tracked include page views and clicks, with various identifiers such as user ID and session ID being recorded.
  • Users have the option to opt-out of tracking if they choose not to accept cookies, emphasizing privacy considerations.

Goals vs. Usability in User Experience

  • Websites typically track key performance indicators like purchase events alongside other optional metrics such as button clicks or popup opens.
  • The speaker discusses how optimizing usability involves understanding both customer preferences and company objectives, which can sometimes conflict.
  • They note that improving speed in processes does not always lead to successful transactions; users may drop off due to lack of information or readiness.

A/B Testing and User Feedback

  • A/B testing is introduced as a method for evaluating changes in website structure while also considering user feedback through surveys.
  • An example is provided where menu structure changes were tested for their impact on transaction speed and user navigation ease.

Design Principles and Behavioral Expectations

  • The interaction between actual user behavior and expected design outcomes is crucial; discrepancies can reveal deeper insights into user needs.
  • Sometimes unexpected results from tests provide more valuable lessons about user behavior than anticipated outcomes would have offered.

Understanding Web Analytics and Process Mining

The Importance of Data in Testing

  • Effective tests yield expected results, prompting analysis of why they occurred and how to align outcomes with goals.
  • Gathering opinions is challenging; typically, only a few can be asked directly. Combining large-scale IB test data with targeted questionnaires enhances understanding.

Utilizing Existing Data Sources

  • Web analytics provide pre-existing data that can be leveraged for deeper insights into user behavior.
  • Google Analytics serves as a foundational tool, offering essential statistics such as page views and purchase events based on website setup.

User Sessions and Visitor Tracking

  • Each user interaction is tracked through sessions identified by session IDs, allowing differentiation between multiple visits.
  • Purchase events can be analyzed not just by unique users but also by total occurrences, revealing repeat buying patterns.

Session Definitions and Cookie Tracking

  • A session is time-bound (e.g., 60 minutes or inactivity for 30 minutes), while visits are tied to cookies that track user activity until expiration.
  • If cookies expire, users are assigned new visitor IDs unless logged in or otherwise identifiable.

Analyzing User Behavior Beyond Funnels

  • Higher-level customer tracking allows linking anonymous visitors to known customers for more detailed analysis relevant to process mining.
  • Key metrics include the number of visitors, sessions, event counts, time spent on pages, and traffic sources (e.g., Google vs. advertisements).

Limitations of Funnel Analysis

  • Funnels assume a linear progression through defined steps but often fail to capture the complex paths users take on websites.
  • Real user behavior includes non-linear navigation—users may revisit pages multiple times before completing an action.

Transitioning from Assumptions to Reality in Process Mining

  • Traditional funnels do not account for the loops or backtracking users engage in during their journey across web pages.
  • Process mining provides a comprehensive view of actual user paths rather than relying solely on assumed sequences.

Integrating Data with Business Process Management

Understanding Funnel Limitations and Process Mining in User Behavior Analysis

The Nature of Funnels and User Behavior

  • Discussion on the limitations of funnel analysis, emphasizing that it abstracts user behavior into stages without capturing actual actions taken by users.
  • Highlighting misconceptions created by funnels, as they only track users who follow a specific path, potentially missing those who do not conform to this linear progression.

Risks of Solely Relying on Funnel Analysis

  • Example provided from a client named Bit Food, illustrating how different paths lead to conversions but are often overlooked in traditional funnel analysis.
  • Emphasis on the difficulty of identifying overlaps between various user paths when using standard funnels, which can obscure complex user journeys.

Advantages of Process Mining Over Traditional Funnels

  • Introduction to process mining as a method used to analyze user behavior more comprehensively than funnels allow.
  • Mention of the potential for discovering unknown paths or behaviors through process mining that would remain hidden with funnel analysis alone.

Steps in Conducting Process Mining Analysis

  • First step involves defining clear and answerable questions before diving into data analysis; this is crucial for effective outcomes.
  • Importance of formulating concrete questions (around 10–12), ensuring they align with available data and can be addressed through process mining techniques.

Data Preparation and Stakeholder Communication

  • Discussing the necessity of confirming data availability related to defined questions upfront to avoid repetitive data cleanup later.
  • Setting boundaries at the project's outset helps manage stakeholder expectations regarding what can be achieved within the current project scope.

Understanding Behavioral Analysis in Web Analytics

The Nature of Questions in Web Analytics

  • The questions posed in web analytics differ from traditional metrics, focusing on behavioral insights rather than static percentages.
  • Analysts seek to confirm paths within larger datasets and explore overlaps and differences across devices and filters.
  • Emphasis is placed on understanding the differences between user paths rather than merely quantifying overlaps.
  • Defining the right questions is crucial; clients often default to static percentage inquiries, which may not capture dynamic behaviors.
  • Analysts must guide clients towards more insightful behavioral dimensions relevant for deeper analysis.

Goals and Methodology of Analysis

  • The overarching goal of the analysis is to gain insights into user paths for future optimization efforts.
  • A comprehensive understanding of interactions beyond mere percentages is essential for effective analysis.
  • Process mining requires relevant data; if data isn't available, it limits the ability to answer important questions effectively.
  • Identifying the correct dataset for answering specific questions can be challenging but necessary for accurate insights.
  • Creativity is often required in scoping data appropriately within process mining contexts.

Data Availability and Detail Level

  • In this project, web data was readily available, including tracked events like page views and purchases, facilitating analysis.
  • The focus was not on individual product pages but rather on general product page engagement across various templates.
  • Python was utilized to group similar pages together while also differentiating checkout steps based on their order of access.
  • Existing URL structures aided in achieving a detailed level of insight without significant delays due to missing data tracking.

Data Preparation and Labeling in Process Mining

The Importance of Creativity in Data Labeling

  • Effective data relabeling can vary in complexity; simple tasks like relabeling URLs are straightforward, while more complex scenarios require creative solutions.

Stakeholder Engagement for Accurate Labeling

  • Conversations with stakeholders were essential to identify default labels on pages, leading to the creation of a comprehensive spreadsheet for labeling based on URL patterns.

Core Data Requirements for Process Mining

  • For process mining, essential data includes case ID, timestamp, and activity. Events such as page views or purchases must be timestamped for accurate analysis.

User Identification and Session Tracking

  • Each event is associated with a user ID (randomized number), session ID, and timestamps. Combining these IDs allows tracking specific user sessions effectively.

Additional Attributes for Enhanced Analysis

  • Attributes like login status and device type (mobile vs. desktop users) were included to focus on logged-in users who have a richer experience on the website.

Data Preparation Steps

Grouping Data Based on Relevance

  • The third step involves data preparation through grouping relevant pages connected to key parts of the project while ensuring clarity in analysis.

Balancing Specificity and Generalization

  • A balance was sought between specificity for detailed analysis and generalization to avoid excessive time spent on categorizing less relevant pages.

Practical Use of Existing Labels

  • Existing labels were utilized pragmatically; discussions with stakeholders helped refine groupings without overcomplicating the dataset.

Variance Analysis for Group Size Assessment

  • Variance was analyzed to determine if groups had sufficient size; small groups (e.g., five users per page) indicated that some categories might be too specific.

Streamlining Clickstream Data

Techniques for Simplifying Unique User Paths

Finding the Right Abstraction Level in Analysis

Balancing Specificity and Generalization

  • The challenge lies in finding the right abstraction level for analysis; it must be generic enough to apply broadly but specific enough to provide meaningful insights.
  • Having specific questions helps refine categories, making it easier to determine what data is relevant and what paths to explore.

Importance of Clear Questions

  • Clear, focused questions guide the analysis process, allowing for a more structured approach rather than an open-ended exploration.
  • There’s a distinction between defining questions and finding explanations; both are crucial for understanding behavior in analysis.

Exploring Data: Initial Steps

Conducting a Sanity Check

  • The initial exploration involves checking the overall map of data to ensure its validity before delving deeper into specifics.
  • A medium zoom level allows analysts to identify unexpected loops or anomalies that may indicate issues with data integrity.

Identifying Patterns

  • Observations during this phase can reveal outdated pages still appearing in data, prompting necessary updates or corrections.
  • Exploration can lead to discovering patterns, such as frequent checkout sequences, which inform further targeted analyses.

Targeted Analysis vs. Explorative Analysis

Distinguishing Between Approaches

  • Targeted analysis starts with specific questions while explorative analysis remains open-ended, allowing for broader discovery without preconceived notions.
  • The focus on targeted analysis means less time spent on exploratory phases when clear objectives are established from the outset.

Confirming Hypotheses

Analysis of User Behavior on Website

Understanding Page Interactions

  • The analysis begins with a visual representation showing arrows from page five to the cart, indicating user navigation paths. The thickness of lines represents the volume of users, with page two having the highest traffic leading to the cart.
  • The focus is on analyzing session ID and user ID at a session level, which captures individual visits to the website. This approach aims to identify user behaviors that lead to adding items to the cart.
  • Filtering was employed in data analysis, prioritizing overall patterns over specific sequences of pages visited. This method acknowledges overlaps between different paths users take.

Path Distribution Analysis

  • The distribution of user paths is examined specifically for sessions that resulted in an order confirmation, highlighting that not all added items lead to purchases.
  • Despite consistent path presence across various scenarios, their importance varied significantly based on device type and other factors influencing user behavior.

Comparative Analysis Techniques

  • Comparisons were made against baseline behaviors and between confirmed orders versus non-confirmed ones. This dual comparison helps understand how certain pages contribute differently depending on transaction outcomes.
  • At this stage, observations were documented extensively without final conclusions drawn yet. Filtering actions provided insights into percentages related to specific paths taken by users.

Key Findings and Insights

  • Important findings emerged regarding relationships between different navigation paths. Analyzing cases where users visited multiple pages helped clarify which combinations led most effectively towards transactions.
  • Emphasis was placed on predefined paths rather than just general page visits; understanding these pathways was crucial for effective analysis and decision-making in web analytics.

Best Practices in Data Analysis

  • Acknowledgment of complexity within web analytics processes necessitates framing analyses around specific groups (e.g., confirmed orders). This targeted approach aids clarity amidst extensive data comparisons.

Cross-Section Analysis and Client Involvement

Understanding Cross-Sectional Analysis

  • The discussion begins with the complexity of creating a cross-section analysis, emphasizing the need to confirm order status alongside login status and device type.
  • It highlights the challenge of limiting dimensions in analysis to avoid overwhelming clients with excessive combinations that complicate report readability.

Approaches to Data Analysis

  • Two primary approaches are identified: top-down decision-making starting from questions or finding a middle ground through visualizations that trigger further inquiries.
  • The importance of domain expertise is stressed, as it influences client involvement in the analysis process, which can provide immediate feedback on visualizations.

Project Goals and Improvement Areas

  • The project aims to identify areas for improvement on specific pages validated through A/B testing or user research methods.
  • Emphasis is placed on focusing only on significant segments that can lead to meaningful impacts rather than getting lost in overly specific data groups.

Volume as a Key Component

  • The volume of data is crucial; small segments may not yield impactful results, guiding decisions about which dimensions warrant deeper exploration.
  • Analysts are encouraged to prioritize larger segments that can drive substantial improvements instead of pursuing minor details without follow-up potential.

Transitioning Perspectives in Targeted Analysis

Steps in Targeted Analysis

  • Transitioning into step six involves switching perspectives while concluding targeted analyses, highlighting its significance for effective reporting.
  • Questions arise regarding when targeted analysis is considered complete; having structured questions helps determine when to stop exploring data.

Feedback Loop and Continuous Exploration

  • There’s an acknowledgment that sharing results with clients often leads to new questions, prompting further investigation beyond initial findings.
  • Time constraints play a role; analysts allocate time for discovery but must balance this with ongoing feedback from clients during reporting phases.

Collaboration with Domain Experts

  • Collaborating closely with domain experts or clients allows for immediate responses that can inform follow-up analyses and enhance understanding of findings.

Data Structure and User Session Analysis

Analyzing User Sessions

  • A concrete example illustrates how user ID and session ID serve as key identifiers in analyzing multiple sessions per individual user.

Analysis of User Behavior in Session Paths

Understanding User Path Choices

  • The analysis investigates whether users consistently choose a single path across different sessions or if their choices vary based on activities, day of the week, or user identity.
  • The study utilizes user IDs to aggregate session data into a singular case for process mining maps, allowing for a broader perspective on user behavior beyond individual sessions.

Comparing Session-Based and User-Based Analysis

  • A visual representation (Venn diagram) illustrates differences between session-based and user-based analyses, highlighting overlaps in paths taken by users.
  • In session-based analysis, significant overlap exists among paths; for instance, 22% of sessions combine pages two and five. However, user-based analysis shows even greater overlap across multiple sessions.
  • Users do not strictly adhere to one path; they adapt their navigation based on tasks or products being sought. Over a third of users engage with all available paths.

Implications for Page Optimization

  • Optimizing web pages should not focus solely on individual paths since many users interact with multiple pages. Consistency across these pages is crucial for enhancing user experience.
  • The assumption that all users follow the same behavioral pattern is invalid; both different users and the same user at different times exhibit varied behaviors.

Reporting Findings to Clients

  • Discussion shifts to how findings are reported to clients—whether through reports or presentations. A combination approach is often used.
  • Initial reporting includes detailed tables and fan diagrams focusing on percentages and overlaps rather than screenshots from software tools like Disco.

Interpretation of Data Insights

  • While initial findings describe what users are doing (e.g., path overlaps), further investigation seeks to understand why these behaviors occur—such as reasons behind combining paths or leaving specific pages.

Understanding Behavioral Insights Through Data Analysis

Transitioning from Description to Explanation

  • The discussion emphasizes the importance of moving beyond mere data description (percentages, overlaps) to understanding the underlying reasons for observed behaviors. This involves using initial assumptions as a basis for further research and validation.

Role of Domain Experts in Interpretation

  • It is highlighted that domain experts, such as psychologists or UX designers, play a crucial role in interpreting data. Their expertise helps in generating hypotheses based on psychological principles or design standards.

Hypothesis Testing and Actionable Changes

  • The conversation notes various methods to test hypotheses about user behavior, including changing webpage elements or communication strategies. If expectations are not met, it indicates potential areas for further investigation.
  • Once hypotheses are formed regarding user behavior, actionable changes can be implemented—such as redesigning pages or modifying text—to see if these adjustments lead to improved outcomes.

Importance of A/B Testing

  • A/B testing is introduced as a method to compare different scenarios (A vs. B), allowing teams to validate their hypotheses based on observed effects from both situations.

Continuous Improvement Cycle

  • The process mining projects discussed serve as an ongoing cycle of improvement where insights gained lead to hypothesis testing and subsequent refinements in strategy.

Engaging Stakeholders Post-Analysis

  • After presenting findings through management summaries, stakeholders often provide feedback and additional questions that guide further analysis. This collaborative approach ensures comprehensive exploration of the data.
  • It's essential to understand what stakeholders need from the report to implement follow-up actions effectively within their organizations.

Facilitating Organizational Buy-In

  • Recommendations include creating slide decks summarizing key findings and contextual information about process mining. This aids stakeholders in communicating results with colleagues and gaining support for necessary changes.

Identifying Opportunities for Further Research

  • The team encourages stakeholders to explore opportunities presented by findings, suggesting they consider how these insights can enhance organizational processes or website functionality.
  • Emphasis is placed on identifying specific areas requiring more research—like user experience studies—to clarify why certain features may not be utilized effectively by users.

Conclusion: Next Steps in Validation

Understanding Process Mining in Customer Journey Analysis

The Role of Process Mining in Clickstream Analysis

  • Process mining is positioned within the context of clickstream and customer journey analysis, enhancing web analytics by providing insights into user behavior throughout the funnel.
  • It allows for a deeper understanding of customer actions as they navigate through various pages, addressing both abstract questions and specific behavioral aspects.

Insights from Behavioral Dynamics

  • Process mining enables quick segmentation analysis, such as comparing confirmed versus non-confirmed orders to understand behaviors relative to the total population.
  • The challenge lies in generating hypotheses about observed behaviors, which must be validated to inform actionable recommendations for improvement.

Recommendations for Analysts

  • Analysts should consider using process mining when exploring uncharted areas of user behavior that predefined funnels may overlook.
  • This technique can reveal missed opportunities and alternative paths on websites that are not captured by traditional funnel analysis.

Identifying Hidden User Paths

  • For e-commerce or complex websites, process mining can uncover parts of the user journey that analysts have not previously considered, such as dead ends where users drop off.
  • Understanding these hidden paths is crucial for optimizing website performance and improving user experience.

Conclusion and Future Discussions

  • The session concludes with appreciation for insights shared on process mining applications. Future discussions will focus on comparing processes in different contexts.
Video description

Clickstreams are the digital traces that visitors leave when they navigate through a website. This data can be analyzed with process mining to make the website more effective or user-friendly. Irene shares her experience of a recent clickstream analysis project. We discuss why the funnels of traditional web analytics are not enough to understand behavior. For a more detailed summary, and for the links to the pointers mentioned during the show, refer to: https://fluxicon.com/blog/2025/03/process-mining-cafe-39-recording/ Chapters: 0:00​ Intro 2:03 Traditional web analytics 17:16 Why funnels are not enough to understand behavior 20:27 Process mining example 1:14:43 Closing