Лысенков С.Н. - Наука как ремесло - Лекция 11. Статистика в научной публикации

Лысенков С.Н. - Наука как ремесло - Лекция 11. Статистика в научной публикации

Application of Statistics in Scientific Research

Importance of Statistics in Research

  • The necessity of statistics in scientific research is often taken for granted, yet it has become integral due to historical developments, particularly in biomedical sciences during the latter half of the 20th century.
  • Early resistance to statistical methods was noted among biologists like Ronald Fisher and Karl Pearson, highlighting a gradual acceptance of statistics as essential for data analysis.

Reasons for Utilizing Statistics

  • One common reason researchers cite for using statistics is managing large datasets that exceed human capacity for visual analysis.
  • Another perspective, though less common today, suggests that statistics are primarily needed for quantitative data; however, this view is outdated as statistical methods apply broadly across various types of data.

Understanding Samples and Populations

  • Statistics are crucial when dealing with incomplete data sets; researchers often work with samples rather than entire populations.
  • A typical example involves medical studies comparing patients with a disease against healthy control groups without encompassing all individuals globally affected by the condition.

General Population vs. Sample Representation

  • Statistical science allows conclusions about a general population based on smaller samples, which must be representative to ensure valid insights.
  • Representativeness means that the sample accurately reflects the characteristics of the broader population being studied.

Challenges in Achieving Representativeness

  • For a sample to be considered representative, it should ideally have equal chances of selection from the general population; however, this is rarely achievable in practice.

Understanding Statistical Representation and Bias

The Complexity of Non-Communicable Diseases

  • Discusses the influence on the likelihood of developing non-communicable diseases, emphasizing that without contrary evidence, assumptions about representativeness should be made cautiously.

Population Differences and Representativeness

  • Highlights that differences observed in one population may not apply to other cohabiting populations, stressing the need for careful consideration of biological contexts.

Limitations of Statistical Conclusions

  • Points out that conclusions drawn from a specific sample may not be applicable to broader populations, indicating potential misinterpretation if generalizations are made too broadly.

Prescriptive vs. Descriptive Statistics

  • Explains the distinction between prescriptive (idealized) statistics and real-world applications, noting that many statistical methods assume ideal conditions which rarely occur in practice.

Challenges with Data Distribution Assumptions

  • Discusses how statistical methods often rely on continuous distributions like normal distribution, which may not accurately reflect real-world data scenarios.

The Practical Implications of Statistical Methods

Realities of Data Collection and Errors

  • Emphasizes that while statistical theories are mathematically sound, practical application often reveals discrepancies due to imperfect data collection processes.

Understanding Sampling Bias

  • Addresses how sampling bias can lead to skewed results; recognizing this bias is crucial for accurate parameter estimation in research studies.

The Impact of Input Errors on Data Integrity

  • Notes that input errors can significantly affect data quality; understanding these errors is essential for interpreting results correctly.

Navigating Traditional Practices in Statistics

Common Misconceptions About Data Accuracy

  • Shares an anecdote illustrating how student-collected data can lead to erroneous conclusions due to filtering mistakes during data entry processes.

Random vs. Systematic Errors in Large Datasets

  • Clarifies that most input errors tend to be random rather than systematic; thus, they might cancel each other out when analyzing large datasets.

Tradition Over Innovation in Statistical Criteria

  • Discusses how certain fields have established traditions regarding acceptable statistical criteria; adherence to these norms can sometimes overshadow more suitable alternatives.

Understanding Statistical Analysis

The Dangers of Data Manipulation

  • Discusses the attempts to achieve desired results through data manipulation, highlighting that this is a form of self-deception. It emphasizes the issue of selecting statistical criteria that yield more favorable outcomes rather than those that are most appropriate.
  • Connects the problem of data manipulation with publication bias, noting that while related, it is not limited to this issue. The speaker stresses the importance of recognizing these manipulative practices in published research.

Historical Context and Critique

  • Introduces a historical anecdote about Ronald Fisher, known as the father of biological statistics, and his development of analysis methods. Contrasts Fisher's work with Andrey Kolmogorov's skepticism regarding its practical applicability.
  • Kolmogorov criticized Fisher’s ANOVA for being too strict in its data requirements, suggesting it was impractical despite its robustness in real-world applications.

Foundations of Statistical Understanding

  • Clarifies that the lecture does not aim to teach statistics from scratch but assumes some prior knowledge among participants. It encourages recalling previous experiences with scientific practice and statistics.
  • Emphasizes understanding how to interpret data based on variable types—quantitative versus qualitative—and highlights the necessity for statistical tools across both categories.

Types of Variables

Quantitative Variables

  • Defines quantitative variables as those where meaningful differences can be measured between values (e.g., distances). This allows for comparisons such as "twice as much" between measurements.
  • Notes that within quantitative variables, various measures like mean, median, and mode can be applied due to their numerical nature.

Qualitative Variables

  • Describes qualitative variables as those where only categorical distinctions matter without measurable differences (e.g., color morphologies or disease types).

Ordinal Variables

  • Introduces ordinal variables which allow for ranking but do not quantify differences (e.g., severity levels in medical conditions). These can be analyzed using median but not mean due to lack of measurable intervals.

Common Misconceptions in Statistics

Understanding Statistical Data Types and Analysis

Central Tendency and Data Types

  • The median can be the same across different distributions, but the mean may vary. It's crucial to understand that calculating the mean for ordinal data assumes equal intervals between values.
  • Binary features (yes/no) are often confused with qualitative features, which can have more than two categories (e.g., hair color). This distinction is important in statistical analysis.

Qualitative vs. Quantitative Features

  • In biological studies, community samples can include various species, making them qualitative traits that aren't limited to binary classifications.
  • Binary traits can be interpreted as any of three types: qualitative, where proportions of presence/absence are analyzed; median calculations may align with mode.

Evaluating Research Design

  • Before starting research or reviewing others' work, assess how authors define and interpret each feature—whether it's quantitative or ordinal.
  • It’s generally acceptable to treat quantitative data as ordinal or vice versa, though this may lead to loss of information.

Common Statistical Tasks

  • When analyzing one or two features, tasks typically involve comparing groups or identifying relationships between variables.
  • Understanding whether you want to compare groups or find variable associations is essential for selecting appropriate statistical methods.

Group Comparison and Method Selection

  • If comparing groups, determine how many groups there are—two or more—as this influences method applicability.
  • For two groups, methods designed for larger group comparisons can still apply; however, this does not hold true when moving from large to small group analyses.

Parametric Methods Considerations

  • Assess if parametric methods are suitable based on assumptions about normal distribution and variance equality among groups.
  • Parametric tests require specific conditions regarding distribution characteristics; violations of these assumptions could lead to inaccurate results.

Robustness of Parametric Tests

  • While strict adherence to parametric assumptions is rare in practice, they may hold sufficiently in larger sample sizes for reliable analysis.
  • As sample size increases (e.g., over 50), robustness improves against assumption violations; however, unequal group sizes can complicate results.

Transforming Data for Analysis

  • Sometimes data transformation (e.g., logarithmic adjustments) is necessary for fitting parametric criteria rather than merely checking normality visually.

Choosing Appropriate Statistical Criteria

Statistical Testing and Analysis

Application of Statistical Criteria

  • The application of the Mann-Whitney criteria in practice often yields consistent results. If there is a discrepancy between the Student's t-test and Mann-Whitney test, it warrants further investigation.
  • A lack of power in the Mann-Whitney test may explain discrepancies; conversely, if the Student's t-test indicates differences while Mann-Whitney does not, it suggests complex distribution issues.

Analyzing Multiple Groups

  • When conducting tests across multiple groups, it's crucial to determine whether the null hypothesis has been rejected or not. If rejected, identifying which groups account for significant differences is essential.
  • Post-hoc tests like Tukey's HSD (parametric) and Dunn's test (non-parametric) are useful for understanding group differences after initial analysis.

Exploring Relationships Between Variables

  • Identifying relationships between variables depends on their types; qualitative versus quantitative distinctions shape analysis strategies.
  • If one variable is qualitative and another quantitative or ordinal, this resembles group comparison tasks—assessing outcomes based on experimental group membership.

Understanding Statistical Associations

  • A statistical association occurs when variations in one variable lead to changes in another. For quantitative variables, this means differing distributions across groups.
  • For two qualitative variables, associations can be analyzed using chi-square tests or Fisher’s exact test to explore relationships such as eye color and hair color correlations.

Correlation Methods

  • When examining relationships between ordinal quantities, correlation calculations become necessary. Pearson correlation traditionally applies to quantitative variables but can also extend to binary data.
  • Spearman correlation is suitable when at least one variable is ordinal; unlike Pearson’s method that assumes normality in distributions.

Significance Testing Concepts

  • Understanding statistical significance involves grasping p-values. Despite movements advocating against p-value reliance, they remain prevalent due to common misunderstandings surrounding them.
  • Statistical significance reflects the probability of obtaining similar or more extreme values under a null hypothesis framework.

Practical Example: Binomial Test

  • A binomial test example illustrates how we assess probabilities—like guessing colors of chips—to establish a null hypothesis (50% chance).
  • The critical value approach helps determine when deviations from expected probabilities warrant rejecting the null hypothesis in favor of an alternative hypothesis.

Understanding P-Values and Statistical Significance

The Traditional Critical Level of 5%

  • Traditionally, the critical level for statistical significance is set at 5%. However, this threshold can be flexible, with significant results potentially occurring around 40% to 60%.

Symmetry Around the Mean

  • The p-value is illustrated at both 40% and 60%, indicating a symmetrical situation around the mean of approximately 50%.

Historical Context of Statistical Testing

  • In earlier times, without computers or calculators, statistical tests were compared against tables. Currently, software provides precise p-values for results but comparisons with critical significance levels remain common.

Interpretation of P-Values

  • A p-value can theoretically reach up to 100%, indicating perfect alignment with the null hypothesis. For instance, if a sample yields exactly 50 items, the p-value would equal one.

Misconceptions About P = 1

  • Some believe that a p-value of one is unattainable; however, it can occur under ideal conditions. Nonetheless, such perfect agreement raises skepticism about its likelihood in real-world scenarios.

Reporting P-Values Effectively

  • It’s crucial not to simply report whether p-values are below or above five percent without context. This practice was more relevant when using tables rather than modern computational methods that yield accurate estimates.

Understanding Variability in Results

  • Reporting just "p < .05" does not convey how close or far from this threshold the result actually is (e.g., p = .49 vs. p = .051). Both may suggest similar confidence levels regarding rejecting the null hypothesis.

Fisher's Proposal on Significance Levels

  • Ronald Fisher advocated for a more nuanced approach to significance levels beyond just five percent. Other thresholds like one percent or even lower are recognized in various scientific fields as more stringent criteria.

Marginal Significance Levels

  • If a result falls between five and ten percent for its p-value, it indicates marginal significance—suggesting some evidence against the null hypothesis but less robust than stronger evidence at lower percentages.

Terminology: Significant vs. Reliable

  • In Russian terminology, terms like "significant" and "reliable" are often used interchangeably; however, they carry different connotations in English where only "statistically significant" exists without ambiguity.

Implications of Statistical Significance

  • The term “statistically significant” implies that observed deviations from the null hypothesis are substantial enough based on our sample data but does not guarantee practical relevance or importance.

Caution Against Overconfidence

  • Using “reliable” might instill unwarranted confidence compared to “significant.” Readers may misinterpret statistical significance as definitive proof of an effect when it merely indicates low probability under the null hypothesis.

Distinction Between Statistical and Clinical Significance

  • Statistically significant results do not equate to meaningful differences in practical terms; thus researchers should also report effect sizes alongside p-values for clarity on their findings' implications.

Importance of Effect Size Reporting

  • Researchers should provide additional metrics such as effect size and confidence intervals along with p-values so that others can assess how substantial their findings truly are beyond mere statistical thresholds.

Clinical Relevance Versus Statistical Findings

  • In medicine, distinguishing between clinical significance (real-world impact on health outcomes) versus statistical significance (mathematical validity based on sample data size), remains essential for effective healthcare decisions.

This structured summary encapsulates key discussions surrounding statistical testing principles while providing timestamps for easy reference back to specific points within the transcript.

Understanding Statistical Significance and Multiple Comparisons

The Importance of Effect Size

  • There is a distinction regarding the effect size in meta-analyses, which often work with effect sizes and their confidence intervals. It is noted that sometimes only p-values are reported without effect sizes, which should not be the case.

P-Values and Type I Error

  • The application of p-values is based on the low probability of detecting strong deviations from the null hypothesis, leading to what is known as a Type I error—detecting a non-existent effect.

Misinterpretation of Rare Events

  • When observing an unlikely event under normal conditions, we often conclude that things are not as they usually are; this reflects how statistics operates similarly.

Challenges with Multiple Testing

  • A fundamental issue in statistics arises from repeated testing: any deviation from rare events becomes likely if tests are applied multiple times. This increases the chance of falsely identifying significant differences due to random chance.

Probability of False Discoveries

  • As more statistical tests are conducted (e.g., testing many factors), the likelihood that at least one Type I error occurs rises significantly. For instance, testing 20 factors yields about a 40% chance that at least one will show false significance if none truly affect the outcome.

Reporting Issues in Research

  • Researchers often do not disclose when multiple hypotheses were tested but only report those that yielded significant results. This lack of transparency obscures issues related to multiple comparisons.

Importance of Reporting Non-Significant Results

  • It’s crucial to report findings where no influence was detected because failing to find an effect differs fundamentally from not having tested for it at all.

Group Comparisons and Analysis Techniques

  • Problems with multiple comparisons also arise when comparing several groups using methods designed for two-sample comparisons (like Student's t-test). Special tests must be used for accurate analysis across multiple groups.

Adjustments for Multiple Comparisons

  • Various adjustments exist for handling multiple comparisons; Bonferroni correction divides critical levels by the number of comparisons but can be overly stringent. More flexible methods like False Discovery Rate (FDR) adjustments have gained popularity.

Advancements in Statistical Methods

  • Modern statistics has evolved beyond early 20th-century methods, incorporating multifactorial approaches allowing simultaneous evaluation of several variables' impacts on outcomes through generalized linear models.

Generalized Linear Models Explained

Analysis of Group Effects on Variable Y

Understanding the Relationship Between Groups and Variables

  • The analysis begins by discussing how points are distributed, particularly focusing on the differences between groups A and B regarding variable Y. It is noted that group B shows a slightly higher value for Y compared to group A.
  • When examining the effect of a quantitative variable X on Y without considering group differences, it appears that as X increases, Y also increases. This observation is based solely on the overall distribution of data points.
  • However, when both the effect of X and group membership (A or B) are considered simultaneously, it becomes clear that while belonging to group B raises the value of Y, within each group, an increase in X still leads to an increase in Y.

Causal Relationships vs. Correlation

  • The discussion highlights methods like Generalized Linear Models (GLM), which can account for various distributions and non-linear relationships. These models are increasingly used to incorporate random effects into analyses.
  • Multi-factorial models allow researchers to consider multiple factors simultaneously and their interactions. This means that the relationship between one factor may vary depending on another factor's level.

Importance of Factor Selection in Analysis

  • The interaction between factors can significantly influence outcomes; for instance, there may be a dependency between variables in one group but not in another. Recognizing these interactions is crucial for accurate modeling.
  • Multi-factorial models help identify significant variables while acknowledging potential limitations such as overfitting or misinterpretation due to insignificant factors being included or excluded from analysis.

Challenges with Data Quality

  • The selection process for significant factors must be approached carefully since including irrelevant factors can distort results. Initial filtering is essential before applying complex models.
  • There’s a risk that multi-factorial models might create an illusion of comprehensiveness where all influencing factors seem accounted for; however, this does not guarantee predictive accuracy across different datasets.

Data Entry Errors and Their Impact

  • Input errors can severely affect results; random errors generally increase residual variance but do not fundamentally alter findings unless they lead to systematic biases.
  • Systematic errors pose greater risks as they can produce misleading estimates—common issues include confusing zeros representing missing values versus actual zero measurements.

Historical Context of Data Misinterpretation

  • An example from historical research illustrates how mislabeling data (e.g., using dashes instead of zeros for missing vaccination data in studies from 1860) led to skewed interpretations about vaccine efficacy.
  • Distinguishing between true absence (missing data represented by zeros versus actual zero values indicating no occurrence of a trait) is critical for accurate prevalence assessments within populations.

Sample Size Considerations in Research

Understanding Sample Size Requirements

  • The discussion begins with the common issue researchers face regarding the necessary sample size for studies, highlighting three potential answers to this question.
  • One approach is based on statistical norms within various fields, where traditional sample sizes are established. This does not guarantee validity but allows for comparability across studies.
  • A second perspective emphasizes that larger sample sizes are generally better; however, excessively large samples can lead to statistically significant results that may be trivial or irrelevant.

Importance of Effect Size

  • Researchers must consider effect size alongside statistical significance to avoid misleading conclusions. For instance, a correlation coefficient of 0.01 indicates minimal explanatory power and should not warrant extensive discussion.
  • The third method involves power analysis, which determines the required sample size needed to detect an effect of a specified size with a certain probability.

Power Analysis Explained

  • Power analysis provides precise calculations for determining necessary sample sizes based on expected effect sizes and desired detection probabilities. In clinical research, justifying sample size based on clinically significant effects is standard practice.
  • While there are no strict thresholds (like 80% power), commonly accepted values range from 80% to 90%, emphasizing the importance of understanding what constitutes sufficient power.

Visualizing Statistical Power

  • A visual representation illustrates test power as the likelihood of obtaining statistically significant results when the alternative hypothesis is true. Different alternative hypotheses yield varying probabilities of falling into critical regions.
  • It’s noted that stronger effects increase detection probability; thus, larger samples enhance power under consistent conditions.

Sample Size Impact on Power

  • As sample sizes grow, the incremental gain in power diminishes—illustrated through examples using Student's t-test comparing means with different standard deviations.
  • Increasing sample size beyond certain points may not justify costs or effort since it yields diminishing returns in terms of increased statistical power.

Practical Implications for Unequal Samples

  • When dealing with unequal samples, overall study power is primarily influenced by the smaller group. Significant increases in one group's size do not substantially affect total study power if another remains small.
  • An example shows how increasing one group from 25 to 500 only marginally improves overall study power from 8% to 16%, questioning whether such large samples are necessary.
Video description

00:00:19 Вводное слово к лекции 00:01:11 Зачем нужна статистика в научных исследованиях 00:24:58 Какие бывают данные 00:34:42 Типы статистических задач 00:49:28 Что такое статистическая значимость 00:55:40 Как сообщать о результатах статистических тестов 01:12:39 Многофакторные модели 01:25:06 Объем выборки, необходимый для исследования Ссылка на плейлист: https://www.youtube.com/playlist?list=PLcsjsqLLSfNDgbFuNRYbDs8Pn-Ba47K49