Лысенков С.Н. - Наука как ремесло - Лекция 11. Статистика в научной публикации

Name: Лысенков С.Н. - Наука как ремесло - Лекция 11. Статистика в научной публикации
Uploaded: 2023-01-15T19:02:42.000Z
Duration: 3 h 8 min 59 s

Application of Statistics in Scientific Research

Importance of Statistics in Research

The necessity of statistics in scientific research is often taken for granted, yet it has become integral due to historical developments, particularly in biomedical sciences during the latter half of the 20th century.

Early resistance to statistical methods was noted among biologists like Ronald Fisher and Karl Pearson, highlighting a gradual acceptance of statistics as essential for data analysis.

Reasons for Utilizing Statistics

One common reason researchers cite for using statistics is managing large datasets that exceed human capacity for visual analysis.

Another perspective, though less common today, suggests that statistics are primarily needed for quantitative data; however, this view is outdated as statistical methods apply broadly across various types of data.

Understanding Samples and Populations

Statistics are crucial when dealing with incomplete data sets; researchers often work with samples rather than entire populations.

A typical example involves medical studies comparing patients with a disease against healthy control groups without encompassing all individuals globally affected by the condition.

General Population vs. Sample Representation

Statistical science allows conclusions about a general population based on smaller samples, which must be representative to ensure valid insights.

Representativeness means that the sample accurately reflects the characteristics of the broader population being studied.

Challenges in Achieving Representativeness

For a sample to be considered representative, it should ideally have equal chances of selection from the general population; however, this is rarely achievable in practice.

Understanding Statistical Representation and Bias

The Complexity of Non-Communicable Diseases

Discusses the influence on the likelihood of developing non-communicable diseases, emphasizing that without contrary evidence, assumptions about representativeness should be made cautiously.

Population Differences and Representativeness

Highlights that differences observed in one population may not apply to other cohabiting populations, stressing the need for careful consideration of biological contexts.

Limitations of Statistical Conclusions

Points out that conclusions drawn from a specific sample may not be applicable to broader populations, indicating potential misinterpretation if generalizations are made too broadly.

Prescriptive vs. Descriptive Statistics

Explains the distinction between prescriptive (idealized) statistics and real-world applications, noting that many statistical methods assume ideal conditions which rarely occur in practice.

Challenges with Data Distribution Assumptions

Discusses how statistical methods often rely on continuous distributions like normal distribution, which may not accurately reflect real-world data scenarios.

The Practical Implications of Statistical Methods

Realities of Data Collection and Errors

Emphasizes that while statistical theories are mathematically sound, practical application often reveals discrepancies due to imperfect data collection processes.

Understanding Sampling Bias

Addresses how sampling bias can lead to skewed results; recognizing this bias is crucial for accurate parameter estimation in research studies.

The Impact of Input Errors on Data Integrity

Notes that input errors can significantly affect data quality; understanding these errors is essential for interpreting results correctly.

Navigating Traditional Practices in Statistics

Common Misconceptions About Data Accuracy

Shares an anecdote illustrating how student-collected data can lead to erroneous conclusions due to filtering mistakes during data entry processes.

Random vs. Systematic Errors in Large Datasets

Clarifies that most input errors tend to be random rather than systematic; thus, they might cancel each other out when analyzing large datasets.

Tradition Over Innovation in Statistical Criteria

Discusses how certain fields have established traditions regarding acceptable statistical criteria; adherence to these norms can sometimes overshadow more suitable alternatives.

Understanding Statistical Analysis

The Dangers of Data Manipulation

Discusses the attempts to achieve desired results through data manipulation, highlighting that this is a form of self-deception. It emphasizes the issue of selecting statistical criteria that yield more favorable outcomes rather than those that are most appropriate.

Connects the problem of data manipulation with publication bias, noting that while related, it is not limited to this issue. The speaker stresses the importance of recognizing these manipulative practices in published research.

Historical Context and Critique

Introduces a historical anecdote about Ronald Fisher, known as the father of biological statistics, and his development of analysis methods. Contrasts Fisher's work with Andrey Kolmogorov's skepticism regarding its practical applicability.

Kolmogorov criticized Fisher’s ANOVA for being too strict in its data requirements, suggesting it was impractical despite its robustness in real-world applications.

Foundations of Statistical Understanding

Clarifies that the lecture does not aim to teach statistics from scratch but assumes some prior knowledge among participants. It encourages recalling previous experiences with scientific practice and statistics.

Emphasizes understanding how to interpret data based on variable types—quantitative versus qualitative—and highlights the necessity for statistical tools across both categories.

Types of Variables

Quantitative Variables

Defines quantitative variables as those where meaningful differences can be measured between values (e.g., distances). This allows for comparisons such as "twice as much" between measurements.

Notes that within quantitative variables, various measures like mean, median, and mode can be applied due to their numerical nature.

Qualitative Variables

Describes qualitative variables as those where only categorical distinctions matter without measurable differences (e.g., color morphologies or disease types).

Ordinal Variables

Introduces ordinal variables which allow for ranking but do not quantify differences (e.g., severity levels in medical conditions). These can be analyzed using median but not mean due to lack of measurable intervals.

Common Misconceptions in Statistics

Understanding Statistical Data Types and Analysis

Central Tendency and Data Types

The median can be the same across different distributions, but the mean may vary. It's crucial to understand that calculating the mean for ordinal data assumes equal intervals between values.

Binary features (yes/no) are often confused with qualitative features, which can have more than two categories (e.g., hair color). This distinction is important in statistical analysis.

Qualitative vs. Quantitative Features

In biological studies, community samples can include various species, making them qualitative traits that aren't limited to binary classifications.

Binary traits can be interpreted as any of three types: qualitative, where proportions of presence/absence are analyzed; median calculations may align with mode.

Evaluating Research Design

Before starting research or reviewing others' work, assess how authors define and interpret each feature—whether it's quantitative or ordinal.

It’s generally acceptable to treat quantitative data as ordinal or vice versa, though this may lead to loss of information.

Common Statistical Tasks

When analyzing one or two features, tasks typically involve comparing groups or identifying relationships between variables.

Understanding whether you want to compare groups or find variable associations is essential for selecting appropriate statistical methods.

Group Comparison and Method Selection

If comparing groups, determine how many groups there are—two or more—as this influences method applicability.

For two groups, methods designed for larger group comparisons can still apply; however, this does not hold true when moving from large to small group analyses.

Parametric Methods Considerations

Assess if parametric methods are suitable based on assumptions about normal distribution and variance equality among groups.

Parametric tests require specific conditions regarding distribution characteristics; violations of these assumptions could lead to inaccurate results.

Robustness of Parametric Tests

While strict adherence to parametric assumptions is rare in practice, they may hold sufficiently in larger sample sizes for reliable analysis.

As sample size increases (e.g., over 50), robustness improves against assumption violations; however, unequal group sizes can complicate results.

Transforming Data for Analysis

Sometimes data transformation (e.g., logarithmic adjustments) is necessary for fitting parametric criteria rather than merely checking normality visually.

Choosing Appropriate Statistical Criteria

Statistical Testing and Analysis

Application of Statistical Criteria

The application of the Mann-Whitney criteria in practice often yields consistent results. If there is a discrepancy between the Student's t-test and Mann-Whitney test, it warrants further investigation.

A lack of power in the Mann-Whitney test may explain discrepancies; conversely, if the Student's t-test indicates differences while Mann-Whitney does not, it suggests complex distribution issues.

Analyzing Multiple Groups

When conducting tests across multiple groups, it's crucial to determine whether the null hypothesis has been rejected or not. If rejected, identifying which groups account for significant differences is essential.

Post-hoc tests like Tukey's HSD (parametric) and Dunn's test (non-parametric) are useful for understanding group differences after initial analysis.

Exploring Relationships Between Variables

Identifying relationships between variables depends on their types; qualitative versus quantitative distinctions shape analysis strategies.

If one variable is qualitative and another quantitative or ordinal, this resembles group comparison tasks—assessing outcomes based on experimental group membership.

Understanding Statistical Associations

A statistical association occurs when variations in one variable lead to changes in another. For quantitative variables, this means differing distributions across groups.

For two qualitative variables, associations can be analyzed using chi-square tests or Fisher’s exact test to explore relationships such as eye color and hair color correlations.

Correlation Methods

When examining relationships between ordinal quantities, correlation calculations become necessary. Pearson correlation traditionally applies to quantitative variables but can also extend to binary data.

Spearman correlation is suitable when at least one variable is ordinal; unlike Pearson’s method that assumes normality in distributions.

Significance Testing Concepts

Understanding statistical significance involves grasping p-values. Despite movements advocating against p-value reliance, they remain prevalent due to common misunderstandings surrounding them.

Statistical significance reflects the probability of obtaining similar or more extreme values under a null hypothesis framework.

Practical Example: Binomial Test

A binomial test example illustrates how we assess probabilities—like guessing colors of chips—to establish a null hypothesis (50% chance).

The critical value approach helps determine when deviations from expected probabilities warrant rejecting the null hypothesis in favor of an alternative hypothesis.

Understanding P-Values and Statistical Significance

The Traditional Critical Level of 5%

Traditionally, the critical level for statistical significance is set at 5%. However, this threshold can be flexible, with significant results potentially occurring around 40% to 60%.

Symmetry Around the Mean

The p-value is illustrated at both 40% and 60%, indicating a symmetrical situation around the mean of approximately 50%.

Historical Context of Statistical Testing

In earlier times, without computers or calculators, statistical tests were compared against tables. Currently, software provides precise p-values for results but comparisons with critical significance levels remain common.

Interpretation of P-Values

A p-value can theoretically reach up to 100%, indicating perfect alignment with the null hypothesis. For instance, if a sample yields exactly 50 items, the p-value would equal one.

Misconceptions About P = 1

Some believe that a p-value of one is unattainable; however, it can occur under ideal conditions. Nonetheless, such perfect agreement raises skepticism about its likelihood in real-world scenarios.

Reporting P-Values Effectively

It’s crucial not to simply report whether p-values are below or above five percent without context. This practice was more relevant when using tables rather than modern computational methods that yield accurate estimates.

Understanding Variability in Results

Reporting just "p < .05" does not convey how close or far from this threshold the result actually is (e.g., p = .49 vs. p = .051). Both may suggest similar confidence levels regarding rejecting the null hypothesis.

Fisher's Proposal on Significance Levels

Ronald Fisher advocated for a more nuanced approach to significance levels beyond just five percent. Other thresholds like one percent or even lower are recognized in various scientific fields as more stringent criteria.

Marginal Significance Levels

If a result falls between five and ten percent for its p-value, it indicates marginal significance—suggesting some evidence against the null hypothesis but less robust than stronger evidence at lower percentages.

Terminology: Significant vs. Reliable

In Russian terminology, terms like "significant" and "reliable" are often used interchangeably; however, they carry different connotations in English where only "statistically significant" exists without ambiguity.

Implications of Statistical Significance

The term “statistically significant” implies that observed deviations from the null hypothesis are substantial enough based on our sample data but does not guarantee practical relevance or importance.

Caution Against Overconfidence

Using “reliable” might instill unwarranted confidence compared to “significant.” Readers may misinterpret statistical significance as definitive proof of an effect when it merely indicates low probability under the null hypothesis.

Distinction Between Statistical and Clinical Significance

Statistically significant results do not equate to meaningful differences in practical terms; thus researchers should also report effect sizes alongside p-values for clarity on their findings' implications.

Importance of Effect Size Reporting

Researchers should provide additional metrics such as effect size and confidence intervals along with p-values so that others can assess how substantial their findings truly are beyond mere statistical thresholds.

Clinical Relevance Versus Statistical Findings

In medicine, distinguishing between clinical significance (real-world impact on health outcomes) versus statistical significance (mathematical validity based on sample data size), remains essential for effective healthcare decisions.

This structured summary encapsulates key discussions surrounding statistical testing principles while providing timestamps for easy reference back to specific points within the transcript.

Understanding Statistical Significance and Multiple Comparisons

The Importance of Effect Size

There is a distinction regarding the effect size in meta-analyses, which often work with effect sizes and their confidence intervals. It is noted that sometimes only p-values are reported without effect sizes, which should not be the case.

P-Values and Type I Error

The application of p-values is based on the low probability of detecting strong deviations from the null hypothesis, leading to what is known as a Type I error—detecting a non-existent effect.

Misinterpretation of Rare Events

When observing an unlikely event under normal conditions, we often conclude that things are not as they usually are; this reflects how statistics operates similarly.

Challenges with Multiple Testing

A fundamental issue in statistics arises from repeated testing: any deviation from rare events becomes likely if tests are applied multiple times. This increases the chance of falsely identifying significant differences due to random chance.

Probability of False Discoveries

As more statistical tests are conducted (e.g., testing many factors), the likelihood that at least one Type I error occurs rises significantly. For instance, testing 20 factors yields about a 40% chance that at least one will show false significance if none truly affect the outcome.

Reporting Issues in Research

Researchers often do not disclose when multiple hypotheses were tested but only report those that yielded significant results. This lack of transparency obscures issues related to multiple comparisons.

Importance of Reporting Non-Significant Results

It’s crucial to report findings where no influence was detected because failing to find an effect differs fundamentally from not having tested for it at all.

Group Comparisons and Analysis Techniques

Problems with multiple comparisons also arise when comparing several groups using methods designed for two-sample comparisons (like Student's t-test). Special tests must be used for accurate analysis across multiple groups.

Adjustments for Multiple Comparisons

Various adjustments exist for handling multiple comparisons; Bonferroni correction divides critical levels by the number of comparisons but can be overly stringent. More flexible methods like False Discovery Rate (FDR) adjustments have gained popularity.

Advancements in Statistical Methods

Modern statistics has evolved beyond early 20th-century methods, incorporating multifactorial approaches allowing simultaneous evaluation of several variables' impacts on outcomes through generalized linear models.

Generalized Linear Models Explained

Analysis of Group Effects on Variable Y

Understanding the Relationship Between Groups and Variables

The analysis begins by discussing how points are distributed, particularly focusing on the differences between groups A and B regarding variable Y. It is noted that group B shows a slightly higher value for Y compared to group A.

When examining the effect of a quantitative variable X on Y without considering group differences, it appears that as X increases, Y also increases. This observation is based solely on the overall distribution of data points.

However, when both the effect of X and group membership (A or B) are considered simultaneously, it becomes clear that while belonging to group B raises the value of Y, within each group, an increase in X still leads to an increase in Y.

Causal Relationships vs. Correlation

The discussion highlights methods like Generalized Linear Models (GLM), which can account for various distributions and non-linear relationships. These models are increasingly used to incorporate random effects into analyses.

Multi-factorial models allow researchers to consider multiple factors simultaneously and their interactions. This means that the relationship between one factor may vary depending on another factor's level.

Importance of Factor Selection in Analysis

The interaction between factors can significantly influence outcomes; for instance, there may be a dependency between variables in one group but not in another. Recognizing these interactions is crucial for accurate modeling.

Multi-factorial models help identify significant variables while acknowledging potential limitations such as overfitting or misinterpretation due to insignificant factors being included or excluded from analysis.

Challenges with Data Quality

The selection process for significant factors must be approached carefully since including irrelevant factors can distort results. Initial filtering is essential before applying complex models.

There’s a risk that multi-factorial models might create an illusion of comprehensiveness where all influencing factors seem accounted for; however, this does not guarantee predictive accuracy across different datasets.

Data Entry Errors and Their Impact

Input errors can severely affect results; random errors generally increase residual variance but do not fundamentally alter findings unless they lead to systematic biases.

Systematic errors pose greater risks as they can produce misleading estimates—common issues include confusing zeros representing missing values versus actual zero measurements.

Historical Context of Data Misinterpretation

An example from historical research illustrates how mislabeling data (e.g., using dashes instead of zeros for missing vaccination data in studies from 1860) led to skewed interpretations about vaccine efficacy.

Distinguishing between true absence (missing data represented by zeros versus actual zero values indicating no occurrence of a trait) is critical for accurate prevalence assessments within populations.

Sample Size Considerations in Research

Understanding Sample Size Requirements

The discussion begins with the common issue researchers face regarding the necessary sample size for studies, highlighting three potential answers to this question.

One approach is based on statistical norms within various fields, where traditional sample sizes are established. This does not guarantee validity but allows for comparability across studies.

A second perspective emphasizes that larger sample sizes are generally better; however, excessively large samples can lead to statistically significant results that may be trivial or irrelevant.

Importance of Effect Size

Researchers must consider effect size alongside statistical significance to avoid misleading conclusions. For instance, a correlation coefficient of 0.01 indicates minimal explanatory power and should not warrant extensive discussion.

The third method involves power analysis, which determines the required sample size needed to detect an effect of a specified size with a certain probability.

Power Analysis Explained

Power analysis provides precise calculations for determining necessary sample sizes based on expected effect sizes and desired detection probabilities. In clinical research, justifying sample size based on clinically significant effects is standard practice.

While there are no strict thresholds (like 80% power), commonly accepted values range from 80% to 90%, emphasizing the importance of understanding what constitutes sufficient power.

Visualizing Statistical Power

A visual representation illustrates test power as the likelihood of obtaining statistically significant results when the alternative hypothesis is true. Different alternative hypotheses yield varying probabilities of falling into critical regions.

It’s noted that stronger effects increase detection probability; thus, larger samples enhance power under consistent conditions.

Sample Size Impact on Power

As sample sizes grow, the incremental gain in power diminishes—illustrated through examples using Student's t-test comparing means with different standard deviations.

Increasing sample size beyond certain points may not justify costs or effort since it yields diminishing returns in terms of increased statistical power.

Practical Implications for Unequal Samples

When dealing with unequal samples, overall study power is primarily influenced by the smaller group. Significant increases in one group's size do not substantially affect total study power if another remains small.

An example shows how increasing one group from 25 to 500 only marginally improves overall study power from 8% to 16%, questioning whether such large samples are necessary.