“Software R orientado al área de diseños experimentales agrícolas”.

“Software R orientado al área de diseños experimentales agrícolas”.

Introduction to the Workshop

Overview of the Course

  • The workshop is set to run from 9 AM to 11 AM, focusing on practical applications of R software for research.
  • Participants will learn how to utilize R tools in experimental design and research methodologies.

Software Installation Guidance

  • Instructions are provided for downloading R and RStudio, emphasizing ease of access for both Windows and Mac users.
  • Users are advised to select the appropriate version based on their computer's specifications (32-bit or 64-bit).

Software Setup and Resources

Downloading Required Software

  • The latest version of R is lightweight (approximately 86 MB), making it easy to install without high computational requirements.
  • A shared folder will be available containing all course materials, scripts, and exercises for participants.

Accessing Course Materials

  • A link to a cloud folder with resources will be shared in the chat for easy access by all participants.

Understanding R Software

Importance of R in Research

  • Many universities and research institutions use R for data analysis; understanding its capabilities is crucial for effective data handling.
  • The workshop aims to clarify what R is, its advantages over other software, and how it can enhance data analysis processes.

Features of the R Environment

  • R provides a flexible programming environment that can be expanded through packages developed by researchers, enhancing user experience continuously.

Introduction to R and RStudio

Overview of R Software

  • The academic community utilizes statistics as a small part of the programming software, which is open-source and customizable for user needs.
  • RStudio provides a graphical interface that enhances user-friendliness, especially for beginners who are recommended to use this interface.

Working Environment in RStudio

  • The program features four interactive windows: a script editor for coding, output displays for generated files or data, and sections for graphs and help resources.
  • Users can explore various methods to work within the software, starting with simpler paths before advancing to more complex options.

Experimental Design in Agricultural Sciences

  • Emphasizes the importance of scientific experimentation involving treatments with repetitions to reduce error and increase confidence in results supported by statistical inference.
  • Discusses essential statistical concepts like mode, mean, and median derived from sample populations aimed at approximating target populations.

Key Concepts in Experimental Design

Statistical Terminology

  • The course will focus on applying essential statistical principles necessary for evaluating experiments including randomization of treatments and comparison of means.

Common Designs Used in Agriculture

  • Examples include block designs where experimental units are organized into blocks based on certain characteristics to minimize external variability affecting results.

Practical Application of Designs

Field Mapping and Randomization

  • Illustrates how treatments are randomized within field maps; participants can infer the type of experimental design being used based on provided sketches.

Decision-Making in Design Selection

  • Highlights that researchers determine the appropriate design based on observed variations in their experimental fields.

Data Collection and Analysis

Importance of Consistency

  • Stresses that data collection should align with the chosen experimental design; discrepancies between design installation and analysis can lead to incorrect conclusions.

Variable Response Considerations

  • Researchers must identify key response variables that differentiate treatments effectively during data collection processes.

Working with Databases in R Software

Introduction to Database Organization

  • Participants are instructed on how to organize a database using software like SLR, emphasizing the need for a single column for the response variable.
  • A question arises regarding the use of punctuation in R, highlighting that since the software is programmed in English, numbers should be formatted with periods instead of commas.

Setting Up R Studio

  • The instructor opens R Studio and requests participants to do the same, sharing a link to a folder containing necessary data and recordings.
  • Upon opening R Studio, participants are guided through its interface, noting that it typically displays three initial windows.

Creating Scripts and Managing Data

  • The instructor explains how to create a new script file within R Studio as part of preparing for data analysis.
  • Participants are encouraged to save their scripts regularly with meaningful names that include dates for better organization.

Importing Data into R

  • The first step in data analysis involves uploading a database; an example command is provided using read.delim() function with "clipboard" as an argument.
  • Instructions are given on copying data from Excel into memory using Control + C before executing commands in R.

Working with Example Datasets

  • Participants are prompted to download example datasets from Excel and ensure they have them ready on their computers.
  • The instructor directs everyone to navigate to specific sheets within the dataset that contain traditional examples formatted for use in R.

This structured approach provides clarity on each segment discussed during the session while allowing easy navigation through timestamps linked directly to relevant content.

How to Upload and Analyze Data in Software

Uploading Data

  • The process begins with selecting the green arrow button to run the software, which initiates the loading of a database named "desea."
  • Users are encouraged to confirm that their databases have been successfully uploaded. Recommended vectors for analysis include a touch and str to view database structure.

Understanding Data Structure

  • The a taxi function disaggregates data, allowing users to visualize it as individual components rather than a single object.
  • The database consists of three columns: two recognized as characters and one as integers. Issues arise if numbers are formatted incorrectly (e.g., decimals or commas).

Troubleshooting Upload Issues

  • If users encounter problems uploading their databases, they should ensure they copy from the header downwards in Excel before pasting into the software.
  • It is crucial that users have installed the necessary software (Rebase) for proper functionality; otherwise, uploads will fail.

Visualizing Data

  • After running commands, users can visualize results in real-time. Sharing codes via chat is encouraged for collaborative troubleshooting.
  • A summary of data allows visualization of each column's behavior. Users can clear previous outputs using Control + L for better clarity.

Statistical Analysis Preparation

  • The system identifies 25 rows with specific characteristics and provides summary statistics such as minimum, maximum, mean, first quartile, median, and third quartile.
  • Before conducting ANOVA (Analysis of Variance), descriptive statistics like box plots are recommended to assess data variability across treatments.

Creating Models and Conducting Tests

  • Users create models using functions from Rebase for variance analysis based on treatment performance metrics.
  • Normality tests (e.g., Shapiro-Wilk test), particularly useful for small datasets, help determine if data follows a normal distribution.

Interpreting Results

  • A Shapiro test result yielding a p-value of 0.47 indicates normal distribution within the dataset since it exceeds 0.05.
  • To further validate findings visually, creating histograms can provide insight into data distribution patterns despite limited sample sizes.

Understanding Variance and Statistical Tests

Homogeneity of Variances

  • The discussion begins with the importance of testing for normal distribution, followed by a query about homogeneity of variances using non-normal distributions.
  • It is clarified that while non-normal distributions can be analyzed using Levene's test, parametric tests require normally distributed quantitative data. Bartlett's test is recommended for these cases.

Analysis of Variance (ANOVA)

  • The speaker explains that both assumptions have been validated, leading to the application of ANOVA. They note significant statistical differences among treatments based on degrees of freedom.
  • A question arises regarding the minimum number of observations needed for reliable analysis; three repetitions per treatment are suggested as a standard practice.

Treatment Repetitions and Special Designs

  • For conducting ANOVA, at least three treatments are necessary; if only two treatments exist, simpler tests may suffice.
  • Some special designs like lattice designs allow researchers to work with fewer repetitions while still obtaining valid results.

Significance Levels in Results

  • The significance levels are discussed: values below 0.01 receive three stars, below 0.05 one star, and values above 0.05 are marked as not significant (ns).

Comparison of Means

  • Edgar asks about applying fixed versus random models in ANOVA; it’s noted that most researchers use fixed models where treatment selection is determined by the investigator.
  • The conversation shifts to Duncan's multiple range test as a method for comparing means, emphasizing the need for specific libraries in R programming.

Installing Required Libraries

  • Instructions are provided on how to install the 'agricolae' library in R for easier statistical analysis.
  • Users are guided through checking installation success and activating the library within their R environment.

Conducting Statistical Tests Using R

  • The speaker demonstrates how to perform Tukey's test using the installed library by specifying treatment comparisons directly within R code.
  • A complete version of the command is mentioned but simplified options yield similar results when executed correctly.

Accessing Test Results

  • Participants are informed that all written instructions will be saved in a folder for future reference and exploration within R software.

Statistical Analysis and Comparisons

Understanding Degrees of Freedom and Error

  • Discussion on degrees of freedom for error set at 20, with a focus on the mean square error in statistical analysis.

Execution of Statistical Tests

  • The execution of a vector comparison is demonstrated, highlighting that two vectors yield the same results. Questions are invited regarding these vectors.

Confidence Intervals and Alpha Levels

  • Simplification in calculations by assuming an alpha level of 0.05; discussion on changing it to 0.01 for a 99% confidence interval, emphasizing the importance of rigorous comparisons.

Utilizing Duncan's Test

  • Introduction to Duncan's test as part of agricultural statistics; instructions provided for replacing variables in the test setup.

Analyzing Treatment Comparisons

  • Emphasis on maintaining consistency in naming conventions when inputting data into statistical software; reiteration of degrees of freedom from previous discussions.

Grouping Differences in Treatments

  • Comparison between different grouping methods reveals subtle differences; specific values used for treatment comparisons are discussed, indicating how they affect statistical significance.

Statistical Significance Evaluation

  • Explanation of how to interpret differences between treatment means using specific numerical examples to illustrate statistical significance or lack thereof.

Summary and Visualization Techniques

  • Recap on summarizing results through various tests while maintaining clarity about alpha levels used during comparisons; visual representation techniques are introduced.

Data Preparation Steps

  • Overview of steps taken to prepare data for analysis including loading datasets, checking structure, and visualizing distributions through box plots before conducting further analyses.

Importance of Normal Distribution and Variance Homogeneity

  • Validation processes such as normal distribution checks and variance homogeneity tests are crucial before proceeding with mean comparisons using specified packages.

Handling Outliers in Data Analysis

  • Discussion on identifying outliers using box plots prior to analysis; emphasizes the need for careful consideration when dealing with anomalous data points that may skew results.

This structured summary encapsulates key insights from the transcript while providing timestamps for easy reference back to specific parts of the discussion.

Analysis of Variability and Error Measurement

Understanding Degrees of Freedom and Error Measurement

  • The discussion begins with the concept of degrees of freedom in error measurement, highlighting that even with fewer repetitions (e.g., four or five), sufficient data remains for analysis.
  • The total treatment error is calculated, emphasizing the importance of understanding variability coefficients, which are essential for statistical analysis.
  • The formula for calculating the coefficient of variability is introduced: standard deviation divided by the mean, expressed as a percentage. This calculation is crucial for assessing data consistency.

Calculating Coefficient of Variability

  • A manual calculation example is provided to determine the coefficient of variability, demonstrating practical application using square roots and averages.
  • Results from calculations yield a coefficient value around 3.5, showcasing how different methods can lead to similar outcomes despite minor discrepancies.

Graphical Representation and Analysis

  • The need for graphical representation arises; a simple graph will be created based on previous analyses to visualize results effectively.
  • Utilization of an agricultural library function called var group allows extraction and organization of data into groups for better visualization in graphs.

Enhancing Graphical Data Presentation

  • Discussion includes setting axis limits based on average values plus standard deviations to ensure clarity in graphical representations.
  • Suggestions are made regarding labeling axes appropriately (e.g., yield in tons per hectare), ensuring that graphs convey meaningful information clearly.

Finalizing Graphical Outputs

  • A title suggestion "Comparison of Means" is proposed for the bar graph, indicating its purpose within presentations or scientific articles.
  • Emphasis on customization options available within software tools used for creating these graphs highlights flexibility in presentation styles.

Practical Application and Script Usage

Utilizing Scripts Across Different Datasets

  • Acknowledgment that not all options can be explored due to time constraints; focus remains on basic script functionalities necessary for complete random design analysis.
  • Introduction to applying previously developed scripts to new datasets demonstrates versatility but requires attention to header consistency across datasets.

Addressing Dataset Changes and Visualization Adjustments

  • When switching datasets, adjustments must be made (e.g., changing axis limits), ensuring accurate visual representation without losing critical data insights.

Replicating Models and Encouraging Collaboration

  • Participants are encouraged to replicate models using shared scripts while being mindful that changes in dataset headers necessitate corresponding updates in scripts.

Engagement and Collaborative Learning

Facilitating Group Work on Experimental Designs

  • An invitation is extended for participants to ask questions or express difficulties encountered during exercises, fostering an interactive learning environment.

Organizing Collaborative Data Editing Sessions

  • Plans are laid out for collaborative work on experimental designs where participants will select experiments to format into rows and columns collectively.

This structured approach ensures clarity while providing detailed insights into key concepts discussed throughout the transcript.

Organizing Data for Analysis

Introduction to Treatment Design

  • Discussion on random repetitions and the need for structured blocks in treatment designs, specifically mentioning treatments 5 and 6.
  • Emphasis on the importance of selecting treatments and organizing data accordingly; both treatments will be utilized.

Data Entry Instructions

  • Instructions provided for entering data into designated sheets, focusing on formatting into rows and columns with clear labels for treatments, blocks, variables, and responses.
  • Mention of standardizing entries by using abbreviations for lengthy treatment names to maintain clarity.

Collaboration and Editing

  • Encouragement for participants to actively engage in editing the shared document; specific instructions given regarding how to fill out blocks and response variables.
  • Reminder to use abbreviations when listing various types of oils in the dataset.

Data Verification Process

  • Importance of verifying that all entered values match those in the original table before proceeding with analysis.
  • Clarification that the current organization method is optimal for database management, allowing direct software integration without needing reorganization later.

Finalization of Data Entry

  • Call for a final review among participants to ensure accuracy across all data entries before moving forward with analysis.
  • Explanation of complete block design principles as they relate to treatment repetition counts within datasets.

Data Analysis Techniques

Preparing Data for R Programming

  • Transitioning from data entry to analysis; starting with a focus on five treatments while preparing datasets in R programming environment.
  • Steps outlined for copying data into R while ensuring no accidental changes occur during this process.

Loading Datasets into R

  • Instructions on creating a new script file in R after cleaning up previous graphs or unnecessary elements from earlier analyses.

Working with Variables

  • Guidance on naming conventions when uploading datasets into R; users encouraged to personalize their dataset names during practice exercises.

Analyzing Treatment Responses

  • Steps detailed on executing commands within R to analyze treatment responses based on performance metrics captured in the dataset.

Visualization Techniques

  • Creation of box plots discussed as a method to visualize treatment performance effectively; emphasis placed on concise variable naming for better readability.

Analysis of Variance and Treatment Effects

Introduction to the Model

  • The discussion begins with the introduction of a response variable related to performance, incorporating blocks as a source of variation in the model.
  • An error is identified in the model setup, specifically regarding the spelling of "rendimiento" (performance).

Assumptions in Statistical Analysis

  • The speaker emphasizes the importance of checking assumptions such as normality and homogeneity of variances using tests like Shapiro-Wilk for residuals from Model 2.
  • Normal distribution is confirmed as a prerequisite before assessing homogeneity of variances.

Homogeneity Testing

  • A test for homogeneity reveals whether there are significant differences between treatments; results indicate no significant differences among blocks.
  • Participants inquire about accessing software tools necessary for analysis, highlighting that R can be downloaded freely online.

Interaction Between Factors

  • A question arises regarding additivity between block and treatment interactions; it’s noted that while interactions can be included, they are not typical in standard designs.

Experiment Results Overview

  • The analysis concludes that there were no significant differences found across treatments based on a coefficient of variability reported at 24.5%.
  • Details about specific treatments (e.g., control, gibberellins, acetic acid) are provided, indicating no significant yield differences among them.

Next Steps in Experimental Design

Transition to New Experiment

  • The speaker proposes moving on to another experiment focused on acids after determining that further comparisons are unnecessary due to lack of significance.

Data Handling and Preparation

  • Instructions are given on how to load data into R from various formats (Excel or text), emphasizing proper structure with rows and columns.

Visualization Techniques

  • A box plot is suggested for visualizing data distributions related to aphids, which helps identify treatment variations despite statistical insignificance observed earlier.

Insights from Box Plots

  • Observations from previous box plots suggest treatment variations exist but may not be statistically significant; this reinforces the need for careful interpretation of data visuals.

Final Steps in Analysis

  • The next model will focus on analyzing variance concerning aphid responses based on treatment effects while maintaining structured naming conventions within scripts.

Normal Distribution Testing and Model Validation

Shapiro-Wilk Test on Residuals

  • The speaker conducts a Shapiro-Wilk test to assess the normal distribution of residuals from the model.

Homogeneity of Variances

  • Discussion on whether the model meets the assumption of homogeneity of variances, referencing previous box plot analyses. The conclusion is that there is no homogeneity.
  • Acknowledgment that if assumptions are not met, alternative validation methods must be considered.

Data Transformation Options

  • The speaker suggests transforming data as an option when normality is not achieved, particularly for count data in entomology and phytopathology.
  • Demonstration of a simple transformation using square roots to address non-normality issues.

Model Updates and Further Testing

  • After transforming the data, the speaker updates their database and prepares to run a new model (Model 4).
  • Conducting another Shapiro test on Model 4's residuals reveals persistent issues with variance homogeneity despite transformations.

Non-parametric Tests Consideration

  • Introduction to non-parametric tests due to ongoing violations of assumptions; examples include Friedman and Kruskal-Wallis tests.
  • Emphasis on understanding different statistical arguments behind various treatments used in publications.

Statistical Analysis Results

Treatment Comparisons

  • Results indicate no significant differences among treatments based on statistical analysis (p-value = 0.72), suggesting similar rankings across experiments.

Exploring Alternative Transformations

  • Encouragement for further exploration of different transformation methods to meet statistical assumptions while noting current findings show no significant differences.

Data Management and Experimental Design

New Dataset Introduction

  • Introduction of a new dataset for further analysis, indicating adjustments made for clarity in modeling (Model 5).

Statistical Tests on New Models

  • Running Shapiro tests and Bartlett tests on Model 5 shows variability at 21%, leading to identification of statistically significant treatment differences.

Future Directions in Statistical Analysis

Upcoming Topics

  • Brief mention that future discussions will cover factorial designs, with resources shared via email for additional support.

Practical Applications

  • Suggestions for utilizing libraries related to experimental design, including randomization techniques and block designs for practical applications in fieldwork.

Installation and Usage of Agricultural Library

Overview of Data Handling

  • The process involves transferring data to Excel, allowing for easy access and manipulation similar to previously discussed examples.

Installation Process

  • Instructions are provided on how to download the agricultural library, emphasizing the installation steps necessary for setup.
  • A script generated earlier is being uploaded for participants, ensuring they have access to materials and examples for future sessions.

Preparation for Upcoming Sessions

  • Participants are encouraged to prepare their own experimental data in advance of the next session scheduled at 9 AM, facilitating practical application during discussions.

Successful Installation Confirmation

  • After installation, a success message confirms that the agricultural library is ready for use. This step is crucial as it indicates readiness to proceed with further tasks.

Recommendations for Additional Libraries

  • Other recommended libraries include 'facto min' and 'facto extra', which are noted as free and easy to install, enhancing multi-variate analysis capabilities.
Video description

CURSO TALLER VIRTUAL