MES/IA-MOD9-UNIDAD4
Introduction to Orange and Project Setup
Overview of the Session
- The session is divided into two parts: an introduction to Orange and a practical project demonstration.
- The speaker emphasizes that using Orange is straightforward, aiming to assist participants in their research endeavors.
Sharing Resources
- The speaker shares a URL with the group via WhatsApp for participants to access the same content displayed on screen.
- Participants confirm they can view and navigate through various sections such as academic experience and research recognition.
Exploring Features of Orange
Introduction to Zin Model
- A product called "Zin" is introduced, which aids in design and internal coding, allowing users to create personal web pages.
- Users can download code from this model for use in other platforms, highlighting its multimodal capabilities adaptable across different models.
Application of Knowledge
- The speaker discusses how knowledge fusion occurs within the platform, enabling automated processes without extensive coding skills required from users.
- Emphasis is placed on creating personal repositories or thematic web quests using the tools available in Orange.
Hands-On Project with Orange
Starting the Project
- Participants are instructed to open Orange and create a new project titled "Prueba de Hipótesis" (Hypothesis Testing). This relates directly to thesis work involving quantitative designs.
Hypothesis Testing Process
- The speaker outlines traditional methods for hypothesis testing using SPSS, including normal distribution checks and chi-square tests for decision-making regarding null hypotheses.
Transitioning Tools
- A transition from SPSS to Excel is suggested for hypothesis testing within Orange, indicating a more user-friendly approach while still achieving valid results through widget utilization.
Machine Learning Presentation and Hypothesis Testing
Overview of the Class Objectives
- The class aims to apply machine learning using predefined models in Orange for analyzing data from a questionnaire, interpreting results with GPT, and preparing a thesis report.
- Resources include categorical data, no-code software (Orange), and a presentation report model. The focus is on transforming categorical variables into numerical ones for analysis.
Steps for Data Transformation
- The first step involves converting categorical variables into numerical values, which is essential for statistical analysis. This transformation allows better compatibility with analytical tools like Excel or SPSS.
- A representative sample size of 380 subjects is identified from the dataset, adjusting for headers to yield a final count of 378 usable responses. This ensures accurate representation in the analysis.
Practical Application in Excel
- Participants are instructed to download an example file containing categorical questionnaire data to practice the conversion process during the session. Confirmation of file download is requested before proceeding further.
- An explanation is provided regarding the use of Likert scale questions in surveys, emphasizing that respondents understand categories rather than raw numbers when answering questions about their preferences or behaviors.
Understanding Categorical vs Numerical Variables
- Categorical variables are defined as those that provide options (e.g., "Never," "Sometimes") rather than numeric values; these need conversion for effective analysis within software tools like Orange or Excel.
- The instructor outlines how to assign numerical values to different response categories: e.g., "Never" = 1, "Almost Never" = 2, etc., facilitating easier data processing and interpretation later on.
Filtering Data in Excel
- Instructions are given on how to filter data within Excel by selecting specific categories (e.g., gender) and assigning them corresponding numeric values efficiently without manual entry for each individual response. This streamlines the transformation process significantly.
- Emphasis is placed on utilizing filters effectively within Excel to manage large datasets while ensuring accuracy in categorization and subsequent value assignment during analysis tasks.
Data Transformation and Analysis Process
Steps for Data Entry and Conversion
- The speaker demonstrates how to enter data into a cell, using a checkbox method to mark selections. A double-click on the checkbox automatically fills in the selected category.
- After completing all conversions, it is essential to remove filters to enable access to all 378 samples in the dataset.
- The speaker emphasizes the importance of data cleaning, ensuring no dirty data exists before proceeding with further analysis.
Value Assignment and Filtering Techniques
- The process involves assigning numerical values to categorical responses such as "sometimes," which is assigned a value of three after filtering.
- Similar steps are repeated for other categories like "almost never" (value two), "almost always" (value four), and "always" (value five), demonstrating consistency in value assignment across categories.
Structuring Data for Analysis
- Once all values are converted from categorical to numerical, filters are removed to prepare for further analysis in subsequent columns.
- A request is made by a participant for clarification on item numbering within their dataset, indicating collaborative engagement during the session.
Progress Check and Next Steps
- Participants confirm their progress with data entry; some require additional time, leading to an extension of three minutes for completion.
- After confirming readiness among participants, the speaker transitions into discussing how they will structure their datasets moving forward.
Transitioning from Questions to Items
- The focus shifts from questions in the dataset to labeling items numerically (e.g., item one). This change reflects a move towards quantitative analysis rather than qualitative questioning.
- Instructions are given on how to drag cells horizontally in Excel, allowing automatic filling of item numbers from one through eight based on initial input.
Dimensionality and Indicator Construction
- The speaker outlines constructing dimensions and indicators within their framework. Each dimension corresponds with specific items or questions that contribute collectively towards overall metrics.
- An example illustrates that summing values from individual items provides totals for each dimension, reinforcing understanding of dimensional relationships within datasets.
Visual Representation of Data Structure
- Participants are guided on visually organizing their data by coloring different dimensions distinctly—green for one dimension, blue for another—to enhance clarity during analysis.
- Emphasis is placed on maintaining clear visual distinctions between dimensions and indicators as part of effective data management practices.
This structured approach ensures participants grasp both practical skills in Excel as well as theoretical concepts related to data transformation and analysis.
Understanding Pretest and Posttest in Research
Introduction to Pretest and Posttest Concepts
- The speaker introduces the concept of pretest (pret) and posttest (postex), emphasizing that the pretest measures the dependent variable at the beginning of an experiment.
- Clarification is sought on what pret and postex mean, indicating a need for understanding these foundational terms in research methodology.
Conducting Pretests
- The pretext is defined as data collected before implementing any intervention, while postex is collected afterward to assess changes.
- Participants are guided through calculating totals from their pretests, reinforcing the importance of quantifying results accurately.
- The speaker explains how to organize data by dimensions and colors for clarity when analyzing total scores from pretests.
Transitioning to Posttests
- After conducting a pretest, researchers implement an intervention before administering a posttest using the same subjects to measure changes over time.
- An example is provided where students retake a questionnaire after a month, illustrating practical application in educational settings.
Analyzing Results
- Emphasis is placed on converting categorical data into numerical values for analysis during both pretests and posttests.
- The expectation that scores should generally improve after interventions is discussed, highlighting trends observed in most cases.
Hypothetical Scenario for Analysis
- A hypothetical scenario illustrates how participants can manipulate data (e.g., adding seven points to each score), allowing them to visualize potential outcomes from their interventions.
- Participants are encouraged to confirm their calculations based on this hypothetical adjustment, ensuring they understand how changes affect overall results.
Hypothesis Testing and Data Preparation in Orange
Preparing Data for Hypothesis Testing
- The speaker emphasizes the importance of two specific columns, L and M, which are essential for hypothesis testing. These represent pre-test and post-test totals.
- It is crucial to compare the total pre-test (pret) with the total post-test (postex) to observe any changes effectively.
- The speaker suggests copying relevant items and totals into a new Excel sheet for clarity before importing them into Orange.
- The required data structure includes gender, eight items, pre-test, and post-test values; it should be clean without unnecessary headers or formulas.
- Confirmation is sought from participants regarding the readiness of their sheets for loading into Orange.
Loading Data into Orange
- Participants are instructed that formulas can remain in the Excel file as long as extraneous information is removed; only headers and data are necessary.
- A brief pause is taken to ensure everyone is ready before proceeding with closing the Excel sheet.
- The speaker instructs participants to close their current Excel files before moving on to work within Orange software.
- Participants are guided to select the first widget file in Orange to load their prepared Excel sheet containing cleaned data.
- Emphasis is placed on ensuring that only header and data rows are included when uploading files.
Understanding Variable Types in Analysis
- The discussion shifts towards identifying variable types such as numerical or categorical for effective analysis within Orange.
- Clarification is provided that all items will now be treated as numerical variables rather than categorical ones, differing from previous analyses.
- Participants must verify that all variables are correctly classified as numerical for accurate processing in subsequent steps.
Assigning Roles to Variables
- Different roles for variables are explained: 'picture' indicates characteristics, 'target' signifies objectives, while 'meta' refers to additional information needed during analysis.
- Age and gender variables are identified as additional information by Orange due to their complementary nature in analysis contexts.
Finalizing Data Setup
- Pre-tests are designated as characteristics while post-tests serve as targets since they will be compared directly against each other during analysis.
- Once all settings appear correct, participants should apply these configurations before visualizing results through a data tape widget connection.
- Participants confirm successful setup of their views before continuing with further instructions on analyzing results.
Data Processing and Visualization in Orange
Loading the Dataset
- The speaker discusses the initial steps of loading a numerical dataset into the Orange software, emphasizing that this has already been completed.
- The configuration of the widget is highlighted, specifically using the "Select Columns" widget to connect with the data table for visualization purposes.
Configuring Select Columns Widget
- Instructions are provided on how to access and configure the "Select Columns" widget, including connecting it to the data table.
- The speaker explains how to select specific columns from a list, guiding users through selecting items from one column to another.
Finalizing Column Selection
- Users are instructed on how to select multiple items using control-click and move them into a target area within the widget interface.
- Additional information such as numbers and gender is mentioned as necessary selections for inclusion in the target area.
Procedural Steps with Widgets
- Confirmation is requested from participants regarding their progress before moving on to procedural steps involving additional widgets.
- The speaker introduces another widget called "C Diagram," explaining its connection process with previous widgets for further analysis.
Visualization Configuration
- Guidance is given on configuring visualizations by selecting appropriate columns for display in graphs, ensuring clarity in data representation.
- Specific instructions are reiterated about which variables should be selected for visualization, aiming for an accurate graphical output.
Review and Clarification Requests
- A participant requests clarification on earlier steps related to loading data into Orange, indicating some confusion that needs addressing.
- The speaker reiterates essential steps for loading Excel files into Orange, emphasizing proper routing of files during setup.
Completing Data Setup
- Further details are provided about organizing selected columns correctly within Orange's interface before proceeding with analysis.
- Participants are reminded about selecting key variables like age and gender while preparing their datasets for analysis.
Graphical Output Verification
- After completing configurations, participants confirm successful generation of graphs based on their selected data inputs.
Statistical Insights
- The importance of understanding sample size representation from Excel outputs is discussed as part of statistical analysis preparation.
- A significance value derived from comparing pretext and posttext variables is introduced as crucial information influencing decision-making processes.
Understanding Significance in Hypothesis Testing
Key Concepts of P-Value and Hypothesis Testing
- The significance value (P) is introduced as 0.00, indicating the level of significance in hypothesis testing.
- A rule states that if the significance value is greater than 0.05, the null hypothesis should not be rejected, implying no statistical significance.
- Conversely, if the P-value is less than 0.05, it leads to rejecting the null hypothesis and indicates statistical significance.
Application of Rules in Hypothesis Testing
- The speaker poses a critical question regarding which rule to follow based on a P-value of 0.00, concluding that it falls under the second rule for rejection of the null hypothesis.
- This decision confirms that results are statistically significant and demonstrates how hypothesis testing has been simplified.
Structuring Research Findings
- The speaker emphasizes documenting both hypotheses clearly: an alternative hypothesis supporting research and a null hypothesis suggesting no influence.
- The context involves students in their final semesters, with independent variables being the intelligent virtual tutoring model affecting thesis work.
Data Visualization and Interpretation
- Instructions are given on presenting data visually using machine learning models to illustrate findings effectively through graphs.
- Emphasis is placed on interpreting results by summarizing hypotheses and utilizing tools like GPT for generating textual explanations based on visual data.
Final Steps in Analysis
- The process includes copying relevant hypotheses into analytical software for interpretation based on provided diagrams.
- Conclusively, there’s a call to perform correlation tests following initial analysis to ensure comprehensive understanding beyond just statistical significance.
What is Correlation?
Introduction to Correlation Widget
- The concept of correlation is introduced, with a focus on using the correlation widget in Orange.
- Instructions are provided on how to connect the correlation widget by selecting columns and configuring settings.
Setting Up the Correlation View
- Users are guided to achieve a specific view after connecting the widget, comparing their setup with an example.
- Emphasis is placed on ensuring that not all items are included; users should select their target variable (total post).
Analyzing Results
- The goal is to observe a high correlation value (e.g., 0.98), indicating a strong relationship between variables.
- Clarification is given regarding the expected output layout, confirming that total post should be at the top.
Conducting Correlation Tests
Performing Spearman's Rank Correlation Test
- Users are instructed to conduct Spearman's rank correlation test for 378 elements using machine learning models.
- A step-by-step process for capturing results from Orange and preparing them for reporting is outlined.
Utilizing Scarlet Plot Widget
- The Scarlet Plot widget is introduced as a tool for visualizing data correlations, with instructions on its connection and configuration.
Configuring Scarlet Plot
Step-by-Step Configuration
- Users learn how to set up axes in Scarlet Plot: X-axis for pre-test scores and Y-axis for post-test scores.
- Confirmation of values (R-value around 1 or slightly above) indicates strong positive correlation; users share their findings.
Interpreting Results with GPT
Copying Data for Analysis
- Instructions are provided on copying results into GPT for interpretation, emphasizing clarity in what needs analysis.
Understanding R-values
- Discussion revolves around interpreting R-values, highlighting that higher initial scores correlate with better outcomes in dependent variables.
Intelligent Virtual Tutoring Model and Research Performance
Impact of Intelligent Tutoring on Student Performance
- The intelligent virtual tutoring model is identified as a significant predictor of research performance among final semester students at Universidad Blanca.
- This correlation not only validates the relationship between variables but also highlights its pedagogical and methodological effectiveness.
Hypothesis Testing in Research
- Students are advised to articulate the influence of their model in percentage terms when presenting hypotheses, emphasizing clarity in statistical significance.
- A specific example illustrates that a 98.8% correlation (P-value of 0.988) can be presented to a tribunal as evidence supporting the alternative hypothesis.
Simplifying Data Analysis
- The speaker contrasts traditional SPSS methods with modern approaches, suggesting that simpler methods can yield effective results without extensive complexity.
- Emphasis is placed on closing chapters on hypothesis testing succinctly while introducing advanced techniques like algorithms for item evidence generation.
Advanced Predictive Techniques
- Discussion includes predictive modeling using various data analysis techniques, hinting at future courses focused on machine learning applications.
- The speaker mentions the ability to visualize predictions through tables and RAM chains, although these topics may extend beyond current discussions.
Transitioning from Traditional to Modern Methods
- A comparison is made between traditional SPSS processes and contemporary machine learning practices, highlighting ease of use and efficiency in data handling.
- Machine learning allows for straightforward data conversion and hypothesis testing, making it more accessible than previous methodologies.
Project Submission Instructions
- Students are instructed to save their projects correctly, ensuring proper file management by naming conventions related to hypothesis testing.
- Guidance is provided on organizing project files into folders and converting them into WinRAR format for submission.
Project Submission and Review Process
Instructions for Project Submission
- The speaker requests participants to upload their complete projects and Excel files to a WhatsApp group within two minutes, emphasizing the need for completeness.
- It is highlighted that both the Excel data and project must be functional together; if the project does not execute, it indicates an issue with the uploaded Excel file.
- A warning is given that no further submissions will be accepted after this point, as the speaker prepares to review submitted projects.
Challenges in Project Execution
- One participant expresses difficulty with configuration steps in Orange and mentions issues with graphical outputs due to missing data.
- The participant plans to revisit instructional videos for clarification on how to resolve these issues before final submission.
Review of Submitted Projects
- The speaker begins reviewing submissions, starting with Alarcón's project. Initial checks reveal incomplete data regarding pre and post conditions.
- After correcting some initial errors, they validate Alarcón's project against its data inputs.
Data Validation Process
- The speaker continues validating each project's data integrity by checking specific components like Scarlet Plot configurations.
- They emphasize that all necessary elements should function correctly without additional input from participants during validation.
Final Checks and Feedback
- As they move on to María Torres' submission, it becomes evident that her work is also incomplete; only partial data has been provided.
- Adriana Gareca’s submission shows more promise as it includes required elements but still requires adjustments in certain areas like Scarlet Plot settings.
Common Errors Identified
- Participants are reminded of common mistakes such as failing to configure essential components properly or miscalculating totals in their Excel sheets.
- The importance of thoroughness in completing all aspects of the project is reiterated, highlighting learning opportunities from these errors.
WhatsApp Communication and Project Guidelines
Overview of Final Project Requirements
- The final project involves collaboration between colleagues Arcón Rosario and Adriana Gareca, who will submit a specific Excel file as part of their work.
- Students are required to process raw data provided by the instructor, transforming it into a structured format for their projects.
- Deliverables include an Excel file, the processed data, the project itself, and a PDF report summarizing all findings.
Activity Four: Practical Questionnaire
- A practical questionnaire will be administered based on previous sessions to assess students' understanding of the material covered.
- Participation in this activity is crucial; only those who have engaged with prior sessions will be able to answer effectively.
Extension of Activities
- A request was made for extended access to activities due to scheduling conflicts faced by some students during the week.
- The instructor agreed to reopen all activities until February 1st, allowing students ample time to complete their assignments without rush.
Encouragement for Timely Completion
- Students are encouraged not to procrastinate and utilize the available time throughout the week leading up to deadlines.
Introduction of New Tools and Resources
- The instructor introduced a new model for working on projects via Zoom, emphasizing its advanced features compared to previous versions.
- Students must log in using their Gmail accounts to access resources shared in the chat.
Features of New Software Version
- The latest software version (4.7) offers enhanced capabilities for various tasks including coding and research functionalities.
Demonstration of Practical Applications
- A demonstration was conducted showing how students can create personal web pages or resumes using the new software tools available.
This markdown summary encapsulates key points from the transcript while providing timestamps for easy reference.
Interactive Coding with AI Models
Overview of the Model's Capabilities
- The model can generate code in HTML format, allowing execution without internet access. It processes information gradually, showcasing its interactive capabilities.
- This model offers more resources and potential than GPT, enabling real-time adjustments to instructions as it processes knowledge.
- Users may need to clear their browser cache if they encounter issues, as the model consumes significant resources during operation.
User Interaction and Adjustments
- The model generates a visual profile that requires user input for improvement; users can add information through menus.
- Users are encouraged to provide detailed instructions for each menu option to enhance the output quality in real time.
Comparison with Other AI Models
- The discussed model is positioned as superior to Gemini for tasks like laboratory work and educational presentations due to its unique features.
- Each AI model has specific strengths; while Gemini excels in certain areas, GPT is noted for its reasoning capabilities.
Performance Metrics and Insights
- A classification system (RG1) is used to evaluate models based on reasoning performance; recent data shows improvements in GPT 5.2's optimization for reasoning tasks.
- Evidence from recent sessions indicates a rise in performance metrics, highlighting the effectiveness of structured inquiries when using these models.
Limitations and Considerations
- While Grock shows potential in reasoning, it currently ranks lower compared to other models like GPT 5.2; Claude also has limitations despite being useful for coding tasks.
- Chat Z offers unique interactive scenarios but lacks the reasoning depth found in GPT; users must choose models based on specific task requirements.
Conclusion of Session
- Participants are reminded of the importance of selecting appropriate models based on their intended use cases, emphasizing a blend of different tools for optimal results.
Discussion and Gratitude for Learning
Expression of Thanks
- The speaker expresses gratitude towards the doctor for the teachings received during the module, indicating a positive learning experience.
- Acknowledgment of significant learning from the doctor, highlighting the impact of their instruction on students.
- The speaker repeatedly thanks both the doctor and another individual named Néor, emphasizing appreciation for their contributions to education.
Inquiry About Tools Used
- The speaker seeks clarification regarding the virtual whiteboard utilized by the doctor during lessons, indicating interest in effective teaching tools.