Cleaning Data in Excel: Microsoft Excel Crash Course

Cleaning Data in Excel: Microsoft Excel Crash Course

Cleaning Data in Excel Webinar

Introduction to the Webinar

  • Deborah Ashby introduces herself as a Microsoft trainer with over 25 years of experience in supporting and training on Microsoft applications.
  • Attendees are encouraged to use the chat panel for messages, but Deborah notes she won't see it while sharing her screen.
  • Questions can be noted down for discussion at the end of the webinar; there is no assistant present to relay questions during the session.

Housekeeping and Recording Information

  • The webinar is being recorded, and attendees will receive a copy via a monthly newsletter towards the end of the month.
  • Participants are encouraged to review materials at their own pace after the session if needed.

Agenda Overview

  • The agenda includes discussing why data cleaning is important, common issues encountered with external data, and methods for addressing these issues.
  • Topics include adjusting rows/columns, deleting blank rows, filling blank cells, removing duplicates, clearing formatting issues, and converting text-stored numbers back to numeric format.

Importance of Cleaning Data

  • Cleaning data is crucial for accurate analysis; often overlooked but integral when working with datasets from various sources.
  • Common scenarios involve receiving poorly formatted spreadsheets or downloading data from third-party systems that require tidying up before analysis.

Consequences of Not Cleaning Data

  • Failing to clean data can lead to messy and inaccurate analyses which complicate tasks like creating pivot tables or using lookup formulas effectively.
  • Ensuring consistency in formatting helps prevent errors that could arise during analysis.

Common Issues Encountered in Data

  • Inconsistent column widths and row heights are among several common problems faced when dealing with unclean datasets.

Data Cleaning Challenges in Excel

Common Issues with Data Sets

  • Blank rows, columns, or cells can disrupt analysis, particularly when using pivot tables. Duplicate data also poses a significant problem.
  • Inconsistent case and spacing can make data appear messy and difficult to read. Numbers formatted as text can lead to calculation errors in formulas.
  • Incorrect column formatting (e.g., dates formatted as text) affects data display. Messy formatting from external sources often complicates the dataset.
  • Data may require splitting (e.g., separating first names from last names), and abbreviations might need replacing with full words for clarity.
  • Ensuring headers are correctly specified simplifies analysis. Random characters or line breaks should be removed to maintain data integrity.

Introduction to the Example Dataset

  • The session will utilize a sales data file that has been intentionally created with various issues for demonstration purposes.
  • The speaker expresses concern about audio issues experienced in previous sessions, confirming that participants can hear them clearly this time.

Observations on the Example Dataset

  • The dataset appears disorganized; immediate issues include insufficient row/column sizes leading to cut-off data and inconsistent cell formatting (e.g., borders).
  • Blank rows are present throughout the dataset, along with inconsistencies in spacing and case within certain columns.

Initial Steps for Data Cleanup

  • To improve visibility of the dataset, autofitting column widths and row heights is recommended. This allows all content to be displayed properly without obstruction.
  • Autofitting can be done quickly by selecting all columns and double-clicking on any border between them; this adjusts their width automatically based on content size.

Addressing Blank Rows

  • For large datasets containing multiple blank rows, manually deleting each one is impractical. Efficient methods for identifying and removing these blanks will be discussed later in the session.

Excel Tips for Managing Data

Efficiently Deleting Blank Rows

  • The speaker emphasizes the importance of avoiding time-consuming individual deletions of blank rows in Excel. Instead, they suggest using the "Go To Special" feature to select all columns quickly.
  • A keyboard shortcut (Ctrl + G) is introduced to access "Go To," allowing users to select all blank cells within a dataset efficiently.
  • After selecting blank rows, users can delete them in one action via the Home ribbon, significantly speeding up data management tasks.

Handling Blank Cells

  • The speaker advises that it's better to fill blank cells with a value (like zero) rather than leaving them empty for analysis purposes.
  • Users can again utilize "Go To Special" to highlight blank cells and input a zero across all selected blanks simultaneously by pressing zero followed by Ctrl + Enter.

Clearing Formatting

  • The discussion shifts to formatting issues, such as inconsistent cell backgrounds and borders. The speaker suggests clearing these formats for consistency.
  • Selecting all data can be done using Ctrl + A; however, if clicked outside the dataset, it selects the entire spreadsheet instead.
  • In the Home tab under editing options, users can choose "Clear Formats" to remove unwanted formatting easily.

Removing Duplicates

  • Identifying exact duplicates in datasets is crucial. Excel provides a "Remove Duplicates" button located on the Data ribbon under Data Tools.
  • Users are guided on how to specify headers and select columns when searching for duplicates, ensuring only exact matches are removed from their dataset.

Converting Text to Numbers

  • The speaker highlights an issue where numbers may be misidentified as text due to alignment differences in Excel (left-aligned vs right-aligned).
  • For small datasets, converting text back into numbers can be done individually by clicking on warning triangles next to affected cells.
  • For larger datasets with many instances of this issue, copying a number (e.g., 1), then pasting it into selected cells allows bulk conversion from text back into numeric format.

Excel Data Formatting Techniques

Converting Text to Numbers

  • To convert numbers stored as text back to numeric format, select the data, use "Paste Special," and choose "Multiply." This method is efficient for large datasets.

Importance of Number Formatting

  • Proper number formatting is crucial in Excel. Most cells default to "General" format, which may not be suitable for all data types like currency or dates.

Setting Correct Formats for Different Data Types

  • Ensure that numerical columns such as "Units Sold" are set to "Number" format. Adjust decimal places as needed.
  • For financial figures (e.g., manufacturing price), select multiple columns and press Ctrl + 1 to access the Format Cells dialog box.

Accounting vs. Currency Format

  • The difference between accounting and currency formats lies in alignment: accounting aligns decimal points vertically and positions the currency symbol on the left, making it easier to read down a column of numbers.

Handling Date Formats in Excel

  • Dates in Excel are represented as serial numbers (days since January 1, 1900). Change the cell format from General to Short Date to display them correctly.

Cleaning Up Text Data

  • Inconsistent casing and spacing issues can occur in text columns (e.g., country names). Use Excel functions to standardize these entries.

Using Helper Columns for Text Cleanup

  • Add a helper column using Ctrl + Shift + Plus. Utilize functions like UPPER, LOWER, or PROPER to adjust text case accordingly.

Removing Erroneous Spaces with TRIM Function

  • Combine proper case adjustments with the TRIM function to eliminate unwanted spaces at the beginning or end of words, ensuring clean data for analysis.

Data Cleaning Techniques in Excel

Combining Functions for Data Cleanup

  • The speaker demonstrates how to combine the TRIM function with the PROPER command in Excel to clean up text data efficiently. This is done by adding TRIM at the beginning of the formula and ensuring it is closed properly.
  • The discussion introduces handling line breaks, which can occur when users press Alt + Enter. The speaker explains that these can be removed using the CLEAN function, which also eliminates non-printing characters.
  • By integrating CLEAN, TRIM, and PROPER, users can run three functions on a single piece of data, resulting in a tidy formula that cleans up text effectively.

Addressing Specific Text Issues

  • A potential issue arises with "USA" being converted to proper case. The speaker notes this will be addressed later as they plan to change its representation.
  • When attempting to delete a column containing formulas, reference errors occur due to dependencies on other cells. The solution involves using Paste Special to paste values only, removing underlying formulas while retaining visible text.

Finalizing Data Preparation

  • To correct specific entries like "USA," the speaker uses Find and Replace (Ctrl + H), replacing all instances with a capitalized version efficiently.
  • Emphasizing the importance of spell checking, the speaker highlights that misspelled words can lead to analysis errors by treating similar terms as distinct entities.

Structuring Data for Analysis

  • Before analysis, it's recommended to convert data into an Excel table using Ctrl + T. This action enables better organization and access to additional tools via a new ribbon called Table Design.
  • Tables are automatically named generically (e.g., Table1). It's beneficial to rename them meaningfully (e.g., sales_data), avoiding spaces for easier reference in future analyses.
  • Tables auto-expand when new data is added, making them crucial for maintaining updated charts or pivot tables without manual adjustments.

Pivot Tables and Data Cleaning in Excel

Introduction to Pivot Tables

  • The speaker demonstrates creating a pivot table after cleaning the data set, emphasizing the importance of having a clean data set for accurate analysis.
  • A new worksheet is created for the pivot table report, which imports all column headings from the cleaned data into the pivot table fields area.

Building and Customizing Pivot Tables

  • Users can drag and drop column headings into different areas of the pivot table to perform dynamic analyses; changes can be made easily by rearranging fields.
  • The speaker illustrates how to analyze gross sales by country using a simple drag-and-drop method within the pivot table interface.

Importance of Data Cleaning

  • Emphasizes that without prior data cleaning, results may include blanks or incorrect totals, leading to confusion in interpreting the pivot table report.

Formatting Pivot Table Values

  • The speaker shows how to format numbers in the pivot table using accounting format and adjusting decimal places for better presentation.
  • Grand totals are turned off before creating a basic column chart from the pivot table to enhance clarity.

Enhancing Visual Presentation

  • Techniques for improving chart aesthetics include hiding filter buttons, removing unnecessary legends, and displaying actual values above bars for clearer communication of data.
  • Adjustments such as changing bar widths and applying custom number formatting (e.g., representing millions with 'm') are discussed to make charts visually appealing.

Final Touches on Charts

  • Further customization includes formatting data labels inside bars with bold white text and adding descriptive titles like "Gross Sales by Country" for context.

Exploring Flash Fill

Introduction to Flash Fill

  • The speaker introduces Flash Fill as an efficient tool in Excel that simplifies data entry tasks. It is described as a "Genius Button" that automates repetitive actions.

How to Import and Clean Data in Excel

Introduction to the Text File

  • The speaker introduces a text file shared via chat, indicating it may originate from a third-party system.
  • The data is not easily readable; it includes column headings such as employee name, hire date, salary, department, and job title separated by semicolons.

Importing Data into Excel

  • The process of importing the text file into Excel is outlined. Users need to navigate to "File" and select "Open" to locate their text file.
  • It’s noted that Excel defaults to showing only .xlsx files; users must select "All Files" to view the text file.
  • Upon opening the text file, Excel recognizes it as a non-Excel format and prompts the user with the Text Import Wizard.

Configuring Delimiters in Excel

  • Users are instructed to choose "Delimited" for the type of data and confirm that their data has headers before proceeding.
  • The wizard allows users to specify semicolon as the delimiter, which helps Excel identify where each column begins.

Cleaning Up Imported Data

  • After importing, users can adjust column widths for better readability.
  • The speaker highlights an issue with names being displayed in uppercase and demonstrates how to rearrange them using Flash Fill for proper formatting.

Utilizing Flash Fill for Quick Formatting

  • Flash Fill is introduced as a tool that simplifies formatting tasks by recognizing patterns. Users can invoke it by typing desired formats next to existing data.
  • Two additional methods of invoking Flash Fill are discussed: typing examples or using the Flash Fill button on the Data ribbon.

Finalizing Data Preparation

  • Further cleaning techniques include changing job titles to lowercase using Flash Fill again for efficiency.
  • Once cleaned up, users can apply currency or date formatting as needed before converting their dataset into a table (Ctrl + T).

Exploring Power Query for Data Management

  • The speaker mentions Power Query as an alternative tool for data cleaning that offers a graphical interface, making it easier for those who struggle with formulas.
  • Power Query is also beneficial for combining multiple files seamlessly during data analysis processes.

Combining and Cleaning Sales Data with Power Query

Introduction to File Setup

  • The speaker mentions the need for specific file organization before sharing files via email, ensuring that participants can download them correctly.
  • Emphasizes the importance of having files in exact folders for proper functionality, indicating a structured approach to data management.

Overview of Sales Data Files

  • Introduces two sales data files from 2017 and 2018, highlighting their structure which includes columns like date, order ID, customer number, trip details, salesperson, trip price, and year.
  • Describes the context of the data as related to a travel agency's annual sales records stored in a designated folder.

Automating Data Updates

  • Discusses the goal of automating updates when new files are added to the folder by combining existing files into one cohesive dataset.
  • Begins demonstrating how to connect Excel to the folder containing sales data using Power Query.

Connecting Excel to Folder

  • The speaker selects the sales data folder on their desktop to establish a connection between Excel and this location.
  • Confirms successful identification of both 2017 and 2018 sales files within the selected folder.

Combining Files in Power Query

  • Explains that if files share identical structures (same column headings), they can be combined into one comprehensive file for analysis.
  • Mentions an option for transforming data if needed before combining it; encourages checking for necessary clean-up tasks.

Navigating Power Query Interface

  • Introduces Power Query as a tool for cleaning up data visually rather than manually in Excel.
  • Lists capabilities such as removing blank rows, splitting columns by delimiter, find-and-replace functions, and formatting adjustments available within Power Query.

Combining Sales Data Files

  • Demonstrates how to combine both 2017 and 2018 sales files into one long dataset using downward arrows in Power Query.
  • Shows a preview of combined content confirming successful integration of both datasets while maintaining clarity on source names.

Preparing Data for Analysis

  • Discusses reviewing combined data columns before creating pivot tables; emphasizes evaluating whether all columns are necessary.
  • Highlights removing unnecessary columns (like source name), ensuring only relevant information is retained for analysis.

Data Cleaning and Analysis Techniques in Power Query

Overview of Data Tidying

  • The speaker discusses the importance of checking number formats in data to ensure accuracy before analysis.
  • A specific example is given regarding a trip column that combines trip codes and names, which complicates analysis using pivot tables.

Splitting Columns for Better Analysis

  • To facilitate analysis, the speaker suggests separating the trip code from the trip name into distinct columns.
  • In Power Query, this can be done easily by selecting the column and using the "Split Column" feature by delimiter (in this case, a dash).

Cleaning Up Erroneous Spaces

  • After splitting columns, there may be unwanted spaces; applying a trim function helps clean these up effectively.
  • The speaker mentions additional formatting options available in Power Query such as cleaning functions and text transformations.

Loading Data into Pivot Tables

  • Once data is cleaned, it can be loaded directly into a pivot table for further analysis.
  • The process includes creating a pivot table report where fields like trip name and price can be analyzed over different years.

Refreshing Data Connections

  • When new data (e.g., 2019 sales data) is added to the connected folder, refreshing the pivot table updates all related analyses automatically.
  • This automatic refresh capability highlights an advantage of using Power Query for maintaining up-to-date analyses without manual intervention.

Summary of Techniques Learned

  • The session concludes with a recap of techniques covered: cleaning data in Excel vs. Power Query, utilizing Flash Fill for quick fixes, and establishing connections for ongoing data importation.
  • Emphasis is placed on how these skills are beneficial for anyone involved in extensive data analysis tasks.
Video description

✅Download the free demo file and follow along here ➡️https://www.simonsezit.com/cleaning-data-in-excel-webinar-demo-files In this Microsoft Excel tutorial, we'll walk you through various formulas and functions that can help you with cleaning data in Excel. We'll discuss the common issues you may encounter with your data set and teach you various ways to clean and manipulate data before moving on to using Power Query to import, combine, and clean up data. Check out the latest Microsoft courses from Simon Sez IT 🖥️ https://www.simonsezit.com/course-category/microsoft/ 🌍 Get unlimited training with Simon Sez IT’s 200+ courses ⏩ https://www.simonsezit.com/courses/ 🚀 Supercharge your skills with a FREE 14-day trial to our extensive training library. Learn from 10,000+ ad-free videos and begin your transformation today! ➡️ https://simonsezit.lpages.co/ssit-14-day-free/ Looking for Excel tricks or Excel tutoring? Check out our Excel 2021 tutorials for intermediate users playlist here: 🔥https://www.youtube.com/playlist?list=PLzj7TwUeMQ3jhT7yg37LKe8aSTA7OuWn3 For more recorded webinars from Simon Sez IT, take a look at this playlist 🔥 https://www.youtube.com/playlist?list=PLzj7TwUeMQ3i4Byo1_NLrPDrxNp2NK66s Check out our FREE 300+ hour training course playlist here ➡️ 🔥https://www.youtube.com/playlist?list=PLzj7TwUeMQ3hsADWJd2sJh4o8jATTyaRU 🖥️Watch more popular Microsoft tutorials from Simon Sez IT: ✅Excel 2021 Tutorials for Beginners playlist 👉https://www.youtube.com/playlist?list=PLzj7TwUeMQ3jqRlmIaj6jb_X9EGQueMDb ✅Microsoft Office 2021 and Office 365 Training Tutorials 👉https://www.youtube.com/playlist?list=PLzj7TwUeMQ3hH_MxteY6LR3OSMHpDkt_j 💬Stay in touch! ✅SimonSezIT.com: 👉https://www.SimonSezIT.com/ ✅StreamSkill.com: 👉https://StreamSkill.com/ ✅YouTube Channel: 👉https://www.youtube.com/user/simonsezittraining ✅LinkedIn: 👉https://www.linkedin.com/company/simon-sez-it/ Please SUBSCRIBE 🔔 https://www.youtube.com/simonsezittraining?sub_confirmation=1 ♥️ We appreciate your support! If you enjoyed the video, please give a thumbs up 👍🏽 ;-)