CS50x 2025 - Lecture 4 - Memory
Introduction to Pointers
- CS50 Week 4 focuses on memory and introduces pointers, a key concept in understanding how computers work.
Challenges of Learning Pointers
- The topic may be challenging; it might not fully sink in during the first exposure.
- Expect a plateau in difficulty soon, leading to smoother sailing after this week.
Representation of Images
- Information representation is crucial; images consist of pixels with bits/bytes defining colors.
- Enhancing an image reveals pixelation rather than clarity due to finite information.
Bitmap Images Explained
- Bitmap images are grids of bits (0's and 1's), where each bit represents color states like black or white.
Color Representation in Images
- Color images require more than one bit per pixel; typically, 24 bits (3 bytes for RGB).
Student Art Project Introduction
- Two students demonstrate pixel art using Post-it notes as their canvas.
Student Introductions and Artwork Reveal
- Students introduce themselves and reveal their artwork created from a grid of pixels.
Audience Interaction on Artwork Interpretation
- Audience guesses the artwork's subject, which is identified as a palm tree on an island.
Past Student Art Examples
Understanding Data Manipulation
- Introduction to manipulating data at a lower level, focusing on images and their RGB representation.
- Introduction of hexadecimal (base 16) as an additional numeral system alongside binary (base 2) and decimal (base 10).
- Explanation of hexadecimal notation in tools like Photoshop, where colors are represented by six-digit codes.
Color Representation in Hexadecimal
- Color black is represented as
000000in hexadecimal, indicating no red, green, or blue.
- Color white is represented as
FFFFFF, showing full intensity of red, green, and blue light.
- Specific color codes:
FF0000for red,00FF00for green, and0000FFfor blue.
Hexadecimal System Explained
- Hexadecimal uses letters A-F to represent values 10-15; this is a convention for easier representation.
- The system allows humans to work with more symbols than binary while still being compatible with computer memory.
- Hexadecimal notation simplifies the representation of numbers beyond the decimal system.
Mathematics Behind Hexadecimal
- Hexadecimal operates similarly to other numeral systems but uses powers of 16 instead of 10 or 2.
- Two-digit hexadecimal values represent numbers from 0 to 255; e.g.,
00is 0 and01is 1.
- Counting continues up to
0F, which represents decimal value 15.
Counting in Hexadecimal
- In hexadecimal, the number representing decimal value 10 is written as
0A.
- The highest two-digit hexadecimal number is
FF, equating to decimal value 255.
Understanding Hexadecimal Representation
- To represent numbers 0 through 15, four bits are needed in binary.
- Two hexadecimal digits can describe a single byte (8 bits), making it visually simpler.
- Memory addresses in computers use hexadecimal notation instead of decimal or binary.
Hexadecimal Ambiguities
- Addresses start from 0 to F in hexadecimal, creating potential ambiguities with decimal numbers.
- Conventionally, "0x" is prefixed to hexadecimal numbers to eliminate ambiguity for readers.
Practical Application of Variables
- A variable
nis initialized to 50; the program will explore memory locations.
- Computer memory has unique addresses similar to real-world mailboxes.
Memory and Data Representation
- The integer
nis stored in memory as a pattern of bits representing the number 50.
- Typically, integers occupy 4 bytes (32 bits), which represent values like 50 in binary.
Exploring Memory Addresses
- The variable
nmight be located at an arbitrary address like0x123.
Understanding Pointers and Memory Addresses in C
Address of Operator and Dereference Operator
- The address of operator (
&) gives the memory location of a variable.
- The dereference operator (
*) accesses the value at a given memory address.
- Both operators work together: one retrieves an address, while the other accesses data at that address.
Using printf with Pointers
printfformat code%pprints pointers or memory addresses.
- A pointer is a variable that stores an address, declared using
int *p.
- Syntax for declaring a pointer involves specifying its type followed by
*, e.g.,int *p.
Assigning Addresses to Pointers
- Use
&nto get the address of variablenand assign it to pointerp.
- Code reads right to left; thus,
&nretrieves the address ofn.
- This process allows low-level control over memory management in C.
Compiling and Running Code
- Compile code using commands like
make addressesand run with./addresses.
- Errors can occur if syntax mistakes are made, such as missing semicolons.
- Memory addresses vary each time the program runs due to dynamic allocation.
Recap on Pointer Usage
- The variable
pholds the address of integer variablen.
- Recompiling may yield different addresses due to changes in memory organization.
Understanding Pointers in C
Memory and Variable Addresses
- Discusses the importance of understanding variable memory locations and how to access them.
- Introduces the concept of printing variable addresses using
printfwith%p.
- Compares methods of printing variable values from previous weeks.
Operators: Ampersand and Star
- Explains the use of ampersand (
&) for getting addresses and star (*) for dereferencing pointers.
- Highlights the complexity of using the star operator in different contexts within C programming.
- Clarifies that the star is used both for declaring pointers and dereferencing them.
Pointer Size and Memory Representation
- Describes how pointers are declared with types but used without specifying types afterward.
- Illustrates a memory grid showing where variables like
nandpreside in memory.
- Notes that pointer sizes are typically 64 bits on modern systems due to large memory capacities.
Questions About Pointers
- Answers questions about multiple pointers in main, emphasizing naming conventions for differentiation.
- Confirms that pointers have their own addresses, leading to concepts like "pointers to pointers."
Understanding Pointers and Memory
- Pointers allow access to specific memory locations, but the actual address is often irrelevant to programmers.
- Discussion on virtual memory mapping to physical memory; focus remains on practical applications rather than technical details.
- The concept of pointers abstracts away specific addresses, using arrows as a metaphor for pointing to memory locations.
Building Data Structures with Pointers
- Upcoming lessons will involve creating data structures in memory, utilizing pointers to connect various locations.
- Addresses in memory can be likened to physical addresses, such as mailboxes representing variables like pointers.
- Example of pointer 'p' storing an address while another variable 'n' exists at a different address.
Dereferencing Pointers
- Dereferencing a pointer involves accessing the value stored at the address it points to, akin to checking a mailbox.
- Demonstration of dereferencing with visual aids (foam fingers), illustrating how pointers lead to actual values in memory.
- Audience engagement during the demonstration highlights understanding of dereferencing concepts.
Revisiting Strings in C
- Clarification that strings are not a built-in data type in C; they are treated as arrays of characters instead.
- The term "string" is convenient but does not exist as a keyword in C programming language.
- Example provided showing how strings are created and manipulated using character arrays and null terminators.
Common Errors with Strings
- Highlighting common mistakes when declaring strings, such as undeclared identifiers leading to errors during compilation.
Understanding Strings in C
The Nature of Strings
Stringis undeclared in C by default; it may be mistaken for a typo related tostdin.
- Including
cs50.hresolves the issue, allowing the use of strings in C.
- Strings are arrays of contiguous memory, with each character occupying one byte.
Memory Addresses and Pointers
- Each character's address can be calculated based on its position in memory.
- Using
%pdisplays the address of variables, revealing that it holds a hexadecimal address.
- Printing addresses using array notation shows that
shas the same address ass.
How printf Handles Strings
- The
%sformat specifier iterates through characters until it encounters a null byte (0).
- In memory,
sis a pointer storing only the first byte's address of the string.
- The null byte indicates where strings end, which is crucial for functions like
printf.
Comparison with Higher-Level Languages
- Other languages (e.g., Java, Python) track string length automatically, unlike C.
- C's efficiency comes from leaving string length management to developers.
Understanding Strings in C Programming
Overview of String Representation
- The course design helps understand high-level concepts before diving deeper into memory representation.
- Strings are declared as
string s = "HI!", but they are essentially addresses pointing to the first character.
- A string is defined as the address of its first character, leading to the use of
char *for lower-level string handling.
Data Types and Typedef
- In C, a
char *represents strings at a lower level, with the last character's address indicated by a null terminator.
- Custom data types can be created using
typedef, such as defining a person structure with attributes like name and number.
- Using
typedefallows creating aliases for existing types, simplifying code readability.
Creating Synonyms for Data Types
- An example of creating an alias:
typedef int integer;makes it easier to remember thatintstands for integer.
- To define a string type, one can use
typedef char *, which may seem complex but serves to simplify understanding.
CS50 Library and String Definition
- The line in cs50.h defines the keyword string as an alias for
char *, facilitating easier usage in code.
- Clarification on declaring types versus variables; declaring type 'string' does not violate conventions even if syntax varies slightly.
Practical Usage of Char Pointers
- After CS50, it's recommended to use
char *instead of string for real-world C programming applications.
- The transition from using CS50 library functions to native C code will occur next week, removing training wheels.
Addressing Characters in Strings
- When accessing characters within a string array, using '&' is necessary unless referencing the pointer itself.
Understanding Strings in C
Introduction to Strings
%sis used inprintffor strings, although the term "string" does not exist in C.
- The concept of string is supported by
printf, and CS50's training wheels can be removed as understanding improves.
Typedef and Training Wheels
- Avoid using
typedef char * string;if comfortable with pointers; no need for training wheels.
- It's acceptable to continue using CS50's library for a while longer as you gain confidence.
Working with Characters
- Demonstration of printing characters manually from a string using pointer notation instead of square brackets.
- Pointer arithmetic allows manipulation of memory addresses directly, enhancing flexibility in accessing array elements.
Pointer Arithmetic Explained
- Addresses are just numbers; you can perform arithmetic on them to access different memory locations.
- Square bracket notation is syntactic sugar for pointer arithmetic, making code more readable.
Printing Parts of Strings
- You can print parts of a string by adjusting the starting address (e.g.,
printf("%sn", s + 1);).
Understanding Memory Manipulation in Programming
Memory and Operators
- Discusses the trade-offs of using simple operators like star (*) and ampersand (&) for memory manipulation.
- Mentions the CrowdStrike incident as an example of how simple problems can lead to significant system failures.
Code Correction and Style
- Corrects a line of code from cs50.h regarding stylistic conventions, emphasizing no space after the star in type declarations.
- Introduces a review of computer memory and previous problems to understand current programming practices better.
Comparing Integers vs. Strings
- Describes a program that compares two integers input by the user, checking for equality.
- Demonstrates successful integer comparison with examples (e.g., 50 vs. 50).
Issues with String Comparison
- Explains why string comparison using equals equals (==) does not work as intended.
- Modifies code to compare strings instead of integers, prompting users for string inputs.
Addressing String Comparison Problems
- Observes unexpected results when comparing identical strings due to memory address differences.
- Clarifies that strings are stored as addresses pointing to character arrays, affecting comparison outcomes.
Memory Layout for Strings
- Illustrates how strings occupy different memory addresses, leading to discrepancies in comparisons.
Understanding String Comparison in C
Memory Addresses and String Pointers
- Strings are stored in memory at consecutive addresses, with pointers
sandtpointing to different chunks.
- Using
==compares memory addresses, not string content; different addresses indicate different strings.
Introduction of strcmp Function
- The
strcmpfunction is introduced to compare string contents instead of addresses. It returns 0 if strings are equal.
- When using
strcmp, identical strings return 0, while different strings yield non-zero values based on character comparison.
Implementation Details of strcmp
- The implementation likely uses a loop to compare characters until a null terminator is reached.
- It checks each character's ASCII value without needing subtraction for comparison.
Memory Allocation for Strings
- Two separate memory locations are used: one for the pointer and another for the actual string data.
- Functions like
get_stringallocate memory for the input string separately from the pointer variable.
Demonstrating Address Differences
- Printing addresses with
%pshows that even identical strings occupy different memory locations.
- It's uncommon to print addresses as programmers typically focus on string content rather than their locations.
Preparing to Manipulate Strings
Understanding String Manipulation in C
Copying Strings and Capitalization
- Introduces string copying:
string t = sand capitalizing the first letter usingtoupper.
- Explains how
toupperfunction fromctype.hchanges the first character of stringt.
- Demonstrates printing both original (
s) and modified (t) strings to observe changes.
Addressing Memory Issues
- Discusses unexpected behavior where both strings appear capitalized due to shared memory addresses.
- Clarifies that both variables point to the same address, leading to simultaneous changes.
- Illustrates memory locations for
sandt, emphasizing they reference the same data.
Properly Allocating Memory
- Suggests a need for better memory management when copying strings.
- Introduces
mallocfor dynamic memory allocation, allowing proper storage of string characters.
- Explains how to use
mallocto allocate enough bytes based on the length of strings.
Implementing Safe String Copying
- Proposes storing the address of allocated memory in variable
t.
- Advises against hardcoding sizes; instead, calculate required bytes dynamically.
Memory Management and String Copying in C
Understanding Null Characters
- Discusses the need to manually add a null character at the end of a string when copying.
- Emphasizes iterating through the entire string, including the null character.
Using malloc for Memory Allocation
- Introduces
stdlib.hfor usingmallocto allocate memory dynamically.
- Explains how
mallocreturns an address for allocated memory, which is crucial for string manipulation.
Copying Strings Safely
- Describes how characters are copied from one string to another using a loop.
- Highlights that proper memory allocation allows safe copying of strings, including handling null characters.
Error Handling with malloc
- Warns about potential failures of
malloc, which can return 0 if no memory is available.
- Clarifies confusion between 'null' (0) and 'NULL' (pointer), indicating errors in memory allocation.
Defensive Programming Practices
- Suggests checking if the pointer returned by
mallocis NULL before proceeding with operations.
- Advises against manipulating unallocated or invalid memory to prevent crashes or undefined behavior.
Simplifying Code with Library Functions
Memory Management in C
Understanding Memory Allocation
- The function copies data from source to destination using a loop, simplifying code implementation.
- Questions arise about memory allocation and string assignment in adjacent variables.
- Allocating two variables next to each other can lead to overflow if one variable holds a long string.
Memory Safety Practices
get_stringallocates memory conservatively for user input, preventing crashes unless excessive input is given.
- A memory leak occurs when allocated memory is not freed after use, which can cause programs to crash over time.
- Programs that run indefinitely must manage memory properly to avoid running out of resources.
Debugging Memory Issues
- Freeing allocated memory at the end of its use prevents leaks;
freereversesmalloc.
- Tools like Valgrind help identify memory-related errors in programs, aiding debugging efforts.
Example Code Analysis
- An example program demonstrates allocating space for integers and highlights potential bugs in array indexing.
- Using
sizeofensures compatibility across different systems by dynamically determining variable size.
- Mistakes include incorrect array indexing and failure to free allocated memory after use.
Utilizing Valgrind for Debugging
- Valgrind helps detect mistakes related to memory management that may not be caught by other tools like debug50.
Memory Management and Debugging with Valgrind
Understanding Memory Errors
- The program runs without crashing but has bugs that could lead to crashes in larger software.
- Running Valgrind reveals an "invalid write of size 4," indicating a memory issue related to 4 bytes.
- The error is traced back to
memory.cline 9, where an incorrect index (1-indexed instead of 0-indexed) is used.
Fixing Memory Issues
- Adjust the array indices from 1,2,3 to 0,1,2 to correct the indexing error.
- Valgrind also indicates a memory leak at
memory.cline 6, suggesting that memory allocated is not being freed.
- Identifying leaks requires freeing allocated memory; in this case, variable
xneeds to be freed.
Verifying Corrections
- After making changes and recompiling the program, it still runs without crashing but appears more correct.
- Running Valgrind again shows no memory leaks: "All heap blocks were freed."
- Valgrind is a useful tool for checking memory-related mistakes in code.
Handling Garbage Values in C
Introduction to Garbage Values
- Discusses setting conditions before freeing memory; conditionals can be useful in complex programs.
- Introduces a new file
garbage.cto demonstrate garbage values left in uninitialized memory locations.
Demonstrating Garbage Values
- Creates an array of scores without initializing them or prompting user input for values.
- Iterates through the array and prints out values which are unpredictable due to lack of initialization.
Observations on Output
- Outputs show random scores including zeros and negative numbers—indicative of garbage values.
- Emphasizes unpredictability of uninitialized variables leading to confusing outputs when printed multiple times.
Managing Output Display
Understanding Output Pagination
- Using the Enter key shows only the first screenful of output; space key paginates further.
- Introduces a program with two variables, x and y, for storing integer addresses without initial values.
- Allocates memory for an integer in x but not for y, leading to potential issues.
Memory Management Issues
- Assigning a value to y without allocation can cause crashes due to garbage values.
- The problematic line is identified as it attempts to write to an uninitialized pointer.
- A video from Stanford introduces Binky, illustrating pointer concepts through animation.
Pointer Concepts Explained
- Binky learns that pointers initially do not point anywhere until assigned.
- Code allocates integers but requires separate steps to set up pointees correctly.
- Demonstration of dereferencing pointer x successfully stores 42 in its pointee.
Dereferencing and Pointer Assignment
- Attempting to store 13 via pointer y fails due to lack of initialization.
- Correctly assigning y to point at the same location as x resolves the issue.
- Both pointers now share the same pointee, allowing successful dereferencing.
Implications of Memory Mismanagement
- Binky's story highlights dangers of accessing uninitialized memory and garbage values.
- Discusses swapping values in programming contexts like bubble sort or selection sort.
Introduction to Variable Swapping
- Gabe introduces himself as a freshman from Thayer.
- David explains the concept of variables using two glasses of water representing different values.
- Gabe expresses concern about blending the colors when swapping the contents.
Implementing Variable Swapping
- David suggests using a third glass as a temporary variable for swapping.
- The process of swapping is linked to C code implementation, emphasizing the need for a temporary variable.
- David outlines a simple swap function in C that takes two integer values.
Understanding Scope and Function Behavior
- The swap function correctly swaps values but only within its own scope.
- Variables exist only within their defined context, affecting how changes are reflected outside the function.
- A demonstration program shows that original variables remain unchanged after calling swap.
Debugging and Memory Concepts
- Using debug tools reveals that while local variables change, original values do not reflect those changes.
- The debugger illustrates how passing by value works, showing copies of variables being manipulated.
Understanding Memory Management in Programming
Memory Loading and Structure
- Programs load machine code into memory when executed, which includes global variables defined outside of the main function.
- The heap grows downwards for dynamic memory allocation (malloc), while the stack grows upwards for local variables and function calls.
Function Call Mechanics
- Each function call allocates a new frame on the stack, containing its own local variables separate from those in the calling function.
- Local variables in functions are stored in their respective frames, leading to potential confusion when trying to swap values between them.
Value Passing vs. Reference Passing
- When passing arguments by value, changes made within a function do not affect the original variables in the calling function.
Understanding Pointer Manipulation in C
Code Implementation and Changes
- The value of
tmpis assigned to the address inb, using pointer declarations and dereferencing.
- Changes are made to pass the addresses of integers instead of their values, requiring syntax adjustments.
- The function now requires passing the addresses of
xandyusing&xand&y.
Compilation Issues
- An error occurs due to incorrect address passing; adjustments are needed in the prototype.
- After correcting the code, it compiles successfully, allowing for proper swapping of values.
Memory Management Concepts
- In memory context, variables
aandbhold addresses rather than direct values, enabling reference manipulation.
- Passing by reference involves sending addresses instead of actual values, which allows functions to modify original data.
- The process involves storing a value from one address into another through temporary storage.
Stack and Heap Considerations
- After function execution, changes persist in main's version of variables due to effective memory manipulation.
- Function stack remnants may remain after return unless explicitly cleared by the compiler for performance reasons.
- Logical errors can occur if memory management isn't handled properly; understanding stack vs. heap is crucial.
Memory Overflow Risks
- Excessive use of heap memory via malloc can lead to stack overflow or other memory issues over time.
Understanding Buffer Overflows
- Etymology of Stack Overflow: The term "stack overflow" refers to calling too many functions, causing memory overflow in the stack region.
- Buffer Overflow Explanation: A buffer can overflow if you access beyond its allocated size, leading to potential errors.
- Real-world Example: YouTube and Netflix use buffers for video streaming; overflowing these can cause significant issues.
CrowdStrike Incident
- Software Crash Impact: CrowdStrike's software crash affected major systems like Delta Airlines, resulting in millions lost due to downtime.
- Postmortem Findings: The issue stemmed from an out-of-bounds read caused by accessing a 21st value in a 20-value array.
- Consequences of Off-by-One Error: An off-by-one error led to system crashes and significant financial losses due to untested code.
Programming Language Safety
- C Language Vulnerabilities: C allows easy access beyond array boundaries, increasing the risk of bugs compared to languages like Python or Java.
- Defensive Programming Practices: Other languages incorporate safeguards against such errors due to historical buggy code practices.
- Risks of Automatic Updates: Automatic updates can introduce bugs that may compromise entire systems if not properly tested beforehand.
Implementing Input Functions
- CS50 Training Wheels Overview: Discusses using basic functions learned since week 1 for input handling in programming assignments.
- Challenges with Strings vs. Integers: Strings pose risks due to unknown user input length, while integers have fixed sizes making them easier to handle safely.
Understanding User Input in C Programming
Getting User Input with scanf
- The
printffunction prompts the user, whilescanfreads formatted input from the user.
- To read an integer, use
scanfand provide the address of the variable to store the value.
- Printing the value of
nafter reading it confirms successful input.
Error Handling in Input
- Unlike
get_int, custom loops can handle non-integer inputs by prompting users repeatedly.
- Transitioning to string input requires defining a character pointer instead of a string data type.
Reading Strings Safely
- Use
%sinscanfto read strings, but ensure proper memory allocation for safety.
- Avoid using uninitialized pointers as they may lead to undefined behavior or crashes.
Memory Management Concerns
- Uninitialized pointers can contain garbage values, leading to unpredictable results when used.
- Proper initialization is crucial; allocate memory using functions like
malloc.
Buffer Overflow Risks
- Without allocating sufficient space for strings, buffer overflows can occur, causing crashes.
- Treating pointers and arrays interchangeably can lead to confusion if not managed correctly.
Solutions for Safe String Handling
- Always initialize pointers before use; otherwise, they may point to invalid memory locations.
Understanding Buffer Overflow and Memory Management
- Discusses the risks of buffer overflow when using fixed-size buffers, highlighting potential issues with user input.
- Emphasizes the importance of using
get_stringto prevent buffer overflows by dynamically allocating memory as needed.
- Compares C's memory management with higher-level languages like Java and Python, noting their ease of use for input/output.
Memory Allocation Concerns
- Addresses the question of using large malloc allocations to avoid overflow, explaining it still leads to wasted memory.
- Warns against inefficient memory usage if a program allocates excessive space for unpredictable user input.
- Introduces file I/O as a new topic relevant for upcoming problem sets, requiring pointer syntax in C.
File Input/Output Functions in C
- Lists common file operations in C such as
fopen,fclose, andfprintf, indicating their real-world analogies.
- Explains that functions starting with 'f' typically relate to file operations, whether text or binary files.
- Prepares to demonstrate creating a simple phone book application in C that utilizes file I/O.
Creating a Phone Book Application
- Begins coding a phone book application that saves names and numbers into a CSV file format.
- Mentions including necessary libraries (
cs50.h,stdio.h,string.h) for easier string handling and I/O operations.
- Describes opening a CSV file in write mode using
fopenand explains the significance of the "w" mode.
Saving User Input to File
- Demonstrates prompting users for their name and number before saving this data into the opened CSV file.
- Highlights differences between
printf(console output) andfprintf(file output), emphasizing formatting capabilities.
Creating a Phonebook Application
- Opening
phonebook.csvand creating a phonebook file.
- Adding John Harvard's number to the CSV file; audience provides the correct number.
- Noting that previous entries disappear when overwriting; suggests using append mode.
Appending Data to Files
- Compiling and rerunning the program to add multiple contacts without losing data.
- Explaining how applications like Excel manage data similarly by appending rows.
Error Checking in File Operations
- Emphasizing the importance of error-checking when opening files, similar to memory allocation checks.
- Advising to check return values for validity before proceeding with file operations.
Implementing a Copy Command
- Introducing the concept of implementing a custom copy command (
cp) in C.
- Setting up
cp.cwith necessary headers and command line arguments for source and destination files.
Reading and Writing Files Byte by Byte
- Describing how to open files for reading and writing, emphasizing not using append mode for copying.
- Proposing byte-by-byte reading from source file using functions like
fread.
Looping Through File Data
How to Copy Files Byte by Byte
- Use
fwriteto save each byte to the destination file, specifying the address and size.
- Close both source and destination files after copying bytes one at a time without interpreting characters.
- In C, use
typedefto create a byte type withuint8_t, representing an unsigned 8-bit integer.
Implementing a Custom Copy Command
- Run the custom copy program
.cpto replicate files likeaddresses.cintobackup.c.
- The loop in the code reads bytes until
freadreturns 0, advancing automatically through the file.
- File reading functions behave like video playback, tracking cursor position as bytes are read.
Introduction to Bitmap Files and Image Manipulation
- This week focuses on bitmap files (BMP), which represent images as grids of pixels.
- Implement filters such as converting images to black and white or applying sepia tones.