Computer Architecture - Lecture 2: Trends, Tradeoffs and Design Fundamentals (Fall 2021)

Name: Computer Architecture - Lecture 2: Trends, Tradeoffs and Design Fundamentals (Fall 2021)
Uploaded: 2021-10-02T02:24:35.000Z
Duration: 5 h 46 min 40 s

Ending the Live Stream

The speaker asks how to end the live stream.

No bullet points available.

Technical Difficulties

The speaker expresses frustration with technical difficulties.

No bullet points available.

Starting Lecture on Computer Architecture

The speaker apologizes for technical difficulties and begins lecture on computer architecture.

Lecture will cover interesting trends, trade-offs, and issues in computer architecture.

Speaker plans to give a higher level perspective of an important issue in computing systems - reliability, security, safety, and privacy.

Dependability is critical as technology scales and we rely more on computing devices in our daily lives.

Examples of dependability issues include reliability problems, security problems, safety problems, and predictability problems.

Course Logistics

The speaker discusses course logistics.

Speaker plans to give course logistics at the end of the lecture.

Next week's lecture will be hybrid in-person.

Getting Started with Lecture

The speaker confirms that everyone can see him and begins discussing topics covered in yesterday's lecture.

Speaker confirms that he can see some confirmation from attendees but notes that cameras need to be wider screen.

Topics covered in yesterday's lecture include machine learning accelerators, processing in memory, genomic accelerators, and non-volatile memory.

Reliability Issues in Computing Systems

The speaker discusses reliability issues in computing systems.

Dependability is becoming increasingly important as technology scales and we rely more on computing devices in our daily lives.

Reliability, security, safety, and privacy are critical problems in computing systems today.

Robustness is a nice word that captures reliability, security, safety, and to some extent privacy.

Dependability issues affect robustness.

Examples of dependability issues include reliability problems, security problems, safety problems, and predictability problems.

RoHammer

The speaker discusses RoHammer.

RoHammer is a fascinating issue to look into in computer architecture because it's fundamentally a device and circuit level problem that has percolated into all the way into the software stack essentially.

Bit flips in modern memories can lead to reliability problems and can be exploited to take advantage of them.

Introduction to Row Hammer

The instructor introduces the concept of Row Hammer, which is a hardware failure mechanism that can lead to system security vulnerabilities.

What is Row Hammer?

Row Hammer is the ability to predictably induce bit flips in commodity memory chips.

More than 80% of tested DRAM chips were found to be vulnerable to this problem.

Inducing bit flips predictably leads to security problems because an attacker can figure out how to take over the system by inducing these bit flips at the right time and place.

This simple hardware failure mechanism can create a widespread system security vulnerability.

How Does Row Hammer Work?

When you activate one row and apply high voltage, it disturbs adjacent rows due to electrical disturbance caused by insufficient isolation between cells.

Each activation causes disturbance, and if done repeatedly, physically adjacent rows can get bit flips.

These are called victim rows, while the activated row is called an aggressor row.

Cells are getting smaller as technology scales up, making the problem worse.

Implications of Row Hammer

Repeatedly reading a row enough times before memory gets refreshed induces errors in adjacent rows in most real DRAM chips today.

Rowhammer Vulnerability

In this section, the speaker discusses the Rowhammer vulnerability and its impact on memory isolation in computing systems.

Memory Industry Vulnerability

More than 80% of modules manufactured by three major DRAM manufacturers are vulnerable to Rowhammer.

The memory industry is dominated by Samsung, Hynix, and Micron, and all their chips are vulnerable to Rowhammer.

Technology Scaling Problem

The problem is caused by pushing technology scaling too hard.

Newer chips have cells that are closer together and smaller, making them more vulnerable to disturbance effects that cause bit flips.

Reliability and Security Implications

Bit flips can break memory isolation between different memory addresses.

This can lead to both reliability and security problems as data can be corrupted or accessed without permission.

Clever people have developed attacks that take advantage of these bit flips to gain unrestricted access to systems or website visitors.

Exploiting Bit Flips

Bit flips can be exploited using JavaScript or GPUs.

An Android app has been released that hammers a phone's memory to identify which bits are vulnerable to Rowhammer.

Memory Access Engine

In this section, the speaker talks about how memory access engines can leak private data and destroy the accuracy of neural networks.

Memory Access Engine Vulnerabilities

Memory access engines can leak private data.

Rowhammer attacks can destroy the accuracy of neural networks.

Rowhammer attacks can cause bit flips that reduce accuracy to 10%.

Disturbance in DRAM

Disturbance in memory is a well-known effect, but Roehammer made it visible through software.

Roehammer exposed disturbance errors within a refresh interval in DRAM, which affects reliability, security, and safety.

Planning for Reliability Issues

Going forward with technology scaling, people should plan for reliability issues and any bit flips that may be happening.

Ignoring these issues is not an option.

Rowhammer Vulnerability

In this section, the speaker discusses the Rowhammer vulnerability and how it has become worse with newer chips. They also talk about how existing mitigation mechanisms are not effective.

Rowhammer Vulnerability

Roadhammer is getting much worse today with newer chips.

Newer chips are much more vulnerable to Rowhammer, and if you turn off the mitigation mechanisms that are employed, some chips' weakest cells fail only after 4800 activations.

The problem is getting worse because cells are smaller, noise is higher, and cells are closer to each other. Device-level solutions are not talked about in the literature.

Spatially, Rowhammer is becoming worse because you can induce bit flips in more rows and farther away from the victim row as well.

Existing mitigation mechanisms are not effective. Analysis of more than 1500 DM chips shows that.

Reverse Engineering Mitigation Mechanisms

Manufacturers introduced some mitigation mechanisms inside DRAM but didn't tell what it was at a high level. They were obscuring the security mechanisms that they employed in their devices.

This paper showed that you can reverse engineer the internal mitigation mechanisms that DM manufacturers employ inside their GM chips just enough so that you can actually create some custom access patterns to DRAM that would circumvent those mitigation mechanisms.

Mitigation mechanisms are not secure. The approach taken to secure DRM was not good from an industry perspective because they were a little bit hush about it.

Qualifying a DRAM Chip for Rowhammer-Free

It is very difficult to qualify a DRAM chip to be Rowhammer-free.

Refresh Rates and Row Hammering

In this section, the speaker discusses the downsides of increasing refresh rates to solve row hammering issues. They also mention other solutions such as co-design of memory controller and DRAM chip, using performance counters at the software level, and how solutions can span the entire stack.

Increasing Refresh Rates

Increasing refresh rates has downsides such as reduced performance and energy.

To make DRAM fully secure based on what they have tested, you need to increase your refresh rates by 7x or even 16x.

Some DRM chips internally increase their refresh rates according to experiments.

Co-design of Memory Controller and DRAM Chip

The block hammer solution is an example of co-design of memory controller and DRAM chip.

Inside the memory controller, you can take action to reduce the probability of hammering rows by detecting which rows are being hammered potentially and then throttling down accesses to those rows.

Other Solutions

Solutions can be at the device level, circuit level, architecture level, logic level, software level or system software level.

Lectures that will be done in this semester will cover more up-to-date research than older lectures online.

Cutting Edge Research on Row Hammering

In this section, the speaker talks about Google's recent introduction of new row hammer patterns that are not handled by mitigation mechanisms. They also discuss their own research on improving access patterns in row hammer attacks.

Google's New Row Hammer Patterns

Google recently introduced new row hammer patterns that they claim are not handled by mitigation mechanisms.

Improving Access Patterns in Row Hammer Attacks

They conducted research on different ways of improving the access patterns in row hammer attacks.

Understanding the sensitivity of cells for different effects on row hammer such as temperature, spatial location, and how long you can keep the row active can help construct better row hammer attacks.

Reverse Engineering DRAM Chips for Rowhammer Attacks

In this section, the speaker introduces a methodology to reverse engineer what's happening inside the DRAM chip to protect against rowhammer attacks. By inducing data retention errors in DRAM, one can figure out how the internal mitigation mechanisms are working.

Methodology for Reverse Engineering DRAM Chips

Inducing data retention errors in DRAM enables reverse engineering of internal mitigation mechanisms.

This methodology allows for newer and more powerful rowhammer attacks.

To make a system fully secure, it is important to prove its security through collaboration with experts in security.

Failure to prove the security of mitigation mechanisms can lead to vulnerabilities that can be exploited relatively easily.

Fault Attacks and Rowhammer

Rowhammer is an example of fault attacks that induce hardware faults such as single bit flips.

Fault attacks require physical access to the computer and privileges that are not normally available.

The beauty or scary thing about rowhammer is that anyone can do it at the user level through software, making it difficult to detect.

Conclusion

Raw hammer is still an open problem and security by obscurity is not a good solution.

Why Manufacturers are not taking Rowhammer seriously?

In this section, the speaker discusses why manufacturers may not be taking Rowhammer seriously and the challenges associated with solving the problem.

Challenges in the DRAM Industry

The DRAM industry is commodity-driven and cost-oriented.

Reducing/increasing capacity of DRAM chips is prioritized over addressing security concerns like Rowhammer.

Processing in memory has not been enabled for 50 years despite good arguments for it due to business implications and limitations of the DRAM interface.

Solutions to Rowhammer

The solution can be done by the manufacturer or system. Intel developed a solution that was adopted by manufacturers but was not effective enough.

Manufacturers are taking Rowhammer seriously, but they may not be doing a good job in solving the problem due to its complexity.

Mitigations for Rowhammer Vulnerabilities on Cloud Servers

In this section, the speaker discusses whether cloud providers have mitigations in place for Rowhammer vulnerabilities.

Cloud Providers' Response to Rowhammer

Cloud providers formed task groups in 2020 to solve the problem after becoming worried about published research results.

Microsoft Research and Google Research are involved in finding solutions.

ECC error correcting codes employed by cloud systems are not enough to solve the problem as more than two bit flips can occur.

Meltdown and Spectre Vulnerabilities

In this section, the speaker briefly mentions Meltdown and Spectre vulnerabilities at a microarchitectural level.

Meltdown and Spectre

Meltdown and Spectre are issues at the microarchitectural level that cause information leakage.

Spectre and Meltdown Attacks

In this section, the speaker explains how speculative execution works and how it can be exploited by malicious programs to access secret data. The speaker also discusses the differences between Spectre and Meltdown attacks.

How Speculative Execution Works

Speculative execution uses branch prediction to execute instructions on the wrong path of an instruction stream.

This leaves traces of secret data in the processor's cache that can be accessed by a malicious program using cache timing or side channel attacks.

A malicious program can run concurrently with another program accessing privacy-critical data, bring some data into the cache, and infer what that secret data is.

Spectre Attack

Spectre attack is fundamental to how speculative execution works.

It is harder to execute than Rowhammer because it requires aligning a malicious program with another program accessing security-critical data and timing the accesses made by both programs.

The timing differences between these accesses provide a side channel for inferring what data values are being accessed by the security-critical program.

Meltdown Attack

Meltdown attack is easier to solve than Spectre attack.

It involves polluting branch target buffers and branch prediction structures such that another program gets branch mispredictions more often at the right times.

This allows leaking information across local security boundaries but does not allow taking over your entire system like Rowhammer.

Additional Resources

For more information about these attacks, you can read a blog post by one of their discoverers, Jan Horn from Google Project Zero.

There are also other papers related to Spectre and Meltdown that you can easily find online.

Solutions for Spectre and Meltdown

In this section, the speaker discusses solutions to the Spectre and Meltdown vulnerabilities.

Speculative Execution as a Solution

Turning off speculative execution is a solution but it destroys the performance of a pipelined out-of-order processor.

We need other solutions like not running security-critical applications together with others.

Microarchitecture Design Choices

Research showed that microarchitecture design choices related to instruction execution can lead to side channels.

Power management-related attacks exploit current management mechanisms in a similar way to how Spectre and Meltdown exploit speculative execution.

Balancing Performance and Security

People are trying to find a balance between performance gained from speculative execution and security holes created by it.

Microarchitectural design choices made to improve performance or manage power create side channels or covert channels inside the system.

Demanding Workloads

In this section, the speaker talks about how workloads have become more demanding due to data deluge, pushing computing platforms increasingly strained.

Increasingly Demanding Applications

Applications are becoming increasingly demanding due to data deluge.

Computing platforms will become increasingly strained as applications push boundaries.

The Need for More Performance

Existing systems' performance is more than three-four orders of magnitude higher than Mips R2000 in 1980's.

As humans, we have increasingly important problems that require a lot of data. These problems are fundamental in science, technology, engineering, and our daily lives.

Genome Analysis and Its Applications

In this section, the speaker discusses genome analysis and its applications in various fields such as medicine, disease control, and outbreak surveillance.

Genome Analysis for Medical Decisions

Genome analysis can help make better medical decisions by understanding diseases better.

It can also help understand how species are related to each other.

One example of genome analysis is the Oxford Nanopore device that can sequence someone's genome relatively cheaply.

However, processing the data is a bottleneck in genome analysis.

Genome Analysis for Disease Control

Genome analysis was used to understand the spread of COVID-19 by doing genome analysis on humans as well as the virus.

Genome analysis can enable us to understand how a disease spreads, mutates, and affects different methods for fighting it.

Rapid surveillance of outbreaks like Ebola was possible with portable genome sequencing devices.

Other Applications of Genome Analysis

Genome analysis can be used to analyze genomes for good by predicting whether something will cause an outbreak or not.

A study was done in 2015 on city-scale microbiome profiling to understand what kind of species are there and what they may cause potentially for the spread of disease.

Technology Scaling in Genome Sequencing

Technology scaling has enabled higher throughput and lower cost in genome sequencers.

Genome Sequencing and Technology Scaling

In this section, the speaker discusses the importance of genome sequencing and how it can be seen as a data collection engine. They also talk about the scaling of technology in genomic data production and how it is becoming much better than computation capability.

Genome Sequencing as a Data Collection Engine

Genome sequencing can be seen as a data collection engine that passes something you want to know through a device that understands what it is.

The bottleneck in genome sequencing is processing the large amount of data generated.

Two example papers show how acceleration using FPGAs or in-memory genome analysis can help keep up with the amount of data being generated.

Technology Scaling in Genomic Data Production

The number of genomes sequenced in the world is increasing exponentially due to technology scaling.

Moore's Law shows that the cost per raw megabase of DNA sequence is reducing much faster than the cost of a transistor, indicating that technology scale is happening much better in genome analysis today.

However, if your data generation capability greatly outweighs your computation capability, then you have a problem. This gap between computation capability and data generation capability is increasing and we have a databall.

Accelerating Genome Analysis

To solve this bottleneck, acceleration techniques are being pursued for genome analysis similar to machine learning and databases.

Algorithmic and software improvements should be tackled first to eliminate useless work before moving on to hardware/software co-designs.

Eliminating useless computation can speed up existing software by orders of magnitude.

Algorithm-Architecture Co-Design for Genome Analysis

In this section, the speaker discusses how algorithm-architecture co-design can improve performance and energy efficiency in genome analysis.

In-Memory Computing

In-memory computing improves performance significantly.

More detailed lectures on genome analysis as a case study of how to do hardware software code design and acceleration will be covered later.

Algorithm architecture co-design is an example that changes the algorithm and designs the FPGA according to the algorithm significantly.

Read Mapping Problem

The read mapping problem can be converted to the routing problem in VLSI.

Algorithms employed in VLSI routing are used to do read mapping much faster.

This is really the state of the art at this point in pre-alignment filtering.

Approximate String Matching

Approximate string matching is at the core of many genomic analyses.

Hardware software co-design mechanism is used for approximate string matching.

Challenges in Genome Analysis

In this section, the speaker talks about challenges faced during genome analysis, such as long analysis times and overwhelming amounts of data.

Nanopore Sequencing

Even with a simple task like analyzing COVID19, which takes one hour, nanopore sequencing takes seven hours RNA to answer.

Examining thousands of genomes together and trying to compare them to each other will explode analysis time significantly.

Metagenomic Nanopore Sequencing

Metagenomic nanopore sequencing briefly mentioned over here.

Data Intensive Workloads

Data is key for many workloads such as machine learning and genomics.

Performance and energy balance are required due to increasing amounts of data thrown at these workloads.

Mobile machine learning frameworks are also included.

Conclusion

In this section, the speaker concludes by referring to past lectures and discussing the overwhelming nature of modern machines due to data.

Refers to past lectures that will be repeated in a later lecture.

Data is really overwhelming modern machines.

Energy and Latency Disparity

In this section, the speaker discusses the disparity between energy and latency of data movement versus computation. The speaker explains that memory access consumes significantly more energy than a complicated floating-point arithmetic operation.

Energy Disparity

A complicated floating-point arithmetic operation is only 20 picojoules, while main memory read or write is 16 nanojoules, which is 800x different.

Memory access consumes two to three orders of magnitude more energy than a complex edition.

If faced with a trade-off like this, it may not make sense to bring data from memory to the processor chip to do a simple operation because it causes a lot of energy to move the data.

Performance Aspect

The stop for a floating-point operation is on the order of one to ten cycles, but memory read or write is on the order of hundreds of cycles.

There's also a performance aspect since latencies are problematic.

Technology Scaling

Technology scaling has brought us this new trade-off where we have a bottleneck in terms of memory and interconnect.

Logic has scaled well, but interconnect and memory cells did not scale as much.

As a result, technology scaling has created this new disparity between energy and latency.

Impact of Technology Scaling

In this section, the speaker talks about how technology scaling has impacted the disparity between energy and latency.

Historical Perspective

Computation was much more expensive than memory 70 years ago (1945/1950s).

Two to three orders of magnitudes difference existed at that time.

Technology Scaling Impact

Logic has scaled well over time by reducing transistor size and making them more efficient.

Interconnects and memory cells did not scale as much as transistors.

As a result, interconnects became the bottleneck and memory cell access became the bottleneck.

New Disparity

Technology scaling has created a new disparity between energy and latency due to this bottleneck.

Recent numbers show that the number is today closer to 160 or 180 instead of 800, but two orders of magnitude is still quite high in terms of the disparity in energy between different operations.