Linguagem Compilada vs Interpretada | Qual é melhor?

Name: Linguagem Compilada vs Interpretada | Qual é melhor?
Uploaded: 2022-04-15T00:00:00.000Z
Duration: 2 h 22 min 44 s

Understanding Compiled vs. Interpreted Languages

Introduction to Language Discussions

Fabio Akita introduces the common debate among beginners regarding the superiority of their favorite programming language, often focusing on whether compiled languages are better than interpreted ones.

The aim is not to declare one as superior but to highlight misconceptions in reasoning that overlook other important factors.

Educational Context

Akita notes that those studying or graduated from computer science will benefit more from this discussion, encouraging them to engage with technical details and ask questions.

He mentions that compiler and operating system topics typically span at least a year in computer science courses.

Practical Examples: Hello World Programs

A "Hello World" example in C is presented, demonstrating how source code is compiled into an ELF binary using the "cc" compiler.

The Java "Hello World" example illustrates a different process where javac generates a .class file, which requires the java command to execute, indicating it’s not a traditional compilation.

Understanding Executable Formats

Akita explains that for Linux to execute binaries, they must be in ELF format; Java's .class files do not meet this criterion.

He highlights the unique hexadecimal signature of ELF binaries (7f 45 4c 46), contrasting it with Java's class files starting with cafe babe.

Defining Compilation and Interpretation

By defining compilation as converting source code directly into machine-executable binaries, he concludes that Java does not fit this definition and is thus interpreted.

Both C and Java can be considered compiled languages under certain definitions; however, understanding their differences is crucial for avoiding online debates' pitfalls.

Theoretical Foundations of Programming Languages

Akita discusses programming languages as regular languages defined by formal grammar rules similar to natural language structures like paragraphs and sentences.

Understanding the Importance of Grammar in Programming

The Role of Grammar in Sentence Formation

Sentences must adhere to grammatical rules, including subjects, predicates, adjectives, pronouns, and verb tenses to convey meaning.

A missing comma can drastically change the meaning of a sentence; for example, "não vai ter jeito" vs. "não, vai ter jeito."

This concept parallels programming where punctuation errors can lead to bugs.

Text as Bytes in Programming

In C programming, text is merely a sequence of bytes or characters that lacks inherent meaning until processed.

For humans unfamiliar with programming, raw code also appears meaningless without context.

Tokenization: Breaking Down Code

Tokens are meaningful sequences of characters (e.g., keywords like "int" and "printf") that need to be identified from the text.

Lexical analysis involves defining lexemes—categories such as digits (0-9), letters (a-z), and punctuation marks.

The Process of Lexical Analysis

Understanding Lexical Analyzers

Tools like flex analyze C code to identify tokens such as signs, strings, digits, and operators.

Once tokens are identified, their meanings must be understood within the context of grammar rules.

Practical Example: A Simple Language

The speaker created a minimalistic programming language accepting expressions like 1 add 2 or 4 sub 3.

An interpreter written in JavaScript reads command-line arguments and processes files using libraries for path handling and file system operations.

Building a Lexer and Interpreter

Tokenization Process

The lexer uses simple string splitting to create an array of tokens from input text.

The interpreter executes commands based on token types using switch-case statements for operations like addition or subtraction.

Limitations and Future Considerations

The simplicity raises questions about supporting more complex expressions involving multiple numbers or operator precedence (e.g., multiplication before addition).

Syntax Analysis: Beyond Tokenization

Separating Syntax from Execution

To handle complexity in languages effectively, syntax analysis must be separated from execution time.

Defining Grammar Rules

Syntax analysis requires a defined grammar that specifies what constitutes valid expressions and functions within the language.

Tools for Defining Language Grammar

Traditional Tools for Parsing

Common tools include lex/flex for lexical analysis and bison/yacc for defining grammar structures.

Understanding Function Definitions

Understanding Compound Statements and C Syntax

Definition of Compound Statement

A compound statement in C can be defined as:

An empty block enclosed in braces.

A list of statements within braces.

A list of declarations followed by a list of statements.

The focus is on the official, complete, and unambiguous grammar defining functions in C.

Statement Lists and Types

A statement list can consist of:

One or more statements, indicating a recursive definition.

Various types of statements including labeled statements, compound statements, and iteration statements (e.g., while, do while, for).

Lexical Definitions and Grammar

The lexical definitions and Yacc grammar files provide the complete syntax and semantics of C:

These files are concise yet comprehensive for understanding language definitions.

Similar definitions exist for other languages like JavaScript on the EcmaScript website.

Parsing Code into Tokens

Purpose of Grammar in Programming Languages

The primary goal is to tokenize source code (text):

This process involves breaking down code into tokens that can be organized into data structures for manipulation.

Ultimately transforms source code into a Parse Tree.

Importance of Data Structures

Trees are crucial data structures in programming:

The parsing process converts text to tokens before organizing them into trees.

Abstract Syntax Trees (AST)

Representation of Expressions

Infix notation is common but not always optimal:

Example: 1 + 2 * 3 uses infix notation where operators are between operands.

Alternative Notations

Reverse Polish Notation (RPN):

Operands precede operators; e.g., inputting 3, 2, then * results in multiplication before addition.

Programming Execution Flow

Internal Representation Post-Parsing

After parsing, expressions are represented as prefix notation in trees:

For example: +(*(2,3),1) illustrates how operations are structured internally.

Practical Example with Java

Demonstrating execution with Java code:

Converts command-line arguments to integers using parseInt.

Understanding Java Compilation and Bytecode

The Role of javac and javap

The javac tool compiles Java code into bytecode, similar to how assembly language was used in older systems like the 6502 CPU. It processes the code through a parser before generating machine instructions.

Each line of Java code can generate multiple bytecode instructions; for instance, Integer.parseInt results in at least four instructions, highlighting the complexity behind seemingly simple operations.

Evolution of Compilers

Historically, programming directly in assembly was common due to limited memory and processing power. Modern compilers have evolved to produce more efficient assembly than manual coding could achieve.

The efficiency of modern compilers means that writing everything in assembly is generally unnecessary; only rare exceptions might warrant such an approach.

Understanding Stack Operations

In the example discussed, bytecodes like iload, imul, and iadd are used for stack operations—loading integers onto the stack, multiplying them, and adding results respectively.

A new class named Calc2 demonstrates hardcoding calculations directly into code (e.g., System.out.print(1 + 2 * 3)), which simplifies execution by eliminating unnecessary bytecode calls.

Compiler Optimization Techniques

When compiling expressions like 1 + 2 * 3, the compiler optimizes by pre-calculating constant values (resulting in a single bipush instruction instead of multiple load and arithmetic instructions).

This optimization illustrates how compilers rewrite code for efficiency, discarding redundant calculations that yield constant results.

Philosophy Behind Code Readability

Regardless of syntax preferences or coding styles (like indentation vs. braces), what ultimately matters is how efficiently a compiler can translate high-level code into machine instructions.

Clean code practices are essential not for machines but for human readability; programmers write for others' understanding—including their future selves—rather than solely for computer execution.

Historical Context on Resource Constraints

The Evolution of Programming Languages and Practices

Historical Context of Hardware Costs

In the past, the Nintendo console cost nearly $180, which adjusts to about $500 today, comparable to a PS5. This highlights the importance of optimizing code due to high hardware costs.

Back in 1983, wasting even 10 kilobytes of RAM could mean an additional $200 expense. Thus, programmers prioritized efficient coding practices over ease of programming.

Importance of Code Quality

Nowadays, wasting 1 or 2 gigabytes of RAM is less significant; however, programmer time has become more valuable. Efficient code saves time for developers rather than just machine resources.

Clean and organized code is crucial for future reference; it aids in understanding during emergencies when quick comprehension is necessary.

Compiler Technology and Its Limitations

No compiler can optimize poorly written code effectively; for example, using "select *" on a large database table can lead to inefficiencies that compilers cannot mitigate.

Abstract Syntax Trees (AST) are essential tools for programmers. They allow linters and static analysis tools to identify potential issues without parsing raw text directly.

Theoretical Foundations of Programming Languages

The concept of context-free grammar was introduced by John Backus in the 1960s while developing Fortran. His work laid foundational principles applicable to both natural and programming languages.

Backus created a meta-language called IAL that evolved into ALGOL, often considered a precursor to modern programming languages.

Legacy and Influence on Modern Languages

ALGOL's influence extends beyond C; it serves as the root for many contemporary languages like C++, Java, Python, and C#.

Dennis Ritchie developed C as a successor to B language after addressing limitations related to resource efficiency inherent in earlier languages like BCPL.

Notation Systems in Language Definition

Peter Naur recognized Backus's metalinguistic ideas leading to the term "Backus Normal Form" (BNF), although Donald Knuth later suggested it should be referred to as "Backus-Naur Form."

How Tools Like Flex, Bison, and Yacc Simplify Language Design

Introduction to Parser Generators

Tools such as Flex, Bison, and Yacc streamline the process of language design by allowing developers to focus on writing lexeme tables and grammar.

ANTLR is highlighted as a modern parser generator that is implemented in Java.

Parsing Beyond Programming Languages

Parsers are not limited to programming languages; they are also essential for processing configuration files like YAML or JSON.

Both YAML and JSON require a lexer and parser to convert them into usable objects in programming languages like JavaScript or Python.

Understanding the Document Object Model (DOM)

The DOM represents HTML documents as trees, created through parsing HTML content.

Just as the DOM can be manipulated in web development, compilers manipulate Abstract Syntax Trees (AST) for code optimization.

From AST to Bytecode

After generating an AST from Java source code, the compiler checks syntax and dependencies before converting it into bytecode.

Bytecodes are mnemonic representations of binary instructions that CPUs understand; examples include aaload, iload, etc.

The Role of Virtual Machines

Bytecode instructions target a virtual machine rather than physical hardware; this distinction clarifies differences between compilers and interpreters.

The Java Virtual Machine (JVM), along with similar environments for Python and JavaScript, executes programs within their respective virtual machines.

Interpreters vs. Virtual Machines

A virtual machine functions similarly to an interpreter by translating instructions from one machine format to another.

When using Python's command line interface (python), users interact with an interpreter that processes code immediately through a Read-Eval-Print Loop (REPL).

Dynamic Languages and AST Manipulation

Interpreters maintain an AST format that allows real-time modifications during execution, contributing to what defines dynamic languages.

Even statically typed languages like Java allow some level of runtime modification via class loaders or reflection APIs.

Differences Between Interpreters and Virtual Machines

Unlike traditional virtual machines that abstract hardware details from applications, interpreters directly execute high-level code without such isolation.

Understanding Compiled vs. Interpreted Languages

The Nature of Java and Python

Java is technically a compiled language as it generates a binary for the JVM, but it also functions as an interpreted language since it requires an interpreter to run on actual hardware.

Similarly, Python, JavaScript, Ruby, and PHP are interpreted languages that directly access system resources like disk and network sockets.

Intermediate Representations in Compilation

Most modern compilers do not convert code directly from the abstract syntax tree (AST) to machine instructions; instead, they use intermediate representations (IR).

Examples of IR include DotNet's Intermediate Language (IL), LLVM's Intermediate Representation (IR), and GCC's Register Transfer Language (RTL).

Optimization Techniques in Compilers

Compilers optimize code by pre-calculating expressions to reduce instruction count, such as simplifying 1 + 2 * 3 directly to 7.

Advanced optimizations can include eliminating unused code or rearranging execution order for efficiency without altering program output.

Transformation Passes in Compilation

The optimization phase involves multiple "passes" over the code to identify redundancies and improve performance through techniques like dead code elimination.

This stage is crucial for enhancing compiler efficiency and effectiveness while minimizing bugs during translation.

Structure of Modern Compilers

A typical compiler consists of three main components: front-end (parsing), middle-end (optimizing), and back-end (generating machine-specific instructions).

Compiler Optimization and Language Migration

The Role of LLVM in Compiler Development

The construction of compilers on top of LLVM allows existing compilers to optimize code without starting from scratch. Developers can focus on creating a parser that converts their language into Intermediate Representation (IR), similar to how TypeScript is transpiled to JavaScript.

When new hardware, like Apple's M1 chip, is introduced, only the back-end of the compiler needs updating to convert optimized IR into machine instructions. This modularity simplifies adaptation for new architectures.

Historical Context and Performance Gains

Apple’s rapid migration from PowerPC to Intel and then to ARM was facilitated by nearly two decades of investment in LLVM-based compiler technologies. This strategic development allowed seamless performance across different architectures.

Compilers are crucial for achieving high performance; they can leverage hardware design improvements while ensuring software compatibility without recompilation.

Understanding Different Compilation Strategies

Languages like C, Java, and JavaScript utilize different compilation strategies: C uses Ahead Of Time (AOT) compiling, while Java and JavaScript employ Just In Time (JIT) compilation techniques.

JIT compilation has various strategies tailored for specific problems, allowing dynamic optimization during runtime compared to static AOT methods used in languages like C.

Code Optimization Examples in Java

An example illustrates calculating the circumference of a circle using both direct computation and method abstraction. Initial implementations may lead to less readable code but demonstrate basic functionality.

By defining constants such as Math.PI, developers can improve code clarity. Searching for existing constants before defining new ones is encouraged as it promotes cleaner coding practices.

Disassembly Insights and Function Interfaces

Disassembling compiled code reveals optimizations made by the compiler, such as pre-calculating constant values instead of invoking functions unnecessarily.

Understanding Java's Class Loading and Compilation Process

The Nature of Java Classes

Java is a dynamic, interpreted language where code can be modified within the JVM. Classes are designed for reuse, meaning they do not operate in isolation.

When creating a new class (e.g., Calc5), it can reference another class (Calc3) without needing to recompile Calc3, as long as both classes are in the same directory.

Class Dependencies and Compilation

If Calc3.class is deleted, attempting to run Calc5 will result in a NoClassDefFoundError. This illustrates that Java compiles each class individually while maintaining stable interfaces between them.

Java's Reflection system allows manipulation of private methods, preventing optimizations that could break interfaces from being made.

Understanding Interfaces and Addressing

During compilation, the compiler notes function addresses (e.g., for circunferencia). The JVM assigns these addresses when loading classes rather than at compile time.

The JVM maintains an internal table mapping method calls to their respective addresses during execution, allowing independent compilation of classes.

Compiling with External Libraries

Unlike languages like C or C++, which require knowledge of final addresses at compile time due to lack of a virtual machine, Java’s architecture allows for more flexibility.

In large projects with many files, dependency management becomes crucial. Tools like Makefiles automate this process by checking dependencies before compiling.

Object Files and Function Interfaces

Compiling source code generates object files (e.g., .o files). These contain compiled binaries but do not yet have resolved external function addresses.

Header files (like stdio.h) provide function interfaces necessary for the compiler to validate correct usage without knowing actual memory locations until linking occurs later.

Understanding the Role of Linkers and JIT Compilers in Programming

The Function main and Initial Call Analysis

The discussion begins with focusing on the main function, starting at address 1139 in hexadecimal. A call is identified at address 1147, which points to a nearby address (1030).

At address 1030, there is a jump to another location with a comment indicating it relates to the puts function from glibc. This suggests that calling printf with a null optional parameter may switch to using puts.

Linker Functions and Optimization

The speaker explains that during compilation, an initial phase leaves placeholders for addresses until confirmed by the linker (GNU LD). GCC automates this linking process.

The linker not only links but can also optimize binaries further. It can perform actions like inlining functions that are not called elsewhere, reducing unnecessary jumps and returns.

Compiler vs. Linker Optimizations

C compilers analyze code for optimizations file by file; however, linkers re-evaluate all generated binaries collectively for additional optimizations.

Languages like Java or Python lack static linkers when compiling source code into intermediate bytecode, leading to less optimization compared to C.

Just-In-Time Compilation (JIT)

When loading Java projects, the JVM initializes and creates an internal table mapping classes and functions' memory addresses. This dynamic management replaces static linking.

Although JIT requires checking addresses dynamically, caching mechanisms enhance performance by storing frequently accessed data.

HotSpot Optimization Techniques

JIT compilers identify "hot spots" in code execution—frequently executed paths—and optimize them during runtime rather than compile time.

As programs run longer, they experience improved performance due to ongoing optimizations applied by the JIT compiler.

Comparison with Other Languages

JIT reduces performance gaps between interpreted languages like Java and compiled languages like C through real-time optimizations.

In JavaScript engines such as Google V8, similar processes occur where bytecode is generated from source code before being executed efficiently.

Execution of Bytecode in V8

Understanding the Optimization Process in JavaScript

The Role of V8 in Optimizing JavaScript Execution

The V8 engine analyzes execution to identify optimization opportunities, focusing on frequently used functions rather than compiling entire libraries like jQuery.

If V8 were inefficient and compiled everything, it would waste time optimizing unused parts of libraries, leading to slower performance.

The Just-In-Time Compiler (JIT), known as TurboFan in V8, optimizes the most utilized code segments into native machine binaries for better performance.

Comparing Static and Dynamic Languages

Unlike C, which compiles everything beforehand for efficiency, dynamic languages like JavaScript allow for runtime modifications due to their flexible nature.

Compiled binaries from static languages are generally faster and consume less memory compared to dynamically interpreted languages that can change behavior at runtime.

Advantages of Dynamic Languages

Dynamic languages enable real-time manipulation of web pages through interpreters that maintain accessible data structures, enhancing user experience with immediate changes.

This flexibility allows developers to inspect and modify elements directly within browsers without needing a restart or recompilation.

Metaprogramming Capabilities

Languages such as JavaScript, Ruby, Python, and others offer powerful metaprogramming features that allow code modification during execution via interpreters.

These capabilities facilitate injecting new bytecode without restarting applications; JIT compilers optimize this new code on-the-fly.

Performance Trade-offs Between Compiled and Interpreted Languages

Static languages typically generate immutable bytecode while dynamic ones allow real-time alterations but may be slower due to lack of aggressive pre-execution optimization.

Modern interpreters often include JIT compilation techniques that significantly reduce performance gaps between interpreted and compiled languages.

Conclusion: Understanding Language Characteristics

Understanding Compilation and Interpretation in Programming

The Role of JIT and Transpilers

The ability to run complex games in browsers is largely attributed to Just-In-Time (JIT) compilation, which enhances performance.

Early transpilers, such as Facebook's HipHop, converted PHP code into C++, showcasing the evolution of programming languages before TypeScript and other transpiled languages emerged.

Currently, Facebook has shifted focus from HipHop to the HHVM virtual machine, which maintains compatibility with PHP 7 to some extent.

Bytecode Generation Across Languages

Python generates .pyc files that contain bytecode for its virtual machine during parsing, allowing for efficient execution.

Ruby also produces internal bytecodes, indicating a trend among modern programming languages towards optimizing execution through bytecode generation.

Misconceptions About Language Performance

Common misconceptions persist regarding interpreted versus compiled languages; many believe interpreted languages are inherently slower than compiled ones.

The argument that Java is superior to JavaScript due to being compiled is challenged by advancements in compiler technology and virtual machines over the past two decades.