Apple’s New M5 Max Changes the Local AI Story
M5 Max MacBook Pro: A First Look and Comparison
Introduction to the M5 Max MacBook Pro
- The new M5 Max MacBook Pro is introduced as a replacement for the M4 Max, featuring a new GPU architecture with neural accelerators in every GPU core.
- Apple claims significant improvements, including over four times peak GPU compute for AI and up to 614 GB/s memory bandwidth.
Performance Metrics
Single-Core Performance
- The speaker emphasizes the importance of single-core performance for system responsiveness and JavaScript-heavy applications.
- Speedometer 3.1 test results show the M5 Max achieving a score of 60.5, surpassing previous models (M3 at 49.6 and M4 at 56.7).
Multi-Core Performance
- Multi-core performance is crucial for tasks like IDE builds and code compilation; Mandelroad algorithm implemented in Python is used for testing.
- The M4 Max recorded run times of 14.6 seconds, while the M5 Max improved significantly with times of 11.6 and 11.8 seconds.
Architectural Changes
Core Configuration
- The M4 Max has 16 cores (12 performance, 4 efficiency), while the M5 Max features an upgraded configuration with no efficiency cores—18 total cores (6 super cores, 12 performance).
- This change in core naming reflects marketing strategies rather than substantial architectural differences.
Comparing to Other Models
Performance Against M3 Ultra
- The speaker compares multi-core performance against the M3 Ultra's impressive time of around 8.5 seconds, noting that the M5 Max performs closely behind it.
Local AI Capabilities
Key Factors for Local LLM Performance
- For local language model systems (LLMs), storage speed, prompt processing (PP), and token generation (TG) are critical metrics.
- Prompt processing relies heavily on GPU power while token generation is sensitive to memory bandwidth.
Memory Bandwidth Insights
- Memory bandwidth figures: M4 Max at 546 GB/s; M3 Ultra at 819 GB/s; Apple claims the M5 Max boosts this to up to 614 GB/s with neural accelerators enhancing throughput.
SSD Speed Improvements
Storage Speed Comparisons
- The SSD speeds are highlighted: previous models had read speeds around 7,300 MB/s; the new Gen 5 drives in the M5 Max reach nearly double that at approximately 13,647 MB/s read speed.
- Faster storage aids in loading large models quickly which benefits both local LLM operations and code compilation processes.
This structured markdown file provides a comprehensive overview of key insights from the transcript regarding Apple's latest MacBook Pro model's capabilities and comparisons with its predecessors.
Performance Comparison of M4 Max, M3 Ultra, and M5 Max
Speed and Accuracy in Security Operations
- The distinction between junior analysts and professionals lies in speed and accuracy during high-pressure situations, such as alerts at 3:00 a.m. Muscle memory is crucial over theoretical training.
TryHackMe for Business
- TryHackMe offers tools for security teams to enhance real-world readiness through features like skill tracking dashboards, sock simulators, threat hunting simulations, AI-powered tabletop exercises, and certifications.
Realistic Incident Scenarios
- Analysts engage in realistic incident scenarios within the sock simulator to investigate alerts and respond to attacks. This hands-on experience builds essential muscle memory for faster breach responses.
Memory Bandwidth Testing with Stream Test
- The Stream test measures memory bandwidth performance across different models. Local LLMs benefit from high memory bandwidth during token generation.
Performance Metrics of Apple Chips
- The M4 Max achieves 319,000 MB/s; the M3 Ultra reaches 337,000 MB/s; while the M5 Max tops at 351,000 MB/s—13% more than the M4 Max. These figures reflect sustained memory throughput via CPU rather than peak performance metrics.
Token Generation Speed Analysis
Impact of Memory Bandwidth on Token Generation
- Higher memory bandwidth may correlate with faster token generation speeds. Initial tests show similar time-to-first-token results for both M4 Max and M5 Max but differing tokens per second rates (79.1 vs. 88.49).
MLX Framework Utilization
- Apple's MLX framework optimizes machine learning model performance on Apple Silicon devices. LM Studio supports GGUF models running on Llama CPP as well.
Comparative Analysis of Model Performance
Testing Larger Models
- A larger model (GPTOSS 12B with 120 billion parameters) was tested using a software engineering prompt focused on scalable web application architecture design.
GPU Usage During Processing
- GPU usage varied significantly among models: while the M3 Ultra reached full capacity (100%), the other two machines hovered around 75%-79%.
Power Consumption Insights
Power Usage During Operations
- Both the M4 Max and M5 Max consumed about 130 watts during testing; however, spikes were noted with the M5 Max reaching up to 154 watts under load conditions.
Token Generation Rates Across Models
- Token generation rates showed slight differences: 61 tokens/sec for the M4 Max, slightly higher at 65 tokens/sec for the M5 Max, while the M3 Ultra achieved a better rate of 82 tokens/sec despite expectations based on higher memory bandwidth.
Performance Insights on Apple's M5 Max
Overview of Quantization and Prompt Processing
- The discussion begins with a focus on a small model utilizing quantization, specifically Q4 (integer 4), which is expected to reveal significant information.
- The speaker emphasizes the importance of prompt processing (PP), the initial stage that heavily relies on computational power.
Performance Metrics of Apple’s M4 and M5 Max
- The M4 Max achieves a prompt processing speed of 1,855 tokens per second, while the M5 Max significantly improves this to 4,468 tokens per second—almost a fourfold increase.
- In comparison, the previous generation, M3 Ultra, recorded a speed of 2,959 tokens per second; thus, the M5 Max outperforms it in prompt processing capabilities.
Anticipation for Future Developments
- The speaker expresses excitement about potential advancements with the upcoming M5 Ultra model based on these performance metrics.
- Viewers are encouraged to engage by liking the video and sharing their thoughts on what they would like to see tested next.