IN TODAY'S SIGNAL |
ποΈ Top News
π AI4 Conference
β‘οΈ Trending Signals
π Trending Papers
π§ Lecture
|
Read Time: 4 min 59 sec |
|
|
|
Enjoying this newsletter?
Please forward it to a friend or colleague. It helps us keep this content free. |
|
|
|
TOP NEWS |
Language Models |
Google Releases Gemma 2: A Powerful Family of LLMs 3x Smaller than Llama-3 70B |
β§ 2701 Likes |
 |
What's New |
Google DeepMind launched Gemma 2, an open large language model available in 9 billion (9B) and 27 billion (27B) parameter versions. It outperforms larger models, providing cost-effective deployment options.
Core Innovations
Gemma 2 features sliding window attention, soft-capping, and knowledge distillation.
- Sliding Window Attention: Interleaves local and global attention layers to balance quality and efficiency.
- Soft-Capping: Prevents logits from excessive growth, ensuring stable training.
- Knowledge Distillation: Uses a larger teacher model to enhance the 9B model's performance.
Integration and Compatibility
Gemma 2 integrates seamlessly with major AI frameworks, supporting Hugging Face Transformers, JAX, PyTorch, and TensorFlow via Keras 3.0. It runs efficiently on various hardware, from gaming laptops to cloud setups.
Performance Metrics
Gemma 2 delivers high performance across benchmarks:
- 27B Model: Scores 75.2 on MMLU, 75.1 on GSM8K, and 71.4 on ARC-c.
- 9B Model: Scores 71.3 on MMLU, 62.3 on GSM8K, and 68.4 on ARC-c.
Deployment and Access
Developers can access Gemma 2's model weights from Kaggle and Hugging Face. Starting next month, deployment on Vertex AI will be available, with model integration options in Google AI Studio and local environments using Gemma.cpp.
Safety and Evaluation
Google DeepMind implemented rigorous safety measures, including data filtering and comprehensive testing, to mitigate biases and risks in Gemma 2.
Academic Support
The Gemma 2 Academic Research Program offers Google Cloud credits for research use, with applications open through August 9.
Technical Specifications
- Context Length: 8192 tokens
- Hardware Compatibility: NVIDIA H100, A100 GPUs, Google Cloud TPU
- Training Data: 13 trillion tokens for 27B model, 8 trillion tokens for 9B model
|
Access |
|
|
READ MORE |
|
|
|
 |
The AI Conference: Share 2 Days with the Brightest Minds in AI |
The AI Conference brings together OpenAI, Anthropic, Meta, DeepMind and more.
-
Engage with 60+ speakers leading the AI revolution
-
Network, collaborate, and co-create with industry pioneers
-
Explore topics including AGI, AI in enterprise, building with AI, and more,
Last chance to register for Early Bird pricing:
Discount code: "alpha24" |
REGISTER NOW |
partner with us |
|
|
|
TRENDING SIGNALS |
Compilers |
|
β§ 3532 Likes |
|
Inference |
|
β§ 110 Likes |
|
Voice Cloning |
|
β§ 1520 Likes |
|
Open Source |
|
β§ 859 Likes |
|
Contest |
|
β§ 2110 Likes |
|
|
|
|
|
|
TOP PAPERS |
Safety |
|
β§ 1630 Likes |
Problem
LLMs can infer censored knowledge from scattered hints in training data, creating safety risks.
Solution
Introduced inductive out-of-context reasoning (OOCR), enabling LLMs to generalize latent information from training data without explicit in-context learning. Developed five tasks to evaluate OOCR, including predicting unknown city identities and learning function definitions.
Results
GPT-4 outperformed GPT-3.5, achieving 56% accuracy in identifying cities and excelling in bias detection and function inversion. OOCR consistently outperformed in-context learning, showing potential for LLMs to implicitly learn complex structures. |
|
Generative AI |
|
β§ 3690 Likes |
Problem
Is it possible for a machine learning model trained only on chess games from players with ratings up to 1000 to play above that level? This seems counterintuitive as it suggests a model can outperform its training data.
Solution
The study explores this by developing "ChessFormer," a transformer model trained on chess game transcripts. It uses low-temperature sampling to effectively ensemble predictions from diverse, weak data sources, enhancing performance beyond individual input capabilities.
Results
ChessFormer demonstrates this "transcendence" by achieving a chess rating of about 1500, significantly surpassing its training limit of 1000 elo. This success hinges on sufficient data diversity and precise temperature control during model training. |
|
MultiModal |
|
β§ 699 Likes |
Problem
Large-scale multimodal pretraining often involves slow, computationally expensive processes with heavy reliance on manually curated datasets.
Solution
The research introduces Joint Example Selection (JEST), a method that selects data in batches rather than individually, using model-based criteria to enhance learnability. This approach leverages recent advances in model approximation, particularly the Flexi-ViT architecture, to efficiently handle large super-batches of data.
Results
JEST achieves state-of-the-art (SoTA) results with up to 13Γ fewer training iterations and 10Γ fewer FLOPs. For instance, on the WebLi dataset, applying JEST to raw datasets matches the performance of hand-filtered subsets, eliminating the need for foundation datasets. |
|
|
|
|
|
|
LECTURE |
Efficient ML |
MIT's EfficientML Course Now on Youtube |
β§ 578 Likes |
 |
Modern deep neural networks demand substantial computational power, limiting their practical applications. Efficient machine learning strategies enable you to deploy complex models on everyday devices and reduce cloud infrastructures' load.
You can access this comprehensive course on YouTube. MIT's 46-lecture series teaches you to minimize the computational demands of deep neural networks, making them more manageable for everyday devices and less taxing on cloud infrastructure.
Learn through a detailed curriculum on essential efficiency techniques, including:
- Model compression
- Pruning
- Quantization
- Neural architecture search
- Distributed training
- Data/model parallelism
Implement these techniques hands-on. You'll deploy the Llama2-7B large language model on laptops, applying your new skills in real-world scenarios and directly experiencing the benefits of efficient machine learning. |
WATCH THE LECTURES |
|
|
|
LAST WEEK'S GREATEST HITS |
|
|
|
|
|