Signup | Past Issues | Follow on X | Read on Web

AlphaSignal

Hey ,

Welcome to today's edition of AlphaSignal, a newsletter for developers by developers.

We identify and summarize the top 1% news, papers, models, and repos in the AI industry.

IN TODAY'S SIGNAL

🎖️ Top News

Google Releases Gemma 2, a powerful LLM family 3x smaller than Llama-3 70B

📌 AI4 Conference

Join the world's top AI conference: Free passes, 350+ speakers, 150+ exhibitors

⚡️ Trending Signals

Meta releases new models for Compiler Optimization
New inference method reaches 1600+ Tokens/sec on MacBook
Open-Source model can do voice cloning with 5 seconds of audio
HF releases Open LLM Leaderboard: Qwen2-72B-Instruct gets #1
Anthropic Contest: Win $30k in API Credits for Claude-Based Apps

📄 Trending Papers

LLMs can infer censored knowledge from scattered hints in training data
Generative models can outperform the experts that train them
Large-scale multimodal pretraining can be with 13x less iterations

🧠 Lecture

Master Model Optimization with MIT's EfficientML Course on YouTube

Read Time: 4 min 59 sec

Enjoying this newsletter?
Please forward it to a friend or colleague. It helps us keep this content free.

TOP NEWS

Language Models

Google Releases Gemma 2: A Powerful Family of LLMs 3x Smaller than Llama-3 70B

⇧ 2701 Likes

What's New

Google DeepMind launched Gemma 2, an open large language model available in 9 billion (9B) and 27 billion (27B) parameter versions. It outperforms larger models, providing cost-effective deployment options.

Core Innovations
Gemma 2 features sliding window attention, soft-capping, and knowledge distillation.

Sliding Window Attention: Interleaves local and global attention layers to balance quality and efficiency.
Soft-Capping: Prevents logits from excessive growth, ensuring stable training.
Knowledge Distillation: Uses a larger teacher model to enhance the 9B model's performance.

Integration and Compatibility
Gemma 2 integrates seamlessly with major AI frameworks, supporting Hugging Face Transformers, JAX, PyTorch, and TensorFlow via Keras 3.0. It runs efficiently on various hardware, from gaming laptops to cloud setups.

Performance Metrics
Gemma 2 delivers high performance across benchmarks:

27B Model: Scores 75.2 on MMLU, 75.1 on GSM8K, and 71.4 on ARC-c.
9B Model: Scores 71.3 on MMLU, 62.3 on GSM8K, and 68.4 on ARC-c.

Deployment and Access
Developers can access Gemma 2's model weights from Kaggle and Hugging Face. Starting next month, deployment on Vertex AI will be available, with model integration options in Google AI Studio and local environments using Gemma.cpp.

Safety and Evaluation
Google DeepMind implemented rigorous safety measures, including data filtering and comprehensive testing, to mitigate biases and risks in Gemma 2.

Academic Support
The Gemma 2 Academic Research Program offers Google Cloud credits for research use, with applications open through August 9.

Technical Specifications

Context Length: 8192 tokens
Hardware Compatibility: NVIDIA H100, A100 GPUs, Google Cloud TPU
Training Data: 13 trillion tokens for 27B model, 8 trillion tokens for 9B model

Access

You can run Gemma-2 in colab here
Download the weights on Kaggle or HuggingFace
Access the models in Google AI Studio
Read the paper here

The AI Conference: Share 2 Days with the Brightest Minds in AI

The AI Conference brings together OpenAI, Anthropic, Meta, DeepMind and more.

Engage with 60+ speakers leading the AI revolution
Network, collaborate, and co-create with industry pioneers
Explore topics including AGI, AI in enterprise, building with AI, and more,

Last chance to register for Early Bird pricing:

Discount code: "alpha24"

partner with us

TRENDING SIGNALS

Compilers

Meta releases LLM Compiler: a family of models that can emulate the compiler, predict optimal passes, and disassemble code

⇧ 3532 Likes

Inference

New method allows LLMs to achieve 1600+ tokens/sec on MacBook by implementing batch parallel KV cache in MLX

⇧ 110 Likes

Voice Cloning

Open-source model achieves impressive voice cloning with less than 5 seconds of audio

⇧ 1520 Likes

Open Source

HuggingFace releases the OpenLLM leaderboard: Qwen2-72B-Instruct ranking #1

⇧ 859 Likes

Contest

Anthropic is giving out $30k in Anthropic API credits: build and share an app that uses Claude to be selected

⇧ 2110 Likes

TOP PAPERS

Safety

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

⇧ 1630 Likes

Problem

LLMs can infer censored knowledge from scattered hints in training data, creating safety risks.

Solution

Introduced inductive out-of-context reasoning (OOCR), enabling LLMs to generalize latent information from training data without explicit in-context learning. Developed five tasks to evaluate OOCR, including predicting unknown city identities and learning function definitions.

Results

GPT-4 outperformed GPT-3.5, achieving 56% accuracy in identifying cities and excelling in bias detection and function inversion. OOCR consistently outperformed in-context learning, showing potential for LLMs to implicitly learn complex structures.

Generative AI

Transcendence: Generative Models Can Outperform The Experts That Train Them

⇧ 3690 Likes

Problem

Is it possible for a machine learning model trained only on chess games from players with ratings up to 1000 to play above that level? This seems counterintuitive as it suggests a model can outperform its training data.

Solution

The study explores this by developing "ChessFormer," a transformer model trained on chess game transcripts. It uses low-temperature sampling to effectively ensemble predictions from diverse, weak data sources, enhancing performance beyond individual input capabilities.

Results

ChessFormer demonstrates this "transcendence" by achieving a chess rating of about 1500, significantly surpassing its training limit of 1000 elo. This success hinges on sufficient data diversity and precise temperature control during model training.

MultiModal

Data curation via joint example selection further accelerates multimodal learning

⇧ 699 Likes

Problem

Large-scale multimodal pretraining often involves slow, computationally expensive processes with heavy reliance on manually curated datasets.

Solution

The research introduces Joint Example Selection (JEST), a method that selects data in batches rather than individually, using model-based criteria to enhance learnability. This approach leverages recent advances in model approximation, particularly the Flexi-ViT architecture, to efficiently handle large super-batches of data.

Results

JEST achieves state-of-the-art (SoTA) results with up to 13× fewer training iterations and 10× fewer FLOPs. For instance, on the WebLi dataset, applying JEST to raw datasets matches the performance of hand-filtered subsets, eliminating the need for foundation datasets.

LECTURE

Efficient ML

MIT's EfficientML Course Now on Youtube

⇧ 578 Likes

Modern deep neural networks demand substantial computational power, limiting their practical applications. Efficient machine learning strategies enable you to deploy complex models on everyday devices and reduce cloud infrastructures' load.

You can access this comprehensive course on YouTube. MIT's 46-lecture series teaches you to minimize the computational demands of deep neural networks, making them more manageable for everyday devices and less taxing on cloud infrastructure.

Learn through a detailed curriculum on essential efficiency techniques, including:

Model compression
Pruning
Quantization
Neural architecture search
Distributed training
Data/model parallelism

Implement these techniques hands-on. You'll deploy the Llama2-7B large language model on laptops, applying your new skills in real-world scenarios and directly experiencing the benefits of efficient machine learning.

WATCH THE LECTURES

LAST WEEK'S GREATEST HITS

NVIDIA's releases Nemotron 340B, an open LLM matching GPT-4 performance for chat applications and synthetic data generation.
Google Mesop: Rapidly build Python web apps with strong typing.
Apple drops 20 CoreML models and 4 datasets on HuggingFace.

Stop receiving emails here.

AlphaSignal, 214 Barton Springs RD, Austin, Texas 94123, United States

Signup | Past Issues | Follow on X | Read on Web

AlphaSignal

IN TODAY'S SIGNAL

TOP NEWS

Google Releases Gemma 2: A Powerful Family of LLMs 3x Smaller than Llama-3 70B

What's New

Access

The AI Conference: Share 2 Days with the Brightest Minds in AI

TRENDING SIGNALS

Meta releases LLM Compiler: a family of models that can emulate the compiler, predict optimal passes, and disassemble code

New method allows LLMs to achieve 1600+ tokens/sec on MacBook by implementing batch parallel KV cache in MLX

Open-source model achieves impressive voice cloning with less than 5 seconds of audio

HuggingFace releases the OpenLLM leaderboard: Qwen2-72B-Instruct ranking #1

Anthropic is giving out $30k in Anthropic API credits: build and share an app that uses Claude to be selected

TOP PAPERS

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Transcendence: Generative Models Can Outperform The Experts That Train Them

Data curation via joint example selection further accelerates multimodal learning

LECTURE

MIT's EfficientML Course Now on Youtube

LAST WEEK'S GREATEST HITS

How was today’s email?

Not Great Good Amazing

Thank You.