Share

On Gemma 2, Meta's LLM Compiler, voice cloning model, Anthropic contest, OOCR, ChessFormer, JEST, MIT EfficientML course.
 β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ

Signup  |  Past Issues  |  Follow on X  |  Read on Web

AlphaSignal

.

Hey ,

Welcome to today's edition of AlphaSignal, a newsletter for developers by developers.

We identify and summarize the top 1% news, papers, models, and repos in the AI industry. 

IN TODAY'S SIGNAL

πŸŽ–οΈ Top News

πŸ“Œ AI4 Conference

⚑️ Trending Signals

πŸ“„ Trending Papers

🧠 Lecture

Read Time: 4 min 59 sec

Enjoying this newsletter?
Please forward it to a friend or colleague. It helps us keep this content free.

TOP NEWS

Language Models

Google Releases Gemma 2: A Powerful Family of LLMs 3x Smaller than Llama-3 70B

⇧ 2701 Likes

What's New

Google DeepMind launched Gemma 2, an open large language model available in 9 billion (9B) and 27 billion (27B) parameter versions. It outperforms larger models, providing cost-effective deployment options.


Core Innovations
Gemma 2 features sliding window attention, soft-capping, and knowledge distillation.

  • Sliding Window Attention: Interleaves local and global attention layers to balance quality and efficiency.

  • Soft-Capping: Prevents logits from excessive growth, ensuring stable training.

  • Knowledge Distillation: Uses a larger teacher model to enhance the 9B model's performance.

Integration and Compatibility
Gemma 2 integrates seamlessly with major AI frameworks, supporting Hugging Face Transformers, JAX, PyTorch, and TensorFlow via Keras 3.0. It runs efficiently on various hardware, from gaming laptops to cloud setups.


Performance Metrics
Gemma 2 delivers high performance across benchmarks:

  • 27B Model: Scores 75.2 on MMLU, 75.1 on GSM8K, and 71.4 on ARC-c.

  • 9B Model: Scores 71.3 on MMLU, 62.3 on GSM8K, and 68.4 on ARC-c.

Deployment and Access
Developers can access Gemma 2's model weights from Kaggle and Hugging Face. Starting next month, deployment on Vertex AI will be available, with model integration options in Google AI Studio and local environments using Gemma.cpp.


Safety and Evaluation
Google DeepMind implemented rigorous safety measures, including data filtering and comprehensive testing, to mitigate biases and risks in Gemma 2.


Academic Support
The Gemma 2 Academic Research Program offers Google Cloud credits for research use, with applications open through August 9.


Technical Specifications

  • Context Length: 8192 tokens
  • Hardware Compatibility: NVIDIA H100, A100 GPUs, Google Cloud TPU
  • Training Data: 13 trillion tokens for 27B model, 8 trillion tokens for 9B model

Access

READ MORE

The AI Conference: Share 2 Days with the Brightest Minds in AI 

The AI Conference brings together OpenAI, Anthropic, Meta, DeepMind and more.

  • Engage with 60+ speakers leading the AI revolution

  • Network, collaborate, and co-create with industry pioneers

  • Explore topics including AGI, AI in enterprise, building with AI, and more,

Last chance to register for Early Bird pricing:

Discount code: "alpha24"

REGISTER NOW

partner with us

TRENDING SIGNALS

Compilers

Meta releases LLM Compiler: a family of models that can emulate the compiler, predict optimal passes, and disassemble code

⇧ 3532 Likes

Inference

New method allows LLMs to achieve 1600+ tokens/sec on MacBook by implementing batch parallel KV cache in MLX 

⇧ 110 Likes

Voice Cloning

Open-source model achieves impressive voice cloning with less than 5 seconds of audio

⇧ 1520 Likes

Open Source

HuggingFace releases the OpenLLM leaderboard: Qwen2-72B-Instruct ranking #1

⇧ 859 Likes

Contest

Anthropic is giving out $30k in Anthropic API credits: build and share an app that uses Claude to be selected

⇧ 2110 Likes

TOP PAPERS

Safety

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

⇧ 1630 Likes

Problem

LLMs can infer censored knowledge from scattered hints in training data, creating safety risks.


Solution

Introduced inductive out-of-context reasoning (OOCR), enabling LLMs to generalize latent information from training data without explicit in-context learning. Developed five tasks to evaluate OOCR, including predicting unknown city identities and learning function definitions.


Results

GPT-4 outperformed GPT-3.5, achieving 56% accuracy in identifying cities and excelling in bias detection and function inversion. OOCR consistently outperformed in-context learning, showing potential for LLMs to implicitly learn complex structures.

Generative AI

Transcendence: Generative Models Can Outperform The Experts That Train Them

⇧ 3690 Likes

Problem

Is it possible for a machine learning model trained only on chess games from players with ratings up to 1000 to play above that level? This seems counterintuitive as it suggests a model can outperform its training data.


Solution

The study explores this by developing "ChessFormer," a transformer model trained on chess game transcripts. It uses low-temperature sampling to effectively ensemble predictions from diverse, weak data sources, enhancing performance beyond individual input capabilities.


Results

ChessFormer demonstrates this "transcendence" by achieving a chess rating of about 1500, significantly surpassing its training limit of 1000 elo. This success hinges on sufficient data diversity and precise temperature control during model training.

MultiModal

Data curation via joint example selection further accelerates multimodal learning

⇧ 699 Likes

Problem

Large-scale multimodal pretraining often involves slow, computationally expensive processes with heavy reliance on manually curated datasets.


Solution

The research introduces Joint Example Selection (JEST), a method that selects data in batches rather than individually, using model-based criteria to enhance learnability. This approach leverages recent advances in model approximation, particularly the Flexi-ViT architecture, to efficiently handle large super-batches of data.


Results

JEST achieves state-of-the-art (SoTA) results with up to 13Γ— fewer training iterations and 10Γ— fewer FLOPs. For instance, on the WebLi dataset, applying JEST to raw datasets matches the performance of hand-filtered subsets, eliminating the need for foundation datasets.

LECTURE

Efficient ML

MIT's EfficientML Course Now on Youtube

⇧ 578 Likes

Modern deep neural networks demand substantial computational power, limiting their practical applications. Efficient machine learning strategies enable you to deploy complex models on everyday devices and reduce cloud infrastructures' load.


You can access this comprehensive course on YouTube. MIT's 46-lecture series teaches you to minimize the computational demands of deep neural networks, making them more manageable for everyday devices and less taxing on cloud infrastructure.


Learn through a detailed curriculum on essential efficiency techniques, including:

  • Model compression
  • Pruning
  • Quantization
  • Neural architecture search
  • Distributed training
  • Data/model parallelism

Implement these techniques hands-on. You'll deploy the Llama2-7B large language model on laptops, applying your new skills in real-world scenarios and directly experiencing the benefits of efficient machine learning.

WATCH THE LECTURES

LAST WEEK'S GREATEST HITS

Stop receiving emails here.

AlphaSignal, 214 Barton Springs RD, Austin, Texas 94123, United States

Email Marketing by ActiveCampaign