Signup | Past Issues | Follow on X | Read on Web

AlphaSignal

Hey ,

Welcome to today's edition of AlphaSignal, a newsletter for developers by developers.

We identify and summarize the top 1% news, papers, models, and repos in the AI industry.

IN TODAY'S SIGNAL

Read time: 4 min 57 sec

🎖️ Top News

Meta releases their game-changing Multi-Token-Prediction models

📌 Encord

How to improve mAP by 20% while reducing dataset size by 35% with data curation

⚡️ Trending Signals

Microsoft's MInference processes 1M context 10x faster on a single A100.
Claude Artifacts can now be published, shared, and remixed by others.
Salesforce releases a function calling model, outperforming GPT-3.5 and Claude.
Stability AI license now free for most users, including small businesses.
KLING, Chinese OpenAI rival, available via web for video generation.

📌 AI Conference

Share 2 Days with the Brightest Minds in AI: 350+ speakers, 150+ exhibitors. 300$ discount code: "alpha24"

📄 Top Papers

Scaling Synthetic Data Creation: Persona-driven method creates diverse data, achieving 79.4% accuracy on synthetic tests.
RouteLLM: Router model selects LLMs based on human preferences, cuts costs by 85%.
AI Agents Cost Optimization: Cost-controlled evaluations optimize AI agents, reducing costs by 53% on HotPotQA.

Want AlphaSignal to add new sections or topics?
Click here to vote or suggest new features for the newsletter.

TOP NEWS

Language Models

Meta's Groundbreaking Multi-Token-Prediction models are now available

⇧ 888 Likes

What's New

Meta has released pre-trained models on Hugging Face that use multi-token prediction, a new approach for training large language models (LLMs). This release is based on a research paper published by Meta in April 2024, which introduced the multi-token prediction technique.

The multi-token prediction method trains LLMs to forecast multiple future words simultaneously, rather than just the next word in a sequence.

This breakthrough has the potential to change how large language models (LLMs) are developed and deployed, offering a more efficient way to train and run these models without increasing computational costs.

Key performance metrics

7B parameter model using multi-token prediction:
- Solved 12% more problems on HumanEval
- Solved 17% more problems on MBPP
3x inference speedup on code completion tasks
8-byte prediction model outperformed next-byte model by 67% on MBPP
Improvements in summarization benchmarks for 2-token and 4-token prediction models in natural language tasks

How Multi-token Prediction Works

Multi-token prediction instructs the model to predict multiple future tokens at once. This technique enhances sample efficiency and reduces the discrepancy between training-time teacher forcing and inference-time autoregressive generation. It uses a shared transformer trunk with independent output heads for each predicted token.

Benefits for Large Models and Complex Tasks

The benefits of multi-token prediction scale with model size. A 13B parameter model using this method shows around 15% more code problem-solving capability on average. The technique promotes better induction heads and algorithmic reasoning capabilities.

Paper here

CHECK THE MODELS

Poorly Curated Data is Killing your AI Model

When your AI models fail to perform as expected, it's tempting to throw more data at the problem. But it's becoming evident that data quality can have a far bigger impact than data quantity. So how should you approach this challenge?

See how one AI team managed to improve mAP by 20% on their production models while reducing their dataset size by 35% with intelligent data curation from Encord.

LEARN MORE

partner with us

TRENDING SIGNALS

Inference

Microsoft releases MInference: a method to process 1M context 10x faster in a single A100

⇧ 1190 Likes

Claude

Claude Artifacts can be published, shared, and remixed by others

⇧ 660 Likes

API

Salesforce releases an impressive model for function calling, outperforming models 7x its size, including GPT-3.5 & Claude

⇧ 615 Likes

Image Generation

Stability AI license becomes free for most users, including small businesses (<$1M annual revenue)

⇧ 385 Likes

Video Generation

The Chinese OpenAI Sora rival, KLING, is now available via web

⇧ 1219 Likes

Share 2 Days with the Brightest Minds in AI

The AI Conference brings together OpenAI, Meta, DeepMind and many more.

Engage with 60+ speakers leading the AI revolution
Network, collaborate, and co-create with industry pioneers
Explore topics including AGI, AI in enterprise, building with AI, and more!

This week only: Save $350 using discount code: “alpha24”

Register ↗️

TOP PAPERS

Synthetic Data

Scaling Synthetic Data Creation with 1,000,000,000 Personas

⇧ 1398 Likes

Problem

Existing methods either rely on a seed corpus, limiting diversity, or a comprehensive list of key points, which is impractical to scale across various domains.

Solution

A novel persona-driven methodology is introduced, leveraging a collection named Persona Hub, containing 1 billion diverse personas automatically curated from web data. This approach utilizes personas to guide large language models (LLMs) in synthesizing diverse data across multiple scenarios without the limitations of previous methods.

Results

The Persona Hub facilitated the creation of diverse synthetic data, including 50,000 math problems and 10,000 knowledge-rich texts. Using 1.07 million personas from this hub, a 7B model was fine-tuned, achieving 79.4% accuracy on synthetic test instances and 64.9% on the MATH benchmark, showcasing effective scalability.

Access the model and more details here.

Routing

RouteLLM: Learning to Route LLMs with Preference Data

⇧ 1680 Likes

Problem

Routing all queries to the most capable large language models (LLMs) ensures quality but is costly, while using less capable models reduces costs but may compromise quality. Efficiently selecting between strong and weak LLMs during inference to balance cost and performance poses a challenge.

Solution

RouteLLM introduces a router model that selects between a stronger and a weaker LLM based on human preference data and data augmentation techniques. This approach aims to minimize costs while maintaining response quality, demonstrated by routing between models like GPT-4 and Mixtral-8x7B.

Results

RouteLLM achieves significant cost reductions—over 85% on MT Bench and 45% on MMLU—while retaining 95% of GPT-4’s performance. Compared to commercial routers like Martian and Unify AI, it delivers equivalent performance at over 40% lower cost.

The router's code, models, and data are available here.

AI Agents

AI Agents That Matter

⇧ 919 Likes

Problem
Existing AI agent benchmarks focus narrowly on accuracy, neglecting cost control and standardization. This leads to overly complex, costly agents and hinders reproducibility. Benchmarks fail to differentiate between model developers' and downstream developers' needs, leading to overfitting and unrealistic performance estimates.

Solution
The paper proposes cost-controlled evaluations and joint optimization of accuracy and cost. It introduces simple baseline agents outperforming complex state-of-the-art (SOTA) agents on HumanEval. It emphasizes distinct benchmarking needs for model and downstream developers and advocates for better generality in benchmarks.

Results
Baseline agents significantly reduced costs while maintaining accuracy on HumanEval. Joint optimization on HotPotQA reduced costs by 53% (GPT-3.5) and 41% (Llama-3-70B) without sacrificing accuracy. The study highlights substantial cost savings and improved efficiency with optimized agent designs.

LAST WEEK'S GREATEST HITS

Master Model Optimization with MIT's EfficientML Course on YouTube
New method allows LLMs to achieve 1600+ tokens/sec on MacBook by implementing batch parallel KV cache in MLX
Open-source model achieves impressive voice cloning with less than 5 seconds of audio

Stop receiving emails here.

AlphaSignal, 214 Barton Springs RD, Austin, Texas 94123, United States