Signup | Past Issues | Follow on X | Read on Web

AlphaSignal

Hey ,

Welcome to today's edition of AlphaSignal, a newsletter for developers by developers.

We identify and summarize the top 1% news, papers, models, and repos in the AI industry.

IN TODAY'S SIGNAL

Read time: 5 min 33 sec

🎖️ Top News

Meta's Releases Llama 3.1: the first open-source model that outperforms GPT-4o

📌 Encord

Garbage data is sabotaging your AI models: Here’s how you fix it.

⚡️ Trending Signals

Mistral releases "Large 2" a model that excels in code, math, reasoning.
New Llama 3.1 achieves near-instant chat speeds on Groq.
Google Colab now integrates Google Sheets in notebooks.
OpenAI offers free GPT-4o Mini fine-tuning to counter Meta’s Llama 3.1 release.
Stability launches their first video-to-video generation model.

📌 AI Conference

Join the world's top AI conference: Free passes, 350+ speakers, 150+ exhibitors.

🧠 Tutorial

Karpathy's "Zero to Hero" full course on Neural Networks

If you're enjoying AlphaSignal please forward this email to a colleague.

It helps us keep this content free.

TOP NEWS

Open Source

Meta Releases Llama 3.1: The First Open-Source Models To Reach GPT-4o Performance

⇧ 31,103 Likes

What's New

Meta has just released Llama 3.1, a significant advancement in open-source AI. The release includes a 405 billion parameter model, the most sophisticated open model to date, outperforming GPT-4 on several benchmarks.

It comes in 3 variants, 8b, 70b and 405b (each has base and instruct versions).

All are natively multilingual and has official tool calling support. The 405B model was used to improve the 8B and 70B via distillation & synthetic data during the finetuning stages. Multimodality is still work-in-progress.

Performance metrics highlights of Llama 3.1

405B: MMLU-Chat(General) 88.6 , GSM8K(Math) 96.8, HumanEval(code) 89. These are at par with GPT4o
Specifically, the 405B beats GPT4o on ARC Challenge (Reasoning), GSM8K, Nexus(tool use), ZeroSCROLLS/QuALITY(Long Context) and Multilingual MGSM benchmarks.
70B: MMLU-Chat 86, GSM8K 95.1, HumanEval 80
8B: MMLU-Chat 73, GSM8K 84.5 , HumanEval 72.6, substantial improvement over Llama 3 8B

License of Llama 3.1

Permissively licensed, including commercial use (unless you exceed 700m monthly users), synthetic data generation, distillation and finetuning.

Architecture and Training details

All the 3 models were trained on 15T tokens and a synthetic data pipeline and uses a standard dense Transformer architecture. Some more techniques requiring special mention are

Grouped query attention (GQA) with 8 key-value heads
Vocabulary with 128K tokens
RoPE base frequency hyperparameter increased to 500,000

The models can handle up to 128k tokens of context. This was achieved through a multi-stage process: initial pretraining on 8k token windows due to resource limits, followed by continued pretraining that gradually increased the context length to 128k tokens over six stages.

Llama 3's finetuning process involved supervised instruction tuning (SFT) followed by direct preference optimization (DPO). Unlike some models, it did not use reinforcement learning with human feedback (RLHF) or proximal policy optimization (PPO).

Meta has also published an exhaustive 92 page paper for Llama 3.1 covering details of pretrainining data, filtering, annealing, synthetic data, scaling laws, infrastructures, parallelism, training recipes, post-training adaptation, benchmarking, inference strategies, quantization etc.

Where is it good at

With 128K context window Llama 3.1 will be great for RAG applications. And the main strength of 405B model is that it's ideal for distilling smaller, task-specific expert models. So from synthetic data generation to model distillation, the possibilities are limitless with Llama 3.1

Model distillation transfers knowledge from a large teacher LLM to a smaller student model, aiming to maintain performance while reducing computational requirements.

The process typically involves training the student model to mimic the output distribution of the teacher model, often using softmax with temperature scaling to emphasize informative soft targets.

What is still missing

The current released version of Llama-3.1 is not yet multimodal. The image, video, and speech capabilities are integrated into Llama 3.1. However, these models are under development and not yet broadly released.

Model pricing

Among the API providers, Octo.ai offers Llama 3.1 405B at $3/M input tokens and $9/M output tokens, compared to GPT4-0's $5/M and $15/M respectively.

Access

Read the 92 page paper here
For API based access, checkout current partners of Meta
Huggingface page to download Llama 3.1 models
Try it on Groq

Struggling with AI model performance? The solution isn't more data—it's better data.

Are your AI projects suffering from:

Overwhelming volumes of unstructured data
Difficulty identifying data for labelling
Inefficient labelling and QA workflows impacting ground truth quality
Lengthy and expensive ‘build-and-iterate’ cycles

Encord boasts the industry’s only video-native data curation tool, a powerful labelling editor with advanced one-shot labelling, and native model evaluation tooling to accelerate the ‘build-and-iterate’ cycle for AI teams.

See how you can unify your ML toolstack with Encord’s end-to-end data development platform to build production-ready AI products faster than your competitors.

EXPLORE ENDCORD

partner with us

TRENDING SIGNALS

Open Source

Mistral releases Large 2, a model significantly more capable in code generation, mathematics, and reasoning

⇧ 2130 Likes

Inference

You can now reach near-instant chat speed on the new Llama 3.1 models using Groq

⇧ 4155 Likes

Notebooks

Google Colab now allows you to create and edit Google Sheets in Notebooks

⇧ 342 Likes

Fine-tuning

OpenAI offers free GPT-4o Mini fine-tuning to counter Meta’s Llama 3.1 release

⇧ 1429 Likes

Video Generation

Stability announces Stable Video 4D, their first video-to-video generation model

⇧ 649 Likes

Share 2 Days with the Brightest Minds in AI

The AI Conference brings together OpenAI, Meta, DeepMind and many more.

- Engage with 60+ speakers leading the AI revolution

- Network, collaborate, and co-create with industry pioneers

- Explore topics including AGI, AI in enterprise, building with AI, and more!

AlphaSignal readers get $350 with discount code: alpha24

Get your ticket ↗️

TUTORIAL

Machine Learning

Karpathy's Zero to Hero Course on Neural Networks

⇧ 11,103 Stars

Andrej Karpathy's "Zero to Hero" course on neural networks is a comprehensive guide that takes learners from foundational principles to advanced techniques.

This course features a series of YouTube videos and a GitHub repository containing Jupyter notebooks and exercises. It covers everything from basic neural network concepts to building sophisticated models like GPT. Each lecture is designed to provide hands-on experience and deepen understanding of neural networks.

In this course, learners will cover the following five key concepts:

Backpropagation and Neural Network Training: Understand the mechanics of backpropagation and how to train neural networks from scratch.
Language Modeling: Build character-level language models and progress to more complex models like GPT.
Multilayer Perceptrons (MLPs): Implement and understand MLPs, including training techniques and model evaluation.
Convolutional Neural Networks (CNNs): Develop a deeper MLP into a CNN architecture, learning the principles and applications of CNNs.
Tokenization in Language Models: Create a GPT Tokenizer to understand the role of tokenization in large language models and its impact on model performance.

WATCH NOW

LAST WEEK'S GREATEST HITS

Microsoft releases SpreadsheetLLM, a model designed to optimize LLMs' powerful understanding on spreadsheets.
Developers release Tiger-Gemma, a uncensored version of the Gemma 9B LLM
Mistral releases Codestral Mamba, a Mamba2 LLM for code generation and MathΣtral, a math reasoning model

Stop receiving emails here.

AlphaSignal, 214 Barton Springs RD, Austin, Texas 94123, United States