Signup | Email Archive | Follow on X | Web Version

AlphaSignal

Hey ,

Welcome to today's edition of AlphaSignal.

Whether you are a researcher, engineer, developer, or data scientist, our summaries are there to keep you up-to-date with the latest breakthroughs in AI.

Let's get into it!

Lior

IN TODAY'S SIGNAL

Top News: Meta releases Llama 3
Trending Signals
Top Papers:
- Many-Shot In-Context Learning
- AutoCrawler
- Megalodon
Tutorial: Efficiently fine-tune Llama 3 with PyTorch

Read Time: 5 min 27 sec

Enjoying this newsletter?
Please forward it to a friend or colleague. It helps us keep this content free.

TRENDING REPO

Language Models

Meta Releases the Most Powerful Open-Source Model Yet: Llama 3

⇧ 17,482 ⇆ 2460

What's New

Meta has released the Llama 3 series, a new generation of language models with configurations of 8 billion and 70 billion parameters, along with an upcoming 400 billion parameter model.

This is one of the biggest releases this year, with Meta rolling out new models, products and research all at once.

Model Info

Default 8k token context window.
Outperform other open-source models of their scale like Gemma 7B or Mistral8x22B with a MMLU over 80
Improved reasoning capabilities thanks to an increased focus on coding datasets.

Model Training and Data:

Trained on over 15 trillion tokens from publicly available sources.
Incorporates a 128K token vocabulary tokenizer.
Utilizes advanced data-filtering pipelines for optimal data quality.
Trained for efficiency with over 400 TFLOPS per GPU on 16K GPUs.

Performance Benchmarks:

MMLU: 8B model scores 68.4; 70B model achieves 82.0.
HumanEval: 8B at 62.2; 70B reaches 81.7.
GSM-8K: 79.6 for 8B; 70B model leads with 93.0.
MATH dataset: 30.0 for 8B; 70B model scores 50.4.

Research

These models come with a set of research breakthroughs and contributions that will be detailed in a paper in the coming months.

For now, Meta has revealed that:

Llama 3 uses a Tiktoken-based tokenizer with a vocabulary of 128K tokens that encodes language much more efficiently, and relies on Grouped Query Attention which leads to substantially improved model performance.
Model performance continues to improve even after the model is trained on a lot more data than the scaling laws recommend. Both the 8B and 70B parameter models continued to improve log-linearly after a 15T token training.

Access and Integration:

Fully open-source including model weights.
No cost for access and integration.
Available across major platforms like AWS and Google Cloud.

Why it Matters

This series of open-source releases reaffirm Meta’s strong belief in open-source for safer, faster, cross-discipline innovation in a healthier AI market.

Meta’s 400B model, currently scoring 85 on MMLU, with additional features coming soon like multimodality and larger context window, could drastically disrupt the open-source scene.

Community Feedback

Cameron R. Wolfe: "LLaMA-3 is a prime example of why training a good LLM is almost entirely about data quality"

Jim Fan: "The upcoming Llama-3-400B+ will mark the watershed moment that the community gains open-weight access to a GPT-4-class model."

Bilal Tahir: "8K context length is surprising though...why so little compared to equivalent models? Is it a limitation of the architecture or a decision to prioritize other aspects of the model during training?"

Access

Try Llama 3 at 800 tokens/s on Groq →
Try Llama 3 on HF →
Github Repo →

TRY LLAMA3

Come see what rigorous, reliable, and scalable AI looks like.

LLM hallucinations, and misidentifications by computer vision systems - how do you ensure you don’t become an AI failure headline losing trust from the public?

On June 25th attend the world’s first AI Quality conference and learn how industry leaders from Google, Uber, NVIDIA, and more are ensuring rigorous, reliable, and scalable AI.

Get your tickets now and use code KolenaVIP2024 for $60 off

partner with us →

TRENDING SIGNALS

Language Models

Llama 3 70B Can Refactor and Document Your Code In Real Time

⇧ 1784 ⇆ 221

JAX

Deepmind Releases a JAX Toolkit For Building, Editing, and Visualizing Neural Networks

⇧ 1932 ⇆ 456

Education

Andrew Ng Announces a New Short Course On Mistral Open-Source Models

⇧ 1391 ⇆ 252

Open-Source

Mark Zuckerberg Could Open-Source a $10 Billion Model

⇧ 1728 ⇆ 198

Imagine an AI... that can type anywhere you can on macOS with full context on what's on your screen

Omnipilot brings AI to every Mac app, using the app's context to provide intelligent assistance. Invoke it with a shortcut to supercharge writing, email, and getting answers.

Download macOS app ↗️

TOP PAPERS

In-Context Learning

Many-Shot In-Context Learning

Problem: Large language models are limited by few-shot in-context learning (ICL), which restricts adaptability and performance in complex tasks.

Solution: The research expands ICL to many-shot scenarios using larger context windows and hundreds of examples. It introduces Reinforced ICL with model-generated rationales and Unsupervised ICL that eliminates rationales entirely.

Results: Many-shot ICL significantly improves task performance, showing gains in adaptability and bias mitigation. It enhances reasoning and complex problem-solving, effectively learning high-dimensional functions.

⇧ 1001 ⇆ 182

Web Scraping

AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation

Problem: Traditional web crawlers struggle with adaptability and scalability in new environments, while generative agents based on large language models lack performance and reusability in open-world scenarios.

Solution: AutoCrawler, a two-stage framework that combines LLMs with crawlers, uses a progressive understanding approach leveraging the hierarchical structure of HTML. It includes top-down and step-back operations to refine actions and prune irrelevant HTML, enhancing efficiency.

Results: AutoCrawler significantly outperforms the state-of-the-art baseline in crawler generation tasks. Comprehensive experiments demonstrate its effectiveness in generating stable and executable action sequences for diverse and changing web environments.

⇧ 1021 ⇆ 249

Language Models

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Problem: Transformers face scalability issues with long sequences due to quadratic complexity and weak length extrapolation, while alternative models like linear attention underperform in pretraining efficiency and accuracy.

Solution: Megalodon introduces an architecture with unlimited context length, utilizing components like complex exponential moving average (CEMA) and normalized attention for enhanced efficiency and capability.

Results: In comparison with Llama2, Megalodon demonstrates superior efficiency at a scale of 7 billion parameters and 2 trillion training tokens, achieving a training loss of 1.70, which positions it between the performance benchmarks of Llama2's 7B and 13B models.

⇧ 1561 ⇆ 342

Efficiently fine-tune Llama 3 with PyTorch

⇧ 555 ⇆ 125

What's New

This tutorial details how to fine-tune the Llama 3 70B model using PyTorch FSDP, Q-Lora, and SDPA, optimized for 4x 24GB GPUs. It includes steps for setting up a development environment, preparing a high-quality dataset, and executing efficient distributed training with Hugging Face's tools.

The tutorial focuses on reducing memory requirements through data and model parallelism, leveraging quantization, and low-rank adapters.

You will learn how to apply these techniques in practice, adjust configurations, and utilize gradient checkpointing to manage GPU memory effectively, achieving scalable fine-tuning on consumer-sized hardware setups.

Stop receiving emails here.

AlphaSignal, 214 Barton Springs RD, Austin, Texas 94123, United States

Signup | Email Archive | Follow on X | Web Version

AlphaSignal

IN TODAY'S SIGNAL

TRENDING REPO

Meta Releases the Most Powerful Open-Source Model Yet: Llama 3

What's New

Why it Matters

Community Feedback

Access

Come see what rigorous, reliable, and scalable AI looks like.

TRENDING SIGNALS

Llama 3 70B Can Refactor and Document Your Code In Real Time

Deepmind Releases a JAX Toolkit For Building, Editing, and Visualizing Neural Networks

Andrew Ng Announces a New Short Course On Mistral Open-Source Models

Mark Zuckerberg Could Open-Source a $10 Billion Model

Imagine an AI... that can type anywhere you can on macOS with full context on what's on your screen

TOP PAPERS

Many-Shot In-Context Learning

AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

TOP TUTORIAL

Efficiently fine-tune Llama 3 with PyTorch

What's New

How was today’s email?

Not Great Good Amazing

Thank You.