IN TODAY'S SIGNAL |
Read Time: 5 min 27 sec |
|
|
|
Enjoying this newsletter?
Please forward it to a friend or colleague. It helps us keep this content free. |
|
|
|
TRENDING REPO |
Language Models |
Meta Releases the Most Powerful Open-Source Model Yet: Llama 3 |
⇧ 17,482 ⇆ 2460 |
 |
What's New |
Meta has released the Llama 3 series, a new generation of language models with configurations of 8 billion and 70 billion parameters, along with an upcoming 400 billion parameter model.
This is one of the biggest releases this year, with Meta rolling out new models, products and research all at once.
Model Info
-
Default 8k token context window.
-
Outperform other open-source models of their scale like Gemma 7B or Mistral8x22B with a MMLU over 80
-
Improved reasoning capabilities thanks to an increased focus on coding datasets.
Model Training and Data:
- Trained on over 15 trillion tokens from publicly available sources.
- Incorporates a 128K token vocabulary tokenizer.
- Utilizes advanced data-filtering pipelines for optimal data quality.
- Trained for efficiency with over 400 TFLOPS per GPU on 16K GPUs.
Performance Benchmarks:
- MMLU: 8B model scores 68.4; 70B model achieves 82.0.
- HumanEval: 8B at 62.2; 70B reaches 81.7.
- GSM-8K: 79.6 for 8B; 70B model leads with 93.0.
- MATH dataset: 30.0 for 8B; 70B model scores 50.4.
Research
These models come with a set of research breakthroughs and contributions that will be detailed in a paper in the coming months.
For now, Meta has revealed that:
-
Llama 3 uses a Tiktoken-based tokenizer with a vocabulary of 128K tokens that encodes language much more efficiently, and relies on Grouped Query Attention which leads to substantially improved model performance.
-
Model performance continues to improve even after the model is trained on a lot more data than the scaling laws recommend. Both the 8B and 70B parameter models continued to improve log-linearly after a 15T token training.
Access and Integration:
- Fully open-source including model weights.
- No cost for access and integration.
- Available across major platforms like AWS and Google Cloud.
|
Why it Matters |
This series of open-source releases reaffirm Meta’s strong belief in open-source for safer, faster, cross-discipline innovation in a healthier AI market.
Meta’s 400B model, currently scoring 85 on MMLU, with additional features coming soon like multimodality and larger context window, could drastically disrupt the open-source scene. |
Community Feedback |
Cameron R. Wolfe: "LLaMA-3 is a prime example of why training a good LLM is almost entirely about data quality"
Jim Fan: "The upcoming Llama-3-400B+ will mark the watershed moment that the community gains open-weight access to a GPT-4-class model."
Bilal Tahir: "8K context length is surprising though...why so little compared to equivalent models? Is it a limitation of the architecture or a decision to prioritize other aspects of the model during training?" |
Access |
|
TRY LLAMA3 |
|
|
|
 |
Come see what rigorous, reliable, and scalable AI looks like. |
LLM hallucinations, and misidentifications by computer vision systems - how do you ensure you don’t become an AI failure headline losing trust from the public?
On June 25th attend the world’s first AI Quality conference and learn how industry leaders from Google, Uber, NVIDIA, and more are ensuring rigorous, reliable, and scalable AI.
Get your tickets now and use code KolenaVIP2024 for $60 off
|
REGISTER |
partner with us → |
|
|
|
TRENDING SIGNALS |
Language Models |
|
⇧ 1784 ⇆ 221 |
|
JAX |
|
⇧ 1932 ⇆ 456 |
|
Education |
|
⇧ 1391 ⇆ 252 |
|
Open-Source |
|
⇧ 1728 ⇆ 198 |
|
|
|
|
|
|
Imagine an AI... that can type anywhere you can on macOS with full context on what's on your screen |
Omnipilot brings AI to every Mac app, using the app's context to provide intelligent assistance. Invoke it with a shortcut to supercharge writing, email, and getting answers. |
Download macOS app ↗️ |
|
|
|
TOP PAPERS |
In-Context Learning |
|
Problem: Large language models are limited by few-shot in-context learning (ICL), which restricts adaptability and performance in complex tasks.
Solution: The research expands ICL to many-shot scenarios using larger context windows and hundreds of examples. It introduces Reinforced ICL with model-generated rationales and Unsupervised ICL that eliminates rationales entirely.
Results: Many-shot ICL significantly improves task performance, showing gains in adaptability and bias mitigation. It enhances reasoning and complex problem-solving, effectively learning high-dimensional functions. |
⇧ 1001 ⇆ 182 |
|
Web Scraping |
|
Problem: Traditional web crawlers struggle with adaptability and scalability in new environments, while generative agents based on large language models lack performance and reusability in open-world scenarios.
Solution: AutoCrawler, a two-stage framework that combines LLMs with crawlers, uses a progressive understanding approach leveraging the hierarchical structure of HTML. It includes top-down and step-back operations to refine actions and prune irrelevant HTML, enhancing efficiency.
Results: AutoCrawler significantly outperforms the state-of-the-art baseline in crawler generation tasks. Comprehensive experiments demonstrate its effectiveness in generating stable and executable action sequences for diverse and changing web environments. |
⇧ 1021 ⇆ 249 |
|
Language Models |
|
Problem: Transformers face scalability issues with long sequences due to quadratic complexity and weak length extrapolation, while alternative models like linear attention underperform in pretraining efficiency and accuracy.
Solution: Megalodon introduces an architecture with unlimited context length, utilizing components like complex exponential moving average (CEMA) and normalized attention for enhanced efficiency and capability.
Results: In comparison with Llama2, Megalodon demonstrates superior efficiency at a scale of 7 billion parameters and 2 trillion training tokens, achieving a training loss of 1.70, which positions it between the performance benchmarks of Llama2's 7B and 13B models. |
⇧ 1561 ⇆ 342 |
|
|
|
|
|
TOP TUTORIAL |
Fine-Tuning |
Efficiently fine-tune Llama 3 with PyTorch |
⇧ 555 ⇆ 125 |
 |
What's New |
This tutorial details how to fine-tune the Llama 3 70B model using PyTorch FSDP, Q-Lora, and SDPA, optimized for 4x 24GB GPUs. It includes steps for setting up a development environment, preparing a high-quality dataset, and executing efficient distributed training with Hugging Face's tools.
The tutorial focuses on reducing memory requirements through data and model parallelism, leveraging quantization, and low-rank adapters.
You will learn how to apply these techniques in practice, adjust configurations, and utilize gradient checkpointing to manage GPU memory effectively, achieving scalable fine-tuning on consumer-sized hardware setups. |
READ MORE |
|
|
|
|