Share

On eliminating MatMul from LLMs, using Whisper in your browser, running your own LLMs, 15,140 ChatGPT prompts, and Apple's latest announcements..
 β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ

Signup  |  Past Issues  |  Follow on X  |  Read on Web

AlphaSignal

Hey ,

Welcome to today's edition of AlphaSignal. 


Whether you are a researcher, engineer, developer, or data scientist, our summaries are there to keep you up-to-date with the latest breakthroughs in AI. 


Let's get into it,


Lior

.

IN TODAY'S SIGNAL

πŸ“° Top News

πŸ“Œ Snorkel AI Webinar

⚑️ Top 5 Signals

πŸ› οΈ Top Papers

  • Autoguidance: guides diffusion models with a smaller version, achieving FID 1.25 on ImageNet-512.

  • Vision-LSTM: uses xLSTM for vision tasks, achieving 77.3% accuracy, reducing computational costs.

  • Short Circuiting: manipulates model representations to prevent harmful outputs, reducing harmful outputs by 90%.

🧠 Tutorial

  • How to run a 100% local, fully private LLM with llama.cpp 

Read Time: 4 min 18 sec

Enjoying this newsletter?
Please forward it to a friend or colleague. It helps us keep this content free.

TOP NEWS

Architecture

Eliminating matrix multiplication (MatMul) from LLMs

⇧ 5,210 Likes

What's New

The paper "Scalable MatMul-free Language Modeling" went completely viral on Twitter, generating 2.3 million impressions due to its innovative approach that eliminates matrix multiplication (MatMul) from LLMs.

Traditionally, MatMul is essential for processing dense layers and implementing self-attention mechanisms in neural networks, but it demands substantial computational power and memory.


Problem Addressed

LLMs typically require MatMul for their operations, which significantly limits their deployment to environments equipped with high-end hardware due to the high computational and memory demands. 


Solution

The research introduces a method that replaces MatMul with simpler computational techniques, dramatically reducing resource consumption while maintaining model performance.


How It Works

  • In Dense Layers: The method substitutes MatMul with ternary accumulations where the weights are only -1, 0, or +1. This reduces the complexity of calculations.

  • For Self-Attention Mechanisms: It utilizes a MatMul-free Linear Gated Recurrent Unit (MLGRU) that operates solely on element-wise products.

  • In Channel Mixing: It employs modified Gated Linear Units (GLUs) that integrate BitLinear layers with ternary weights, efficiently managing data integration across channels with reduced computational overhead.

Impact of the Innovation

Removing MatMul from the calculations in large language models means these models don't need powerful computers to run. This change allows them to work on simpler devices, like smaller servers or even some personal computers, making advanced AI tools available to more people and places.


Performance Metrics

  • Memory Reduction: Memory usage during inference sees a reduction by more than 10 times compared to unoptimized models.

  • Efficiency Gains: Training speed increases by 25.6%, and overall memory requirements drop by 61% relative to conventional approaches.

  • Hardware Optimization: Custom FPGA accelerators demonstrate the practicality of this method by processing billion-parameter models with just 13 watts of power.

READ THE PAPER

Webinar: How to fine-tune LLMs to perform specialized tasks accurately

The key to transforming foundation models such as Meta's Llama 3 into specialized LLMs is high-quality training data, which can be applied via fine-tuning and alignment.


In this webinar, Snorkel AI will provide an overview of fine-tuning methods such as DPO, ORPO and SPIN, explain how to curate high-quality instruction and preference data 10-100x faster (and at scale) and give a live demo showing how we fine-tune, align and evaluate LLMs.

Join for a live demo and to learn more about:

  • Curating high-quality training data 10-100x faster

  • Emerging LLM fine-tuning and alignment methods

  • Evaluating LLM accuracy for production deployment

Can’t attend live?  All registrants will receive an on-demand recording.

REGISTER NOW

partner with us

TRENDING SIGNALS

Audio

Whisper can now do real-time transcription locally on your browser (open-source)

⇧ 2103 Likes

Database

PGVector, an open-source PostgreSQL, now faster and 75% cheaper than pinecone

⇧ 358 Likes

LLMs

Yandex releases a new open-source tool to pre-train LLMs 20% faster

⇧ 59 Likes

Prompting

New repo collects 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets

⇧ 402 Likes

Industry

Apple introduces on-device and server Foundation Models, ~3B LLM using fine-tuned LoRA Adapters

⇧ 1514 Likes

TOP PAPERS

Image Generation

Guiding a Diffusion Model with a Bad Version of Itself

⇧ 1224

Problem

Diffusion models for image generation often struggle with maintaining image diversity and quality, especially in lower-probability regions of the data distribution. Existing methods like classifier-free guidance (CFG) increase prompt alignment and image quality but reduce variation.

Solution
The paper introduces autoguidance, a method where a diffusion model is guided by a less trained or smaller version of itself. This approach aims to improve control over image quality without compromising image diversity, unlike traditional CFG.

Results
Autoguidance achieved state-of-the-art results on ImageNet-512 with a FrΓ©chet Inception Distance (FID) of 1.25. It also set new benchmarks in ImageNet-64 with an FID of 1.01, significantly enhancing image quality while preserving diversity.

Vision

Vision-LSTM: xLSTM as Generic Vision Backbone

⇧ 898

Problem
Transformers, while effective in computer vision, suffer from high computational costs due to quadratic complexity, especially with high-resolution images.


Solution
Vision-LSTM (ViL) adapts the xLSTM architecture for vision tasks, using a sequence of alternating bi-directional mLSTM blocks to process image patch tokens efficiently with linear computational complexity.


Results
ViL outperforms standard vision transformers on ImageNet-1K classification. ViL-T achieves 77.3% accuracy, outdoing DeiT-T at 72.2%. Even in heavily optimized transformer setups, ViL demonstrates competitive performance, with ViL-B reaching 81.6% accuracy versus DeiT-B's 81.8%.

Alignment

Improving Alignment and Robustness with Short Circuiting

⇧ 532

Problem
AI models are vulnerable to adversarial attacks, which compromise model outputs, posing a significant reliability and safety issue. Current defenses like adversarial training fail to generalize against novel attacks and often degrade model performance.


Solution
The paper introduces "Short Circuiting," a technique that manipulates internal model representations to prevent harmful outputs without specific attack training. This method, based on representation engineering, disrupts harmful processes by rerouting them towards safe states, effectively making the model attack-agnostic.


Results
Short Circuiting demonstrated a significant reduction in compliance to harmful requests by up to 90% on Llama-3-8B-Instruct models, with minimal performance impact (less than 1% decrease in capability tests). The technique outperforms traditional refusal and adversarial training, maintaining robustness against a wide range of unseen adversarial attacks.

TUTORIAL

How to run a 100% local, fully private LLM with llama.cpp 

llama.cpp is a powerful, MIT-licensed framework designed to run large language models (LLMs) like Meta's LLaMA locally in pure C/C++, ensuring full privacy and data security.

Requirements

  • Operating Systems: macOS, Linux, Windows (via CMake), Docker, FreeBSD

  • Hardware: Support for Apple silicon, AVX architectures, and GPUs via custom CUDA kernels and Vulkan backend

Step 1: Installation

  • Install llama.cpp using Homebrew:
    brew install llama.cpp

Step 2: Setting Up the Server

  • Start the server with the desired model:
    llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf

Step 3: Interacting with the Model

  • Send requests to the model using curl:
    curl localhost:8080/v1/chat/completions

llama.cpp REPO

Stop receiving emails here.

AlphaSignal, 214 Barton Springs RD, Austin, Texas 94123, United States

Email Marketing by ActiveCampaign