Signup | Past Issues | Follow on X | Read on Web

AlphaSignal

Hey ,

Welcome to today's edition of AlphaSignal, a newsletter for developers by developers.

We identify and summarize the top 1% news, papers, models, and repos in the AI industry.

IN TODAY'S SIGNAL

🎖️ Top News

Karpathy annoucnes LLM101n, a massive course to build AI storytellers using LLMs

📌 Free Webinar

Join Encord on June 27 to learn how to curate visual datasets, automate dataset cleansing and industry best practices.

⚡️ Trending Signals

Groq accelerates Whisper-large-v3 to 164x speed, $0.03/hour
OpenAI ChatGPT's desktop app now available for all macOS users
Chrome can now run Gemini locally/offline, signup here
Together AI releases a tutorial to build your own AI code assistants
Apple releases 4M: a framework to train "any-to-any" foundation models

🛠️ Trending Repos

Perplexica: an open-source alternative to Perplexity
maestro: orchestrate subagents with Claude, GPT, local LLMs
PDM: efficient Python package management with PEP standards

🧠 Tutorial

How to extract tables from PDF using Python

Read Time: 4 min 05 sec

Enjoying this newsletter?
Please forward it to a friend or colleague. It helps us keep this content free.

TOP NEWS

Tutorials

LLM101n: Let's build a Storyteller

⇧ 15,110 Likes

What's New

Andrej Karpathy is about to launch a new course, LLM101n, focused on building a Storyteller AI Large Language Model (LLM).

This course will guide participants to create, refine, and illustrate stories with AI. It covers the entire process from basics to a fully functioning web app similar to ChatGPT, built from scratch using Python, C, and CUDA, with minimal computer science prerequisites.

By the end of the course, participants will gain a deep understanding of AI, LLMs, and deep learning.

Key topics in the syllabus include:

Bigram Language Model: Language modeling basics.
Transformer Models: Building GPT-2, residuals, and layer normalization.
Tokenization Techniques: Byte pair encoding.
Optimization Strategies: Initialization, AdamW.
Fine-tuning Methods: Supervised finetuning (SFT), reinforcement learning (RLHF, PPO, DPO).

Karpathy’s earlier project, Micrograd, implemented a neural network in 94 lines of code.

This minimalistic codebase includes everything needed to train a neural network: "Sometimes when things get too complicated, I come back to this code and just breathe a little".

CHECK THE REPO

Garbage In, Garbage Out: Ensuring Data Quality for Successful AI Outcomes

Providing your AI models with high-quality data is the most crucial factor impacting model performance. The right data will transform your models, but how do you find the right data?

Join Encord's latest webinar on June 27 to learn how to intelligently surface and curate visual datasets; how to automate dataset cleansing and; industry best practices and insights from real-world projects.

Learn how to tackle one of the biggest challenges in AI development.

GET STARTED

partner with us

TRENDING SIGNALS

Audio

Groq now runs OpenAI's Whisper-large-v3 164x faster for $0.03 per hour of transcription

⇧ 1339 Likes

OpenAI

The ChatGPT desktop app for macOS is now available for all users

⇧ 1203 Likes

Chrome

New Google Chrome can run Gemini locally and offline at an unprecedented speed: signup here

⇧ 20,032 Likes

Code Assistant

Together AI releases a tutorial to build your own personalized code assistant using Mistral 7B Instruct v0.2

⇧ 360 Likes

Vision

Apple releases 4M - A framework for training any-to-any multimodal foundation model, scalable and open-source

⇧ 1810 Likes

TRENDING REPOS

Perplexica

☆ 10,310 Stars

Perplexica is an open-source AI-powered searching tool or an AI-powered search engine that goes deep into the internet to find answers. Inspired by Perplexity AI, it's an open-source option that not just searches the web but understands your questions.

Agents

maestro

☆ 2710 Stars

A framework for claude opus, gpt and local llms to orchestrate subagents. You can run it locally with LMStudio or Ollama, and it integrates a Flask app for a user-friendly interface.

Code Manager

PDM

☆ 2491 Stars

PDM helps you manage Python packages and dependencies efficiently. It supports PEP 517 and PEP 621 standards, offers a fast dependency resolver, and allows flexible plugin usage. You can install Python versions, use centralized caches, and manage virtual environments.

TUTORIAL

How To Extract Tables From PDF

pdfplumber is a Python library that extracts content from PDF files, including text, tables, and metadata. It's ideal for developers who need structured data from documents like financial reports or research papers.

After extracting data with pdfplumber, loading it into a Pandas DataFrame enables efficient data manipulation and analysis.

Here's a complete example that includes extracting tables from PDFs and loading them into Pandas DataFrames:


import pdfplumber
import pandas as pd

# Path to the PDF file
pdf_path = 'path/to/your/pdf_file.pdf'

# Open the PDF file
with pdfplumber.open(pdf_path) as pdf:
    # Iterate through each page in the PDF
    for page in pdf.pages:
        table = page.extract_table()
        if table: # Check if there is a table on the page
            df = pd.DataFrame(table[1:], columns=table[0])
            print(df)

Output:

Stop receiving emails here.

AlphaSignal, 214 Barton Springs RD, Austin, Texas 94123, United States