IN TODAY'S SIGNAL |
🎖️ Top News
📌 Free Webinar
⚡️ Trending Signals
🛠️ Trending Repos
-
Perplexica: an open-source alternative to Perplexity
-
maestro: orchestrate subagents with Claude, GPT, local LLMs
-
PDM: efficient Python package management with PEP standards
🧠 Tutorial
|
Read Time: 4 min 05 sec |
|
|
|
Enjoying this newsletter?
Please forward it to a friend or colleague. It helps us keep this content free. |
|
|
|
TOP NEWS |
Tutorials |
|
⇧ 15,110 Likes |
 |
What's New |
Andrej Karpathy is about to launch a new course, LLM101n, focused on building a Storyteller AI Large Language Model (LLM).
This course will guide participants to create, refine, and illustrate stories with AI. It covers the entire process from basics to a fully functioning web app similar to ChatGPT, built from scratch using Python, C, and CUDA, with minimal computer science prerequisites.
By the end of the course, participants will gain a deep understanding of AI, LLMs, and deep learning.
Key topics in the syllabus include:
- Bigram Language Model: Language modeling basics.
- Transformer Models: Building GPT-2, residuals, and layer normalization.
- Tokenization Techniques: Byte pair encoding.
- Optimization Strategies: Initialization, AdamW.
- Fine-tuning Methods: Supervised finetuning (SFT), reinforcement learning (RLHF, PPO, DPO).
Karpathy’s earlier project, Micrograd, implemented a neural network in 94 lines of code.
This minimalistic codebase includes everything needed to train a neural network: "Sometimes when things get too complicated, I come back to this code and just breathe a little".
|
|
CHECK THE REPO |
|
|
|
 |
Garbage In, Garbage Out: Ensuring Data Quality for Successful AI Outcomes |
Providing your AI models with high-quality data is the most crucial factor impacting model performance. The right data will transform your models, but how do you find the right data?
Join Encord's latest webinar on June 27 to learn how to intelligently surface and curate visual datasets; how to automate dataset cleansing and; industry best practices and insights from real-world projects.
Learn how to tackle one of the biggest challenges in AI development.
|
GET STARTED |
partner with us |
|
|
|
TRENDING SIGNALS |
Audio |
|
⇧ 1339 Likes |
|
OpenAI |
|
⇧ 1203 Likes |
|
Chrome |
|
⇧ 20,032 Likes |
|
Code Assistant |
|
⇧ 360 Likes |
|
Vision |
|
⇧ 1810 Likes |
|
|
|
|
|
|
TRENDING REPOS |
Search |
|
☆ 10,310 Stars |
Perplexica is an open-source AI-powered searching tool or an AI-powered search engine that goes deep into the internet to find answers. Inspired by Perplexity AI, it's an open-source option that not just searches the web but understands your questions. |
|
Agents |
|
☆ 2710 Stars |
A framework for claude opus, gpt and local llms to orchestrate subagents. You can run it locally with LMStudio or Ollama, and it integrates a Flask app for a user-friendly interface. |
|
Code Manager |
|
☆ 2491 Stars |
PDM helps you manage Python packages and dependencies efficiently. It supports PEP 517 and PEP 621 standards, offers a fast dependency resolver, and allows flexible plugin usage. You can install Python versions, use centralized caches, and manage virtual environments. |
|
|
|
|
|
|
TUTORIAL |
How To Extract Tables From PDF |
pdfplumber is a Python library that extracts content from PDF files, including text, tables, and metadata. It's ideal for developers who need structured data from documents like financial reports or research papers.
After extracting data with pdfplumber, loading it into a Pandas DataFrame enables efficient data manipulation and analysis.
Here's a complete example that includes extracting tables from PDFs and loading them into Pandas DataFrames: |
import pdfplumber import pandas as pd
# Path to the PDF file pdf_path = 'path/to/your/pdf_file.pdf'
# Open the PDF file with pdfplumber.open(pdf_path) as pdf: # Iterate through each page in the PDF for page in pdf.pages: table = page.extract_table() if table: # Check if there is a table on the page df = pd.DataFrame(table[1:], columns=table[0]) print(df)
|
Output: |
 |
|
|
|
|