Learn how to Construct a Light-weight Knowledge Pipeline with Airtable and Python -

Picture by Editor | ChatGPT

# Introduction

Airtable not solely affords a versatile, spreadsheet-like interface for information storage and evaluation, it additionally offers an API for programmatic interplay. In different phrases, you’ll be able to join it to exterior instruments and applied sciences — as an example, Python — to construct information pipelines or processing workflows, bringing your outcomes again to your Airtable database (or just “base”, in Airtable jargon).

This text demonstrates tips on how to create a easy, ETL-like pipeline utilizing the Airtable Python API. We’ll follow the free tier, guaranteeing the strategy works with out paid options.

# Airtable Dataset Setup

Whereas the pipeline constructed on this article could be simply tailored to quite a lot of datasets, for these new to Airtable and needing an Airtable challenge and saved dataset as a place to begin, we advocate you observe this latest introductory tutorial to Airtable and create a tabular dataset known as “Prospects”, containing 200 rows and the next columns (see picture):

Prospects dataset/desk in Airtable | Picture by Writer

# Airtable-Python Knowledge Pipeline

In Airtable, go to your consumer avatar — on the time of writing, it’s the circled avatar positioned on the bottom-left nook of the app interface — and choose “Builder Hub”. Within the new display screen (see screenshot beneath), click on on “Private entry tokens”, then on “Create token”. Give it a reputation, and be sure you add at the very least these two scopes: information.data:learn and information.data:write. Likewise, choose the bottom the place your prospects desk is positioned within the “Entry” part, in order that your token has configured entry to this base.

Creating an Airtable API token | Picture by Writer

As soon as the token has been created, copy and retailer it rigorously in a secure place, as it is going to be proven solely as soon as. We’ll want it later. The token begins with pat adopted by a protracted alphanumeric code.

One other key piece of knowledge we might want to construct our Python-based pipeline that interacts with Airtable is the ID of our base. Return to your base within the Airtable net interface, and as soon as there, it’s best to see that its URL within the browser has a syntax like: https://airtable.com/app[xxxxxx]/xxxx/xxxx. The half we’re concerned with copying is the app[xxxx] ID contained between two consecutive slashes (/): that is the bottom ID we are going to want.

With this in hand, and assuming you have already got a populated desk known as “Prospects” in your base, we’re prepared to begin our Python program. I shall be utilizing a pocket book for coding it. In case you are utilizing an IDE, it’s possible you’ll must barely change the half the place the three Airtable atmosphere variables are outlined, to have them learn from an .env file as a substitute. On this model, for simplicity and ease of illustration, we are going to straight outline them in our pocket book. Let’s begin by putting in the mandatory dependencies:

!pip set up pyairtable python-dotenv

Subsequent, we outline the Airtable atmosphere variables. Discover that for the primary two, you’ll want to change the worth along with your precise entry token and base ID, respectively:

import os
from dotenv import load_dotenv # Vital provided that studying variables from a .env file
from pyairtable import Api, Desk
import pandas as pd

PAT = "pat-xxx" # Your PAT (Private Entry Token) is pasted right here
BASE_ID = "app-xxx" # Your Airtable Base ID is pasted right here
TABLE_NAME = "Prospects"

api = Api(PAT)
desk = Desk(PAT, BASE_ID, TABLE_NAME)

Now we have simply arrange an occasion of the Python Airtable API and instantiated a connection level to the purchasers desk in our base. Now, that is how we learn the whole dataset contained in our Airtable desk and cargo it right into a Pandas DataFrame. You simply must be cautious to make use of the precise column names from the supply desk for the string arguments contained in the get() methodology calls:

rows = []
for rec in desk.all():  # honors 5 rps; auto-retries on 429s
    fields = rec.get("fields", {})
    rows.append({
        "id": rec["id"],
        "CustomerID": fields.get("CustomerID"),
        "Gender": fields.get("Gender"),
        "Age": fields.get("Age"),
        "Annual Revenue (ok$)": fields.get("Annual Revenue (ok$)"),
        "Spending Rating (1-100)": fields.get("Spending Rating (1-100)"),
        "Revenue class": fields.get("Revenue Class"),
    })

df = pd.DataFrame(rows)

As soon as the info has been loaded, it’s time to apply a easy transformation. For simplicity, we are going to simply apply one transformation, however we might apply as many as wanted, simply as we might often do when preprocessing or cleansing datasets with Pandas. We’ll create a brand new binary attribute, known as Is Excessive Worth, to indicate high-value prospects, i.e., these whose earnings and spending rating are each excessive:

def high_value(row):
    strive:
        return (row["Spending Score (1-100)"] >= 70) and (row["Annual Income (k$)"] >= 70)
    besides TypeError:
        return False

df["Is High Value"] = df.apply(high_value, axis=1)
df.head()

Ensuing dataset:

Airtable data transformation with Python and Pandas

Airtable information transformation with Python and Pandas | Picture by Writer

Lastly, it’s time to write the adjustments again to Airtable by incorporating the brand new information related to the brand new column. There’s a little caveat: we first must manually create a brand new column named “Excessive Worth” in our Airtable prospects desk, with its kind set to “Checkbox” (the equal of binary categorical attributes). As soon as this clean column has been created, run the next code in your Python program, and the brand new information shall be mechanically added to Airtable!

updates = []
for _, r in df.iterrows():
    if pd.isna(r["id"]):
        proceed
    updates.append({
        "id": r["id"],
        "fields": {
            "Excessive Worth": bool(r["Is High Value"])
        }
    })

if updates:
    desk.batch_update(updates)

Time to return to Airtable and see what modified in our supply prospects desk! If at first look you see no adjustments and the brand new column nonetheless appears empty, do not panic simply but. Not many shoppers are labeled as “excessive worth”, and it’s possible you’ll must scroll down a little bit to see some labeled with a inexperienced tick signal:

Up to date prospects desk | Picture by Writer

That is it! You simply constructed your personal light-weight, ETL-like information pipeline based mostly on a bidirectional interplay between Airtable and Python. Nicely accomplished!

# Wrapping Up

This text targeted on showcasing information capabilities with Airtable, a flexible and user-friendly cloud-based platform for information administration and evaluation that mixes options of spreadsheets and relational databases with AI-powered features. Particularly, we confirmed tips on how to run a light-weight information transformation pipeline with the Airtable Python API that reads information from Airtable, transforms it, and hundreds it again to Airtable — all inside the capabilities and limitations of Airtable’s free model.

Iván Palomares Carrascosa is a frontrunner, author, speaker, and adviser in AI, machine studying, deep studying & LLMs. He trains and guides others in harnessing AI in the true world.

Learn how to Construct a Light-weight Knowledge Pipeline with Airtable and Python

# Introduction

# Airtable Dataset Setup

# Airtable-Python Knowledge Pipeline

# Wrapping Up

The right way to Optimize AI Manufacturing facility Inference Efficiency

10 Python One-Liners to Optimize Your Machine Studying Pipelines

Gearing Up for the Gigawatt Information Middle Age