Skip to main content

Pandas, Mutability, and the Illusion of Pointers

🧠 The Problem This Article Solves

If you’ve ever expected pandas code to behave functionally — only to discover that something upstream mysteriously changed — you’ve run head‑first into pandas’ mutability model.

Python doesn’t have explicit pointers.

But pandas absolutely behaves like it does.

Understanding when you are mutating an existing DataFrame vs creating a new one is one of the biggest pandas “aha” moments — and also one of its most common footguns.


🧩 The Core Idea: DataFrames Are Mutable Objects

A DataFrame is a mutable Python object that owns a collection of Series, each backed by NumPy arrays.

When you write:

df2 = df
df2["x"] = 1

You did not create a copy.

You now have two names pointing at the same object.

print(df is df2)
# True

Any in‑place change through either reference mutates the same underlying table.

Think of it as passing a pointer to a struct — just without pointer syntax.


📋 When Pandas Copies vs Mutates

Here’s the practical rule-of-thumb matrix:

OperationCopies DataFrame?Notes
df = df.rename(...)✅ YesReturns a new object
df.sort_values(inplace=True)❌ NoMutates existing df
df.insert(...)❌ NoAlways in place
df.drop(..., inplace=False)✅ YesFunctional style
df.loc[...] = ...⚠️ MaybeCan write-through slices
df = df[...]✅ UsuallyModern pandas returns copies

If an operation offers inplace=True, assume mutation.

If the result is assigned back to df, assume replacement.


🪞 Copying Explicitly (Breaking the Pointer)

When you want to safely isolate a transformation:

df2 = df.copy()

That gives you a deep copy — new DataFrame, new NumPy buffers.

If you only need a new shell, not new memory:

df2 = df.copy(deep=False)

This creates a new DataFrame object that shares the same underlying arrays. Faster — but mutations to values can still leak.


🧰 Why Helper Functions Feel Functional (But Aren’t)

Take this common helper:

def move_column(df, col, to_idx=0):
s = df.pop(col)
df.insert(to_idx, col, s)
return df

This function mutates in place.

  • pop removes a column
  • insert changes column order
  • returning df only looks functional

Call chains like this hide mutation:

df = move_column(df, "x")

Making It Truly Functional

def move_column(df, col, to_idx=0):
df = df.copy()
s = df.pop(col)
df.insert(to_idx, col, s)
return df

Now:

  • caller gets a new object
  • original remains unchanged
  • behavior is explicit and predictable

⚠️ The Slice Trap (SettingWithCopyWarning)

df2 = df[df["a"] > 0]
df2["b"] = 1

Sometimes this mutates the parent. Sometimes it doesn’t.

That ambiguity is why pandas warns you.

Rule: if you need safety, either:

  • work on explicit copies
  • or assign via .loc on the original

🧠 TL;DR Mental Model

  • 🧱 DataFrame = mutable container of Series
  • 🪞 assignment = new name, same object
  • ✂️ copy() = break the link
  • ⚙️ mutators (insert, pop, drop(inplace=True)) = in place
  • ✅ returning df ≠ immutability

Once this clicks, pandas code becomes readable again — just like knowing when Go passes by value vs reference.

This isn’t pandas trivia.

It’s state management discipline for data engineers.