Tautik Agrahari

07 SEPT 2025 · 5 MIN READ

Soft Delete: Why We Never Really Delete

#databases

Think about it - a user just deleted their important blog post by accident. They're panicking, and your database has no record it ever existed.

Here's something that will save you countless support calls: never hard delete user-generated content.

The fundamental question: "Do we really need to make this data disappear forever, or just hide it from users?"

what you'll take away

quick pointers so you know what to look for as you read:

  • never hard delete user-generated content. users make mistakes, deleted data is signal, and regulators ask questions.
  • hard deletes are expensive — soft deletes are one field update. no page merges, no tree rebalancing.
  • batch your hard deletes. an off-peak cron pays the rebalancing cost once per thousand rows and gives you a recovery window for free.
  • not everything deserves soft delete. the test: "if this came back tomorrow, would anyone care?"
  • every read path must filter is_deleted = FALSE — and a partial index keeps that filter cheap.

What Is Soft Delete

Instead of:

 
-- Hard delete - gone forever
DELETE FROM posts WHERE id = 123;

Do this:

 
-- Soft delete - marked as deleted
UPDATE posts SET is_deleted = true, deleted_at = NOW() WHERE id = 123;

The Three Reasons Why

1. Users Make Mistakes

Google Drive gives you 30 days to recover deleted files. Gmail does the same. There's a reason every major platform does this.

With hard delete : "Sorry, it's gone forever." With soft delete : UPDATE posts SET is_deleted = false WHERE id = 123; - Hero status achieved.

2. Deleted Data Is Valuable

Users delete content for interesting reasons - misinformation, negative feedback, policy violations. That's signal, not noise. Your ML models benefit from understanding deletion patterns.

When regulators ask "show us all data for user X between dates Y and Z," you better have it. Even deleted data often has retention requirements.

✽ RECALL a pm asks why you won't just DELETE FROM posts. what's your case for soft delete?

three things you can never get back after a hard delete: users delete by accident, and recovery becomes a one-field update instead of "sorry, it's gone forever"; deletion patterns are signal — misinformation, negative feedback, policy violations — that your ml and moderation pipelines feed on; and regulators can demand data for a user and date range, deleted or not. hard delete forecloses all three permanently.

The Hidden Performance Benefit

Here's what most people don't know: hard deletes are expensive in databases.

Why Hard Deletes Hurt

When you delete a record:

  1. Find the record
  2. Delete it
  3. Check if database page is too empty
  4. Merge with other pages if needed
  5. Update parent nodes
  6. Potentially restructure the entire tree

One delete = multiple expensive operations.

Why Soft Deletes Are Fast

UPDATE posts SET is_deleted = true WHERE id = ?;
  1. Find record
  2. Update one field
  3. Done

No tree rebalancing. No page merges. One operation.

The Clever Trick: Batch Your Hard Deletes

-- During off-peak hours
DELETE FROM posts 
WHERE is_deleted = true 
  AND deleted_at < NOW() - INTERVAL '30 days' 
LIMIT 1000;

Now you're paying the rebalancing cost once for 1000 deletions instead of 1000 times.

✽ RECALL why is a single hard delete more expensive than it looks, and how does batching flip the economics?

a delete can cascade — check whether the page got too empty, merge pages, update parent nodes, restructure the tree. one delete, multiple expensive operations. a soft delete is one field update, full stop. then a batched off-peak job hard-deletes a thousand expired rows at once, paying the rebalancing cost once instead of a thousand times — you get the recovery window and database-friendly batching in the same move.

Simple Implementation

-- Add to existing table
ALTER TABLE posts ADD COLUMN is_deleted BOOLEAN DEFAULT FALSE;
ALTER TABLE posts ADD COLUMN deleted_at TIMESTAMP NULL;

-- Index for performance  
CREATE INDEX idx_posts_active ON posts (is_deleted) WHERE is_deleted = FALSE;

Query pattern :

 
-- Always filter out deleted
SELECT * FROM posts WHERE user_id = ? AND is_deleted = FALSE;
✽ RECALL what obligation does soft delete impose on every read path, and how do you keep it cheap?

every query must filter is_deleted = FALSE — miss it once and "deleted" content leaks back into the product. the partial index on is_deleted = FALSE keeps the filter nearly free, because it only indexes live rows and stays small no matter how much soft-deleted junk accumulates in the table.

When NOT to Soft Delete

Hard delete these :

  • Likes/reactions (user liked then unliked)
  • Session data (temporary tokens)
  • Analytics events (view counts)
  • Cache entries

Soft delete these :

  • User posts, comments, uploads
  • User accounts
  • Financial transactions

Decision rule : "If this came back tomorrow, would anyone care?" If yes, soft delete. If no, hard delete.

✽ RECALL likes get hard deleted but posts get soft deleted. what's the decision rule?

"if this came back tomorrow, would anyone care?" yes → soft delete: posts, comments, accounts, financial transactions. no → hard delete: unliked likes, session tokens, analytics events, cache entries. soft-deleting everything just bloats tables with rows nobody will ever recover.

Automated Cleanup

-- Nightly cleanup job
DELETE FROM posts 
WHERE is_deleted = true 
  AND deleted_at < NOW() - INTERVAL '30 days'
LIMIT 1000;

Set it and forget it. Best of both worlds - recoverability window plus eventual cleanup.

Key Takeaways

  1. Default to soft delete for user-generated content
  2. Batch your hard deletes during off-peak hours
  3. Always filteris_deleted = FALSE in queries
  4. Automate cleanup with scheduled jobs
  5. Don't soft delete everything - use your judgment

The bottom line : Soft delete acknowledges that users make mistakes, data has value, and databases prefer batch operations. Implement it once, benefit forever.

Your future self (and your users) will thank you.

✽ ❦ ✽

← all posts