07 SEPT 2025 · 5 MIN READ
Soft Delete: Why We Never Really Delete
Think about it - a user just deleted their important blog post by accident. They're panicking, and your database has no record it ever existed.
Here's something that will save you countless support calls: never hard delete user-generated content.
The fundamental question: "Do we really need to make this data disappear forever, or just hide it from users?"
what you'll take away
quick pointers so you know what to look for as you read:
- never hard delete user-generated content. users make mistakes, deleted data is signal, and regulators ask questions.
- hard deletes are expensive — soft deletes are one field update. no page merges, no tree rebalancing.
- batch your hard deletes. an off-peak cron pays the rebalancing cost once per thousand rows and gives you a recovery window for free.
- not everything deserves soft delete. the test: "if this came back tomorrow, would anyone care?"
- every read path must filter
is_deleted = FALSE— and a partial index keeps that filter cheap.
What Is Soft Delete
Instead of:
-- Hard delete - gone forever
DELETE FROM posts WHERE id = 123;Do this:
-- Soft delete - marked as deleted
UPDATE posts SET is_deleted = true, deleted_at = NOW() WHERE id = 123;The Three Reasons Why
1. Users Make Mistakes
Google Drive gives you 30 days to recover deleted files. Gmail does the same. There's a reason every major platform does this.
With hard delete : "Sorry, it's gone forever." With soft delete : UPDATE posts SET is_deleted = false WHERE id = 123; - Hero status achieved.
2. Deleted Data Is Valuable
Users delete content for interesting reasons - misinformation, negative feedback, policy violations. That's signal, not noise. Your ML models benefit from understanding deletion patterns.
3. Legal Requirements
When regulators ask "show us all data for user X between dates Y and Z," you better have it. Even deleted data often has retention requirements.
✽ RECALL a pm asks why you won't just DELETE FROM posts. what's your case for soft delete?
three things you can never get back after a hard delete: users delete by accident, and recovery becomes a one-field update instead of "sorry, it's gone forever"; deletion patterns are signal — misinformation, negative feedback, policy violations — that your ml and moderation pipelines feed on; and regulators can demand data for a user and date range, deleted or not. hard delete forecloses all three permanently.
The Hidden Performance Benefit
Here's what most people don't know: hard deletes are expensive in databases.
Why Hard Deletes Hurt
When you delete a record:
- Find the record
- Delete it
- Check if database page is too empty
- Merge with other pages if needed
- Update parent nodes
- Potentially restructure the entire tree
One delete = multiple expensive operations.
Why Soft Deletes Are Fast
UPDATE posts SET is_deleted = true WHERE id = ?;- Find record
- Update one field
- Done
No tree rebalancing. No page merges. One operation.
The Clever Trick: Batch Your Hard Deletes
-- During off-peak hours
DELETE FROM posts
WHERE is_deleted = true
AND deleted_at < NOW() - INTERVAL '30 days'
LIMIT 1000;Now you're paying the rebalancing cost once for 1000 deletions instead of 1000 times.
✽ RECALL why is a single hard delete more expensive than it looks, and how does batching flip the economics?
a delete can cascade — check whether the page got too empty, merge pages, update parent nodes, restructure the tree. one delete, multiple expensive operations. a soft delete is one field update, full stop. then a batched off-peak job hard-deletes a thousand expired rows at once, paying the rebalancing cost once instead of a thousand times — you get the recovery window and database-friendly batching in the same move.
Simple Implementation
-- Add to existing table
ALTER TABLE posts ADD COLUMN is_deleted BOOLEAN DEFAULT FALSE;
ALTER TABLE posts ADD COLUMN deleted_at TIMESTAMP NULL;
-- Index for performance
CREATE INDEX idx_posts_active ON posts (is_deleted) WHERE is_deleted = FALSE;Query pattern :
-- Always filter out deleted
SELECT * FROM posts WHERE user_id = ? AND is_deleted = FALSE;✽ RECALL what obligation does soft delete impose on every read path, and how do you keep it cheap?
every query must filter is_deleted = FALSE — miss it once and "deleted" content leaks back into the product. the partial index on is_deleted = FALSE keeps the filter nearly free, because it only indexes live rows and stays small no matter how much soft-deleted junk accumulates in the table.
When NOT to Soft Delete
Hard delete these :
- Likes/reactions (user liked then unliked)
- Session data (temporary tokens)
- Analytics events (view counts)
- Cache entries
Soft delete these :
- User posts, comments, uploads
- User accounts
- Financial transactions
Decision rule : "If this came back tomorrow, would anyone care?" If yes, soft delete. If no, hard delete.
✽ RECALL likes get hard deleted but posts get soft deleted. what's the decision rule?
"if this came back tomorrow, would anyone care?" yes → soft delete: posts, comments, accounts, financial transactions. no → hard delete: unliked likes, session tokens, analytics events, cache entries. soft-deleting everything just bloats tables with rows nobody will ever recover.
Automated Cleanup
-- Nightly cleanup job
DELETE FROM posts
WHERE is_deleted = true
AND deleted_at < NOW() - INTERVAL '30 days'
LIMIT 1000;Set it and forget it. Best of both worlds - recoverability window plus eventual cleanup.
Key Takeaways
- Default to soft delete for user-generated content
- Batch your hard deletes during off-peak hours
- Always filter
is_deleted = FALSEin queries - Automate cleanup with scheduled jobs
- Don't soft delete everything - use your judgment
The bottom line : Soft delete acknowledges that users make mistakes, data has value, and databases prefer batch operations. Implement it once, benefit forever.
Your future self (and your users) will thank you.