Database Fundamentals - 1/4

From API Endpoints to Data Persistence

You’ve built beautiful APIs that handle requests, validate data, and respond with consistent formats. Your endpoints look professional, handle errors gracefully, and even support real-time communication. But here’s the reality check that hits every developer: APIs without proper data storage are just expensive calculators.

The data persistence wake-up call:

// Your beautiful API endpoint
app.post("/api/users", validateUser, (req, res) => {
  const user = {
    id: generateId(),
    name: req.body.name,
    email: req.body.email,
    createdAt: new Date(),
  };

  // Where does this user actually live?
  users.push(user); // In memory? Gone when server restarts
  res.status(201).json({ user });
});

app.get("/api/users/:id", (req, res) => {
  const user = users.find((u) => u.id === req.params.id);
  // What happens when you have 10 million users?
  // What about relationships between users and their posts?
  // How do you query by email efficiently?
  res.json({ user });
});

The uncomfortable truth: Without understanding databases, your APIs are just elaborate data formatters that lose everything when they restart. Modern applications require persistent, queryable, scalable data storage that survives server crashes, handles complex relationships, and performs efficiently under load.

Real applications need databases that handle:

  • Persistent storage: Data survives application restarts and server failures
  • Complex queries: Find users by email, posts by category, orders by date range
  • Relationships: Users have posts, posts have comments, orders contain products
  • Concurrent access: Multiple users reading and writing simultaneously
  • Data integrity: Ensuring your data remains consistent and valid
  • Performance: Millisecond response times even with millions of records

This article starts your database journey by covering the fundamental concepts that every backend developer must understand: relational vs. NoSQL databases, SQL basics, and the core operations that power every data-driven application.


Database Types: SQL vs NoSQL - Choosing Your Data Strategy

The Two Fundamental Paradigms

Databases fall into two main categories, each with different strengths and use cases:

Relational (SQL) Databases:

  • Structure: Data organized in tables with predefined schemas
  • Query Language: SQL (Structured Query Language)
  • Examples: PostgreSQL, MySQL, SQLite, Microsoft SQL Server
  • Best For: Complex relationships, transactions, data consistency

NoSQL Databases:

  • Structure: Various models (document, key-value, column-family, graph)
  • Query Language: Varies by database type
  • Examples: MongoDB, Redis, Cassandra, Neo4j
  • Best For: Flexible schemas, horizontal scaling, specific data patterns

When to Choose SQL Databases

SQL databases excel when you have:

-- Complex relationships between entities
-- Users table
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email VARCHAR(255) UNIQUE NOT NULL,
  username VARCHAR(50) UNIQUE NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Posts table with foreign key relationship
CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  content TEXT,
  author_id INTEGER REFERENCES users(id),
  category_id INTEGER REFERENCES categories(id),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Comments table with multiple relationships
CREATE TABLE comments (
  id SERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  user_id INTEGER REFERENCES users(id),
  post_id INTEGER REFERENCES posts(id),
  parent_comment_id INTEGER REFERENCES comments(id), -- For nested comments
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Complex query across multiple tables
SELECT
  u.username,
  p.title,
  COUNT(c.id) as comment_count,
  AVG(pr.rating) as avg_rating
FROM users u
JOIN posts p ON u.id = p.author_id
LEFT JOIN comments c ON p.id = c.post_id
LEFT JOIN post_ratings pr ON p.id = pr.post_id
WHERE p.created_at > '2024-01-01'
GROUP BY u.id, p.id
HAVING COUNT(c.id) > 5
ORDER BY avg_rating DESC
LIMIT 10;

Choose SQL when you need:

  • ACID transactions: Banking, e-commerce, inventory management
  • Complex relationships: Social networks, content management systems
  • Structured data: User profiles, product catalogs, financial records
  • Reporting and analytics: Business intelligence, data analysis
  • Data integrity: Critical business data that must remain consistent

When to Choose NoSQL Databases

NoSQL databases excel for different use cases:

// Document database (MongoDB) - Flexible, nested data
const userProfile = {
  _id: ObjectId("..."),
  username: "tech_blogger",
  profile: {
    bio: "Full-stack developer",
    location: "San Francisco",
    interests: ["JavaScript", "Python", "DevOps"],
    socialLinks: {
      twitter: "@tech_blogger",
      linkedin: "linkedin.com/in/tech-blogger",
      github: "github.com/tech-blogger"
    }
  },
  settings: {
    notifications: {
      email: true,
      push: false,
      sms: true
    },
    privacy: {
      profilePublic: true,
      showEmail: false
    }
  },
  posts: [
    {
      title: "Getting Started with Node.js",
      slug: "getting-started-nodejs",
      tags: ["nodejs", "javascript", "backend"],
      publishedAt: new Date("2024-01-15")
    }
  ]
};

// Key-Value database (Redis) - Simple, fast lookups
await redis.set("session:user:123", JSON.stringify({
  userId: 123,
  username: "tech_blogger",
  role: "premium",
  loginTime: Date.now()
}), 'EX', 3600); // Expires in 1 hour

// Graph database (Neo4j) - Relationship-focused
CREATE (alice:User {name: "Alice", email: "alice@example.com"})
CREATE (bob:User {name: "Bob", email: "bob@example.com"})
CREATE (charlie:User {name: "Charlie", email: "charlie@example.com"})
CREATE (alice)-[:FOLLOWS]->(bob)
CREATE (bob)-[:FOLLOWS]->(charlie)
CREATE (alice)-[:FOLLOWS]->(charlie)

// Find friends of friends
MATCH (user:User {name: "Alice"})-[:FOLLOWS]->()-[:FOLLOWS]->(fof:User)
WHERE NOT (user)-[:FOLLOWS]->(fof) AND user <> fof
RETURN fof.name

Choose NoSQL when you need:

  • Flexible schemas: Rapidly evolving data structures
  • Horizontal scaling: Massive datasets across multiple servers
  • High performance: Real-time applications, caching, gaming
  • Semi-structured data: JSON documents, logs, sensor data
  • Specific patterns: Session storage, real-time messaging, recommendation engines

Relational Database Concepts: The Foundation of Data Organization

Tables: Your Data’s Home

Think of tables as spreadsheets with rules. Each table represents a specific type of entity in your application:

-- Users table structure
CREATE TABLE users (
  id SERIAL PRIMARY KEY,           -- Unique identifier for each user
  username VARCHAR(50) UNIQUE NOT NULL, -- Username must be unique and cannot be empty
  email VARCHAR(255) UNIQUE NOT NULL,   -- Email must be unique and cannot be empty
  password_hash VARCHAR(255) NOT NULL, -- Encrypted password
  first_name VARCHAR(50),              -- Optional first name
  last_name VARCHAR(50),               -- Optional last name
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- Auto-set creation time
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- Auto-updated modification time
  is_active BOOLEAN DEFAULT TRUE      -- Account status
);

-- Sample data in the users table
-- +----+----------+------------------+---------------+------------+-----------+---------------------+---------------------+-----------+
-- | id | username | email            | password_hash | first_name | last_name | created_at          | updated_at          | is_active |
-- +----+----------+------------------+---------------+------------+-----------+---------------------+---------------------+-----------+
-- | 1  | alice123 | alice@email.com  | $2b$10$...    | Alice      | Johnson   | 2024-01-15 10:30:00 | 2024-01-15 10:30:00 | true      |
-- | 2  | bobdev   | bob@example.com  | $2b$10$...    | Bob        | Smith     | 2024-01-16 14:22:10 | 2024-01-16 14:22:10 | true      |
-- | 3  | charlie  | charlie@test.com | $2b$10$...    | Charlie    | Brown     | 2024-01-17 09:15:33 | 2024-01-17 09:15:33 | false     |
-- +----+----------+------------------+---------------+------------+-----------+---------------------+---------------------+-----------+

Primary Keys: Unique Identifiers

Every table needs a unique identifier for each row:

-- Different primary key strategies

-- 1. Auto-incrementing integer (most common)
CREATE TABLE posts (
  id SERIAL PRIMARY KEY,  -- PostgreSQL auto-increment
  title VARCHAR(255) NOT NULL,
  content TEXT
);

-- 2. UUID (globally unique)
CREATE TABLE sessions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id INTEGER REFERENCES users(id),
  token VARCHAR(255) UNIQUE NOT NULL,
  expires_at TIMESTAMP NOT NULL
);

-- 3. Composite primary key (multiple columns)
CREATE TABLE user_roles (
  user_id INTEGER REFERENCES users(id),
  role_id INTEGER REFERENCES roles(id),
  assigned_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (user_id, role_id)  -- Combination must be unique
);

-- 4. Natural key (using business data)
CREATE TABLE countries (
  iso_code CHAR(2) PRIMARY KEY,  -- 'US', 'UK', 'CA', etc.
  name VARCHAR(100) NOT NULL,
  continent VARCHAR(50)
);

Relationships: Connecting Your Data

The power of relational databases lies in relationships between tables:

-- One-to-Many: One user can have many posts
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  username VARCHAR(50) UNIQUE NOT NULL
);

CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  user_id INTEGER REFERENCES users(id), -- Foreign key to users table
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Many-to-Many: Posts can have many tags, tags can be on many posts
CREATE TABLE tags (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) UNIQUE NOT NULL
);

CREATE TABLE post_tags (
  post_id INTEGER REFERENCES posts(id),
  tag_id INTEGER REFERENCES tags(id),
  PRIMARY KEY (post_id, tag_id)  -- Junction table
);

-- One-to-One: Each user has one profile
CREATE TABLE user_profiles (
  id SERIAL PRIMARY KEY,
  user_id INTEGER UNIQUE REFERENCES users(id), -- UNIQUE makes it one-to-one
  bio TEXT,
  avatar_url VARCHAR(255),
  website VARCHAR(255)
);

Real-world relationship example:

-- E-commerce database relationships
CREATE TABLE customers (
  id SERIAL PRIMARY KEY,
  email VARCHAR(255) UNIQUE NOT NULL,
  first_name VARCHAR(50),
  last_name VARCHAR(50)
);

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_id INTEGER REFERENCES customers(id),
  order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  status VARCHAR(20) DEFAULT 'pending',
  total_amount DECIMAL(10,2)
);

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  price DECIMAL(10,2) NOT NULL,
  stock_quantity INTEGER DEFAULT 0
);

CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INTEGER REFERENCES orders(id),
  product_id INTEGER REFERENCES products(id),
  quantity INTEGER NOT NULL,
  unit_price DECIMAL(10,2) NOT NULL
);

-- Query: Find all orders for a customer with item details
SELECT
  o.id as order_id,
  o.order_date,
  o.status,
  p.name as product_name,
  oi.quantity,
  oi.unit_price,
  (oi.quantity * oi.unit_price) as line_total
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE c.email = 'alice@email.com'
ORDER BY o.order_date DESC;

Basic SQL Operations: CRUD That Actually Works

CREATE: Inserting Data

Adding new records to your database:

-- Insert a single user
INSERT INTO users (username, email, first_name, last_name)
VALUES ('techdev', 'techdev@example.com', 'Tech', 'Developer');

-- Insert multiple users in one statement
INSERT INTO users (username, email, first_name, last_name)
VALUES
  ('alice123', 'alice@example.com', 'Alice', 'Johnson'),
  ('bobsmith', 'bob@example.com', 'Bob', 'Smith'),
  ('charlie_b', 'charlie@example.com', 'Charlie', 'Brown');

-- Insert with returning the created record (PostgreSQL)
INSERT INTO posts (title, content, user_id)
VALUES ('My First Post', 'This is the content of my first post.', 1)
RETURNING id, created_at;

-- Insert from another table (data migration)
INSERT INTO archived_users (username, email, archived_at)
SELECT username, email, CURRENT_TIMESTAMP
FROM users
WHERE is_active = false;

Handle conflicts gracefully:

-- PostgreSQL: Insert or update if exists
INSERT INTO users (username, email, first_name)
VALUES ('existing_user', 'new@email.com', 'Updated')
ON CONFLICT (username)
DO UPDATE SET
  email = EXCLUDED.email,
  first_name = EXCLUDED.first_name,
  updated_at = CURRENT_TIMESTAMP;

-- MySQL: Insert or ignore if exists
INSERT IGNORE INTO tags (name)
VALUES ('javascript'), ('python'), ('javascript'); -- Duplicate will be ignored

READ: Querying Data

The heart of database operations - getting your data back:

-- Basic SELECT queries
SELECT * FROM users;  -- Get all columns (avoid in production)

SELECT id, username, email FROM users;  -- Get specific columns

SELECT username,
       email,
       CONCAT(first_name, ' ', last_name) as full_name
FROM users;

-- Filtering with WHERE
SELECT * FROM users
WHERE is_active = true;

SELECT * FROM posts
WHERE created_at >= '2024-01-01'
  AND user_id IN (1, 2, 3);

SELECT * FROM users
WHERE email LIKE '%@gmail.com';

-- Sorting and limiting
SELECT username, created_at
FROM users
ORDER BY created_at DESC
LIMIT 10;

-- Pagination
SELECT username, email
FROM users
ORDER BY id
LIMIT 20 OFFSET 40;  -- Skip first 40, get next 20

-- Aggregation queries
SELECT COUNT(*) as total_users FROM users;

SELECT COUNT(*) as active_users
FROM users
WHERE is_active = true;

SELECT
  DATE_TRUNC('month', created_at) as month,
  COUNT(*) as user_registrations
FROM users
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

-- Complex filtering
SELECT u.username, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.username
HAVING COUNT(p.id) > 5
ORDER BY post_count DESC;

UPDATE: Modifying Existing Data

Changing data safely and efficiently:

-- Update a single record
UPDATE users
SET first_name = 'Alice',
    last_name = 'Smith',
    updated_at = CURRENT_TIMESTAMP
WHERE id = 1;

-- Update multiple records with conditions
UPDATE users
SET is_active = false,
    updated_at = CURRENT_TIMESTAMP
WHERE last_login < '2023-01-01';

-- Update with calculations
UPDATE products
SET price = price * 1.10  -- 10% price increase
WHERE category = 'electronics';

-- Update using data from other tables
UPDATE users
SET post_count = (
  SELECT COUNT(*)
  FROM posts
  WHERE posts.user_id = users.id
);

-- Conditional updates with CASE
UPDATE orders
SET status = CASE
  WHEN order_date < CURRENT_DATE - INTERVAL '30 days'
    AND status = 'pending' THEN 'expired'
  WHEN payment_confirmed = true
    AND status = 'pending' THEN 'confirmed'
  ELSE status
END;

Safe update practices:

-- Always use WHERE clause to avoid updating all records
-- ❌ Dangerous: Updates ALL users
UPDATE users SET password_hash = 'new_hash';

-- ✅ Safe: Updates specific user
UPDATE users
SET password_hash = 'new_hash'
WHERE id = 123;

-- Use transactions for multiple related updates
BEGIN;

UPDATE accounts
SET balance = balance - 100
WHERE id = 1;

UPDATE accounts
SET balance = balance + 100
WHERE id = 2;

-- Only commit if both updates succeed
COMMIT;

DELETE: Removing Data

Deleting data responsibly:

-- Delete specific records
DELETE FROM users
WHERE id = 123;

DELETE FROM posts
WHERE created_at < '2023-01-01'
  AND view_count = 0;

-- Delete with joins (remove orphaned data)
DELETE FROM comments
WHERE post_id IN (
  SELECT id FROM posts
  WHERE user_id IN (
    SELECT id FROM users
    WHERE is_active = false
  )
);

-- Soft delete (preferred for important data)
UPDATE users
SET is_deleted = true,
    deleted_at = CURRENT_TIMESTAMP
WHERE id = 123;

-- Then filter out deleted records in queries
SELECT * FROM users
WHERE is_deleted = false OR is_deleted IS NULL;

SQL Syntax and Best Practices: Writing Professional Queries

Query Structure and Formatting

Well-formatted SQL is readable SQL:

-- ❌ Hard to read
SELECT u.id,u.username,p.title,p.created_at FROM users u JOIN posts p ON u.id=p.user_id WHERE u.is_active=true AND p.published=true ORDER BY p.created_at DESC LIMIT 10;

-- ✅ Clean and readable
SELECT
  u.id,
  u.username,
  p.title,
  p.created_at
FROM users u
JOIN posts p ON u.id = p.user_id
WHERE u.is_active = true
  AND p.published = true
ORDER BY p.created_at DESC
LIMIT 10;

Consistent formatting conventions:

-- Use meaningful table aliases
SELECT
  customer.first_name,
  customer.last_name,
  order_summary.total_amount,
  order_summary.order_date
FROM customers AS customer
JOIN orders AS order_summary ON customer.id = order_summary.customer_id;

-- Indent nested queries
SELECT
  username,
  email,
  (
    SELECT COUNT(*)
    FROM posts
    WHERE posts.user_id = users.id
  ) AS post_count
FROM users
WHERE created_at > (
  SELECT AVG(created_at)
  FROM users
  WHERE is_active = true
);

Common SQL Patterns

Window functions for advanced analytics:

-- Ranking users by post count
SELECT
  username,
  post_count,
  RANK() OVER (ORDER BY post_count DESC) as rank,
  ROW_NUMBER() OVER (ORDER BY post_count DESC) as row_num
FROM (
  SELECT
    u.username,
    COUNT(p.id) as post_count
  FROM users u
  LEFT JOIN posts p ON u.id = p.user_id
  GROUP BY u.id, u.username
) ranked_users;

-- Running totals
SELECT
  order_date,
  daily_revenue,
  SUM(daily_revenue) OVER (
    ORDER BY order_date
    ROWS UNBOUNDED PRECEDING
  ) as running_total
FROM daily_sales
ORDER BY order_date;

Working with dates and times:

-- Date filtering and formatting
SELECT
  username,
  created_at,
  DATE_PART('year', created_at) as registration_year,
  AGE(CURRENT_DATE, created_at::date) as account_age
FROM users
WHERE created_at >= CURRENT_DATE - INTERVAL '1 year';

-- Time-based aggregations
SELECT
  DATE_TRUNC('week', created_at) as week_start,
  COUNT(*) as weekly_registrations,
  COUNT(*) - LAG(COUNT(*)) OVER (
    ORDER BY DATE_TRUNC('week', created_at)
  ) as week_over_week_change
FROM users
GROUP BY DATE_TRUNC('week', created_at)
ORDER BY week_start;

Performance and Security Best Practices

Write efficient queries:

-- ❌ Inefficient: Function in WHERE clause prevents index usage
SELECT * FROM users
WHERE UPPER(username) = 'ALICE123';

-- ✅ Efficient: Direct comparison allows index usage
SELECT * FROM users
WHERE username = 'alice123';

-- ❌ Inefficient: SELECT * gets unnecessary data
SELECT * FROM posts
WHERE created_at > '2024-01-01';

-- ✅ Efficient: Only select needed columns
SELECT id, title, created_at
FROM posts
WHERE created_at > '2024-01-01';

-- Use EXPLAIN to understand query performance
EXPLAIN ANALYZE
SELECT u.username, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
GROUP BY u.id, u.username;

Prevent SQL injection (when building dynamic queries):

// ❌ Vulnerable to SQL injection
const getUserByEmail = (email) => {
  const query = `SELECT * FROM users WHERE email = '${email}'`;
  return db.query(query);
};

// Malicious input: '; DROP TABLE users; --
// Results in: SELECT * FROM users WHERE email = ''; DROP TABLE users; --'

// ✅ Use parameterized queries
const getUserByEmail = (email) => {
  const query = "SELECT * FROM users WHERE email = $1";
  return db.query(query, [email]);
};

// ✅ Or use query builders with automatic escaping
const user = await db("users").where("email", email).first();

Handle NULL values properly:

-- Understand NULL behavior
SELECT * FROM users
WHERE middle_name = NULL;  -- ❌ Returns no rows (NULL != NULL)

SELECT * FROM users
WHERE middle_name IS NULL; -- ✅ Correct NULL check

-- NULL-safe operations
SELECT
  first_name,
  last_name,
  COALESCE(middle_name, '') as middle_name,  -- Default to empty string
  CONCAT(first_name, ' ', COALESCE(middle_name || ' ', ''), last_name) as full_name
FROM users;

Key Takeaways

Database fundamentals form the foundation of every backend application. Understanding relational concepts, basic SQL operations, and proper query techniques is essential for building applications that store and retrieve data efficiently and reliably.

The database mindset you need:

  • Think in relationships: Design your tables to reflect how your data actually connects
  • Plan for scale: Consider how your queries will perform with millions of records
  • Prioritize data integrity: Use constraints, transactions, and proper validation
  • Write readable queries: Format your SQL for your future self and your teammates

What distinguishes professional database usage:

  • Proper table design with meaningful relationships and constraints
  • Efficient queries that use indexes and avoid common performance pitfalls
  • Safe practices that prevent data corruption and security vulnerabilities
  • Consistent formatting and documentation for maintainable database code

What’s Next

We’ve covered the relational database basics that power most web applications. In the next article, we’ll dive deeper into advanced SQL operations: complex JOINs that span multiple tables, subqueries and CTEs for breaking down complex problems, aggregation functions for business analytics, database normalization principles, and transaction management for data consistency.

The foundation is solid—you understand tables, relationships, and basic CRUD operations. Next, we’ll explore the advanced SQL patterns that enable sophisticated data analysis and ensure your database design scales professionally.

You’re no longer just storing data—you’re architecting information systems that maintain integrity, perform efficiently, and support complex business requirements. The database journey deepens from here.