A Beginner's Guide to SQL: From Creating Tables to Optimization

A Beginner's Guide to SQL: From Creating Tables to Optimization

Картинка к публикации: A Beginner's Guide to SQL: From Creating Tables to Optimization

Introduction to SQL

SQL (Structured Query Language) is a language used to interact with relational databases. Its main purpose is to provide developers and administrators with a straightforward yet powerful tool for storing, modifying, and retrieving data. Let’s dive into why SQL is so important and why almost every modern information system relies on relational DBMS (Database Management Systems).

What Is SQL and Why Is It Important?

SQL was created to simplify how we work with data. In earlier days, handling information in programs often meant dealing with complicated binary or text formats. With the advent of SQL, storing and searching data became much easier and faster.

  • Versatility. SQL is supported by most popular DBMSs: MySQL, PostgreSQL, Oracle, Microsoft SQL Server, and more.
  • Simplicity. The SQL query language has a readable syntax based on English keywords (SELECT, INSERT, UPDATE, DELETE).
  • Flexibility. SQL lets you perform complex filtering, sorting, aggregate calculations, and merges of data from multiple tables.
  • Standardization. The core syntax is similar across different DBMSs, making it simpler to switch from one system to another.

No matter if it’s an online store, a social network, or an internal accounting system, nearly every application is built around a database. This is why SQL skills are in demand in various projects.

Relational Databases: Key Principles

The relational model of databases is based on the idea that data is stored in two-dimensional tables (the term relation comes from this concept). In these tables, rows (or “records”) represent individual entities, while columns (or “fields”) represent the attributes of those entities.

Example: “Users” table structure

IDNameEmailRegistration Date
1Иванivan@example.com2025-01-01
2Еленаelena@example.com2025-01-02
3Николайnikolay@example.com2025-01-05
  • Columns (fields) represent different data points (ID, Name, Email, Registration Date).
  • Rows (records) are actual instances of data (in this case, users).

Core Principles of the Relational Model

  1. Unique name for each table. For example, users, orders, products.
  2. Explicit structure. Every table has a clearly defined set of columns with specific data types.
  3. Primary key. To identify each record, tables usually have a special column (e.g., id) whose values are unique.
  4. Relationships. Tables can be linked through foreign keys. For instance, an orders table can contain a foreign key referencing the user who placed the order.

This relational approach offers a convenient structure for managing interrelated data, and SQL makes it simple and efficient to manipulate that data.

Database Architecture: How Data Is Stored and Processed

When we talk about a relational database, we generally mean a DBMS that manages multiple components:

  1. File system. Tables are physically stored in files with a specialized format (each DBMS has its own internal mechanisms).
  2. Server side. This is the database “engine” that processes incoming SQL queries, optimizes them, interacts with the files, and returns the results.
  3. Client. This could be our application or a command-line utility that sends SQL queries and receives responses.

Note: In some cases, the DBMS’s storage logic may include caching, replication, distributing data across multiple servers, and other sophisticated mechanisms. But on a basic level, you should understand there is a “core” (the server) and a “client” (us) that communicates with the server using SQL.

A Simple Database Workflow

  1. We write a query like SELECT * FROM users;
  2. The query is sent to the DBMS
  3. The DBMS analyzes the query, decides the best way to execute it, accesses the relevant data files, and retrieves the required rows and columns
  4. The result is returned to us in a readable form (a set of records)

Basic Operations

Let’s explore the fundamental operations that allow you to retrieve the data you need from relational database tables using SQL. The main “star” here is the SELECT statement, which can handle almost anything—from simple queries to complex filtering and sorting.

SELECT: The Foundation of Data Retrieval

A straightforward example of a query is selecting all columns and rows from a table:

SELECT *
FROM users;
  • SELECT tells the DBMS (Database Management System) to “retrieve data...”
  • * means “all columns”
  • FROM users specifies the table from which to fetch the data

Note: In practice, rather than using *, you often specify particular columns (for example, SELECT name, email FROM users;). This approach is faster and avoids retrieving unnecessary data.

WHERE: Filtering Data

In many cases, you’ll need only specific data rather than everything in a table. That’s where the WHERE clause comes in.

Example 1: Filtering by a Specific Value

SELECT name, email
FROM users
WHERE id = 5;

This retrieves the name and email columns from the users table where id equals 5.

Example 2: Filtering by Text

SELECT id, name
FROM users
WHERE email = 'elena@example.com';

This returns rows where the email column matches 'elena@example.com'.

Example 3: Using Comparison Operators

SELECT id, total_amount
FROM orders
WHERE total_amount > 1000;

The > operator finds all orders where total_amount is greater than 1000.

Example 4: Multiple Criteria

SELECT id, product_name, price
FROM products
WHERE price > 1000
  AND category = 'Ноутбуки';

We use AND to include only those products classified as laptops (Ноутбуки) with a price above 1000.

ORDER BY: Sorting

To arrange the results of a query, use the ORDER BY keyword. By default, sorting is in ascending order (ASC). For descending order, add DESC.

Sorting Example

SELECT id, product_name, price
FROM products
ORDER BY price ASC;

Sorts products by price in ascending order.

SELECT id, product_name, price
FROM products
ORDER BY price DESC;

Sorts the same data by price in descending order.

Sorting by Multiple Columns

SELECT id, product_name, price
FROM products
ORDER BY category ASC, price DESC;

This first groups results by category in ascending order, then sorts each category’s entries by price in descending order.

LIMIT: Restricting the Number of Results

When a table contains many rows, and you only need a few (for example, when displaying a single page of data), use the LIMIT clause:

Example 1: Get the First 5 Rows

SELECT id, name
FROM users
ORDER BY id ASC
LIMIT 5;

This query returns the first five rows. Keep in mind that without ORDER BY, the order of returned rows is not guaranteed.

Example 2: Using OFFSET

Some DBMSs (like MySQL and PostgreSQL) let you skip a number of rows before returning the rest via OFFSET:

SELECT id, name
FROM users
ORDER BY id ASC
LIMIT 5
OFFSET 5;

This skips the first five rows and returns the next five—handy for pagination when showing data page by page.

Practical Scenario: Selecting “Active” Users

Suppose the users table has a column is_active (a boolean or tinyint) indicating whether a user is active. We want to display the first 10 active users, sorted by registration date in descending order so that the newest ones appear first:

SELECT id, name, email, registration_date
FROM users
WHERE is_active = 1
ORDER BY registration_date DESC
LIMIT 10;
  • WHERE is_active = 1 selects only users whose is_active field equals 1.
  • ORDER BY registration_date DESC sorts them so the most recently registered users appear at the top.
  • LIMIT 10 returns only the first 10 matching users.

Working with Tables

In this chapter, we’ll cover how to build the structure of a database—specifically, how to create tables—and the various ways to populate them with data, update them, or delete records as needed. We’ll also discuss data types and how they affect storage and data integrity.

Creating Tables (CREATE TABLE)

The CREATE TABLE command lets you specify the name of a new table, its columns and data types, as well as key constraints (for example, a primary key). The general syntax is:

CREATE TABLE table_name (
    column1 data_type [constraints],
    column2 data_type [constraints],
    ...
);

Example 1: Creating a “users” Table

CREATE TABLE users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE,
    registration_date DATE,
    is_active TINYINT DEFAULT 1
);

Here’s what’s happening in this statement:

  • id INT AUTO_INCREMENT PRIMARY KEY
    • INT means it’s an integer type.
    • AUTO_INCREMENT (MySQL) or SERIAL (PostgreSQL) means the value will automatically increase by 1 each time a new record is inserted.
    • PRIMARY KEY designates this column as uniquely identifying each row.
  • name VARCHAR(100) NOT NULL
    • VARCHAR(100) is a variable-length string type.
    • NOT NULL means this column cannot contain a NULL (empty) value.
  • email VARCHAR(255) UNIQUE
    • Likewise, this is a string type of up to 255 characters.
    • UNIQUE indicates no duplicate values are allowed—each email must be unique.
  • registration_date DATE
    • A data type for storing a date (no time component).
  • is_active TINYINT DEFAULT 1
    • A numeric type, usually 0 or 1 to indicate a status (active/inactive).
    • DEFAULT 1 means if a user is added without explicitly specifying this field, it will default to 1.

Data Types and Their Impact

Choosing the right data type is crucial for performance and accurate storage. For example:

  • INT, BIGINT – For integer values. The choice depends on the range you need (INT vs. BIGINT).
  • FLOAT, DOUBLE, DECIMAL – For floating-point numbers and financial calculations (with DECIMAL often preferred for precision).
  • VARCHAR – For variable-length strings. Specifying a maximum length defines how many characters can be stored.
  • TEXT – If you need to store long texts (such as articles, comments, etc.).
  • DATE, DATETIME, TIMESTAMP – For storing dates and/or times. Choose based on whether you only need the date or a full timestamp.

Using the correct data type safeguards against logical errors (e.g., a string where a number should be) and helps conserve database space.

Inserting Data (INSERT)

To add a new row to a table, use the INSERT statement. The syntax is:

INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);

Example: Adding a New User

INSERT INTO users (name, email, registration_date)
VALUES ('Иван', 'ivan@example.com', '2025-01-08');

We specify the columns we want to fill (name, email, registration_date). The id column is auto-generated (AUTO_INCREMENT), and is_active gets its default value of 1.

Inserting Multiple Rows at Once

INSERT INTO users (name, email, registration_date)
VALUES 
    ('Елена', 'elena@example.com', '2025-01-09'),
    ('Николай', 'nikolay@example.com', '2025-01-10');

Many DBMSs support adding multiple records in a single query, which can be faster than doing individual INSERT statements one by one.

Updating Data (UPDATE)

Use the UPDATE statement when you need to modify existing records in a table. The syntax is:

UPDATE table_name
SET column1 = new_value1, column2 = new_value2, ...
[WHERE condition];

Example: Changing a User’s Active Status

UPDATE users
SET is_active = 0
WHERE email = 'elena@example.com';

We set is_active to 0 to mark that Elena is no longer active. Without a WHERE clause, the statement would update all rows in the table, so always use caution to avoid widespread unintended changes.

Deleting Data (DELETE)

To remove records, use the DELETE command. The syntax is:

DELETE FROM table_name
[WHERE condition];

Example: Removing a User

DELETE FROM users
WHERE email = 'nikolay@example.com';

This completely deletes the row where email = 'nikolay@example.com'. Again, be aware that leaving out the WHERE clause removes every row in the table—an extremely risky action.

Practical Example: Creating an “orders” Table

Let’s say we want to track orders placed by users in an online store. We’ll create an orders table that references the users table (through the user_id foreign key), assuming the DBMS supports foreign keys:

CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    user_id INT NOT NULL,
    order_date DATETIME NOT NULL,
    total_amount DECIMAL(10, 2) NOT NULL,
    CONSTRAINT fk_user
      FOREIGN KEY (user_id)
      REFERENCES users (id)
);
  • user_id INT NOT NULL – The identifier for the user who placed the order.
  • FOREIGN KEY (user_id) REFERENCES users (id) creates a link to the users table, preventing any order from referencing a non-existent user.

Inserting an Order

INSERT INTO orders (user_id, order_date, total_amount)
VALUES (1, '2025-01-08 10:30:00', 1500.50);

The user with id = 1 (Иван) has placed an order with a total amount of 1500.50.

Aggregation and Combining Data

In this chapter, we’ll look at how to group data and perform calculations on those groups—such as summing values, counting rows, finding minimums and maximums, and more. We’ll also explore how to combine data from multiple tables using JOIN and UNION.

Data Aggregation

Aggregation is a way to summarize a set of rows in a table and return a single aggregated value for each group. It’s often used to calculate statistics based on certain criteria. SQL provides several built-in aggregate functions for this purpose:

  • COUNT – Counts the number of rows or the number of non-null values in a column.
  • SUM – Adds up values in a group.
  • AVG – Finds the average value.
  • MIN – Returns the minimum value.
  • MAX – Returns the maximum value.

Example 1: Counting Total Users

SELECT COUNT(*) AS total_users
FROM users;
  • COUNT(*) returns the total number of rows in the users table.
  • AS total_users gives the column a readable alias.

Example 2: Finding the Average Order Amount

SELECT AVG(total_amount) AS average_order
FROM orders;
  • AVG(total_amount) calculates the average order amount.
  • This is useful when you want to determine the “average check” in an online store.

GROUP BY and Filtering Groups with HAVING

To apply aggregate functions to specific subsets of data (groups), you use GROUP BY. It groups rows by one or more columns, and you can then return aggregate results for each group.

Example 1: Number of Orders per User

SELECT user_id, COUNT(*) AS orders_count
FROM orders
GROUP BY user_id;
  • Here, all orders are grouped by user_id, so we see how many orders each user has made.
  • The result might look like:
user_idorders_count
110
23
37

Example 2: Summing Orders by Date

SELECT DATE(order_date) AS order_day, SUM(total_amount) AS total_sum
FROM orders
GROUP BY DATE(order_date);
  • We take only the date part (excluding time) from order_date and sum up total_amount.
  • This shows the total sales for each day.

HAVING is used to filter results after grouping (i.e., it applies conditions to the aggregated data). It works similarly to WHERE but on grouped results.

SELECT user_id, SUM(total_amount) AS total_spent
FROM orders
GROUP BY user_id
HAVING SUM(total_amount) > 1000;
  1. First, we group orders by users and calculate the total purchase amount (SUM(total_amount)) for each user.
  2. Then, using HAVING, we exclude users whose total spending is 1000 or less.

Important: WHERE filters rows before grouping, whereas HAVING filters groups after.

Combining Data from Multiple Tables

In a real database, information is typically split across different tables that are related to each other. For example, you might have a users table and an orders table, where orders.user_id references users.id. To get consolidated information, you need to join these tables.

JOIN: Inner and Outer Joins

  • INNER JOIN returns only the rows that have matching records in both tables.
SELECT 
    u.name AS user_name,
    o.order_id,
    o.total_amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id;
  • ON u.id = o.user_id specifies the matching condition.
  • Each row from users is matched with rows in orders where the user’s id equals the order’s user_id.
  • If a user has placed no orders, they won’t appear in the results.
  • LEFT JOIN returns all rows from the left table (the one before the word JOIN), plus the matching rows from the right table. If there’s no match in the right table, the corresponding columns show NULL:
SELECT 
    u.name,
    o.order_id,
    o.total_amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
  • This will list all users along with their orders, if any.
  • For users who haven’t placed orders, fields from orders will be NULL.
  • RIGHT JOIN does the opposite: it returns all rows from the right table and the matching rows from the left. It’s less common because you can usually flip the query and use LEFT JOIN instead.

Merging Results with UNION

If you need to “stack” the results of two queries on top of each other—where both have the same column structure—you can use the UNION (or UNION ALL) operator:

SELECT name AS person, email
FROM users
UNION
SELECT supplier_name AS person, supplier_email AS email
FROM suppliers;
  • This produces a combined list of “people” and their emails, coming from two different tables.
  • By default, UNION removes duplicate rows. If you want to keep duplicates, use UNION ALL.
  • The number and type of columns must match in both queries.

Practical Example: Daily Totals with User Names

Suppose we have an orders table where each order record includes a user_id and a date. We want to show the day, the total number of orders for that day, and the names of all users who placed orders on those days.

  1. Step 1: Group orders by day and find the total count of orders.

    SELECT DATE(o.order_date) AS order_day, COUNT(*) AS total_orders
    FROM orders o
    GROUP BY DATE(o.order_date);
    
  2. Step 2: To add user names, we join the users table:

    SELECT 
        DATE(o.order_date) AS order_day,
        GROUP_CONCAT(u.name SEPARATOR ', ') AS users_list,
        COUNT(*) AS total_orders
    FROM orders o
    INNER JOIN users u ON o.user_id = u.id
    GROUP BY DATE(o.order_date);
    
    • GROUP_CONCAT(u.name SEPARATOR ', ') (available in MySQL) concatenates the user names into a single string, separated by commas.
    • The result shows who made orders on each specific day and how many total orders were placed.

Query Optimization

In this final chapter, we’ll discuss how to make working with your database simpler and more efficient. We’ll look at creating views (VIEW) and using indexes to speed up data retrieval, as well as some advanced optimization techniques.

Creating Views (VIEW)

A view is a virtual table that’s built from the results of a SELECT query. Depending on the database system and the type of view, the data might not be fully duplicated in the database—instead, the view “stores” the query logic in one place.

1.1 Why Use Views

  • Simplifying complex queries. Instead of rewriting a lengthy SELECT with multiple joins and filters every time, you can create a view once and then query it like a regular table.
  • Security. You can hide certain columns or filter data by giving access only to the view, rather than the underlying table.
  • Logical isolation. If the structure of the actual tables changes, but the view’s interface stays the same, applications can continue using the view without modification.

1.2 Creating a View

The general syntax:

CREATE VIEW view_name AS
SELECT ...
FROM ...
[WHERE ...]

Example: Creating “active_users”

CREATE VIEW active_users AS
SELECT 
    id,
    name,
    email,
    registration_date
FROM users
WHERE is_active = 1;

Now, to refer to active users, you no longer need to include WHERE is_active = 1 in every query. Just do:

SELECT *
FROM active_users;

Under the hood, the DBMS will run the original query, but from your perspective, it’s as if you’re querying a separate table.

Indexes: Speeding Up Data Retrieval

2.1 What Is an Index and How Does It Work?

An index is a specialized data structure (for example, a B-tree) that the DBMS stores separately from the main table. It allows the system to find relevant rows much faster than scanning the entire table row by row.

  • Without an index: The DBMS may have to check each row to see if it meets the query condition, which can be very slow when dealing with large datasets.
  • With an index: The DBMS uses a specialized structure that keeps column values in an ordered format, making it quick to locate the rows you’re looking for.

2.2 Types of Indexes

  • B-Tree Index (the most common type in many DBMSs): Good for exact matches, range queries, and sorting.
  • Hash Index (for example, in PostgreSQL): Extremely fast for exact matches, but not suitable for range queries.
  • Fulltext Index (in MySQL): Optimized for searching text fields (e.g., words in articles).
  • GIN / GIST Indexes (in PostgreSQL): Useful for complex data types like JSON, arrays, geospatial data, etc.

2.3 Creating an Index

Syntax in MySQL (similar in other DBMSs):

CREATE INDEX index_name
ON table_name (column1, [column2, ...]);

Example: Index on email

CREATE INDEX idx_users_email
ON users (email);

Now a query like:

SELECT *
FROM users
WHERE email = 'elena@example.com';

will run faster.

Note: Creating an index speeds up read operations (SELECT) but can slow down inserts (INSERT), updates (UPDATE), and deletions (DELETE) because the index must be updated too. Therefore, you should only add indexes where fast retrieval matters, not on every column.

Advanced Optimization Techniques

3.1 Analyzing Query Execution (EXPLAIN)

Many DBMSs (MySQL, PostgreSQL, SQL Server, Oracle) offer an EXPLAIN tool that shows how a query will be executed—which tables are scanned, which indexes are used, and so on.

EXPLAIN SELECT *
FROM users
WHERE email = 'elena@example.com';

The results will indicate whether the query uses an index or performs a full table scan. This helps you spot bottlenecks and improve them by creating indexes or rewriting the query.

3.2 Query Caching

Some DBMSs can cache query results. Caching is also commonly implemented at the application level (for example, using Memcached or Redis) to avoid rerunning the same queries against the database.

3.3 Denormalization and Sharding

  • Denormalization: Deliberately duplicating data across tables to reduce the number of joins, which may be useful if fast reads are more important than a perfectly normalized structure.
  • Sharding: Splitting a large table into separate fragments (shards), typically by some key like an id range. This distributes the load across multiple servers.

3.4 Query Planners and Distributed Databases

Large-scale systems may use distributed architectures (PostgreSQL clusters, MySQL sharding, table partitioning, etc.). That’s beyond the scope of this introductory course, but learning about indexing and how to use EXPLAIN is already a major step toward better performance.

With this, we wrap up our introductory SQL course. You now know how to create tables, manipulate data, combine results, aggregate information, and optimize queries. This foundation will help you move on to studying specific DBMSs (MySQL, PostgreSQL, SQL Server, and more) and tackling real-world database challenges.


Read also:

ChatGPT
Eva
💫 Eva assistant

Выберите способ входа