A Beginner's Guide to SQL: From Creating Tables to Optimization
Introduction to SQL
SQL (Structured Query Language) is a language used to interact with relational databases. Its main purpose is to provide developers and administrators with a straightforward yet powerful tool for storing, modifying, and retrieving data. Let’s dive into why SQL is so important and why almost every modern information system relies on relational DBMS (Database Management Systems).
What Is SQL and Why Is It Important?
SQL was created to simplify how we work with data. In earlier days, handling information in programs often meant dealing with complicated binary or text formats. With the advent of SQL, storing and searching data became much easier and faster.
- Versatility. SQL is supported by most popular DBMSs: MySQL, PostgreSQL, Oracle, Microsoft SQL Server, and more.
- Simplicity. The SQL query language has a readable syntax based on English keywords (SELECT, INSERT, UPDATE, DELETE).
- Flexibility. SQL lets you perform complex filtering, sorting, aggregate calculations, and merges of data from multiple tables.
- Standardization. The core syntax is similar across different DBMSs, making it simpler to switch from one system to another.
No matter if it’s an online store, a social network, or an internal accounting system, nearly every application is built around a database. This is why SQL skills are in demand in various projects.
Relational Databases: Key Principles
The relational model of databases is based on the idea that data is stored in two-dimensional tables (the term relation comes from this concept). In these tables, rows (or “records”) represent individual entities, while columns (or “fields”) represent the attributes of those entities.
Example: “Users” table structure
ID | Name | Registration Date | |
---|---|---|---|
1 | Иван | ivan@example.com | 2025-01-01 |
2 | Елена | elena@example.com | 2025-01-02 |
3 | Николай | nikolay@example.com | 2025-01-05 |
- Columns (fields) represent different data points (ID, Name, Email, Registration Date).
- Rows (records) are actual instances of data (in this case, users).
Core Principles of the Relational Model
- Unique name for each table. For example,
users
,orders
,products
. - Explicit structure. Every table has a clearly defined set of columns with specific data types.
- Primary key. To identify each record, tables usually have a special column (e.g.,
id
) whose values are unique. - Relationships. Tables can be linked through foreign keys. For instance, an
orders
table can contain a foreign key referencing the user who placed the order.
This relational approach offers a convenient structure for managing interrelated data, and SQL makes it simple and efficient to manipulate that data.
Database Architecture: How Data Is Stored and Processed
When we talk about a relational database, we generally mean a DBMS that manages multiple components:
- File system. Tables are physically stored in files with a specialized format (each DBMS has its own internal mechanisms).
- Server side. This is the database “engine” that processes incoming SQL queries, optimizes them, interacts with the files, and returns the results.
- Client. This could be our application or a command-line utility that sends SQL queries and receives responses.
Note: In some cases, the DBMS’s storage logic may include caching, replication, distributing data across multiple servers, and other sophisticated mechanisms. But on a basic level, you should understand there is a “core” (the server) and a “client” (us) that communicates with the server using SQL.
A Simple Database Workflow
- We write a query like
SELECT * FROM users;
- The query is sent to the DBMS
- The DBMS analyzes the query, decides the best way to execute it, accesses the relevant data files, and retrieves the required rows and columns
- The result is returned to us in a readable form (a set of records)
Basic Operations
Let’s explore the fundamental operations that allow you to retrieve the data you need from relational database tables using SQL. The main “star” here is the SELECT
statement, which can handle almost anything—from simple queries to complex filtering and sorting.
SELECT: The Foundation of Data Retrieval
A straightforward example of a query is selecting all columns and rows from a table:
SELECT *
FROM users;
SELECT
tells the DBMS (Database Management System) to “retrieve data...”*
means “all columns”FROM users
specifies the table from which to fetch the data
Note: In practice, rather than using
*
, you often specify particular columns (for example,SELECT name, email FROM users;
). This approach is faster and avoids retrieving unnecessary data.
WHERE: Filtering Data
In many cases, you’ll need only specific data rather than everything in a table. That’s where the WHERE
clause comes in.
Example 1: Filtering by a Specific Value
SELECT name, email
FROM users
WHERE id = 5;
This retrieves the name
and email
columns from the users
table where id
equals 5.
Example 2: Filtering by Text
SELECT id, name
FROM users
WHERE email = 'elena@example.com';
This returns rows where the email
column matches 'elena@example.com'
.
Example 3: Using Comparison Operators
SELECT id, total_amount
FROM orders
WHERE total_amount > 1000;
The >
operator finds all orders where total_amount
is greater than 1000.
Example 4: Multiple Criteria
SELECT id, product_name, price
FROM products
WHERE price > 1000
AND category = 'Ноутбуки';
We use AND
to include only those products classified as laptops (Ноутбуки
) with a price above 1000.
ORDER BY: Sorting
To arrange the results of a query, use the ORDER BY
keyword. By default, sorting is in ascending order (ASC
). For descending order, add DESC
.
Sorting Example
SELECT id, product_name, price
FROM products
ORDER BY price ASC;
Sorts products by price
in ascending order.
SELECT id, product_name, price
FROM products
ORDER BY price DESC;
Sorts the same data by price
in descending order.
Sorting by Multiple Columns
SELECT id, product_name, price
FROM products
ORDER BY category ASC, price DESC;
This first groups results by category
in ascending order, then sorts each category’s entries by price
in descending order.
LIMIT: Restricting the Number of Results
When a table contains many rows, and you only need a few (for example, when displaying a single page of data), use the LIMIT
clause:
Example 1: Get the First 5 Rows
SELECT id, name
FROM users
ORDER BY id ASC
LIMIT 5;
This query returns the first five rows. Keep in mind that without ORDER BY
, the order of returned rows is not guaranteed.
Example 2: Using OFFSET
Some DBMSs (like MySQL and PostgreSQL) let you skip a number of rows before returning the rest via OFFSET
:
SELECT id, name
FROM users
ORDER BY id ASC
LIMIT 5
OFFSET 5;
This skips the first five rows and returns the next five—handy for pagination when showing data page by page.
Practical Scenario: Selecting “Active” Users
Suppose the users
table has a column is_active
(a boolean or tinyint) indicating whether a user is active. We want to display the first 10 active users, sorted by registration date in descending order so that the newest ones appear first:
SELECT id, name, email, registration_date
FROM users
WHERE is_active = 1
ORDER BY registration_date DESC
LIMIT 10;
WHERE is_active = 1
selects only users whoseis_active
field equals 1.ORDER BY registration_date DESC
sorts them so the most recently registered users appear at the top.LIMIT 10
returns only the first 10 matching users.
Working with Tables
In this chapter, we’ll cover how to build the structure of a database—specifically, how to create tables—and the various ways to populate them with data, update them, or delete records as needed. We’ll also discuss data types and how they affect storage and data integrity.
Creating Tables (CREATE TABLE)
The CREATE TABLE
command lets you specify the name of a new table, its columns and data types, as well as key constraints (for example, a primary key). The general syntax is:
CREATE TABLE table_name (
column1 data_type [constraints],
column2 data_type [constraints],
...
);
Example 1: Creating a “users” Table
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE,
registration_date DATE,
is_active TINYINT DEFAULT 1
);
Here’s what’s happening in this statement:
- id INT AUTO_INCREMENT PRIMARY KEY
INT
means it’s an integer type.AUTO_INCREMENT
(MySQL) orSERIAL
(PostgreSQL) means the value will automatically increase by 1 each time a new record is inserted.PRIMARY KEY
designates this column as uniquely identifying each row.
- name VARCHAR(100) NOT NULL
VARCHAR(100)
is a variable-length string type.NOT NULL
means this column cannot contain aNULL
(empty) value.
- email VARCHAR(255) UNIQUE
- Likewise, this is a string type of up to 255 characters.
UNIQUE
indicates no duplicate values are allowed—each email must be unique.
- registration_date DATE
- A data type for storing a date (no time component).
- is_active TINYINT DEFAULT 1
- A numeric type, usually
0
or1
to indicate a status (active/inactive). DEFAULT 1
means if a user is added without explicitly specifying this field, it will default to1
.
- A numeric type, usually
Data Types and Their Impact
Choosing the right data type is crucial for performance and accurate storage. For example:
- INT, BIGINT – For integer values. The choice depends on the range you need (INT vs. BIGINT).
- FLOAT, DOUBLE, DECIMAL – For floating-point numbers and financial calculations (with
DECIMAL
often preferred for precision). - VARCHAR – For variable-length strings. Specifying a maximum length defines how many characters can be stored.
- TEXT – If you need to store long texts (such as articles, comments, etc.).
- DATE, DATETIME, TIMESTAMP – For storing dates and/or times. Choose based on whether you only need the date or a full timestamp.
Using the correct data type safeguards against logical errors (e.g., a string where a number should be) and helps conserve database space.
Inserting Data (INSERT)
To add a new row to a table, use the INSERT
statement. The syntax is:
INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);
Example: Adding a New User
INSERT INTO users (name, email, registration_date)
VALUES ('Иван', 'ivan@example.com', '2025-01-08');
We specify the columns we want to fill (name
, email
, registration_date
). The id
column is auto-generated (AUTO_INCREMENT
), and is_active
gets its default value of 1
.
Inserting Multiple Rows at Once
INSERT INTO users (name, email, registration_date)
VALUES
('Елена', 'elena@example.com', '2025-01-09'),
('Николай', 'nikolay@example.com', '2025-01-10');
Many DBMSs support adding multiple records in a single query, which can be faster than doing individual INSERT
statements one by one.
Updating Data (UPDATE)
Use the UPDATE
statement when you need to modify existing records in a table. The syntax is:
UPDATE table_name
SET column1 = new_value1, column2 = new_value2, ...
[WHERE condition];
Example: Changing a User’s Active Status
UPDATE users
SET is_active = 0
WHERE email = 'elena@example.com';
We set is_active
to 0
to mark that Elena is no longer active. Without a WHERE
clause, the statement would update all rows in the table, so always use caution to avoid widespread unintended changes.
Deleting Data (DELETE)
To remove records, use the DELETE
command. The syntax is:
DELETE FROM table_name
[WHERE condition];
Example: Removing a User
DELETE FROM users
WHERE email = 'nikolay@example.com';
This completely deletes the row where email = 'nikolay@example.com'
. Again, be aware that leaving out the WHERE
clause removes every row in the table—an extremely risky action.
Practical Example: Creating an “orders” Table
Let’s say we want to track orders placed by users in an online store. We’ll create an orders
table that references the users
table (through the user_id
foreign key), assuming the DBMS supports foreign keys:
CREATE TABLE orders (
order_id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
order_date DATETIME NOT NULL,
total_amount DECIMAL(10, 2) NOT NULL,
CONSTRAINT fk_user
FOREIGN KEY (user_id)
REFERENCES users (id)
);
- user_id INT NOT NULL – The identifier for the user who placed the order.
FOREIGN KEY (user_id) REFERENCES users (id)
creates a link to theusers
table, preventing any order from referencing a non-existent user.
Inserting an Order
INSERT INTO orders (user_id, order_date, total_amount)
VALUES (1, '2025-01-08 10:30:00', 1500.50);
The user with id = 1
(Иван) has placed an order with a total amount of 1500.50.
Aggregation and Combining Data
In this chapter, we’ll look at how to group data and perform calculations on those groups—such as summing values, counting rows, finding minimums and maximums, and more. We’ll also explore how to combine data from multiple tables using JOIN
and UNION
.
Data Aggregation
Aggregation is a way to summarize a set of rows in a table and return a single aggregated value for each group. It’s often used to calculate statistics based on certain criteria. SQL provides several built-in aggregate functions for this purpose:
- COUNT – Counts the number of rows or the number of non-null values in a column.
- SUM – Adds up values in a group.
- AVG – Finds the average value.
- MIN – Returns the minimum value.
- MAX – Returns the maximum value.
Example 1: Counting Total Users
SELECT COUNT(*) AS total_users
FROM users;
COUNT(*)
returns the total number of rows in theusers
table.AS total_users
gives the column a readable alias.
Example 2: Finding the Average Order Amount
SELECT AVG(total_amount) AS average_order
FROM orders;
AVG(total_amount)
calculates the average order amount.- This is useful when you want to determine the “average check” in an online store.
GROUP BY and Filtering Groups with HAVING
To apply aggregate functions to specific subsets of data (groups), you use GROUP BY
. It groups rows by one or more columns, and you can then return aggregate results for each group.
Example 1: Number of Orders per User
SELECT user_id, COUNT(*) AS orders_count
FROM orders
GROUP BY user_id;
- Here, all orders are grouped by
user_id
, so we see how many orders each user has made. - The result might look like:
user_id | orders_count |
---|---|
1 | 10 |
2 | 3 |
3 | 7 |
Example 2: Summing Orders by Date
SELECT DATE(order_date) AS order_day, SUM(total_amount) AS total_sum
FROM orders
GROUP BY DATE(order_date);
- We take only the date part (excluding time) from
order_date
and sum uptotal_amount
. - This shows the total sales for each day.
HAVING is used to filter results after grouping (i.e., it applies conditions to the aggregated data). It works similarly to WHERE
but on grouped results.
SELECT user_id, SUM(total_amount) AS total_spent
FROM orders
GROUP BY user_id
HAVING SUM(total_amount) > 1000;
- First, we group orders by users and calculate the total purchase amount (
SUM(total_amount)
) for each user. - Then, using
HAVING
, we exclude users whose total spending is 1000 or less.
Important:
WHERE
filters rows before grouping, whereasHAVING
filters groups after.
Combining Data from Multiple Tables
In a real database, information is typically split across different tables that are related to each other. For example, you might have a users
table and an orders
table, where orders.user_id
references users.id
. To get consolidated information, you need to join these tables.
JOIN: Inner and Outer Joins
- INNER JOIN returns only the rows that have matching records in both tables.
SELECT
u.name AS user_name,
o.order_id,
o.total_amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id;
ON u.id = o.user_id
specifies the matching condition.- Each row from
users
is matched with rows inorders
where the user’sid
equals the order’suser_id
. - If a user has placed no orders, they won’t appear in the results.
- LEFT JOIN returns all rows from the left table (the one before the word
JOIN
), plus the matching rows from the right table. If there’s no match in the right table, the corresponding columns showNULL
:
SELECT
u.name,
o.order_id,
o.total_amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
- This will list all users along with their orders, if any.
- For users who haven’t placed orders, fields from
orders
will beNULL
. - RIGHT JOIN does the opposite: it returns all rows from the right table and the matching rows from the left. It’s less common because you can usually flip the query and use
LEFT JOIN
instead.
Merging Results with UNION
If you need to “stack” the results of two queries on top of each other—where both have the same column structure—you can use the UNION
(or UNION ALL
) operator:
SELECT name AS person, email
FROM users
UNION
SELECT supplier_name AS person, supplier_email AS email
FROM suppliers;
- This produces a combined list of “people” and their emails, coming from two different tables.
- By default,
UNION
removes duplicate rows. If you want to keep duplicates, useUNION ALL
. - The number and type of columns must match in both queries.
Practical Example: Daily Totals with User Names
Suppose we have an orders
table where each order record includes a user_id
and a date. We want to show the day, the total number of orders for that day, and the names of all users who placed orders on those days.
Step 1: Group orders by day and find the total count of orders.
SELECT DATE(o.order_date) AS order_day, COUNT(*) AS total_orders FROM orders o GROUP BY DATE(o.order_date);
Step 2: To add user names, we join the
users
table:SELECT DATE(o.order_date) AS order_day, GROUP_CONCAT(u.name SEPARATOR ', ') AS users_list, COUNT(*) AS total_orders FROM orders o INNER JOIN users u ON o.user_id = u.id GROUP BY DATE(o.order_date);
GROUP_CONCAT(u.name SEPARATOR ', ')
(available in MySQL) concatenates the user names into a single string, separated by commas.- The result shows who made orders on each specific day and how many total orders were placed.
Query Optimization
In this final chapter, we’ll discuss how to make working with your database simpler and more efficient. We’ll look at creating views (VIEW) and using indexes to speed up data retrieval, as well as some advanced optimization techniques.
Creating Views (VIEW)
A view is a virtual table that’s built from the results of a SELECT
query. Depending on the database system and the type of view, the data might not be fully duplicated in the database—instead, the view “stores” the query logic in one place.
1.1 Why Use Views
- Simplifying complex queries. Instead of rewriting a lengthy
SELECT
with multiple joins and filters every time, you can create a view once and then query it like a regular table. - Security. You can hide certain columns or filter data by giving access only to the view, rather than the underlying table.
- Logical isolation. If the structure of the actual tables changes, but the view’s interface stays the same, applications can continue using the view without modification.
1.2 Creating a View
The general syntax:
CREATE VIEW view_name AS
SELECT ...
FROM ...
[WHERE ...]
Example: Creating “active_users”
CREATE VIEW active_users AS
SELECT
id,
name,
email,
registration_date
FROM users
WHERE is_active = 1;
Now, to refer to active users, you no longer need to include WHERE is_active = 1
in every query. Just do:
SELECT *
FROM active_users;
Under the hood, the DBMS will run the original query, but from your perspective, it’s as if you’re querying a separate table.
Indexes: Speeding Up Data Retrieval
2.1 What Is an Index and How Does It Work?
An index is a specialized data structure (for example, a B-tree) that the DBMS stores separately from the main table. It allows the system to find relevant rows much faster than scanning the entire table row by row.
- Without an index: The DBMS may have to check each row to see if it meets the query condition, which can be very slow when dealing with large datasets.
- With an index: The DBMS uses a specialized structure that keeps column values in an ordered format, making it quick to locate the rows you’re looking for.
2.2 Types of Indexes
- B-Tree Index (the most common type in many DBMSs): Good for exact matches, range queries, and sorting.
- Hash Index (for example, in PostgreSQL): Extremely fast for exact matches, but not suitable for range queries.
- Fulltext Index (in MySQL): Optimized for searching text fields (e.g., words in articles).
- GIN / GIST Indexes (in PostgreSQL): Useful for complex data types like JSON, arrays, geospatial data, etc.
2.3 Creating an Index
Syntax in MySQL (similar in other DBMSs):
CREATE INDEX index_name
ON table_name (column1, [column2, ...]);
Example: Index on email
CREATE INDEX idx_users_email
ON users (email);
Now a query like:
SELECT *
FROM users
WHERE email = 'elena@example.com';
will run faster.
Note: Creating an index speeds up read operations (
SELECT
) but can slow down inserts (INSERT
), updates (UPDATE
), and deletions (DELETE
) because the index must be updated too. Therefore, you should only add indexes where fast retrieval matters, not on every column.
Advanced Optimization Techniques
3.1 Analyzing Query Execution (EXPLAIN)
Many DBMSs (MySQL, PostgreSQL, SQL Server, Oracle) offer an EXPLAIN
tool that shows how a query will be executed—which tables are scanned, which indexes are used, and so on.
EXPLAIN SELECT *
FROM users
WHERE email = 'elena@example.com';
The results will indicate whether the query uses an index or performs a full table scan. This helps you spot bottlenecks and improve them by creating indexes or rewriting the query.
3.2 Query Caching
Some DBMSs can cache query results. Caching is also commonly implemented at the application level (for example, using Memcached or Redis) to avoid rerunning the same queries against the database.
3.3 Denormalization and Sharding
- Denormalization: Deliberately duplicating data across tables to reduce the number of joins, which may be useful if fast reads are more important than a perfectly normalized structure.
- Sharding: Splitting a large table into separate fragments (shards), typically by some key like an
id
range. This distributes the load across multiple servers.
3.4 Query Planners and Distributed Databases
Large-scale systems may use distributed architectures (PostgreSQL clusters, MySQL sharding, table partitioning, etc.). That’s beyond the scope of this introductory course, but learning about indexing and how to use EXPLAIN
is already a major step toward better performance.
With this, we wrap up our introductory SQL course. You now know how to create tables, manipulate data, combine results, aggregate information, and optimize queries. This foundation will help you move on to studying specific DBMSs (MySQL, PostgreSQL, SQL Server, and more) and tackling real-world database challenges.