PostgreSQL in MacOS — Create database, load CSV file, and data manipulation
This tutorial is inspired by the Coursera course: Database Design and Basic SQL in PostgreSQL. We will be using a sample data, Musical Track Database. In order to use SQL language to query and manipulate data, we will walk through a process from creating a database, a user, to load data (in this example, a comma-separated-values (CSV) file) into the PostfreSQL system and learn some function about data manipulation.
Data source
An iTunes library: Musical Track Database
Server/Super user and Client
We will create a database and user as a super user and switch to the client mode to load the data into the database. For more concept about server and client, check on my previous post!
And now let’s get started!
Go to the “Terminal” on your computer. First, we check the location we are at by typing pwd
We are going to download the CSV file later to this directory. If you want to change your location, simply type cd
with the location you want. For example, change to the “Downloads” directory.
1: Download the CSV file
Use either of the command below to download the file into the directory
wget https://www.pg4e.com/tools/sql/library.csv
curl -O https://www.pg4e.com/tools/sql/library.csv
Check if the file is successfully downloaded. (I changed my directory back to “chenamber”)
2: Create a user and a database
Initiate the super user mode
psql -U postgres
Create a user and a database with the user as the owner
CREATE USER pg4e_4c0b06f09f WITH PASSWORD ‘pg4e_p_61356298951eb7f’;
CREATE DATABASE pg4e_4c0b06f09f WITH OWNER 'pg4e_4c0b06f09f';
Now you can see if the database is created by typing \l
3: Access to the database as a user
Quit the super user mode and access the database as a user.
\q
psql -U pg4e_4c0b06f09f pg4e_4c0b06f09f
Type in the password “pg4e_p_61356298951eb7f” we just used for the user creation.
4: Create a table
To use SQL for data manipulation, we have to load our data into the system. Let’s look at the CSV file first. There are 6 columns and we will name them “title”, “artist”, “album”, “count”, “rating”, and “len” .
While creating these variables, we also have to tell the system the data types for each variable. For example, “title” is a TEXT variable and rating is an INTEGER variable. For more information about data types, you can visit this website for a comprehensive introduction. I also create a cheatsheet for some commonly used data types.
Here, we first create a table with the name “track_raw” and put the variables and their datatypes.
CREATE TABLE track_raw
(title TEXT, artist TEXT, album TEXT,
count INTEGER, rating INTEGER, len INTEGER);
Let’s see if the table is created in the database by the command \dt
So, yes, now we have one table in our database. This table is empty since we have not input any data in it yet. Now let’s copy the CSV data into this table.
\copy track_raw(title,artist,album,count,rating,len) FROM 'library.csv' WITH DELIMITER ',' CSV;
Now let’s check if the data have been loaded by counting the rows in the table.
SELECT COUNT(*) FROM track_raw;
Yes! We have 296 rows/records in the table, which is the same number of rows in the CSV file. Now we can write some functions to query data.
5: Some functions
All the functions introduced here use SELECT
to view the data. Most of the time, we want to read data with some conditions. Here, we will introduce some functions you can use to extract data of your interest.
For example, use ORDER BY
to sort variables and use LIMIT
to show the first n rows. For ORDER BY
you can add DESC
or ASC to sort in a descending/ascending way.
SELECT * FROM track_raw ORDER BY title ASC LIMIT 10;
Next, we want to show songs by ratings. As in this dataset there are many null values in rating, we use ORDER BY
with ASC
and we get this result.
SELECT * FROM track_raw ORDER BY rating ASC LIMIT 5;
And now we want to only see songs with rating as 100.We use OFFSET n
to skip the first n rows.
SELECT * FROM track_raw ORDER BY rating ASC OFFSET 2 LIMIT 5;
Select songs for one artist.
SELECT * FROM track_raw WHERE artist='Johnny Cash';
Select songs whose titles contains “Love”
SELECT * FROM track_raw WHERE title LIKE '%Love%';
The symbol “%” means that we want to find titles where “Love” is in anywhere of the titles.
If we change %Love%
to Love%
,then we only get titles start with “Love”.