PostgreSQL in MacOS — Create database, load CSV file, and data manipulation

YaLinChen (Amber)
5 min readJul 15, 2022
Source: Unsplash

This tutorial is inspired by the Coursera course: Database Design and Basic SQL in PostgreSQL. We will be using a sample data, Musical Track Database. In order to use SQL language to query and manipulate data, we will walk through a process from creating a database, a user, to load data (in this example, a comma-separated-values (CSV) file) into the PostfreSQL system and learn some function about data manipulation.

Data source

An iTunes library: Musical Track Database

Server/Super user and Client

We will create a database and user as a super user and switch to the client mode to load the data into the database. For more concept about server and client, check on my previous post!

And now let’s get started!

Go to the “Terminal” on your computer. First, we check the location we are at by typing pwd

We are going to download the CSV file later to this directory. If you want to change your location, simply type cd with the location you want. For example, change to the “Downloads” directory.

1: Download the CSV file

Use either of the command below to download the file into the directory

wget https://www.pg4e.com/tools/sql/library.csv
curl -O https://www.pg4e.com/tools/sql/library.csv

Check if the file is successfully downloaded. (I changed my directory back to “chenamber”)

2: Create a user and a database

Initiate the super user mode

psql -U postgres

Create a user and a database with the user as the owner

CREATE USER pg4e_4c0b06f09f WITH PASSWORD ‘pg4e_p_61356298951eb7f’;
CREATE DATABASE pg4e_4c0b06f09f WITH OWNER 'pg4e_4c0b06f09f';

Now you can see if the database is created by typing \l

3: Access to the database as a user

Quit the super user mode and access the database as a user.

\q
psql -U pg4e_4c0b06f09f pg4e_4c0b06f09f

Type in the password “pg4e_p_61356298951eb7f” we just used for the user creation.

4: Create a table

To use SQL for data manipulation, we have to load our data into the system. Let’s look at the CSV file first. There are 6 columns and we will name them “title”, “artist”, “album”, “count”, “rating”, and “len” .

While creating these variables, we also have to tell the system the data types for each variable. For example, “title” is a TEXT variable and rating is an INTEGER variable. For more information about data types, you can visit this website for a comprehensive introduction. I also create a cheatsheet for some commonly used data types.

Here, we first create a table with the name “track_raw” and put the variables and their datatypes.

CREATE TABLE track_raw
(title TEXT, artist TEXT, album TEXT,
count INTEGER, rating INTEGER, len INTEGER);

Let’s see if the table is created in the database by the command \dt

So, yes, now we have one table in our database. This table is empty since we have not input any data in it yet. Now let’s copy the CSV data into this table.

\copy track_raw(title,artist,album,count,rating,len) FROM 'library.csv' WITH DELIMITER ',' CSV;

Now let’s check if the data have been loaded by counting the rows in the table.

SELECT COUNT(*) FROM track_raw;

Yes! We have 296 rows/records in the table, which is the same number of rows in the CSV file. Now we can write some functions to query data.

5: Some functions

All the functions introduced here use SELECT to view the data. Most of the time, we want to read data with some conditions. Here, we will introduce some functions you can use to extract data of your interest.

For example, use ORDER BYto sort variables and use LIMIT to show the first n rows. For ORDER BY you can add DESC or ASC to sort in a descending/ascending way.

SELECT * FROM track_raw ORDER BY title ASC LIMIT 10;

Next, we want to show songs by ratings. As in this dataset there are many null values in rating, we use ORDER BY with ASC and we get this result.

SELECT * FROM track_raw ORDER BY rating ASC LIMIT 5;

And now we want to only see songs with rating as 100.We use OFFSET n to skip the first n rows.

SELECT * FROM track_raw ORDER BY rating ASC OFFSET 2 LIMIT 5;

Select songs for one artist.

SELECT * FROM track_raw WHERE artist='Johnny Cash';

Select songs whose titles contains “Love”

SELECT * FROM track_raw WHERE title LIKE '%Love%';

The symbol “%” means that we want to find titles where “Love” is in anywhere of the titles.

If we change %Love% to Love% ,then we only get titles start with “Love”.

--

--