[“Apple Music Dataset Analysis”]

Objectives
Dataset & Methodology
Research & Analysis
Conclusion

This repository contains the analysis of a Kaggle dataset on Apple Music.

Objectives

How have music genres evolved over time in terms of the number of tracks released?
How does the average price of explicit tracks compare across different music genres?
Can we observe any historical shifts in consumer preferences based on the number of tracks released?
Is there a relationship between the price of a track and its duration within each music genre?
Are there genres where the length of tracks in collections tends to be longer or shorter?
How do the prices of individual tracks compare to those within collections?
How does the number of tracks in a collection relate to the average track price within each genre?
Which tracks stand out as outliers in duration among the top 5 artists’ tracks?

Dataset & Methodology

The dataset provides information about tracks, collections, artists, genres, and other attributes related to Apple Music.

While the dataset is available as a csv, in a real-life scenario, the data would be stored in a database, for example, PostgreSQL database. Therefore, I have performed the analysis using SQL queries to extract relevant information and answer the related research questions. Thus the solutions provided here will scale well to larger datasets and can be easily integrated into a production environment.

The SQL table preview corresponding to the dataset is as follows:

artistId	artistName	collectionCensoredName	collectionId	collectionName	collectionPrice	country	currency	discCount	discNumber	isStreamable	kind	previewUrl	primaryGenreName	releaseDate	trackCensoredName	trackCount	trackExplicitness	trackId	trackName	trackNumber	trackPrice	trackTimeMillis
219350813	The Neighbourhood	I Love You.	635016635	I Love You.	9.99	USA	USD	1	1	TRUE	song	Preview	Alternative	2013-04-22 12:30:00+05:30	Float	11	notExplicit	635016647	Float	11	1.29	261200
4218340	Israel Kamakawiwo’ole	Wonderful World	258387384	Wonderful World	11.99	USA	USD	1	1	TRUE	song	Preview	Worldwide	2001-09-25 17:30:00+05:30	Wonderful World	12	notExplicit	258387389	Wonderful World	1	0.99	270667
396754057	One Direction	Midnight Memories (Deluxe Edition)	695318295	Midnight Memories (Deluxe Edition)	14.99	USA	USD	2	1	TRUE	song	Preview	Pop	2013-11-25 13:30:00+05:30	Midnight Memories	18	notExplicit	695318304	Midnight Memories	4	1.29	176320
28721078	Sia	1000 Forms of Fear	882945378	1000 Forms of Fear	9.99	USA	USD	1	1	TRUE	song	Preview	Pop	2014-07-04 12:30:00+05:30	Cellophane	12	notExplicit	882945396	Cellophane	11	1.29	265587
80456331	Panic! At the Disco	Pretty. Odd. (Deluxe Version)	275965231	Pretty. Odd. (Deluxe Version)	12.99	USA	USD	1	1	TRUE	song	Preview	Alternative	2008-03-25 12:30:00+05:30	Northern Downpour	18	notExplicit	275965263	Northern Downpour	7	1.29	247773

DDL of the table:

CREATE TABLE apple_music_dataset (
    "artistId" integer,
    "artistName" text,
    "collectionCensoredName" text,
    "collectionId" integer,
    "collectionName" text,
    "collectionPrice" numeric,
    "contentAdvisoryRating" character varying,
    country text,
    currency text,
    "discCount" integer,
    "discNumber" integer,
    "isStreamable" character varying,
    kind text,
    "previewUrl" text,
    "primaryGenreName" character varying,
    "releaseDate" timestamp with time zone,
    "trackCensoredName" text,
    "trackCount" integer,
    "trackExplicitness" character varying,
    "trackId" integer,
    "trackName" text,
    "trackNumber" integer,
    "trackPrice" numeric,
    "trackTimeMillis" integer
);

The analysis was performed using PostgreSQL queries. The dataset was queried to extract relevant information and answer the research questions. The queries used in the analysis are provided in the results section below.

The data from SQL cursors was loaded into a Pandas DataFrame for further visualization and the charts were created using libraries such as Matplotlib and Seaborn.

My local setup for achieving the above consisted of:

PostgreSQL server running on localhost on my Mac OS Sonoma
Postico PostgreSQL client for querying the database and exploratory data analysis
JetBrains PyCharm IDE for writing and running Python code

Research & Analysis

The results of the analysis are summarized below. I have included the SQL queries that were used to generate the results and the final visualizations. Incase you are interested in the python code that was used to create the charts and visualizations, it is available in the github repository.