Video Game Sales Simple Analysis

6 minute read

Data Import

The dataset was downloaded from kaggle.com.

Data Overview

There are 11 variables and 16,598 observations.

The variables are:

  • Rank: Ranking of overall sales
  • Name: The game name
  • Platform: Platform of the games release (i.e. PC,PS4, etc.)
  • Year: Year of the game’s release
  • Genre: Genre of the game
  • Publisher: Publisher of the game
  • NA_Sales: Sales in North America (in millions)
  • EU_Sales: Sales in Europe (in millions)
  • JP_Sales: Sales in Japan (in millions)
  • Other_Sales: Sales in the rest of the world (in millions)
  • Global_Sales: Total worldwide sales (in millions)

(sales are the number of copies)

Below are the summary statistics for the numerical variables:

vgsales %>% 
  select(dplyr::contains("_Sales")) %>% 
  summary() %>% 
  pandoc.table(style = "simple")
## 
## 
##    NA_Sales        EU_Sales         JP_Sales       Other_Sales     Global_Sales  
## --------------- --------------- ---------------- ---------------- ---------------
## Min.   : 0.0000 Min.   : 0.0000 Min.   : 0.00000 Min.   : 0.00000 Min.   : 0.0100
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00000 1st Qu.: 0.00000 1st Qu.: 0.0600
## Median : 0.0800 Median : 0.0200 Median : 0.00000 Median : 0.01000 Median : 0.1700
## Mean   : 0.2647 Mean   : 0.1467 Mean   : 0.07778 Mean   : 0.04806 Mean   : 0.5374
## 3rd Qu.: 0.2400 3rd Qu.: 0.1100 3rd Qu.: 0.04000 3rd Qu.: 0.04000 3rd Qu.: 0.4700
## Max.   :41.4900 Max.   :29.0200 Max.   :10.22000 Max.   :10.57000 Max.   :82.7400

Data Visualizations

1. Genre Popularity over the Years

vgsales %>% 
  ggplot(aes(x = Year, y = Global_Sales, fill = Genre)) +
  geom_bar(stat = "identity") +
  labs(title = "Genre Popularity over Years", x = "Year", y = "Global Sales (in milion copies)") +
  scale_x_discrete(breaks = as.character(seq(from = 1980, to = 2016, by = 4))) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Set3")

plot of chunk unnamed-chunk-2

It seems that Action game is the dominating genre since 2000, and Sports game is usually the second popular genre. Also, the global sales peaked around 2008 and then started to drop dramatically.

2. Genre Sales by Regions

vgsales %>% 
  group_by(Genre) %>% 
  summarise_if(is.numeric, sum) %>% 
  ungroup() %>% 
  gather(Region, Sales, NA_Sales:Global_Sales) %>% 
  mutate(Region = str_replace(Region, "_Sales$", "") %>% 
           factor(levels = c("NA", "EU", "JP", "Other", "Global"),
                  labels = c("North America", "Europe", "Japan", "Other", "Global"))) %>% 
  ggplot(aes(x = Region, y = Genre)) +
  geom_tile(aes(fill = Sales)) +
  labs(title = "Genre Sales by Regions", x = "Region", y = "Genre") +
  scale_fill_continuous(name = "Sales\n(in million copies)") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5), legend.title.align = 0)

plot of chunk unnamed-chunk-3

The sales numbers are aggregated to remove the time dimension. From the graph above, we can see that in terms of sales, people from North America and those from Europe have very similar tastes as they all prefer Action, Sports, Shooter, and Platform over others, though in terms of absolute numbers, games are sold more in North America than in Europe.

However, in Japan, the most popular genre is Role-Playing, which might be due to the cultural differences.

Globally, Action and Sports are the most popular genres.

3. Global Sales by Genre of the Top 10 Publishers

Below shows the Top 10 Publishers in terms of cumulative global sales over the years:

top10 <- vgsales %>% 
  group_by(Publisher) %>% 
  summarise(Cum_Sales = sum(Global_Sales)) %>% 
  arrange(desc(Cum_Sales)) %>% 
  top_n(10)
## Selecting by Cum_Sales
top10 %>% pandoc.table(style = "rmarkdown")
## 
## 
## |          Publisher           |  Cum_Sales  |
## |:----------------------------:|:-----------:|
## |           Nintendo           |   1786.56   |
## |       Electronic Arts        |   1110.32   |
## |          Activision          |   727.46    |
## | Sony Computer Entertainment  |   607.50    |
## |           Ubisoft            |   474.72    |
## |     Take-Two Interactive     |   399.54    |
## |             THQ              |   340.77    |
## | Konami Digital Entertainment |   283.64    |
## |             Sega             |   272.99    |
## |      Namco Bandai Games      |   254.09    |
publishers <- top10$Publisher
vgsales %>% 
  filter(Publisher %in% publishers) %>% 
  ggplot(aes(x = Genre, y = Global_Sales)) +
  geom_bar(stat = "identity", aes(fill = Publisher)) +
  coord_flip() +
  labs(title = "Global Sales by Top 10 Publisher by Genre", xlab = "Cumulative Global Sales") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Set3")

plot of chunk unnamed-chunk-5

Within the Top 10 publishers, Sports games have the greatest global sales, followed by Action games. Electronic Arts has absolute power over the Sports games, while Take-Two Interactive is the No. 1 Action game publisher. Although Puzzle, Role Playing and Platform games don’t have the most sales, in those worlds, Nintendo reigns. And Shooter game is Activision’s playground.

4. Top 5 Publishers (Global Sales) over the Decades

The graph below shows how the Top 5 Publishers change over time. However, it’s worth noting that 2010s doesn’t have complete data as we are still in the middle of it!

top5bydecades <- vgsales %>% 
  mutate(decade = cut(as.numeric(Year), breaks = c(1980, 1990, 2000, 2010, 2020), 
                      labels = c("1980s", "1990s", "2000s", "2010s"), ordered_result = TRUE)) %>% 
  filter(!is.na(decade)) %>% 
  group_by(decade, Publisher) %>% 
  summarise(Cum_Sales = sum(Global_Sales)) %>% 
  arrange(decade, desc(Cum_Sales)) %>% 
  top_n(5)
## Selecting by Cum_Sales
top5bydecades %>% 
  ggplot(aes(x = decade, y = Cum_Sales)) +
  geom_label_repel(aes(label = Publisher), label.size = 0.2) +
  labs(title = "Top 5 Publishers over the Decades", x = "Decade", y = "Cumulative Sales") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

plot of chunk unnamed-chunk-6

Legendarily, Nintendo is the only one that has been in this short list for 4 decades. Founded in 1982, Electronic Arts since then has been climbing on this list step by step, and finally in the current decade becomes the No.1 game publisher as of 2016.

Also, the gaming industry seems to have reached the apex during the 2000s, even though we are only half way through the current decade, the absolute difference between 2000s and 2010s is still significantly large.

5. XBOX vs PlayStation

xbox_ps <- vgsales %>% 
  filter(Platform %in% c("XB", "X360", "XOne", "PS", str_c("PS", 2:4))) %>% 
  mutate(Platform = plyr::mapvalues(Platform, from = c("XB", "X360", "XOne", "PS", str_c("PS", 2:4)),
                                    to = c(str_c("XBOX", c("", " 360", " One")), str_c("PlayStation ", 1:4)))) %>% 
  group_by(Year, Platform) %>% 
  summarise_if(is.numeric, sum) %>% 
  ungroup() %>% 
  gather(Region, Sales, NA_Sales:Global_Sales) %>% 
  mutate(Region = str_replace(Region, "_Sales$", "") %>% 
           factor(levels = c("NA", "EU", "JP", "Other", "Global"),
                  labels = c("North America", "Europe", "Japan", "Other", "Global"))) %>% 
  filter(!str_detect(Year, "N/A|2017"))
 
xbox_ps %>% 
  ggplot(aes(x = Year, y = Sales)) +
  geom_bar(aes(fill = Platform), stat = "identity") +
  facet_grid(Region~., scales = "free") +
  theme_minimal() +
  scale_fill_manual(values = c("PlayStation 1" = "#F7E8C3", "PlayStation 2" = "#EF7079", "PlayStation 3" = "#BF382A", "PlayStation 4" = "#683531", 
                               "XBOX" = "#D5EEFF", "XBOX 360" = "#007CB9", "XBOX One" = "#005689"))

plot of chunk unnamed-chunk-7

As two very (if not the most) famous gaming consols, XBOX and PlayStation have been competing against each other for over a decade.

From the graph above, most of the PlayStation game sales come from Japan, the home of the brand, while since its release in 2001, the popularity of XBOX has increased a lot global-wise. In North America, XBOX and PlayStation seem to be taking up the market share equally, but everywhere else, PlayStation has relatively more share and in Japan, PlayStation is just pure monopoly.

Though introduced at around the same time, PlayStation tends to outperform XBOX One in every market, in terms of game sales.


Original Creation Date: December 04, 2016

Comments

Leave a Comment

Your email address will not be published. Required fields are marked *

Loading...