搜尋感興趣的網誌

所有文章連結

2022年1月28日 星期五

Hotel Bookings Analytic – R 語法解析

Hotel Bookings Analytic – R 語法解析


上一篇我將Hotel Bookings透過R進行分析的文章放了上來,這篇我想分享一下開始分析時我的流程(英文好的大大這部分可以省去很多麻煩..)、遇到的疑難問題  ,以及我選擇的解決方案,也會針對裡面有用到的一些Function進行解析與使用時機,對於跟我一樣的新手數據分析師來說可以當作分析實作的參考    


若大大們有更好的建議,請不吝指教!!若有需要返回看Hotel Bookings的大大~

請點hotel bookings analytic – 案例分析報告

也可至kaggle觀看,或是想直接看code,請點Github

2022年1月26日 星期三

Hotel Bookings Analytic - 案例分析報告

數據分析研究 - hotel bookings


這是上完google data analytic 課程後自我進行的數據分析實例,放上來讓大家可以做個參考。

並沒有像網路上的神人分析的那麼徹底,很多還加上了機器學習的領域....讓我這個小小的數據分析新手自嘆不如,但多看多學還是好事~對於google data analytic課程感興趣的朋友,也放上相關的連結,感興趣的朋友可以先行查閱,網路上也有不少的資訊分享,至於自己進修數據分析課程的相關內容,整理好後也會逐步po出來

google課程連結 >> coursera - google data analyze


夜深了,還是趕緊進入主題比較好,免得肚子裡的深夜食堂開張又得去搜刮食物了。        

這次分析的主軸是兩家飯店,Resort HotelCity Hotel    

整個數據集包含了32列與119390行,說真的在Kaggle或是其他數據集取得的數據量體都不會太小,直接用excel打開的話一定跑不出全部的數據    

我是用RStudio寫Markdown報告,用VScode寫R編程,數據集就存回MySQL可以空閒時練習Query語法,至於為什麼要這麼麻煩呢,因為VScode與RStudio兩者都有我覺得好用的地方,當然還是會慢慢摸索找出個最好的整合辦法,只是現階段來說先以兩者的優點來進行分析與報告產出還是較方便,畢竟花不到一個月的時間上完Google Data Analyze課程,對於英文不擅長的我來說已經很吃力了,還要快速的整合這些..那乾脆先睡比較實在   

  

以下是我用R Markdown寫完的分析報告,我的Kggle上也有相同的報告書    

若大大們對於編程的內容有更好的建議,請不吝指教,分析的內容並不是全面性的,而是以飯店基本面的客流量、銷售管道、旅客重複與否性質做分析    

未來也可以後續再做進化,納入更多的因素,相信會有更深一層的分析報告。

---------------------------分隔線-----------------------------------

bookings_hotel_analytic_in_R

Rex_Li

2022/1/25

案例研究

透過兩家不同類型的飯店,Resort Hotel與City Hotel,進行分析彼此的基本情況、來源管道與旅客交互情形。

分析階段與流程

分為五個階段,快速連結如下 :

確定問題

1. 確定利益相關者

  • 飯店經營者。
  • 營運團隊。

2. 需要解決的問題

  • 飯店基本情況分析 : 包含預定取消率、預訂情形、總旅客量體。
  • 飯店銷售管道分析 : 不同來源市場的訂單量、不同來源市場的預定取消率。
  • 飯店客流量分析 : 月份預定時間分布情形、均客單價、假、平日客流量。
  • 旅客情形分析 : 是否為重複客人、預付相關情形。

3. 分析目標

  • 分析之結果如何套用於實際營運。
  • 分析之趨勢如何調整營銷策略。

數據來源、數據資料確認與檢視

  • 特定數據集出處引用 : Hotel booking demand From the paper: hotel booking demand datasets(Attribution 4.0 International (CC BY 4.0)),作者Jesse Mostipak。
  • 來源由kaggle上取得,檔案為可信任之公開數據集。
  • 數據內容由32cols與119390rows組成,包含飯店名稱、是否取消預定、預定期時間、細分來源市場等..
  • 數據存放 : MySQL、Kaggle
  • 編碼存放 : Kaggle、Github
  • 數據操作語言 : R
  • 數據分析報告格式 : R Markdown
  • IDE : RStudio、VScode

清洗、調整數據內容與紀錄

載入所需Packages

library(tidyverse)
library(dplyr)
library(tidyr)
library(here)
library(skimr)
library(janitor)
library(lubridate)
library(ggplot2)

CSV File Import,CSV檔案導入

# Data Set Locarion
setwd("D:/Github_version_file_R/data_set/data_frame_hotel")
# File Check
hotel_df <- read.csv("hotel_bookings.csv")

Check Data Value,檔案數據預覽

預覽數據結果 :  
    1. NA值 : 4個,位於children處。   
    2. NULL值 : 112593個,位於company處。   
    3. NULL值 : 16340個,位於agent處。 
    4. NUll值 : 488個,位於country處。    
    5. Undefined不明確紀錄值 : 1169個,位於meal處。
    6. Undefined不明確紀錄值 : 5個,位於distribution_channel處。
    7. 拼寫檢查 : 無錯誤。    
    8. 確認數據為119390行,32列。 
str(hotel_df)
## 'data.frame':    119390 obs. of  32 variables:
##  $ hotel                         : chr  "Resort Hotel" "Resort Hotel" "Resort Hotel" "Resort Hotel" ...
##  $ is_canceled                   : int  0 0 0 0 0 0 0 0 1 1 ...
##  $ lead_time                     : int  342 737 7 13 14 14 0 9 85 75 ...
##  $ arrival_date_year             : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ arrival_date_month            : chr  "July" "July" "July" "July" ...
##  $ arrival_date_week_number      : int  27 27 27 27 27 27 27 27 27 27 ...
##  $ arrival_date_day_of_month     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ stays_in_weekend_nights       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ stays_in_week_nights          : int  0 0 1 1 2 2 2 2 3 3 ...
##  $ adults                        : int  2 2 1 1 2 2 2 2 2 2 ...
##  $ children                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "BB" "BB" "BB" "BB" ...
##  $ country                       : chr  "PRT" "PRT" "GBR" "GBR" ...
##  $ market_segment                : chr  "Direct" "Direct" "Direct" "Corporate" ...
##  $ distribution_channel          : chr  "Direct" "Direct" "Direct" "Corporate" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "C" "C" "A" "A" ...
##  $ assigned_room_type            : chr  "C" "C" "C" "A" ...
##  $ booking_changes               : int  3 4 0 0 0 0 0 0 0 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
##  $ agent                         : chr  "NULL" "NULL" "NULL" "304" ...
##  $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient" "Transient" "Transient" ...
##  $ adr                           : num  0 0 75 75 98 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  0 0 0 0 1 1 0 1 1 0 ...
##  $ reservation_status            : chr  "Check-Out" "Check-Out" "Check-Out" "Check-Out" ...
##  $ reservation_status_date       : chr  "2015-07-01" "2015-07-01" "2015-07-02" "2015-07-02" ...
head(hotel_df)
##          hotel is_canceled lead_time arrival_date_year arrival_date_month
## 1 Resort Hotel           0       342              2015               July
## 2 Resort Hotel           0       737              2015               July
## 3 Resort Hotel           0         7              2015               July
## 4 Resort Hotel           0        13              2015               July
## 5 Resort Hotel           0        14              2015               July
## 6 Resort Hotel           0        14              2015               July
##   arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
## 1                       27                         1                       0
## 2                       27                         1                       0
## 3                       27                         1                       0
## 4                       27                         1                       0
## 5                       27                         1                       0
## 6                       27                         1                       0
##   stays_in_week_nights adults children babies meal country market_segment
## 1                    0      2        0      0   BB     PRT         Direct
## 2                    0      2        0      0   BB     PRT         Direct
## 3                    1      1        0      0   BB     GBR         Direct
## 4                    1      1        0      0   BB     GBR      Corporate
## 5                    2      2        0      0   BB     GBR      Online TA
## 6                    2      2        0      0   BB     GBR      Online TA
##   distribution_channel is_repeated_guest previous_cancellations
## 1               Direct                 0                      0
## 2               Direct                 0                      0
## 3               Direct                 0                      0
## 4            Corporate                 0                      0
## 5                TA/TO                 0                      0
## 6                TA/TO                 0                      0
##   previous_bookings_not_canceled reserved_room_type assigned_room_type
## 1                              0                  C                  C
## 2                              0                  C                  C
## 3                              0                  A                  C
## 4                              0                  A                  A
## 5                              0                  A                  A
## 6                              0                  A                  A
##   booking_changes deposit_type agent company days_in_waiting_list customer_type
## 1               3   No Deposit  NULL    NULL                    0     Transient
## 2               4   No Deposit  NULL    NULL                    0     Transient
## 3               0   No Deposit  NULL    NULL                    0     Transient
## 4               0   No Deposit   304    NULL                    0     Transient
## 5               0   No Deposit   240    NULL                    0     Transient
## 6               0   No Deposit   240    NULL                    0     Transient
##   adr required_car_parking_spaces total_of_special_requests reservation_status
## 1   0                           0                         0          Check-Out
## 2   0                           0                         0          Check-Out
## 3  75                           0                         0          Check-Out
## 4  75                           0                         0          Check-Out
## 5  98                           0                         1          Check-Out
## 6  98                           0                         1          Check-Out
##   reservation_status_date
## 1              2015-07-01
## 2              2015-07-01
## 3              2015-07-02
## 4              2015-07-02
## 5              2015-07-03
## 6              2015-07-03
colnames(hotel_df)
##  [1] "hotel"                          "is_canceled"                   
##  [3] "lead_time"                      "arrival_date_year"             
##  [5] "arrival_date_month"             "arrival_date_week_number"      
##  [7] "arrival_date_day_of_month"      "stays_in_weekend_nights"       
##  [9] "stays_in_week_nights"           "adults"                        
## [11] "children"                       "babies"                        
## [13] "meal"                           "country"                       
## [15] "market_segment"                 "distribution_channel"          
## [17] "is_repeated_guest"              "previous_cancellations"        
## [19] "previous_bookings_not_canceled" "reserved_room_type"            
## [21] "assigned_room_type"             "booking_changes"               
## [23] "deposit_type"                   "agent"                         
## [25] "company"                        "days_in_waiting_list"          
## [27] "customer_type"                  "adr"                           
## [29] "required_car_parking_spaces"    "total_of_special_requests"     
## [31] "reservation_status"             "reservation_status_date"
nrow(hotel_df) # 119390行
## [1] 119390
ncol(hotel_df) # 32列
## [1] 32
if (sum(is.na(hotel_df)) > 0) {
    print(sum(is.na(hotel_df)))
} else {
    print("not na")
}
## [1] 4
# check unique >> chr col
unique(hotel_df$hotel)
## [1] "Resort Hotel" "City Hotel"
unique(hotel_df$meal)
## [1] "BB"        "FB"        "HB"        "SC"        "Undefined"
unique(hotel_df$country)
##   [1] "PRT"  "GBR"  "USA"  "ESP"  "IRL"  "FRA"  "NULL" "ROU"  "NOR"  "OMN" 
##  [11] "ARG"  "POL"  "DEU"  "BEL"  "CHE"  "CN"   "GRC"  "ITA"  "NLD"  "DNK" 
##  [21] "RUS"  "SWE"  "AUS"  "EST"  "CZE"  "BRA"  "FIN"  "MOZ"  "BWA"  "LUX" 
##  [31] "SVN"  "ALB"  "IND"  "CHN"  "MEX"  "MAR"  "UKR"  "SMR"  "LVA"  "PRI" 
##  [41] "SRB"  "CHL"  "AUT"  "BLR"  "LTU"  "TUR"  "ZAF"  "AGO"  "ISR"  "CYM" 
##  [51] "ZMB"  "CPV"  "ZWE"  "DZA"  "KOR"  "CRI"  "HUN"  "ARE"  "TUN"  "JAM" 
##  [61] "HRV"  "HKG"  "IRN"  "GEO"  "AND"  "GIB"  "URY"  "JEY"  "CAF"  "CYP" 
##  [71] "COL"  "GGY"  "KWT"  "NGA"  "MDV"  "VEN"  "SVK"  "FJI"  "KAZ"  "PAK" 
##  [81] "IDN"  "LBN"  "PHL"  "SEN"  "SYC"  "AZE"  "BHR"  "NZL"  "THA"  "DOM" 
##  [91] "MKD"  "MYS"  "ARM"  "JPN"  "LKA"  "CUB"  "CMR"  "BIH"  "MUS"  "COM" 
## [101] "SUR"  "UGA"  "BGR"  "CIV"  "JOR"  "SYR"  "SGP"  "BDI"  "SAU"  "VNM" 
## [111] "PLW"  "QAT"  "EGY"  "PER"  "MLT"  "MWI"  "ECU"  "MDG"  "ISL"  "UZB" 
## [121] "NPL"  "BHS"  "MAC"  "TGO"  "TWN"  "DJI"  "STP"  "KNA"  "ETH"  "IRQ" 
## [131] "HND"  "RWA"  "KHM"  "MCO"  "BGD"  "IMN"  "TJK"  "NIC"  "BEN"  "VGB" 
## [141] "TZA"  "GAB"  "GHA"  "TMP"  "GLP"  "KEN"  "LIE"  "GNB"  "MNE"  "UMI" 
## [151] "MYT"  "FRO"  "MMR"  "PAN"  "BFA"  "LBY"  "MLI"  "NAM"  "BOL"  "PRY" 
## [161] "BRB"  "ABW"  "AIA"  "SLV"  "DMA"  "PYF"  "GUY"  "LCA"  "ATA"  "GTM" 
## [171] "ASM"  "MRT"  "NCL"  "KIR"  "SDN"  "ATF"  "SLE"  "LAO"
unique(hotel_df$market_segment)
## [1] "Direct"        "Corporate"     "Online TA"     "Offline TA/TO"
## [5] "Complementary" "Groups"        "Undefined"     "Aviation"
unique(hotel_df$distribution_channel)
## [1] "Direct"    "Corporate" "TA/TO"     "Undefined" "GDS"
unique(hotel_df$deposit_type)
## [1] "No Deposit" "Refundable" "Non Refund"
unique(hotel_df$reserved_room_type)
##  [1] "C" "A" "D" "E" "G" "F" "H" "L" "P" "B"
unique(hotel_df$assigned_room_type)
##  [1] "C" "A" "D" "E" "G" "F" "I" "B" "H" "P" "L" "K"
unique(hotel_df$agent)
##   [1] "NULL" "304"  "240"  "303"  "15"   "241"  "8"    "250"  "115"  "5"   
##  [11] "175"  "134"  "156"  "243"  "242"  "3"    "105"  "40"   "147"  "306" 
##  [21] "184"  "96"   "2"    "127"  "95"   "146"  "9"    "177"  "6"    "143" 
##  [31] "244"  "149"  "167"  "300"  "171"  "305"  "67"   "196"  "152"  "142" 
##  [41] "261"  "104"  "36"   "26"   "29"   "258"  "110"  "71"   "181"  "88"  
##  [51] "251"  "275"  "69"   "248"  "208"  "256"  "314"  "126"  "281"  "273" 
##  [61] "253"  "185"  "330"  "334"  "328"  "326"  "321"  "324"  "313"  "38"  
##  [71] "155"  "68"   "335"  "308"  "332"  "94"   "348"  "310"  "339"  "375" 
##  [81] "66"   "327"  "387"  "298"  "91"   "245"  "385"  "257"  "393"  "168" 
##  [91] "405"  "249"  "315"  "75"   "128"  "307"  "11"   "436"  "1"    "201" 
## [101] "183"  "223"  "368"  "336"  "291"  "464"  "411"  "481"  "10"   "154" 
## [111] "468"  "410"  "390"  "440"  "495"  "492"  "493"  "434"  "57"   "531" 
## [121] "420"  "483"  "526"  "472"  "429"  "16"   "446"  "34"   "78"   "139" 
## [131] "252"  "270"  "47"   "114"  "301"  "193"  "182"  "135"  "350"  "195" 
## [141] "352"  "355"  "159"  "363"  "384"  "360"  "331"  "367"  "64"   "406" 
## [151] "163"  "414"  "333"  "427"  "431"  "430"  "426"  "438"  "433"  "418" 
## [161] "441"  "282"  "432"  "72"   "450"  "180"  "454"  "455"  "59"   "451" 
## [171] "254"  "358"  "469"  "165"  "467"  "510"  "337"  "476"  "502"  "527" 
## [181] "479"  "508"  "535"  "302"  "497"  "187"  "13"   "7"    "27"   "14"  
## [191] "22"   "17"   "28"   "42"   "20"   "19"   "45"   "37"   "61"   "39"  
## [201] "21"   "24"   "41"   "50"   "30"   "54"   "52"   "12"   "44"   "31"  
## [211] "83"   "32"   "63"   "60"   "55"   "56"   "89"   "87"   "118"  "86"  
## [221] "85"   "210"  "214"  "129"  "179"  "138"  "174"  "170"  "153"  "93"  
## [231] "151"  "119"  "35"   "173"  "58"   "53"   "133"  "79"   "235"  "192" 
## [241] "191"  "236"  "162"  "215"  "157"  "287"  "132"  "234"  "98"   "77"  
## [251] "103"  "107"  "262"  "220"  "121"  "205"  "378"  "23"   "296"  "290" 
## [261] "229"  "33"   "286"  "276"  "425"  "484"  "323"  "403"  "219"  "394" 
## [271] "509"  "111"  "423"  "4"    "70"   "82"   "81"   "74"   "92"   "99"  
## [281] "90"   "112"  "117"  "106"  "148"  "158"  "144"  "211"  "213"  "216" 
## [291] "232"  "150"  "267"  "227"  "247"  "278"  "280"  "285"  "289"  "269" 
## [301] "295"  "265"  "288"  "122"  "294"  "325"  "341"  "344"  "346"  "359" 
## [311] "283"  "364"  "370"  "371"  "25"   "141"  "391"  "397"  "416"  "404" 
## [321] "299"  "197"  "73"   "354"  "444"  "408"  "461"  "388"  "453"  "459" 
## [331] "474"  "475"  "480"  "449"
unique(hotel_df$company)
##   [1] "NULL" "110"  "113"  "270"  "178"  "240"  "154"  "144"  "307"  "268" 
##  [11] "59"   "204"  "312"  "318"  "94"   "174"  "274"  "195"  "223"  "317" 
##  [21] "281"  "118"  "53"   "286"  "12"   "47"   "324"  "342"  "373"  "371" 
##  [31] "383"  "86"   "82"   "218"  "88"   "31"   "397"  "392"  "405"  "331" 
##  [41] "367"  "20"   "83"   "416"  "51"   "395"  "102"  "34"   "84"   "360" 
##  [51] "394"  "457"  "382"  "461"  "478"  "386"  "112"  "486"  "421"  "9"   
##  [61] "308"  "135"  "224"  "504"  "269"  "356"  "498"  "390"  "513"  "203" 
##  [71] "263"  "477"  "521"  "169"  "515"  "445"  "337"  "251"  "428"  "292" 
##  [81] "388"  "130"  "250"  "355"  "254"  "543"  "531"  "528"  "62"   "120" 
##  [91] "42"   "81"   "116"  "530"  "103"  "39"   "16"   "92"   "61"   "501" 
## [101] "165"  "291"  "290"  "43"   "325"  "192"  "108"  "200"  "465"  "287" 
## [111] "297"  "490"  "482"  "207"  "282"  "437"  "225"  "329"  "272"  "28"  
## [121] "77"   "338"  "72"   "246"  "319"  "146"  "159"  "380"  "323"  "511" 
## [131] "407"  "278"  "80"   "403"  "399"  "14"   "137"  "343"  "346"  "347" 
## [141] "349"  "289"  "351"  "353"  "54"   "99"   "358"  "361"  "362"  "366" 
## [151] "372"  "365"  "277"  "109"  "377"  "379"  "22"   "378"  "330"  "364" 
## [161] "401"  "232"  "255"  "384"  "167"  "212"  "514"  "391"  "400"  "376" 
## [171] "402"  "396"  "302"  "398"  "6"    "370"  "369"  "409"  "168"  "104" 
## [181] "408"  "413"  "148"  "10"   "333"  "419"  "415"  "424"  "425"  "423" 
## [191] "422"  "435"  "439"  "442"  "448"  "443"  "454"  "444"  "52"   "459" 
## [201] "458"  "456"  "460"  "447"  "470"  "466"  "484"  "184"  "485"  "32"  
## [211] "487"  "491"  "494"  "193"  "516"  "496"  "499"  "29"   "78"   "520" 
## [221] "507"  "506"  "512"  "126"  "64"   "242"  "518"  "523"  "539"  "534" 
## [231] "436"  "525"  "541"  "40"   "455"  "410"  "45"   "38"   "49"   "48"  
## [241] "67"   "68"   "65"   "91"   "37"   "8"    "179"  "209"  "219"  "221" 
## [251] "227"  "153"  "186"  "253"  "202"  "216"  "275"  "233"  "280"  "309" 
## [261] "321"  "93"   "316"  "85"   "107"  "350"  "279"  "334"  "348"  "150" 
## [271] "73"   "385"  "418"  "197"  "450"  "452"  "115"  "46"   "76"   "96"  
## [281] "100"  "105"  "101"  "122"  "11"   "139"  "142"  "127"  "143"  "140" 
## [291] "149"  "163"  "160"  "180"  "238"  "183"  "222"  "185"  "217"  "215" 
## [301] "213"  "237"  "230"  "234"  "35"   "245"  "158"  "258"  "259"  "260" 
## [311] "411"  "257"  "271"  "18"   "106"  "210"  "273"  "71"   "284"  "301" 
## [321] "305"  "293"  "264"  "311"  "304"  "313"  "288"  "320"  "314"  "332" 
## [331] "341"  "352"  "243"  "368"  "393"  "132"  "220"  "412"  "420"  "426" 
## [341] "417"  "429"  "433"  "446"  "357"  "479"  "483"  "489"  "229"  "481" 
## [351] "497"  "451"  "492"
unique(hotel_df$customer_type)
## [1] "Transient"       "Contract"        "Transient-Party" "Group"
unique(hotel_df$reservation_status)
## [1] "Check-Out" "Canceled"  "No-Show"
unique(hotel_df$is_repeated_guest)
## [1] 0 1

Fix Data,清洗數據、填補空值、調整數值

1. 修正NA值 :   
    -   children僅有4個NA,以0作為修復填充不影響整體數據分析。   
hotel_df[is.na(hotel_df)] <- 0
sum(is.na(hotel_df))
## [1] 0
2. 進行欄位屬性調整 :   
    -   將chr調整為factor,進行類別整併與確認欄位levels,方便後續進行分析作業。  
# 將chr轉為factor確認類別levels
hotel_df <- hotel_df %>%
    mutate(
        hotel = as.factor(hotel),
        is_canceled = as.factor(is_canceled),
        meal = as.factor(meal),
        country = as.factor(country),
        market_segment = as.factor(market_segment),
        reserved_room_type = as.factor(reserved_room_type),
        assigned_room_type = as.factor(assigned_room_type),
        deposit_type = as.factor(deposit_type),
        agent = as.factor(agent),
        company = as.factor(company),
        customer_type = as.factor(customer_type),
        reservation_status = as.factor(reservation_status),
        reservation_status_date = as.factor(reservation_status_date),
        is_repeated_guest = as.factor(is_repeated_guest)
    )

# 確認col levels
levels(hotel_df$agent) # null level >> 334
##   [1] "1"    "10"   "103"  "104"  "105"  "106"  "107"  "11"   "110"  "111" 
##  [11] "112"  "114"  "115"  "117"  "118"  "119"  "12"   "121"  "122"  "126" 
##  [21] "127"  "128"  "129"  "13"   "132"  "133"  "134"  "135"  "138"  "139" 
##  [31] "14"   "141"  "142"  "143"  "144"  "146"  "147"  "148"  "149"  "15"  
##  [41] "150"  "151"  "152"  "153"  "154"  "155"  "156"  "157"  "158"  "159" 
##  [51] "16"   "162"  "163"  "165"  "167"  "168"  "17"   "170"  "171"  "173" 
##  [61] "174"  "175"  "177"  "179"  "180"  "181"  "182"  "183"  "184"  "185" 
##  [71] "187"  "19"   "191"  "192"  "193"  "195"  "196"  "197"  "2"    "20"  
##  [81] "201"  "205"  "208"  "21"   "210"  "211"  "213"  "214"  "215"  "216" 
##  [91] "219"  "22"   "220"  "223"  "227"  "229"  "23"   "232"  "234"  "235" 
## [101] "236"  "24"   "240"  "241"  "242"  "243"  "244"  "245"  "247"  "248" 
## [111] "249"  "25"   "250"  "251"  "252"  "253"  "254"  "256"  "257"  "258" 
## [121] "26"   "261"  "262"  "265"  "267"  "269"  "27"   "270"  "273"  "275" 
## [131] "276"  "278"  "28"   "280"  "281"  "282"  "283"  "285"  "286"  "287" 
## [141] "288"  "289"  "29"   "290"  "291"  "294"  "295"  "296"  "298"  "299" 
## [151] "3"    "30"   "300"  "301"  "302"  "303"  "304"  "305"  "306"  "307" 
## [161] "308"  "31"   "310"  "313"  "314"  "315"  "32"   "321"  "323"  "324" 
## [171] "325"  "326"  "327"  "328"  "33"   "330"  "331"  "332"  "333"  "334" 
## [181] "335"  "336"  "337"  "339"  "34"   "341"  "344"  "346"  "348"  "35"  
## [191] "350"  "352"  "354"  "355"  "358"  "359"  "36"   "360"  "363"  "364" 
## [201] "367"  "368"  "37"   "370"  "371"  "375"  "378"  "38"   "384"  "385" 
## [211] "387"  "388"  "39"   "390"  "391"  "393"  "394"  "397"  "4"    "40"  
## [221] "403"  "404"  "405"  "406"  "408"  "41"   "410"  "411"  "414"  "416" 
## [231] "418"  "42"   "420"  "423"  "425"  "426"  "427"  "429"  "430"  "431" 
## [241] "432"  "433"  "434"  "436"  "438"  "44"   "440"  "441"  "444"  "446" 
## [251] "449"  "45"   "450"  "451"  "453"  "454"  "455"  "459"  "461"  "464" 
## [261] "467"  "468"  "469"  "47"   "472"  "474"  "475"  "476"  "479"  "480" 
## [271] "481"  "483"  "484"  "492"  "493"  "495"  "497"  "5"    "50"   "502" 
## [281] "508"  "509"  "510"  "52"   "526"  "527"  "53"   "531"  "535"  "54"  
## [291] "55"   "56"   "57"   "58"   "59"   "6"    "60"   "61"   "63"   "64"  
## [301] "66"   "67"   "68"   "69"   "7"    "70"   "71"   "72"   "73"   "74"  
## [311] "75"   "77"   "78"   "79"   "8"    "81"   "82"   "83"   "85"   "86"  
## [321] "87"   "88"   "89"   "9"    "90"   "91"   "92"   "93"   "94"   "95"  
## [331] "96"   "98"   "99"   "NULL"
levels(hotel_df$company) # null level >> 353
##   [1] "10"   "100"  "101"  "102"  "103"  "104"  "105"  "106"  "107"  "108" 
##  [11] "109"  "11"   "110"  "112"  "113"  "115"  "116"  "118"  "12"   "120" 
##  [21] "122"  "126"  "127"  "130"  "132"  "135"  "137"  "139"  "14"   "140" 
##  [31] "142"  "143"  "144"  "146"  "148"  "149"  "150"  "153"  "154"  "158" 
##  [41] "159"  "16"   "160"  "163"  "165"  "167"  "168"  "169"  "174"  "178" 
##  [51] "179"  "18"   "180"  "183"  "184"  "185"  "186"  "192"  "193"  "195" 
##  [61] "197"  "20"   "200"  "202"  "203"  "204"  "207"  "209"  "210"  "212" 
##  [71] "213"  "215"  "216"  "217"  "218"  "219"  "22"   "220"  "221"  "222" 
##  [81] "223"  "224"  "225"  "227"  "229"  "230"  "232"  "233"  "234"  "237" 
##  [91] "238"  "240"  "242"  "243"  "245"  "246"  "250"  "251"  "253"  "254" 
## [101] "255"  "257"  "258"  "259"  "260"  "263"  "264"  "268"  "269"  "270" 
## [111] "271"  "272"  "273"  "274"  "275"  "277"  "278"  "279"  "28"   "280" 
## [121] "281"  "282"  "284"  "286"  "287"  "288"  "289"  "29"   "290"  "291" 
## [131] "292"  "293"  "297"  "301"  "302"  "304"  "305"  "307"  "308"  "309" 
## [141] "31"   "311"  "312"  "313"  "314"  "316"  "317"  "318"  "319"  "32"  
## [151] "320"  "321"  "323"  "324"  "325"  "329"  "330"  "331"  "332"  "333" 
## [161] "334"  "337"  "338"  "34"   "341"  "342"  "343"  "346"  "347"  "348" 
## [171] "349"  "35"   "350"  "351"  "352"  "353"  "355"  "356"  "357"  "358" 
## [181] "360"  "361"  "362"  "364"  "365"  "366"  "367"  "368"  "369"  "37"  
## [191] "370"  "371"  "372"  "373"  "376"  "377"  "378"  "379"  "38"   "380" 
## [201] "382"  "383"  "384"  "385"  "386"  "388"  "39"   "390"  "391"  "392" 
## [211] "393"  "394"  "395"  "396"  "397"  "398"  "399"  "40"   "400"  "401" 
## [221] "402"  "403"  "405"  "407"  "408"  "409"  "410"  "411"  "412"  "413" 
## [231] "415"  "416"  "417"  "418"  "419"  "42"   "420"  "421"  "422"  "423" 
## [241] "424"  "425"  "426"  "428"  "429"  "43"   "433"  "435"  "436"  "437" 
## [251] "439"  "442"  "443"  "444"  "445"  "446"  "447"  "448"  "45"   "450" 
## [261] "451"  "452"  "454"  "455"  "456"  "457"  "458"  "459"  "46"   "460" 
## [271] "461"  "465"  "466"  "47"   "470"  "477"  "478"  "479"  "48"   "481" 
## [281] "482"  "483"  "484"  "485"  "486"  "487"  "489"  "49"   "490"  "491" 
## [291] "492"  "494"  "496"  "497"  "498"  "499"  "501"  "504"  "506"  "507" 
## [301] "51"   "511"  "512"  "513"  "514"  "515"  "516"  "518"  "52"   "520" 
## [311] "521"  "523"  "525"  "528"  "53"   "530"  "531"  "534"  "539"  "54"  
## [321] "541"  "543"  "59"   "6"    "61"   "62"   "64"   "65"   "67"   "68"  
## [331] "71"   "72"   "73"   "76"   "77"   "78"   "8"    "80"   "81"   "82"  
## [341] "83"   "84"   "85"   "86"   "88"   "9"    "91"   "92"   "93"   "94"  
## [351] "96"   "99"   "NULL"
levels(hotel_df$is_canceled)
## [1] "0" "1"
levels(hotel_df$is_repeated_guest)
## [1] "0" "1"
#進行數據彙總預覽
summary(hotel_df)
##           hotel       is_canceled   lead_time   arrival_date_year
##  City Hotel  :79330   0:75166     Min.   :  0   Min.   :2015     
##  Resort Hotel:40060   1:44224     1st Qu.: 18   1st Qu.:2016     
##                                   Median : 69   Median :2016     
##                                   Mean   :104   Mean   :2016     
##                                   3rd Qu.:160   3rd Qu.:2017     
##                                   Max.   :737   Max.   :2017     
##                                                                  
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:119390      Min.   : 1.00            Min.   : 1.0             
##  Class :character   1st Qu.:16.00            1st Qu.: 8.0             
##  Mode  :character   Median :28.00            Median :16.0             
##                     Mean   :27.17            Mean   :15.8             
##                     3rd Qu.:38.00            3rd Qu.:23.0             
##                     Max.   :53.00            Max.   :31.0             
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults      
##  Min.   : 0.0000         Min.   : 0.0         Min.   : 0.000  
##  1st Qu.: 0.0000         1st Qu.: 1.0         1st Qu.: 2.000  
##  Median : 1.0000         Median : 2.0         Median : 2.000  
##  Mean   : 0.9276         Mean   : 2.5         Mean   : 1.856  
##  3rd Qu.: 2.0000         3rd Qu.: 3.0         3rd Qu.: 2.000  
##  Max.   :19.0000         Max.   :50.0         Max.   :55.000  
##                                                               
##     children           babies                 meal          country     
##  Min.   : 0.0000   Min.   : 0.000000   BB       :92310   PRT    :48590  
##  1st Qu.: 0.0000   1st Qu.: 0.000000   FB       :  798   GBR    :12129  
##  Median : 0.0000   Median : 0.000000   HB       :14463   FRA    :10415  
##  Mean   : 0.1039   Mean   : 0.007949   SC       :10650   ESP    : 8568  
##  3rd Qu.: 0.0000   3rd Qu.: 0.000000   Undefined: 1169   DEU    : 7287  
##  Max.   :10.0000   Max.   :10.000000                     ITA    : 3766  
##                                                          (Other):28635  
##        market_segment  distribution_channel is_repeated_guest
##  Online TA    :56477   Length:119390        0:115580         
##  Offline TA/TO:24219   Class :character     1:  3810         
##  Groups       :19811   Mode  :character                      
##  Direct       :12606                                         
##  Corporate    : 5295                                         
##  Complementary:  743                                         
##  (Other)      :  239                                         
##  previous_cancellations previous_bookings_not_canceled reserved_room_type
##  Min.   : 0.00000       Min.   : 0.0000                A      :85994     
##  1st Qu.: 0.00000       1st Qu.: 0.0000                D      :19201     
##  Median : 0.00000       Median : 0.0000                E      : 6535     
##  Mean   : 0.08712       Mean   : 0.1371                F      : 2897     
##  3rd Qu.: 0.00000       3rd Qu.: 0.0000                G      : 2094     
##  Max.   :26.00000       Max.   :72.0000                B      : 1118     
##                                                        (Other): 1551     
##  assigned_room_type booking_changes       deposit_type        agent      
##  A      :74053      Min.   : 0.0000   No Deposit:104641   9      :31961  
##  D      :25322      1st Qu.: 0.0000   Non Refund: 14587   NULL   :16340  
##  E      : 7806      Median : 0.0000   Refundable:   162   240    :13922  
##  F      : 3751      Mean   : 0.2211                       1      : 7191  
##  G      : 2553      3rd Qu.: 0.0000                       14     : 3640  
##  C      : 2375      Max.   :21.0000                       7      : 3539  
##  (Other): 3530                                            (Other):42797  
##     company       days_in_waiting_list         customer_type  
##  NULL   :112593   Min.   :  0.000      Contract       : 4076  
##  40     :   927   1st Qu.:  0.000      Group          :  577  
##  223    :   784   Median :  0.000      Transient      :89613  
##  67     :   267   Mean   :  2.321      Transient-Party:25124  
##  45     :   250   3rd Qu.:  0.000                             
##  153    :   215   Max.   :391.000                             
##  (Other):  4354                                               
##       adr          required_car_parking_spaces total_of_special_requests
##  Min.   :  -6.38   Min.   :0.00000             Min.   :0.0000           
##  1st Qu.:  69.29   1st Qu.:0.00000             1st Qu.:0.0000           
##  Median :  94.58   Median :0.00000             Median :0.0000           
##  Mean   : 101.83   Mean   :0.06252             Mean   :0.5714           
##  3rd Qu.: 126.00   3rd Qu.:0.00000             3rd Qu.:1.0000           
##  Max.   :5400.00   Max.   :8.00000             Max.   :5.0000           
##                                                                         
##  reservation_status reservation_status_date
##  Canceled :43017    2015-10-21:  1461      
##  Check-Out:75166    2015-07-06:   805      
##  No-Show  : 1207    2016-11-25:   790      
##                     2015-01-01:   763      
##                     2016-01-18:   625      
##                     2015-07-02:   469      
##                     (Other)   :114477
3. 進行NULL值與其他值調整 :     
    -   agent欄位原始NULL值不以填充值或刪除的方式處理,以免影響整體數據集數值分析,選擇將NULL值轉換為no_record(沒有紀錄),維持原始數據集的內容,僅做單純的變更紀錄。    
    -   company欄位NULL值處理方式與agent欄位相同。  
    -   is_canceled欄位調整紀錄模式,0調整為check_in,1為canceled,方便閱覽與分析。 
    -   is_repeated_guest欄位調整模式與is_canceled欄位相同。    
# null值轉換,agent、Company
null_agent <- which(levels(hotel_df$agent) == "NULL")
levels(hotel_df$agent)[null_agent] <- "no_record"

null_company <- which(levels(hotel_df$company) == "NULL")
levels(hotel_df$company)[null_company] <- "no_record"

# 數值調整
null_check <- which(levels(hotel_df$is_canceled) == "0")
levels(hotel_df$is_canceled)[null_check] <- "check_in"
null_canceled <- which(levels(hotel_df$is_canceled) == "1")
levels(hotel_df$is_canceled)[null_canceled] <- "canceled"

null_customer <- which(levels(hotel_df$is_repeated_guest) == "0")
levels(hotel_df$is_repeated_guest)[null_customer] <- "new_customer"
null_repeated <- which(levels(hotel_df$is_repeated_guest) == "1")
levels(hotel_df$is_repeated_guest)[null_repeated] <- "repeated_customer"
4.  增加欄位 : 
    -   增加顧客總數(包含嬰兒)、孩童總數(不包含嬰兒)、平均客單(不含嬰兒)。  
# 增加顧客總數、孩童總數、平均客單(小孩加大人,不含嬰兒)
hotel_df <- hotel_df %>%
    mutate(all_guest = adults + children + babies) %>%
    mutate(all_kids = children + babies) %>%
    mutate(avg_adr = (adr / (adults + children)))

數據可視化

1. 飯店基本情況

  • 飯店取消率
# 飯店取消率
ggplot(data = hotel_df) +
    geom_bar(mapping = aes(x = is_canceled, fill = hotel)) +
    scale_fill_manual(values = c("#ffd700", "steelblue")) +
    labs(title = "甚麼樣的原因促使客戶取消?", subtitle = "各飯店取消率") +
    xlab("是否取消") +
    ylab("數量計算")

  • 飯店預訂情形
# 飯店預訂情形
ggplot(data = hotel_df) +
    geom_bar(mapping = aes(x = reservation_status, fill = hotel)) +
    labs(title = "預定後實際入住情形?", subtitle = "各飯店預定關係圖") +
    scale_fill_manual(values = c("deeppink", "mediumseagreen")) +
    xlab("預訂後入住情形") +
    ylab("數量")

  • 總旅客量體
# 總客戶量 & 月份
ggplot(data = hotel_df) +
    geom_point(mapping = aes(x = arrival_date_month, y = all_guest, color = hotel, shape = hotel)) +
    labs(title = "不同飯店每月來客是否平均?", subtitle = "飯店總來客與月份關係") +
    xlab("月份") +
    ylab("旅客總數") +
    theme(axis.text.x = element_text(angle = 60)) +
    facet_wrap(~hotel)

2. 飯店銷售管道

  • 不同來源市場的訂單量
# 不同來源市場的訂單量
ggplot(data = hotel_df) +
    geom_point(mapping = aes(x = market_segment, y = all_guest, color = hotel, shape = hotel)) +
    labs(title = "客戶總量&來源", subtitle = "訂單來源分布情形") +
    xlab("旅客來源") +
    ylab("旅客總量") +
    theme(axis.text.x = element_text(angle = 60)) +
    facet_wrap(~hotel)

  • 不同來源市場的預定取消率
# 不同來源市場的取消率
ggplot(data = hotel_df) +
    geom_bar(mapping = aes(x = is_canceled, fill = market_segment)) +
    labs(title = "訂單增加時入住率是否提升?", subtitle = "來源市場間的取消率") +
    xlab("是否取消") +
    ylab("數量統計")

3. 飯店客流量

  • 月份預定時間分布情形
# 各月分預定時間分布
ggplot(data = hotel_df) +
    geom_boxplot(mapping = aes(x = arrival_date_month, y = lead_time, fill = hotel)) +
    labs(title = "預定時間與準備作業", subtitle = "訂房日期至到達日期間的時間") +
    scale_fill_manual(values = c("yellowgreen", "orange3")) +
    xlab("月份") +
    ylab("預定時間") +
    theme(axis.text.x = element_text(angle = 45))

  • 平均客單價
# 均客單價
ggplot(data = hotel_df) +
    geom_point(mapping = aes(x = arrival_date_month, y = avg_adr, color = hotel, shape = hotel)) +
    labs(title = "各月客單是否穩定?", subtitle = "平均客單價 / 月") +
    xlab("月份") +
    ylab("平均客單(不含嬰兒)") +
    theme(axis.text.x = element_text(angle = 45)) +
    facet_wrap(~hotel)

  • 假、平日客流量
# 假日家庭客流量
ggplot(data = hotel_df) +
    geom_point(mapping = aes(x = children, y = stays_in_weekend_nights, fill = children)) +
    labs(title = "周末住宿是否以孩子居多?", subtitle = "孩童總數與周末關係圖") +
    xlab("孩童總數") +
    ylab("周末住宿")

# 平日家庭客流量
ggplot(data = hotel_df) +
    geom_point(mapping = aes(x = children, y = stays_in_week_nights, fill = children)) +
    labs(title = "平日是否均為非家庭旅客?", subtitle = "平日入住與孩童總數關係圖") +
    xlab("孩童總數") +
    ylab("平日住宿")

4. 旅客情形

  • 是否為重複客人
# 是否重複客
ggplot(data = hotel_df) +
    geom_bar(mapping = aes(x = is_repeated_guest, fill = hotel)) +
    labs(title = "是否不斷有回流客?", subtitle = "各飯店新舊旅客關係圖") +
    scale_fill_manual(values = c("mediumblue", "sienna4")) +
    xlab("新/舊客戶") +
    ylab("數量")

  • 預付相關情形
# 旅客預付情形
ggplot(data = hotel_df) +
    geom_point(mapping = aes(x = deposit_type, y = adults, color = hotel, shape = hotel)) +
    labs(title = "旅客付訂與退款方式", subtitle = "預付方式與是否退款關係圖") +
    xlab("預付型態") +
    ylab("旅客(不含孩童)") +
    facet_wrap(~hotel)

分析結果報告

對於以上數據集自檢視、清洗、調整與可視化之後有幾點進行報告 :

1. 結果分析

-   City Hotel的取消情形明顯大於Resort Hotel,預定之後的實際入住Check Out與Canceled的數量也幾乎持平上升,大量的訂單湧入,卻也大量的取消流失訂單,不僅沒有實際收益,工作人員進行房務準備時也容易手忙腳亂。   
-   Ciry Hotel每月來客幾乎維持持平,Resort Hotel則是於九、十月會有爆發性的來客增加。    
-   旅客來源的部分City Hotel較多的來源來自線上訂購與公司訂單,Resort Hotel則是臨櫃客戶與代理旅行社。    
-   旅客與來源的關係線上預訂、團體的取消率最高,相反的臨櫃旅客、旅行社、公司預定擁有較好的入住率。    
-   均客單City Hotel二月最差,Resort Hotel則是於一、二月為最低點。    
-   周末與平日家庭來客,排除沒有孩童的家庭、情侶、單人等情形,1 ~ 2個孩童為比例最高的家庭來客,無關乎平日或假日。   
-   重複性客人無論City Hotel或是Resort Hotel均表現不理想,依靠完全的新旅客維持營運,對於持續性營收來說並不是理想的情況。    
-   預付款與退款,City Hotel幾乎維持等比例,Resort Hotel則是高比例的臨櫃訂房,提升實際入住情形。  
-   在均客單下降的月份,可以看到比對月來客時並無明顯降低趨勢,可能在均客單下降的月份,做了過多的促銷活動。    

2. 依照分析結果進行建議

City Hotel

-   建議增加預收訂金程序,並明訂告知不退款情況,使旅客下訂之前自我進行確認,減少後續預約取消意願。    
-   進行訂房來源比例分配,活動促銷範圍可以增加旅行社配合模式與現場來客活動。    
-   廣告投放與房型促銷,建議以沒有孩童的家庭、單人或情侶客層,以及1~2位孩童家庭作為主要客層。   
-   單純依靠新旅客並非理想營運模式,建議逐步增加回流旅客住宿活動,提升回流旅客二次消費。    
-   建議旅客預訂後可以增加Chick In活動,或是其他能夠提升旅客住宿前的附加價值感,降低預定後的取消率。    
-   旅客意見回條,建議可新增對於取消預定可能的原因與旅客實際建議,逐步調整旅客入住體驗。  

Resort Hotel

-   來客於九月、十月有爆發性成長,但均客單卻沒有與之增幅,建議可以調整低客單月份促銷的力度,以附加價值增加旅客體驗,而不是以價格作為促銷首選。    
-   單純依靠新旅客並非理想營運模式,建議逐步增加回流旅客住宿活動,提升回流旅客二次消費。    
-   跨國旅客比率過低,建議增加非本地旅行社合作、廣告投放範圍,凸顯區域觀光價值,配合接駁行程,增加透過航空旅遊的旅客比率。    
-   廣告投放與房型促銷,建議以沒有孩童的家庭、單人或情侶客層,以及1~2位孩童家庭作為主要客層。   

飯店內數據紀錄 :

-   數據的NULL值過多,顯示紀錄旅客基本資料時,大多數不會理會包含國家、公司、旅行社代號的欄位填寫,也可能內部進行紀錄時常遺漏,建議於填寫資訊或內部紀錄時添加其他選項,以免造成數據過多空值產生。    
-   數據集內若為類別,以數字形式填入雖減少輸入時間與工作壓力,建議可以增加備註,例如"0" > "check_in","1" > "canceled",對於內部檢視數據時能更快速理解數據內容。    

以上為本次數據分析檢視、清洗、調整、可視化與結果分析過程,感謝您的時間,若有其他可供指教之處,請不吝指教,感謝您。





其他文章

看看精選文章

納希克房價分析 | Nashik Apartment Price Analyze – 語法解析(上)

  這次 Nashik 的房價分析有上傳至 Kaggle ,有興趣的朋友可以前往閱覽, RMarkdown PDF 報告存放在 Google 雲端,程式碼則是存放於 Github ,照慣例會分享好用的函式語法,雖說基本的 Packages 與語法可能很多人都會完整的閱覽,但是實際...