搜尋感興趣的網誌

所有文章連結

2022年3月23日 星期三

R Packages dplyr - select | R包dplyr - select


select函數在檢視數據與整理的階段可以很輕易的將所需變量找尋、鎖定以至於合併成一個新的數據框,通常會搭配fliter作條件篩選,可以整理出更多想要觀察的變量與值


基本語法

select(dataset, 變量..)


有些輔助函數可以搭配select一併使用,查找上會顯得更為快速,畢竟現在的數據集載入後都有不少的欄位,關於這些輔助函數會一併做演示。


這次拿Cyclistic的數據集作為演示

# 載入數據集
cyclistic_dplyr <- read.csv("combine_datas_clearn.csv")
dplyr_operate <- cyclistic_dplyr[1:6, ]

# 確認數據內容
r$> str(dplyr_operate) 'data.frame': 6 obs. of 16 variables: $ ride_id : chr "22178529" "22178530" "22178531" $ started_at : chr "2019-04-01 00:02:22" "2019-04-01 00 $ ended_at : chr "2019-04-01 00:09:48" "2019-04-01 00 $ rideable_type : chr "6251" "6226" "5649" "4151" ... $ start_station_id : int 81 317 283 26 202 420 $ start_station_name : chr "Daley Center Plaza" "Wood St & Tayl $ end_station_id : int 56 59 174 133 129 426 $ end_station_name : chr "Desplaines St & Kinzie St" "Wabash $ member_casual : chr "member" "member" "member" "member" $ year : int 2019 2019 2019 2019 2019 2019 $ month : chr "April" "April" "April" "April" ... $ week : chr "Monday" "Monday" "Monday" "Monday" $ day : int 1 1 1 1 1 1 $ hour : int 0 0 0 0 0 0 $ ride_length : int 446 1048 252 357 1007 257 $ ride_length_minutes: num 7.43 17.47 4.2 5.95 16.78 ...


一般變數選擇語法,挑出年、月、星期並存到新的變量中

#選出年、月、日
select_df <- select(dplyr_operate, year, month, week)

# 輸出結果
r$> select_df year month week 1 2019 April Monday 2 2019 April Monday 3 2019 April Monday 4 2019 April Monday 5 2019 April Monday 6 2019 April Monday


: >> 範圍選擇

# 範圍選擇
scope_select <- select(dplyr_operate, c(year : day))

# 輸出結果
r$> scope_select year month week day 1 2019 April Monday 1 2 2019 April Monday 1 3 2019 April Monday 1 4 2019 April Monday 1 5 2019 April Monday 1 6 2019 April Monday 1


-c() >> 選出不要的部分輸出

# 剔除不需要的變量欄位
select_norequired <- select(dplyr_operate, -c(
    ride_id,
    started_at,
    ended_at,
    rideable_type,
    start_station_id,
    start_station_name,
    end_station_id,
    end_station_name,
    ride_length,
    ride_length_minutes))

# 輸出結果
r$> select_norequired member_casual year month week day hour 1 member 2019 April Monday 1 0 2 member 2019 April Monday 1 0 3 member 2019 April Monday 1 0 4 member 2019 April Monday 1 0 5 member 2019 April Monday 1 0 6 member 2019 April Monday 1 0


`starts_with()` >> 以開頭作為選擇條件,記得要加" "

# 開頭為st的都要
start_select <- select(dplyr_operate, starts_with(
    "st"
))

# 輸出結果
r$> start_select started_at start_station_id start_station_name 1 2019-04-01 00:02:22 81 Daley Center Plaza 2 2019-04-01 00:03:02 317 Wood St & Taylor St 3 2019-04-01 00:11:07 283 LaSalle St & Jackson Blvd 4 2019-04-01 00:13:01 26 McClurg Ct & Illinois St 5 2019-04-01 00:19:26 202 Halsted St & 18th St 6 2019-04-01 00:19:39 420 Ellis Ave & 55th St


`ends_with()` >> 以結尾作為選擇條件,記得要加" "

# 結尾為e的都要
end_select <- select(dplyr_operate, ends_with(
    "e"
))

# 輸出結果
r$> end_select rideable_type start_station_name end_station_name 1 6251 Daley Center Plaza Desplaines St & Kinzie St 2 6226 Wood St & Taylor St Wabash Ave & Roosevelt Rd 3 5649 LaSalle St & Jackson Blvd Canal St & Madison St 4 4151 McClurg Ct & Illinois St Kingsbury St & Kinzie St 5 3270 Halsted St & 18th St Blue Island Ave & 18th St 6 3123 Ellis Ave & 55th St Ellis Ave & 60th St


`contains()` >> 模糊查找,可以找出包含在內的變量

# 模糊選擇,有ww的都要
contains_select <- select(dplyr_operate, contains("ww"))

# 輸出結果
r$> contains_select rideable_type year 1 6251 2019 2 6226 2019 3 5649 2019 4 4151 2019 5 3270 2019 6 3123 2019

沒有留言:

張貼留言

其他文章

看看精選文章

納希克房價分析 | Nashik Apartment Price Analyze – 語法解析(上)

  這次 Nashik 的房價分析有上傳至 Kaggle ,有興趣的朋友可以前往閱覽, RMarkdown PDF 報告存放在 Google 雲端,程式碼則是存放於 Github ,照慣例會分享好用的函式語法,雖說基本的 Packages 與語法可能很多人都會完整的閱覽,但是實際...