select函數在檢視數據與整理的階段可以很輕易的將所需變量找尋、鎖定以至於合併成一個新的數據框,通常會搭配fliter作條件篩選,可以整理出更多想要觀察的變量與值
基本語法
select(dataset, 變量..)
有些輔助函數可以搭配select一併使用,查找上會顯得更為快速,畢竟現在的數據集載入後都有不少的欄位,關於這些輔助函數會一併做演示。
這次拿Cyclistic的數據集作為演示
# 載入數據集
cyclistic_dplyr <- read.csv("combine_datas_clearn.csv")
dplyr_operate <- cyclistic_dplyr[1:6, ]
# 確認數據內容
r$> str(dplyr_operate)
'data.frame': 6 obs. of 16 variables:
$ ride_id : chr "22178529" "22178530" "22178531"
$ started_at : chr "2019-04-01 00:02:22" "2019-04-01 00
$ ended_at : chr "2019-04-01 00:09:48" "2019-04-01 00
$ rideable_type : chr "6251" "6226" "5649" "4151" ...
$ start_station_id : int 81 317 283 26 202 420
$ start_station_name : chr "Daley Center Plaza" "Wood St & Tayl
$ end_station_id : int 56 59 174 133 129 426
$ end_station_name : chr "Desplaines St & Kinzie St" "Wabash
$ member_casual : chr "member" "member" "member" "member"
$ year : int 2019 2019 2019 2019 2019 2019
$ month : chr "April" "April" "April" "April" ...
$ week : chr "Monday" "Monday" "Monday" "Monday"
$ day : int 1 1 1 1 1 1
$ hour : int 0 0 0 0 0 0
$ ride_length : int 446 1048 252 357 1007 257
$ ride_length_minutes: num 7.43 17.47 4.2 5.95 16.78 ...
一般變數選擇語法,挑出年、月、星期並存到新的變量中
#選出年、月、日
select_df <- select(dplyr_operate, year, month, week)
# 輸出結果
r$> select_df
year month week
1 2019 April Monday
2 2019 April Monday
3 2019 April Monday
4 2019 April Monday
5 2019 April Monday
6 2019 April Monday
: >> 範圍選擇
# 範圍選擇
scope_select <- select(dplyr_operate, c(year : day))
# 輸出結果
r$> scope_select
year month week day
1 2019 April Monday 1
2 2019 April Monday 1
3 2019 April Monday 1
4 2019 April Monday 1
5 2019 April Monday 1
6 2019 April Monday 1
-c() >> 選出不要的部分輸出
# 剔除不需要的變量欄位
select_norequired <- select(dplyr_operate, -c(
ride_id,
started_at,
ended_at,
rideable_type,
start_station_id,
start_station_name,
end_station_id,
end_station_name,
ride_length,
ride_length_minutes))
# 輸出結果
r$> select_norequired
member_casual year month week day hour
1 member 2019 April Monday 1 0
2 member 2019 April Monday 1 0
3 member 2019 April Monday 1 0
4 member 2019 April Monday 1 0
5 member 2019 April Monday 1 0
6 member 2019 April Monday 1 0
`starts_with()` >> 以開頭作為選擇條件,記得要加" "
# 開頭為st的都要
start_select <- select(dplyr_operate, starts_with(
"st"
))
# 輸出結果
r$> start_select
started_at start_station_id start_station_name
1 2019-04-01 00:02:22 81 Daley Center Plaza
2 2019-04-01 00:03:02 317 Wood St & Taylor St
3 2019-04-01 00:11:07 283 LaSalle St & Jackson Blvd
4 2019-04-01 00:13:01 26 McClurg Ct & Illinois St
5 2019-04-01 00:19:26 202 Halsted St & 18th St
6 2019-04-01 00:19:39 420 Ellis Ave & 55th St
`ends_with()` >> 以結尾作為選擇條件,記得要加" "
# 結尾為e的都要
end_select <- select(dplyr_operate, ends_with(
"e"
))
# 輸出結果
r$> end_select
rideable_type start_station_name end_station_name
1 6251 Daley Center Plaza Desplaines St & Kinzie St
2 6226 Wood St & Taylor St Wabash Ave & Roosevelt Rd
3 5649 LaSalle St & Jackson Blvd Canal St & Madison St
4 4151 McClurg Ct & Illinois St Kingsbury St & Kinzie St
5 3270 Halsted St & 18th St Blue Island Ave & 18th St
6 3123 Ellis Ave & 55th St Ellis Ave & 60th St
`contains()` >> 模糊查找,可以找出包含在內的變量
# 模糊選擇,有ww的都要
contains_select <- select(dplyr_operate, contains("ww"))
# 輸出結果
r$> contains_select
rideable_type year
1 6251 2019
2 6226 2019
3 5649 2019
4 4151 2019
5 3270 2019
6 3123 2019
沒有留言:
張貼留言