小雷的 Programming & Analytic 日誌: R Packages dplyr - distinct

2022年3月25日星期五

R Packages dplyr - distinct | R包dplyr - distinct

面對一堆凌亂數據，想查找那些值是唯一的值時，dplyr包中就有一個函數可以方便的進行查詢，特別的是這個函數無法使用"$"連結欄位變量進行。

基本語法

distinct(dataset, 欄位變量, .keep_all = FALSE)

參數

.keep_al = FALSE >> 預設為FALSE，如果設定為TRUE，則會將欄位對應的的值做保留，FLASE則是只有抓取第一行的唯一值

基本的查詢

# 創建一個數據框包含NA

bind_df_1 <- tibble(

    number = c(1, 2, -3, 4, -5, 1, 1, -5, 7), 

    letter = c("ap", "ef", "tg", "hk", "bu", "xi", "ux", "it", NA)

)

# 查找唯一值

distinct(bind_df_1)

# 輸出結果

r$> distinct(bind_df_1) # A tibble: 9 x 2 number letter <dbl> <chr> 1 1 ap 2 2 ef 3 -3 tg 4 4 hk 5 -5 bu 6 1 xi 7 1 ux 8 -5 it 9 7 NA

加入參數後進行查詢，沒有指定欄位時，變成各個欄位對應的唯一值

# 加入參數

distinct(bind_df_1, .keep_all = TRUE)

# 輸出結果

r$> distinct(bind_df_1, .keep_all = TRUE) # A tibble: 9 x 2 number letter <dbl> <chr> 1 1 ap 2 2 ef 3 -3 tg 4 4 hk 5 -5 bu 6 1 xi 7 1 ux 8 -5 it 9 7 NA

指定欄位查詢後僅顯示該欄位的唯一值

# 指定欄位

distinct(bind_df_1, number)

# 輸出結果

r$> # 指定欄位 distinct(bind_df_1, number) # A tibble: 6 x 1 number <dbl> 1 1 2 2 3 -3 4 4 5 -5 6 7

加入指定欄位與參數後，顯示指定欄位的唯一值與"對應的第一個值"，其餘的就不顯示了

# 指定欄位並加入參數

distinct(bind_df_1, number, .keep_all = TRUE)

# 輸出結果

r$> distinct(bind_df_1, number, .keep_all = TRUE) # A tibble: 6 x 2 number letter <dbl> <chr> 1 1 ap 2 2 ef 3 -3 tg 4 4 hk 5 -5 bu 6 7 NA

簡單的查詢搭配$定位欄位變量，unique與distinct差異

# 使用$指定欄位

distinct(bind_df_1$number)

# 輸出結果

r$> distinct(bind_df_1$number) Error in UseMethod("distinct") : no applicable method for 'distinct' applied to an object of class "c('double', 'numeric')"

# 使用base包的unique搭配$指定欄位

unique(bind_df_1$number)

# 輸出結果

r$> unique(bind_df_1$number) [1] 1 2 -3 4 -5 7

小雷的 Programming & Analytic 日誌

搜尋感興趣的網誌

所有文章連結

2022年3月25日星期五

R Packages dplyr - distinct | R包dplyr - distinct

沒有留言:

張貼留言

其他文章

看看精選文章

納希克房價分析 | Nashik Apartment Price Analyze – 語法解析(上)

標籤

檢舉濫用情形

搜尋感興趣的網誌

所有文章連結

2022年3月25日 星期五

R Packages dplyr - distinct | R包dplyr - distinct

沒有留言:

張貼留言

其他文章

看看精選文章

納希克房價分析 | Nashik Apartment Price Analyze – 語法解析(上)

標籤

檢舉濫用情形

2022年3月25日星期五