前陣子為了Python kivy app花了不少時間進行,導致延遲了數據分析的產出,這篇是剛好看到kaggle有新的data set,雖然原始用意是要以預測房價為主,但是在預測房價之前若能將整體狀況都了解清楚,對後續的預測工作也會來的更順暢,所以先以整體的分析為主軸。
完整的Rmarkdown報告一樣放置於雲端,供有需要的朋友取用,本次好用的語法也會照慣例寫成語法解析。
小雷的Github
>> 點我連結
觀看完整報告,請至雲端下載PDF : 點我連結
View the full rmarkdown report, please go to the google cloud and download the PDF file, thx
--------------------------------------------做個分隔線--------------------------------------------
Nashik apartment price analyze
Rex_Li
2022/6/1
納希克房屋分析 | Nashik Apartment Price Analyze
關於納希克 | About Nashik
========================================================================================================
納西克是馬哈拉施特拉邦的第 4 大城市,距離孟買和浦那大約 200 公里,納西克作為度假熱點和投資養老院的地點而備受關注。隨著浦那和孟買的房地產價格飆升。社會基礎設施逐漸完善,隨著城市經濟的發展和人們選擇該地區作為永久居住地。
Nashik, the fourth largest city in Maharashtra, about 200 kilometers from Mumbai and Pune, is gaining traction as a holiday hotspot and a location to invest in nursing homes. With property prices soaring in Pune and Mumbai. The social infrastructure is gradually improving, as the city’s economy develops and people choose the area as a permanent residence.
主要目標 | Main Target
========================================================================================================
透過數據探索,分析找出房屋售價趨勢。
在售價、EMI、取得面積等條件下定義本效益比,取得最好的投資物件。
Exploration data, analyze and find out the trend of house price.
Define the cost–performance ratio under the conditions of selling price, EMI, acquired area, etc. to obtain the best investment object.
七個分析階段與流程 | Into Seven Analysis Phases and Processes
========================================================================================================
數據檢視。
問題解析。
數據清洗。
數據彙整。
趨勢分析。
可視化圖表。
結論。
Check data set values.
Parse all problems.
Clearn data process.
Data consolidation.
Trends Analysis.
Visualization chart.
Conclusion.
第一階段 : 數據檢視 | Phase One : Check data set values.
特定數據集出處引用 : https://www.kaggle.com/datasets/rushikeshdane20/nashik-apartment-price-prediction
來源取得由Kaggle,檔案為可信任之公開數據集。
數據內容作者 : Rushikesh Dane 20。
數據存放 : MySQL、Kaggle。
編碼位置 : Kaggle、Github。
編碼語言 : R。
IDE : RStudio、VScode。
References to specific datasets : : https://www.kaggle.com/datasets/rushikeshdane20/nashik-apartment-price-prediction
The source is obtained from Kaggle, the file is a trusted public dataset。
Author : Rushikesh Dane 20。
Data storage : MySQL、Kaggle。
Coding position : Kaggle、Github。
Coding language : R。
- 載入R包與數據集。
- Import R packages and data set.
# Import packages
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
library(tidyr)
library(ggplot2)
library(car)
## 載入需要的套件:carData
##
## 載入套件:'car'
## 下列物件被遮斷自 'package:dplyr':
##
## recode
## 下列物件被遮斷自 'package:purrr':
##
## some
# import data set
setwd("D:\\Github_version_file_R\\data_set\\marchine_learning_data\\Nashik_apartment_price_prediction")
nashik_house <- read.csv("final_data.csv")
- 檢視原始數據集內容。
- Check the data set values.
# check data set
str(nashik_house)
## 'data.frame': 5496 obs. of 12 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ address : chr "Sheetal Vihar, Bhagwant Nagar, Dr.Homi Bhabha Nagar,Nashik" "Samraat Dream Citi, Samta Nagar, Nashik" "Suryaprakash Apartment,Nashik Road, Nashik" "Adishvar Residency,Nashik Road, Nashik" ...
## $ owners : chr "Mahendra Kotwal" "Jiten Dadarkar" "Pankaj" "Saurav" ...
## $ housetype : chr "Apartment" "Apartment" "Apartment" "Apartment" ...
## $ house_condition: chr "old" "old" "old" "old" ...
## $ BHK : num 3 2 2 2 2 2 2 3 2 2 ...
## $ price : num 75 41 53.4 55 27 ...
## $ per_month_emi : num 39.7 21.7 28.2 29.1 14.3 ...
## $ total_sqft : num 1550 1000 970 1000 853 ...
## $ cordinates : chr "Sheetal Vihar" "Samraat Dream Citi" "Surya Prakash" "Nashik Road, Vadner Dumala, Nashik, Maharashtra, 422401" ...
## $ latitude : num 20 20 20 19.9 20 ...
## $ longitude : num 73.8 73.8 73.8 73.8 73.8 ...
dim(nashik_house)
## [1] 5496 12
colnames(nashik_house)
## [1] "X" "address" "owners" "housetype"
## [5] "house_condition" "BHK" "price" "per_month_emi"
## [9] "total_sqft" "cordinates" "latitude" "longitude"
# check fixed values
table(nashik_house$housetype)
##
## Apartment Independent house
## 4323 1173
table(nashik_house$house_condition)
##
## new old
## 1846 3650
table(nashik_house$BHK)
##
## 1 2 2.5 3 3.5 4 5 6 7 8 10
## 1559 2487 2 1017 1 205 33 14 8 3 4
any(is.na(nashik_house))
## [1] TRUE
原始數據集包含12cols、5496rows。
公寓類型共有4323筆、獨棟類型共有1173筆資料。
新成屋共有1846筆、舊式(二手)共有3650筆
房間數自1房~10房都有(包含2.5房2筆、3.5房1筆)
資料集內存在NA空值。
Original data set has 12 cols and 5496rows.
Apartment type has 4,323records, independent type has 1,173records.
The new existing house has 1,846records, second-hand house has 3,650records.
Rooms type has 1bhk ~ 10bhk(include 2.5bhk, 3.5bhk).
There are NA values in the date.
第二階段 : 問題解析 | Phase Two : Parse all problems.
1. 確定利益相關人與團隊 | Identify stakeholders and teams
主要利益關係者 : 所有有購屋需求的自住者與投資客。
Primary stakeholders : Cyclistic、Lily Moreno。
2. 數據集內需要解決的問題 | Issues that need resolving
有多筆NA空值情形。
數據內房間數有0.5間的狀況。
售價單位為Lakh,非一般國際貨幣單位。
There are many NA values in the data.
The room type has 0.5, which is not reasonable.
The currenc unit is Lakh, it is not an international currency unit.
第三階段 : 數據清洗 | Phase Three : Clearn data process.
- 調整貨幣單位為國際貨幣INR、USD並另存數據集防止錯誤數據更動原始數據集。
- 1Lakh = 100,000INR,1K = 1,000,1INR = 0.013USD
- Adjust the currency unit to the international currency INR, USD and save the data set to prevent wrong data from changing the original data set.
house_df <- nashik_house
# create col,1 Lakh = 100,000 INR(Rs)
house_df <- house_df %>%
mutate(
price_INR = price * 100000,
month_emi = per_month_emi * 1000
)
# change coltype 1 USD = 0.013INR
house_df <- house_df %>%
mutate(
price_USD = price_INR * 0.013,
price_INR_sqft = price_INR / total_sqft
)
- 調整房間數型態以利後續整併。
- Adjust the room type to facilitate subsequent consolidation.
# change BHK type to character
house_df$BHK <- as.character(house_df$BHK)
- 移除錯誤房數2.5、3.5。
- Remove the wrong information about room type 2.5、3.5.
# remove incorrect number of rooms
house_df <- house_df[!(house_df$BHK == 2.5 | house_df$BHK == 3.5), ]
- 各房售價為出售者定義,房價不以近似值填補,以免與真實價格產生過大落差,故刪除NA欄位。
- The selling price of each house is defined by the seller, it is not filled with an approximate value, so to avoid a large gap with the real price, so delete the NA column.
# remove null data
house_df <- house_df[complete.cases(house_df), ]
- 移除不需要之欄位。
- Remove unused cols.
# remove col
house_df <- house_df %>%
select(-c("X", "latitude", "longitude"))
- 確認清洗後的數據資料
- Confirm data after cleaning.
# check NA
any(is.na(house_df))
## [1] FALSE
# check data set
dim(house_df)
## [1] 3871 13
# check new fixed values
table(house_df$housetype)
##
## Apartment Independent house
## 2814 1057
table(house_df$house_condition)
##
## new old
## 1769 2102
table(house_df$BHK)
##
## 1 10 2 3 4 5 6 7 8
## 919 4 1898 830 167 30 14 7 2
# check values range
range(house_df$total_sqft)
## [1] 150 40000
range(house_df$price)
## [1] 1 700
range(house_df$price_INR)
## [1] 1e+05 7e+07
range(house_df$per_month_emi)
## [1] 1.05 529.00
range(house_df$month_emi)
## [1] 1050 529000
數據內容初步探索 | Explore data values
調整後數據集大小為13cols與3871rows。
售價範圍自10萬INR至7,000萬INR,呈現極大落差。
面積(平方英尺)、每月EMI也與售價一同表現出很大的落差級距。
After cleaning, the data set changed the size to 13 cols and 3871 rows.
The selling price range is 100,000INR to 70,000,000INR, showing a huge gap.
Square feet、EMI showing huge gap as the same as house price.
第四階段 : 數據彙整 | Phase Four : Data consolidation.
- 彙整總數、平均數、標準差與建立各個數值百分比。
- Aggregate total、average、standard deviation, create the percentage of each value.
house_df_v2 <- house_df %>%
group_by(BHK) %>%
summarise(
total = n(),
avg_price_INR = mean(price_INR),
sd_price_INR = sd(price_INR),
avg_sqft = round(mean(total_sqft), digits = 2),
avg_emi_INR = round(mean(month_emi), digits = 2)
) %>%
mutate(
price_percentage = paste(round(avg_price_INR / sum(avg_price_INR) * 100, digits = 2), "%"),
total_percentage = paste(round(total / sum(total) * 100, digits = 2), "%"),
sqft_percentage = paste(round((avg_sqft / sum(avg_sqft)) * 100, digits = 2), "%"),
emi_percentage = paste(round((avg_emi_INR / sum(avg_emi_INR)) * 100, digits = 2), "%")
)
- 移除百分比與數據間的空格格式。
- Remove percentage symbol spaces
# remove percentage symbol spaces
house_df_v2$price_percentage <- gsub(" ", "", house_df_v2$price_percentage)
house_df_v2$total_percentage <- gsub(" ", "", house_df_v2$total_percentage)
house_df_v2$sqft_percentage <- gsub(" ", "", house_df_v2$sqft_percentage)
house_df_v2$emi_percentage <- gsub(" ", "", house_df_v2$emi_percentage)
- 調整房數排序以利後續圖表繪製。
- Adjust the rooms type to facilitate subsequent chart plot.
# reorder rooms type
house_df_v2$BHK <- factor(house_df_v2$BHK, level = c("1", "2", "3", "4", "5", "6", "7", "8", "10"))
- 彙整房屋型態,加總型態總值後計算各型態百分比。
- Aggregate the house type, summary total number、percentage.
# house type summary
type_price <- house_df %>%
group_by(housetype) %>%
summarise(
total = n(),
price_INR_sum = sum(price_INR)
) %>%
mutate(
price_percentage = paste(round(price_INR_sum / sum(price_INR_sum) * 100, digits = 2), "%"),
total_percentage = paste(round(total / sum(total) * 100, digits = 2), "%")
)
# remove percentage symbol spaces
type_price$price_percentage <- gsub(" ", "", type_price$price_percentage)
type_price$total_percentage <- gsub(" ", "", type_price$total_percentage)
- 彙整新舊型態,加總總值後計算各新舊百分比
- Aggregate new、old type, summary total number、percentage.
# old & new condition summary
condition_df <- house_df %>%
group_by(house_condition) %>%
summarise(
total = n(),
price_INR_sum = sum(price_INR)
) %>%
mutate(
price_percentage = paste(round(price_INR_sum / sum(price_INR_sum) * 100, digits = 2), "%"),
total_percentage = paste(round(total / sum(total) * 100, digits = 2), "%")
)
# remove percentage symbol spaces
condition_df$price_percentage <- gsub(" ", "", condition_df$price_percentage)
condition_df$total_percentage <- gsub(" ", "", condition_df$total_percentage)
房屋每月支付EMI與使用面積評分調整 | Adjust the EMI、area ratio
- 透過國際貨幣基金組織所披露之資料顯示,2021年印度人均所得為2190.901USD,換算後約為168,538INR/年,月均約為14044.83INR,若以人均檢視售價,可以發現房屋價格範圍過高,基於此條件進行EMI與面積評分,找出在限定條件下適合投資或自住之物件以及建立個物件之本效益比。
- According to the information disclosed by the international Monetary Fund, the per capita income of india is 2190.90USD, which is about 168,638INR per year after conversion, and the monthly average is about 14044.83INR per month, if check the selling price by per capita, we can find that the price range of the house is too high. based on this condition, EMI and house area levels are carried out to find out the items suitable for investment or self-occupation under the limited conditions and establish the cost-effectiveness ratio of each item.
支付與面積分級 | EMI and House Area Level
- 以月均收入14044.83INR為基準條件 >> 30%約為4213.449INR,以此演算方式制定下列分級表
- The condition base uses the 14044.83INR per month >> 30% is about 4213.449INR per month, and the following level table is an algorithm like this.
1 ~ 1400INR >> expected (0% ~ 10%)
1401 ~ 5600INR >> affordable (10% ~ 40%)
5601 ~ 11200INR >> Investable (40% ~ 80%)
11201 ~ 100000INR >> costly (遠超出人均範圍 | It is far away from the capita range)
100001 ~ 529000INR >> inflated (遠超出人均範圍 | It is far away from the capita range)
- 面積為平方英尺,換算台坪約為1 : 35.583Sqft,居住空間較常見的格局約為20~45台坪左右,換算約為711 ~ 1600Sqft,以此標準制定以下分級表
- House area is square feet, the area is in square feet, and the converted Taiwan unit is about 1: 35.583Sqft,The common pattern of living space is about 20~45 in Taiwan area unit, which is about 711~1600Sqft, and the following level table is an algorithm like this.
1 ~ 710Sqft >> small
711 ~ 1600Sqft >> median
1601 ~ 2300Sqft >> large
2301 ~ 40000Sqft >> huge
- 建立起各物件的EMI與Sqft階級
- Create EMI and Sqft level in the object
level_df <- house_df
level_df <- level_df %>%
mutate(
price_INR_ratio = price_INR,
sqft_level = total_sqft,
emi_level = month_emi
)
# classification sqft
level_df$sqft_level <- recode( # car package
level_df$sqft_level, "lo:710 = 'small'; 711:1600 = 'median'; 1601:2300 = 'large'; 2301:40000 = 'huge'",
as.factor = TRUE,
levels = c("small", "median", "large", "huge")
)
# classification emi
level_df$emi_level <- recode(
level_df$emi_level,
"lo:1400 = 'expected'; 1401:5600 = 'affordable'; 5601:11200 = 'Investable'; 11201:100000 = 'costly'; 100001:529000 = 'inflated'",
as.factor = TRUE,
levels = c("expected", "affordable", "Investable", "costly", "inflated")
)
- 依照人均找出尚可負擔的物件。
- According to per capita income find an object that people can burden.
suitable_object <- level_df %>%
select(address, owners, housetype, house_condition, BHK, month_emi, price_INR, emi_level, sqft_level) %>%
filter(emi_level == "affordable", sqft_level == "median", price_INR < 1000000) %>%
view()
suitable_object
## address owners
## 764 Dhanlaxmi, Ghatkopar, Nashik Awani Kakkad
## 4594 Gangapur, Nashik SUNIL KALE
## 4595 Konark Nagar, Nashik BHAVESH GURAV
## 5178 lahvit gav patill galli,Nashik Road, Nashik Pankaj Dhumal
## 5179 Prabhat Nagar, Nashik Ravindra
## housetype house_condition BHK month_emi price_INR emi_level
## 764 Apartment old 2 1590 300000 affordable
## 4594 Independent house new 2 2120 400000 affordable
## 4595 Independent house new 2 3100 585000 affordable
## 5178 Independent house old 2 5220 985000 affordable
## 5179 Independent house old 1 4770 900000 affordable
## sqft_level
## 764 median
## 4594 median
## 4595 median
## 5178 median
## 5179 median
第五階段 : 趨勢分析 | Phase Five : Trends Analysis.
綜合以上數據清理、聚合結果針對型態、售價、支付額、面積、新舊程度等五點進行交叉分析
1. 以整體市案件量觀察,現有物件數量最多前三名的房間數為兩房 > 一房 > 三房,佔總數的49.03%、23.74%、21.44%。
2. 一至三房雖然有零星價格偏高的現象,但總體來說相較其他房數範圍波動較小,超過四房型態價格就有明顯波動趨勢,尚未趨於穩定。
3. 若是以房屋型態觀察,釋出最多的類型為公寓,總數佔了整體的72.7%,整體銷售金額僅高於獨棟總金額的17.34%,顯示獨棟類型的房屋在當地價格相對高檔。
4. 新舊比例並沒有明顯的落差產生,新舊占比45.7%、54.3%,新舊程度的總銷售金額佔比為46.38%、53.62%,新舊程度並沒有影響總銷售金額。
5. 七至八房擁有最好的面積佔比,分別為23.6%、26.6%,遠高於十房的11.48%以及五至六房的10.1%、10.2%,若加以每月分期壓力的情況比較,三、四、五、六房佔比皆超過10%以上,其中又以五房的還款壓力為最重,占了19.26%,每月EMI超過50000INR。
6. 排除人均收入的情況,每月支付與面積趨勢在EMI介於20,000INR ~ 30,000INR呈現最高,代表本效益比能夠在此區間能夠以最低的支付壓力取得最高的使用面積。
Based on the above data cleaning and aggregation, cross-analysis is carried out for five point, such as type, price, EMI, area and degree of old and new.
1. Observing the total in the whole data, the most room type is 2 > 1 > 3, the percentage about 49.03%、23.74%、21.44%.
2. The room type 1 ~ 3 some object price has a bit high, but the price range is lower than in another room type, the price of more than four-bedroom type has obvious fluctuation trend, it has not stable.
3. If observing house type, the most type is apartment, which has 72.7%, but the selling price is only higher than the Independent object at 17.34%, this shows that independent homes are relatively expensive locally.
4. this is no obvious gap between the old and new ratio, the old and new type is 45.7%、54.3%, the total selling price is 46.38、53.62%, do not affect the total sales amount.
5. The 7 ~ 8 rooms type has the best area ratio, it is 23.6%、26.6%, which is higher than the 10 rooms ratio is 11.48% and 5 ~ 6 rooms ratio of 10.1%、10.2%, if the monthly installment pressure is compared, the third, fourth, fifth, and sixth rooms all account for more than 10%, and the repayment pressure of the fifth room is the heaviest, accounting for 19.26%, and the monthly EMI exceeds 50000INR.
6. if excluding per capita income, the EMI and house area trend is the best between 20,000INR ~ 30,000INR, can get the highest usable area and the lowest EMI payment.
第六階段 : 可視化圖表 | Phase Six : Visualization chart.
- 不同房數之間的價格範圍。
- The different selling price ranges of the room type.
# Rooms type & Price relation
price_max_min <- ggplot(
data = house_df,
mapping = aes(x = reorder(BHK, price), y = price, fill = BHK)
) +
geom_boxplot() +
scale_y_continuous(
name = "Price Unit : Lakh",
breaks = c(0, max(house_df$price), 100)
) +
scale_x_discrete(name = "Rooms Type") +
labs(
title = "各房數的價格範圍比較",
caption = "1 Lakh = 100,000 INR"
) + # add note
guides(fill = guide_legend(title = "Rooms Type"))
price_max_min
- 各房數在市場上流通的比例大小。
- The proportion of the ratio of houses circulating in the market.
# Rooms type & Total number
price_bhk <- ggplot(
data = house_df_v2,
mapping = aes(x = reorder(BHK, total), y = total, fill = BHK)
) +
geom_col() +
scale_fill_brewer(palette = "Set1") +
geom_text(
aes(label = total_percentage),
vjust = 0.5, # adjust position
hjust = 0.5
) +
labs(
title = "市場上流通的房型數比較",
subtitle = "不同房數總數與所占比例"
) +
xlab("Rooms Type") +
scale_y_continuous(
name = "Total Number",
breaks = c(0, max(house_df_v2$total), 150)
) +
coord_flip() + # graph flip
guides(fill = guide_legend(title = "Rooms Type"))
price_bhk
- 不同房數每月支付EMI比較。
- The EMI payment between the room type.
# EMI & BHK
bhk_emi <- ggplot(
data = house_df_v2,
mapping = aes(x = reorder(BHK, -avg_emi_INR), y = avg_emi_INR, fill = BHK)
) +
geom_col() +
scale_fill_brewer(palette = "Set3") +
geom_text(
aes(label = emi_percentage),
vjust = -0.3
) +
labs(
title = "支付越多是否與房間數成比例?",
subtitle = "房間數與每月支付EMI比較",
caption = "Price Unit : INR"
) +
xlab("Rooms type") +
ylab("AVG EMI") +
guides(fill = guide_legend(titile = "Rooms Type"))
bhk_emi
- 所有物件對於取得面積比較。
- All objects obtaining area.
# SQFT & BHK
bhk_sqft <- ggplot(
data = house_df_v2,
mapping = aes(x = reorder(BHK, -avg_sqft), y = avg_sqft, fill = BHK)
) +
geom_col() +
scale_fill_brewer(palette = "Paired") +
geom_text(
aes(label = sqft_percentage),
vjust = -0.3
) +
labs(
title = "房間數越多是否取得面積越大",
subtitle = "房間數與面積比較",
caption = "Area Unit : Square Feet"
) +
xlab("Rooms Type") +
ylab("AVG Sqft") +
guides(fill = guide_legend(title = "Rooms Type"))
bhk_sqft
- 獨棟與公寓型態間比例。
- The independent and apartment house ratio.
# Apartment & Independent percentage
type_pie <- pie(
type_price$price_INR_sum,
labels = type_price$total_percentage,
col = c("slateblue3", "tan2"),
main = "不同房屋型態所占比例 | Different House Type Percentage"
)
# add notes
legend(
"topright",
legend = c("Apartment", "Independent"),
title = "House Type",
fill = c("slateblue3", "tan2"),
cex = 1.2
)
- 新、舊型態間比例。
- The ratio of the new、old house.
# New & Old house percentage
pie(
condition_df$price_INR_sum,
labels = condition_df$price_percentage,
col = c("#d425ae", "#5bb91d"),
main = "新、房屋所佔比例 | New、Old house percentage"
)
legend(
"topright",
legend = c("New House", "Old House"),
title = "House Condition",
fill = c("#d425ae", "#5bb91d"),
cex = 1.2 # 字符大小
)
- 每月支付金額與取得面積趨勢。
- The trend of the monthly payment EMI and the obtaining area.
# EMI vs SQFT
emi_sqft <- ggplot(
data = house_df_v2,
mapping = aes(x = avg_emi_INR, y = avg_sqft)) +
geom_point(
alpha = 0.6, # transparency透明度
size = 2.0) +
geom_smooth(
method = "loess",
aes(x = avg_emi_INR, y = avg_sqft)) +
labs(
title = "每月EMI越高,所得的面積是否越大?",
subtitle = "平均EMI vs 平均面積",
caption = "EMI Unit = INR") +
xlab("AVG EMI") +
scale_y_continuous(
name = "AVG Square Feet",
breaks = c(0, max(house_df_v2$avg_sqft)))
emi_sqft
## `geom_smooth()` using formula 'y ~ x'
- 面積與支付額分類情形。
- Classification between the obtaining area and EMI payment.
# Sqft and EMI Level distribution
emi_sqft_level <- ggplot(
data = level_df,
mapping = aes(
x = emi_level,
fill = sqft_level
)) +
geom_bar() +
guides(fill = guide_legend(title = "Sqft Level")) +
xlab("EMI Level") +
ylab("Total") +
labs(title = "經分類後的分布情形", subtitle = "EMI & Sqft")
emi_sqft_level
- 面積與房數分類情形。
- Classification between the obtaining area and room type.
# Sqft Level and BHK distribution
sqft_bhk_level <- ggplot(
data = level_df,
mapping = aes(
x = sqft_level,
fill = BHK
)) +
geom_bar() +
guides(fill = guide_legend(title = "Rooms Type")) +
xlab("Sqft Level") +
ylab("Total") +
labs(title = "各房數對應的面積分布", subtitle = "BHK & Sqft Level")
sqft_bhk_level
第七階段 : 結論 | Phase Seven : Conclusion.
分析後結論
印度人口排名為世界前三大,經濟成長也持續起飛,但是過低的人均收入與國內的貧富差距過大,導致了並不是每位民眾都能享有正常的生活水平,房價是其中之一,在此份數據中,房屋價格的落差最低於1,300USD(約39000台幣),最高則是到達了910,000USD(約27,300,000萬台幣),可以預期的是這並不是短時間可以消彌的事情,需要國家政策等各個方面進行調整,若是以剛好介於人均水平的收入,我們可以透過演算建立起的數據,找尋出適當的物件。
The conclusion after analysis
India’s population ranks among the top three in the world, and its economic growth continues to take off. However, the low per capita income and the large gap between the rich and the poor in the country mean that not everyone can enjoy a normal standard of living. Housing prices are one of them. In this data, the gap between house prices is as low as 1,300USD (about 39,000 Taiwan dollars), and the highest is 910,000USD (about 27,300,000 Taiwan dollars). Policy and other aspects are need adjusted. If the income is just between the per capita level, we can find appropriate objects through the data established by calculation.
資料來源 | Source
人均所得 | Per capita income : 國際貨幣基金組織Internation Monetary Fund
人口與經濟概況 | Demographic and Economic Profile : 世界銀行數據WorldBank.org
========================================================================================================
沒有留言:
張貼留言