๋ฐ์ํ
โ Data Load
# Data Load & ๋ฐ์ดํฐ ํ, ์ด์ ๊ฐฏ์ ์ถ๋ ฅ
df = pd.read_csv(DataUrl)
Ans = df.shape
Ans
(10127, 19)
โ Mapping
# Income_Category์ ์นดํ
๊ณ ๋ฆฌ๋ฅผ map ํจ์๋ฅผ ์ด์ฉํ์ฌ ๋ค์๊ณผ ๊ฐ์ด ๋ณ๊ฒฝํ์ฌ newIncome ์ปฌ๋ผ์ ๋งคํํ๋ผ
# Unknown : N
# Less than $40K : a
# $40K - $60K : b
# $60K - $80K : c
# $80K - $120K : d
# $120K + : e
dic = {
'Unknown' : 'N',
'Less than $40K' : 'a',
'$40K - $60K' : 'b',
'$60K - $80K' : 'c',
'$80K - $120K' : 'd',
'$120K +' : 'e'
}
df['newIncome'] = df.Income_Category.map(lambda x : dic[x])
Ans = df['newIncome']
Ans.head(4)
0 c
1 a
2 d
3 a
Name: newIncome, dtype: object
โ๏ธ Lambda ๋ ํจ์๋ฅผ ํ ์ค๋ก ํํํ๋ ํจ์ ๊ธฐ๋ฒ
โ๏ธ Map ํจ์ ๋ ํจ์์ Sequenceํ ๋ฐ์ดํฐ๋ฅผ Parameter๋ก ์ ๋ ฅ๋ฐ์,
๊ฐ element ๋ง๋ค ํจ์๋ฅผ ์ ์ฉํ์ฌ List ๋ก ๋ฐํํ๋ ํจ์
โ Apply
# Income_Category์ ์นดํ
๊ณ ๋ฆฌ๋ฅผ apply ํจ์๋ฅผ ์ด์ฉํ์ฌ ๋ค์๊ณผ ๊ฐ์ด ๋ณ๊ฒฝํ์ฌ newIncome ์ปฌ๋ผ์ ๋งคํํ๋ผ
# Unknown : N
# Less than $40K : a
# $40K - $60K : b
# $60K - $80K : c
# $80K - $120K : d
# $120K + : e
def changeCategory(x):
if x == 'unknown':
return 'N'
elif x == 'Less than $40K':
return 'a'
elif x == '$40K - $60K':
return 'b'
elif x == '$60K - $80K':
return 'c'
elif x == '$80K - $120K':
return 'd'
elif x == '$120K +':
return 'e'
df['newIncome'] = df.Income_Category.apply(changeCategory)
Ans = df['newIncome']
Ans.head(4)
0 c
1 a
2 d
3 a
Name: newIncome, dtype: object
โ Mapping - ๋น๋ ์ ์ถ๋ ฅ 1
# Customer_Age์ ๊ฐ์ ์ด์ฉํ์ฌ ๋์ด ๊ตฌ๊ฐ์ AgeState ์ปฌ๋ผ์ผ๋ก ์ ์ํ๊ธฐ
# (0~9 : 0 , 10~19 :10 , 20~29 :20 … ๊ฐ ๊ตฌ๊ฐ์ ๋น๋์๋ฅผ ์ถ๋ ฅ)
df['AgeState'] = df.Customer_Age.map(lambda x : x//10 *10)
Ans = df['AgeState'].value_counts().sort_index()
Ans
AgeState
20 195
30 1841
40 4561
50 2998
60 530
70 2
Name: count, dtype: int64
โ Mapping - ๋น๋ ์ ์ถ๋ ฅ 2
# Education_Level์ ๊ฐ์ค Graduate๋จ์ด๊ฐ ํฌํจ๋๋ ๊ฐ์ 1 ๊ทธ๋ ์ง ์์ ๊ฒฝ์ฐ์๋ 0์ผ๋ก ๋ณ๊ฒฝํ์ฌ
# newEduLevel ์ปฌ๋ผ์ ์ ์ํ๊ณ ๋น๋์๋ฅผ ์ถ๋ ฅ
df['newEduLevel'] = df.Education_Level.map(lambda x : 1 if 'Graduate' in x else 0)
Ans = df['newEduLevel'].value_counts
Ans
<bound method IndexOpsMixin.value_counts of 0 0
1 1
2 1
3 0
4 0
..
10122 1
10123 0
10124 0
10125 1
10126 1
Name: newEduLevel, Length: 10127, dtype: int64>
โ Mapping - ๋น๋ ์ ์ถ๋ ฅ 3
# Credit_Limit ์ปฌ๋ผ๊ฐ์ด 4500 ์ด์์ธ ๊ฒฝ์ฐ 1 ๊ทธ์ธ์ ๊ฒฝ์ฐ์๋ ๋ชจ๋ 0์ผ๋ก ํ๋ newLimit ์ ์ํ๊ธฐ
# newLimit ๊ฐ ๊ฐ๋ค์ ๋น๋์๋ฅผ ์ถ๋ ฅ
df['newLimit'] = df.Credit_Limit.map(lambda x : 1 if x >= 4500 else 0)
Ans = df['newLimit'].value_counts
Ans
<bound method IndexOpsMixin.value_counts of 0 1
1 1
2 0
3 0
4 1
..
10122 0
10123 0
10124 1
10125 1
10126 1
Name: newLimit, Length: 10127, dtype: int64>
โ Apply - ๋น๋ ์ ์ถ๋ ฅ 1
# Marital_Status ์ปฌ๋ผ๊ฐ์ด Married ์ด๊ณ Card_Category ์ปฌ๋ผ์ ๊ฐ์ด Platinum์ธ ๊ฒฝ์ฐ 1 ๊ทธ์ธ์ ๊ฒฝ์ฐ์๋
# ๋ชจ๋ 0์ผ๋ก ํ๋ newState์ปฌ๋ผ์ ์ ์ํ๊ธฐ, newState์ ๊ฐ ๊ฐ๋ค์ ๋น๋์๋ฅผ ์ถ๋ ฅ
def check(x):
if x.Marital_Status == "Married" and x.Card_Category == "Platinum":
return 1
else:
return 0
df['newState'] = df.apply(check, axis=1)
Ans = df['newState'].value_counts()
Ans
newState
0 10120
1 7
Name: count, dtype: int64
โ Apply - ๋น๋ ์ ์ถ๋ ฅ 2
# Gender ์ปฌ๋ผ๊ฐ M์ธ ๊ฒฝ์ฐ male, F์ธ ๊ฒฝ์ฐ female๋ก ๊ฐ์ ๋ณ๊ฒฝํ์ฌ Gender ์ปฌ๋ผ์ ์๋กญ๊ฒ ์ ์ํ๊ธฐ
# ๊ฐ value์ ๋น๋๋ฅผ ์ถ๋ ฅ
def ChangeGender(x):
if x == 'M' :
return 'male'
else :
return 'female'
df['Gender'] = df.Gender.apply(ChangeGender)
Ans = df['Gender'].value_counts()
Ans
Gender
female 5358
male 4769
Name: count, dtype: int64
๋ฐ์ํ
'Python' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[Python] Pandas _ Pivot (0) | 2023.08.24 |
---|---|
[Python] Pandas _ Time Series (0) | 2023.08.24 |
[Python] Pandas data ์ฒ๋ฆฌ (0) | 2023.05.01 |
[Python] Pandas data: auto-mpg data ์๊ฐํ (0) | 2023.04.30 |
[Python] matplotlib, seaborn ๋ง๋๊ทธ๋ํ ๊ทธ๋ฆฌ๊ธฐ / ๊พธ๋ฏธ๊ธฐ (0) | 2023.04.17 |