๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Python

[Python] Pandas _ Apply, Map

๋ฐ˜์‘ํ˜•

โœ… Data Load

    # Data Load & ๋ฐ์ดํ„ฐ ํ–‰, ์—ด์˜ ๊ฐฏ์ˆ˜ ์ถœ๋ ฅ

    df = pd.read_csv(DataUrl)
    Ans = df.shape

    Ans
(10127, 19)

โœ… Mapping

    # Income_Category์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ map ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ณ€๊ฒฝํ•˜์—ฌ newIncome ์ปฌ๋Ÿผ์— ๋งคํ•‘ํ•˜๋ผ

    # Unknown : N
    # Less than $40K : a
    # $40K - $60K : b
    # $60K - $80K : c
    # $80K - $120K : d
    # $120K + : e

    dic = {
        'Unknown' : 'N',
        'Less than $40K' : 'a',
        '$40K - $60K' : 'b',
        '$60K - $80K' : 'c',
        '$80K - $120K' : 'd',
        '$120K +' : 'e'
    }

    df['newIncome'] = df.Income_Category.map(lambda x : dic[x])

    Ans = df['newIncome']
    Ans.head(4)
0    c
1    a
2    d
3    a
Name: newIncome, dtype: object

โœ”๏ธ Lambda ๋ž€ ํ•จ์ˆ˜๋ฅผ ํ•œ ์ค„๋กœ ํ‘œํ˜„ํ•˜๋Š” ํ•จ์ˆ˜ ๊ธฐ๋ฒ•

โœ”๏ธ Map ํ•จ์ˆ˜ ๋ž€ ํ•จ์ˆ˜์™€ Sequenceํ˜• ๋ฐ์ดํ„ฐ๋ฅผ Parameter๋กœ ์ž…๋ ฅ๋ฐ›์•„,

      ๊ฐ element ๋งˆ๋‹ค ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜์—ฌ List ๋กœ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜

 

โœ… Apply

    # Income_Category์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ apply ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ณ€๊ฒฝํ•˜์—ฌ newIncome ์ปฌ๋Ÿผ์— ๋งคํ•‘ํ•˜๋ผ

    # Unknown : N
    # Less than $40K : a
    # $40K - $60K : b
    # $60K - $80K : c
    # $80K - $120K : d
    # $120K + : e


    def changeCategory(x):
        if x == 'unknown':
            return 'N'
 
        elif x == 'Less than $40K':
            return 'a'
 
        elif x == '$40K - $60K':
            return 'b'
 
        elif x == '$60K - $80K':
            return 'c'
 
        elif x == '$80K - $120K':
            return 'd'
 
        elif x == '$120K +':
            return 'e'    

    df['newIncome'] = df.Income_Category.apply(changeCategory)

    Ans = df['newIncome']

    Ans.head(4)
0    c
1    a
2    d
3    a
Name: newIncome, dtype: object

โœ… Mapping - ๋นˆ๋„ ์ˆ˜ ์ถœ๋ ฅ 1

    # Customer_Age์˜ ๊ฐ’์„ ์ด์šฉํ•˜์—ฌ ๋‚˜์ด ๊ตฌ๊ฐ„์„ AgeState ์ปฌ๋Ÿผ์œผ๋กœ ์ •์˜ํ•˜๊ธฐ
    # (0~9 : 0 , 10~19 :10 , 20~29 :20 … ๊ฐ ๊ตฌ๊ฐ„์˜ ๋นˆ๋„์ˆ˜๋ฅผ ์ถœ๋ ฅ)

    df['AgeState'] = df.Customer_Age.map(lambda x : x//10 *10)
    Ans = df['AgeState'].value_counts().sort_index()

    Ans
AgeState
20     195
30    1841
40    4561
50    2998
60     530
70       2
Name: count, dtype: int64

โœ… Mapping - ๋นˆ๋„ ์ˆ˜ ์ถœ๋ ฅ 2

    # Education_Level์˜ ๊ฐ’์ค‘ Graduate๋‹จ์–ด๊ฐ€ ํฌํ•จ๋˜๋Š” ๊ฐ’์€ 1 ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋Š” 0์œผ๋กœ ๋ณ€๊ฒฝํ•˜์—ฌ
    # newEduLevel ์ปฌ๋Ÿผ์„ ์ •์˜ํ•˜๊ณ  ๋นˆ๋„์ˆ˜๋ฅผ ์ถœ๋ ฅ

    df['newEduLevel'] = df.Education_Level.map(lambda x : 1 if 'Graduate' in x else 0)
    Ans = df['newEduLevel'].value_counts

    Ans
<bound method IndexOpsMixin.value_counts of 0        0
1        1
2        1
3        0
4        0
        ..
10122    1
10123    0
10124    0
10125    1
10126    1
Name: newEduLevel, Length: 10127, dtype: int64>

โœ… Mapping - ๋นˆ๋„ ์ˆ˜ ์ถœ๋ ฅ 3

    # Credit_Limit ์ปฌ๋Ÿผ๊ฐ’์ด 4500 ์ด์ƒ์ธ ๊ฒฝ์šฐ 1 ๊ทธ์™ธ์˜ ๊ฒฝ์šฐ์—๋Š” ๋ชจ๋‘ 0์œผ๋กœ ํ•˜๋Š” newLimit ์ •์˜ํ•˜๊ธฐ
    # newLimit ๊ฐ ๊ฐ’๋“ค์˜ ๋นˆ๋„์ˆ˜๋ฅผ ์ถœ๋ ฅ

    df['newLimit'] = df.Credit_Limit.map(lambda x : 1 if x >= 4500 else 0)
    Ans = df['newLimit'].value_counts

    Ans
<bound method IndexOpsMixin.value_counts of 0        1
1        1
2        0
3        0
4        1
        ..
10122    0
10123    0
10124    1
10125    1
10126    1
Name: newLimit, Length: 10127, dtype: int64>

โœ… Apply - ๋นˆ๋„ ์ˆ˜ ์ถœ๋ ฅ 1

    # Marital_Status ์ปฌ๋Ÿผ๊ฐ’์ด Married ์ด๊ณ  Card_Category ์ปฌ๋Ÿผ์˜ ๊ฐ’์ด Platinum์ธ ๊ฒฝ์šฐ 1 ๊ทธ์™ธ์˜ ๊ฒฝ์šฐ์—๋Š”
    # ๋ชจ๋‘ 0์œผ๋กœ ํ•˜๋Š” newState์ปฌ๋Ÿผ์„ ์ •์˜ํ•˜๊ธฐ, newState์˜ ๊ฐ ๊ฐ’๋“ค์˜ ๋นˆ๋„์ˆ˜๋ฅผ ์ถœ๋ ฅ

    def check(x):
        if x.Marital_Status == "Married" and x.Card_Category == "Platinum":
            return 1
 
        else:
            return 0

    df['newState'] = df.apply(check, axis=1)

    Ans = df['newState'].value_counts()

    Ans
newState
0    10120
1        7
Name: count, dtype: int64

โœ… Apply - ๋นˆ๋„ ์ˆ˜ ์ถœ๋ ฅ 2

    # Gender ์ปฌ๋Ÿผ๊ฐ’ M์ธ ๊ฒฝ์šฐ male, F์ธ ๊ฒฝ์šฐ female๋กœ ๊ฐ’์„ ๋ณ€๊ฒฝํ•˜์—ฌ Gender ์ปฌ๋Ÿผ์— ์ƒˆ๋กญ๊ฒŒ ์ •์˜ํ•˜๊ธฐ
    # ๊ฐ value์˜ ๋นˆ๋„๋ฅผ ์ถœ๋ ฅ

    def ChangeGender(x):
        if x == 'M' :
            return 'male'
       
        else :
            return 'female'

    df['Gender'] = df.Gender.apply(ChangeGender)

    Ans = df['Gender'].value_counts()

    Ans
Gender
female    5358
male      4769
Name: count, dtype: int64
๋ฐ˜์‘ํ˜•