๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Python

[Python] Pandas _ Time Series

๋ฐ˜์‘ํ˜•

โœ… Data Load

    # Data Load / Data type ํŒŒ์•…

    df = pd.read_csv(DataUrl)

    Ans = df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6574 entries, 0 to 6573
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Yr_Mo_Dy  6574 non-null   object 
 1   RPT       6568 non-null   float64
 2   VAL       6571 non-null   float64
 3   ROS       6572 non-null   float64
 4   KIL       6569 non-null   float64
 5   SHA       6572 non-null   float64
 6   BIR       6574 non-null   float64
 7   DUB       6571 non-null   float64
 8   CLA       6572 non-null   float64
 9   MUL       6571 non-null   float64
 10  CLO       6573 non-null   float64
 11  BEL       6574 non-null   float64
 12  MAL       6570 non-null   float64
dtypes: float64(12), object(1)
memory usage: 667.8+ KB

โœ… Datetime type ๋ณ€๊ฒฝ

    # Yr_Mo_Dy์„ ํŒ๋‹ค์Šค์—์„œ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋Š” datetime64ํƒ€์ž…์œผ๋กœ ๋ณ€๊ฒฝ

    df.Yr_Mo_Dy = pd.to_datetime(df.Yr_Mo_Dy)
    Ans = df.Yr_Mo_Dy

    Ans.head(4)
0   2061-01-01
1   2061-01-02
2   2061-01-03
3   2061-01-04
Name: Yr_Mo_Dy, dtype: datetime64[ns]

โœ… year ์˜ unique ๊ฐ’ ์ถœ๋ ฅ

    # Yr_Mo_Dy์— ์กด์žฌํ•˜๋Š” ๋…„๋„์˜ ์œ ์ผ๊ฐ’์„ ๋ชจ๋‘ ์ถœ๋ ฅ

    Ans = df.Yr_Mo_Dy.dt.year.unique()
    Ans
array([2061, 2062, 2063, 2064, 2065, 2066, 2067, 2068, 2069, 2070, 1971,
       1972, 1973, 1974, 1975, 1976, 1977, 1978])

โœ… year ์˜ unique ๊ฐ’ ์ถœ๋ ฅ

    # Yr_Mo_Dy์— ๋…„๋„๊ฐ€ 2061๋…„ ์ด์ƒ์˜ ๊ฒฝ์šฐ์—๋Š” ๋ชจ๋‘ ์ž˜๋ชป๋œ ๋ฐ์ดํ„ฐ
    # ํ•ด๋‹น๊ฒฝ์šฐ์˜ ๊ฐ’์€ 100์„ ๋นผ์„œ ์ƒˆ๋กญ๊ฒŒ ๋‚ ์งœ๋ฅผ Yr_Mo_Dy ์ปฌ๋Ÿผ์— ์ •์˜

    import datetime

    def fix_century(x):    
 
        year = x.year - 100 if x.year >= 2061 else x.year
 
        return pd.to_datetime(datetime.date(year, x.month, x.day))

    df['Yr_Mo_Dy'] = df['Yr_Mo_Dy'].apply(fix_century)

    Ans = df.tail(4)

    Ans

 

โœ… ๋…„๋„๋ณ„ ๊ฐ ์ปฌ๋Ÿผ์˜ ํ‰๊ท ๊ฐ’

    Ans = df.groupby(df.Yr_Mo_Dy.dt.year).mean()

    Ans.head(4)

 

โœ… ์š”์ผ๋ณ„ Mapping

    # weekday์ปฌ๋Ÿผ์„ ๋งŒ๋“ค๊ณ  ์š”์ผ๋ณ„๋กœ ๋งคํ•‘ํ•˜๋ผ (์›”์š”์ผ: 0 ~ ์ผ์š”์ผ :6)

    df['weekday'] = df.Yr_Mo_Dy.dt.weekday

    Ans = df['weekday'].to_frame()

    Ans

 

โœ… ๋…„๋„๋ณ„ ๊ฐ ์ปฌ๋Ÿผ์˜ ํ‰๊ท ๊ฐ’

    # ๋…„๋„, ์ผ์ž ์ƒ๊ด€์—†์ด ๋ชจ๋“  ์ปฌ๋Ÿผ์˜ ๊ฐ ๋‹ฌ์˜ ํ‰๊ท  ๊ตฌํ•˜๊ธฐ

    Ans = df.groupby(df.Yr_Mo_Dy.dt.month).mean()

    Ans.head(4)

 

โœ… ๊ฒฐ์ธก์น˜ ๊ฐ’ ๋Œ€์ฒดํ•˜๊ธฐ

    # ๋ชจ๋“  ๊ฒฐ์ธก์น˜๋Š” ์ปฌ๋Ÿผ๊ธฐ์ค€ ์ง์ „์˜ ๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•˜๊ณ  ์ฒซ ๋ฒˆ์งธ ํ–‰์— ๊ฒฐ์ธก์น˜๊ฐ€ ์žˆ์„๊ฒฝ์šฐ ๋’ค์— ์žˆ๋Š” ๊ฐ’์œผ๋กœ ๋Œ€์ฒด

    df = df.fillna(method='ffill').fillna(method='bfill')

    df.isnull().sum()
Yr_Mo_Dy    0
RPT         0
VAL         0
ROS         0
KIL         0
SHA         0
BIR         0
DUB         0
CLA         0
MUL         0
CLO         0
BEL         0
MAL         0
weekday     0
dtype: int64

โœ… ์ปฌ๋Ÿผ์˜ ํ‰๊ท ๊ฐ’ ๊ตฌํ•˜๊ธฐ

    # ๋…„๋„ - ์›”์„ ๊ธฐ์ค€์œผ๋กœ ๋ชจ๋“  ์ปฌ๋Ÿผ์˜ ํ‰๊ท ๊ฐ’ ๊ตฌํ•˜๊ธฐ

    Ans = df.groupby(df.Yr_Mo_Dy.dt.to_period('M')).mean()

    Ans.head(3)

 

โœ… ์ปฌ๋Ÿผ๊ฐ’ ์ฐจ๋ถ„ํ•˜๊ธฐ

    # RPT ์ปฌ๋Ÿผ์˜ ๊ฐ’์„ ์ผ์ž๋ณ„ ๊ธฐ์ค€์œผ๋กœ 1์ฐจ์ฐจ๋ถ„ํ•˜๊ธฐ

    Ans = df['RPT'].diff()

    Ans
0        NaN
1      -0.33
2       3.79
3      -7.92
4       2.75
        ... 
6569    3.75
6570   -4.37
6571    0.79
6572    4.50
6573    1.83
Name: RPT, Length: 6574, dtype: float64

โœ… ์ด๋™ํ‰๊ท  ๊ฐ’ ๊ตฌํ•˜๊ธฐ

    # RPT์™€ VAL์˜ ์ปฌ๋Ÿผ์„ ์ผ์ฃผ์ผ ๊ฐ„๊ฒฉ์œผ๋กœ ๊ฐ๊ฐ ์ด๋™ํ‰๊ท ํ•œ ๊ฐ’ ๊ตฌํ•˜๊ธฐ

    Ans = df[['RPT','VAL']].rolling(7).mean()

    Ans.head(9)

 

โœ… ๋ฏธ์„ธ๋จผ์ง€ Data Load

DataUrl = 'https://raw.githubusercontent.com/Datamanim/pandas/main/seoul_pm.csv'

df = pd.read_csv(DataUrl)

โœ… datetime ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜

    # ๋…„-์›”-์ผ:์‹œ ์ปฌ๋Ÿผ์„ pandas์—์„œ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋Š” datetime ํ˜•ํƒœ๋กœ ๋ณ€๊ฒฝํ•˜๊ธฐ
    # ์„œ์šธ์‹œ์˜ ์ œ๊ณต๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ 0์‹œ๊ฐ€ 24์‹œ๋กœ ํ‘œํ˜„

    def Change_date(x):
 
        import datetime
 
        hour = x.split(':')[1]
        date = x.split(":")[0]

        if hour == '24':
            hour = '00:00:00'

            FinalDate = pd.to_datetime(date+" "+hour)+datetime.timedelta(days=1)
       
        else:
            hour = hour +':00:00'
 
            FinalDate = pd.to_datetime(date+" "+hour)
       
        return FinalDate

    df['(๋…„-์›”-์ผ:์‹œ)'] = df['(๋…„-์›”-์ผ:์‹œ)'].apply(Change_date)

    Ans = df

    Ans.head(3)

 

โœ… ์ผ์ž๋ณ„ ์˜์–ด์š”์ผ ์ด๋ฆ„ ์ €์žฅ

    # ์ผ์ž๋ณ„ ์˜์–ด์š”์ผ ์ด๋ฆ„์„ dayName ์ปฌ๋Ÿผ์— ์ €์žฅํ•˜๊ธฐ

    df['dayName'] = df['(๋…„-์›”-์ผ:์‹œ)'].dt.day_name()

    Ans = df['dayName']

    Ans.head(5)
0    Saturday
1    Saturday
2    Saturday
3    Saturday
4    Saturday
Name: dayName, dtype: object

โœ… ์ผ์ž๋ณ„ ๊ฐ PM10 ๋“ฑ๊ธ‰์˜ ๋นˆ๋„์ˆ˜ ํŒŒ์•…

    # ์ผ์ž๋ณ„ ๊ฐ PM10 ๋“ฑ๊ธ‰์˜ ๋นˆ๋„์ˆ˜ ํŒŒ์•…ํ•˜๊ธฐ

    Ans1 = df.groupby(['dayName','PM10๋“ฑ๊ธ‰'],as_index=False).size()

    Ans1.head()

 

 

    Ans2 = Ans1.pivot(index='dayName',columns='PM10๋“ฑ๊ธ‰',values='size').fillna(0)

    Ans2

 

โœ… ์‹œ๊ฐ„์˜ ์—ฐ์†, ๊ฒฐ์ธก์น˜ ํ™•์ธ

    # ์‹œ๊ฐ„์ด ์—ฐ์†์ ์œผ๋กœ ์กด์žฌํ•˜๋ฉฐ ๊ฒฐ์ธก์น˜๊ฐ€ ์—†๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ

    check = len(df['(๋…„-์›”-์ผ:์‹œ)'].diff().unique())
    # ์‹œ๊ฐ„์„ ์ฐจ๋ถ„ํ•˜๋ฉด ์ฒซ ๊ฐ’์€ nan, ์ดํ›„ ๋ชจ๋“  ์ฐจ๋ถ„๊ฐ’์ด ๋™์ผํ•˜๋ฉด ์—ฐ์†์ด๋ผ ํŒ๋‹จํ•จ

    if check ==2:
        Ans = True

    else:
        Ans = False

    Ans
True

โœ… ํ‰๊ท ๊ฐ’ ๊ตฌํ•˜๊ธฐ

    # ์˜ค์ „ 10์‹œ์™€ ์˜คํ›„ 10์‹œ(22์‹œ)์˜ PM10์˜ ํ‰๊ท ๊ฐ’์„ ๊ฐ๊ฐ ๊ตฌํ•˜๊ธฐ

    Ans = df.groupby(df['(๋…„-์›”-์ผ:์‹œ)'].dt.hour)['PM10'].mean().iloc[[10,22]].to_frame()

    Ans

 

โœ… ๋‚ ์งœ ์ปฌ๋Ÿผ์„ Index ๋กœ ๋งŒ๋“ค๊ธฐ

    # ๋‚ ์งœ ์ปฌ๋Ÿผ์„ index๋กœ ๋งŒ๋“ค๊ธฐ

    df.set_index('(๋…„-์›”-์ผ:์‹œ)',inplace=True,drop=True)

    Ans = df

    Ans.head(3)

 

๋ฐ˜์‘ํ˜•

'Python' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Python]Pandas _ Merge, Concat  (0) 2023.08.29
[Python] Pandas _ Pivot  (0) 2023.08.24
[Python] Pandas _ Apply, Map  (0) 2023.08.24
[Python] Pandas data ์ฒ˜๋ฆฌ  (0) 2023.05.01
[Python] Pandas data: auto-mpg data ์‹œ๊ฐํ™”  (0) 2023.04.30