๐Ÿ” ๋ฐ์ดํ„ฐ ๋ถ„์„/04. Data Analysis

[ํ†ต๊ณ„์  ๋ชจ๋ธ๋ง] ์‹œ๊ณ„์—ด ๋ถ„์„ - ์ •์ƒ์„ฑ(stationary)๊ณผ ์ฐจ๋ถ„

xod22 2022. 3. 19. 00:02
728x90

2022.03.18 - [๋ฐ์ดํ„ฐ ๋ถ„์„/04. Data Analysis] - [ํ†ต๊ณ„์  ๋ชจ๋ธ๋ง] ์‹œ๊ณ„์—ด ๋ถ„์„

 

[ํ†ต๊ณ„์  ๋ชจ๋ธ๋ง] ์‹œ๊ณ„์—ด ๋ถ„์„

ํ•ญ์ƒ ์‹œ๊ณ„์—ด ๋ถ„์„์€ ์–ด๋ ต๊ณ  ๋ณต์žกํ•˜๋‹ค๋Š” ์ƒ๊ฐ์— ์ฝ”๋“œ๋ฅผ ํ•˜๋‚˜ํ•˜๋‚˜ ์ดํ•ดํ•˜๋ฉด์„œ ์ž‘์„ฑํ•˜๊ธฐ ํž˜๋“ค์—ˆ๋Š”๋ฐ ํ•˜๋‚˜ํ•˜๋‚˜ ์ฐพ์•„๋ณด๋ฉด์„œ ๊ณต๋ถ€๋ฅผ ํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ž€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋Š” ์ผ์ •ํ•œ ์‹œ๊ฐ„

xod22.tistory.com

์ €๋ฒˆ ๊ธ€์— ์ด์–ด์„œ ์ •์ƒ์„ฑ๊ณผ ์ฐจ๋ถ„์— ๋Œ€ํ•ด์„œ ๊ณต๋ถ€ํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค!

 

 

์ •์ƒ์„ฑ(stationary)๊ณผ ๋น„์ •์ƒ์„ฑ(non-stationary)

 

: ์ถ”์„ธ๋‚˜ ๊ณ„์ ˆ์„ฑ์ด ์žˆ๋Š” ์‹œ๊ณ„์—ด์€ ์ •์ƒ์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ์‹œ๊ณ„์—ด์ด ์•„๋‹ˆ๋‹ค. ์ถ”์„ธ์™€ ๊ณ„์ ˆ์„ฑ์€ ์„œ๋กœ ๋‹ค๋ฅธ ์‹œ๊ฐ„์— ์‹œ๊ณ„์—ด์˜ ๊ฐ’์— ์˜ํ–ฅ์„ ์ค„ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ! 

 

1. ํŒจํ‚ค์ง€ ์ž„ํฌํŠธ

from statsmodels.tsa.stattools import adfuller, kpss
import pandas as pd

 

 

2. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

a10.csv
0.00MB

df=pd.read_csv('a10.csv', parse_dates=['date'], index_col='date')

 

 

3. ์Šคํ…Œ์ด์…”๋„ˆ๋ฆฌ๋ฅผ ํ…Œ์ŠคํŠธ

 

์Šคํ…Œ์ด์…”๋„ˆ๋ฆฌ๋ฅผ ํ…Œ์ŠคํŠธํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ

ADF ํ…Œ์ŠคํŠธ, KPSS ํ…Œ์ŠคํŠธ๋ฅผ ์‹ค์Šตํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹น

 

-ADF ํ…Œ์ŠคํŠธ

#H0(๊ท€๋ฌด๊ฐ€์„ค) : stationary x
#H1(๋Œ€๋ฆฝ๊ฐ€์„ค) : stationary o

result=adfuller(df.value.values)

print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')

p-value๊ฐ€ 0.05๋ณด๋‹ค ํฌ๋ฏ€๋กœ ๊ท€๋ฌด๊ฐ€์„ค์„ ๊ธฐ๊ฐํ•  ์ˆ˜ ์—†๋‹ค. ๋น„์ •์ƒ์„ฑ์„ ๋„๋Š” ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์ด๋‹ค.

 

-KPSS ํ…Œ์ŠคํŠธ

#H0(๊ท€๋ฌด๊ฐ€์„ค) : stationary o
#H1(๋Œ€๋ฆฝ๊ฐ€์„ค) : stationary x

result=kpss(df['value'], regression='c')

print(f'KPSS Statistic: {result[0]}')
print(f'p-value: {result[1]}')

p-value๊ฐ€ 0.05๋ณด๋‹ค ์ž‘์œผ๋ฏ€๋กœ ๊ท€๋ฌด๊ฐ€์„ค์„ ๊ธฐ๊ฐํ•œ๋‹ค. ์ฆ‰ ๋น„์ •์ƒ์„ฑ์„ ๋„๋Š” ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์ด๋‹ค.

 


์ฐจ๋ถ„ : ๋น„์ •์ƒ์„ฑ->์ •์ƒ์„ฑ

 

*์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์Šคํ…Œ์ด์…”๋„ˆ๋ฆฌ๋กœ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•

-> ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ์ฐจ์ด๋ฅผ ์ž‘์„ฑ

 

์ฆ‰ Y_t๊ฐ€ t์‹œ๊ฐ„์—์„œ ๊ฐ’์ด๋ผ๋ฉด ์ฒซ ๋ฒˆ์งธ ์ฐจ์ด๋Š” Y=Yt-Yt-1. ๊ฐ„๋‹จํ•˜๊ฒŒ ๋งํ•˜๋ฉด ํ˜„์žฌ๊ฐ’์—์„œ ์ด์ „ ๊ฐ’์„ ๋นผ๋Š” ๊ฒƒ์ด๋‹ค!

์ฒซ๋ฒˆ์งธ ์ฐจ์ด๋ฅผ ์ž‘์„ฑํ•ด์„œ ์Šคํ…Œ์ด์…”๋„ˆ๋ฆฌ ๋ฐ์ดํ„ฐ๋กœ ๋งŒ๋“ค ์ˆ˜ ์—†๋‹ค๋ฉด,

๋‹ค์‹œ ๋ฐ˜๋ณตํ•ด์„œ ๋‘๋ฒˆ์งธ ์ฐจ์ด๋ฅผ ์ž‘์„ฑํ•œ๋‹ค. => ์Šคํ…Œ์ด์…”๋„ˆ๋ฆฌ๊ฐ€ ๋ ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณต..!!

 

 

1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

df=pd.read_csv('a10.csv', parse_dates=['date'], index_col='date')

 

 

2. ์ฐจ๋ถ„

diff=df.diff()

#๊ฒฐ์ธก์น˜๋ฅผ ๋นผ๋†“๊ณ  ์ง„ํ–‰ํ•ด์•ผํ•จ
diff=diff.dropna()

- diffํ•จ์ˆ˜ : ํ˜„์žฌ๊ฐ’-์ด์ „๊ฐ’ -> axis=0

 

 

3. ์ •์ƒ์‹œ๊ณ„์—ด์ด ๋˜์—ˆ๋Š”์ง€ ํ™•์ธ

result=adfuller(diff.value.values)
print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')

- H0(๊ท€๋ฌด๊ฐ€์„ค) : stationary x
- H1(๋Œ€๋ฆฝ๊ฐ€์„ค) : stationary o


p-value๊ฐ€ 0.05๋ณด๋‹ค ํฌ๊ธฐ๋•Œ๋ฌธ์— ๊ท€๋ฌด๊ฐ€์„ค์„ ๊ธฐ๊ฐํ•˜์ง€ ๋ชปํ•จ. => ์•„์ง non-stationary

 

 

4. ํ•œ๋ฒˆ ๋” ๋ฐ˜๋ณต!

diff2=diff.diff()
diff2=diff2.dropna()
result=adfuller(diff2.value.values)
print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')

- H0(๊ท€๋ฌด๊ฐ€์„ค) : stationary x
- H1(๋Œ€๋ฆฝ๊ฐ€์„ค) : stationary o

 

๊ท€๋ฌด๊ฐ€์„ค ๊ธฐ๊ฐ! ์ฆ‰ stationaryํ•ด์ง

 

 

Plot : ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์—์„œ ์ถ”์„ธ, ๊ณ„์ ˆ ์ œ๊ฑฐ

 

~์ถ”์„ธ(trend) ์ œ๊ฑฐ~

 

1. ํŒจํ‚ค์ง€ ์ž„ํฌํŠธ ๋ฐ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

from scipy import signal
import matplotlib.pyplot as plt
df=pd.read_csv('a10.csv', parse_dates=['date'], index_col='date')

 

2. timeseries๊ฐ’์— ๋Œ€ํ•ด ์ถ”์„ธ(trend)๋ถ„ํ•ด

detrended=signal.detrend(df.value.values)
plt.plot(detrended)

 

3. ์‹œ๊ณ„์—ด ๋ถ„ํ•ด์—์„œ ์ถ”์„ธ(trend) ์ œ๊ฑฐํ•˜๊ธฐ

from statsmodels.tsa.seasonal import seasonal_decompose

- multiplicative : ์Šน๋ฒ•๋ชจ๋ธ
- extrapolate_trend='freq' : Trend ์„ฑ๋ถ„์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ rolling window ๋•Œ๋ฌธ์— ํ•„์—ฐ์ ์œผ๋กœ trend, resid์—๋Š” Nan ๊ฐ’์ด ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ด NaN๊ฐ’์„ ์ฑ„์›Œ์ฃผ๋Š” ์˜ต์…˜์ด๋‹ค.

result_mul=seasonal_decompose(df['value'], model='multiplicative', extrapolate_trend='freq')
detrended=df.value.values-result_mul.trend
plt.plot(detrended)

์šฐ์ƒํ–ฅ ํŠธ๋ Œ๋“œ๊ฐ€ ์ œ๊ฑฐ๋˜๋ฉด์„œ ๊ณ„์ ˆ์„ฑ์„ ๋ณด๋‹ค ๋ช…ํ™•ํ•˜๊ฒŒ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

~๊ณ„์ ˆ์„ฑ ์ œ๊ฑฐ~

from statsmodels.tsa.seasonal import seasonal_decompose
result_mul=seasonal_decompose(df['value'], model='multiplicative')
detrended=df.value.values / result_mul.seasonal
plt.plot(detrended)

๊ณ„์ ˆ์„ฑ์ด ์ œ๊ฑฐ๋˜๋ฉด์„œ ์šฐ์ƒํ–ฅ ํŠธ๋ Œ๋“œ๋ฅผ ๋ณด๋‹ค ๋ช…ํ™•ํ•˜๊ฒŒ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

728x90