ARIMA时间序列预测-1

1
2
import numpy as np
import pandas as pd

时间序列

  • 时间戳
  • 时间间隔
  • 固定周期
1
2
3
4
5
# D 天
# H 小时
# M 月
rng = pd.date_range('2016/07/01', periods=10 , freq='D')
rng
DatetimeIndex(['2016-07-01', '2016-07-02', '2016-07-03', '2016-07-04',
               '2016-07-05', '2016-07-06', '2016-07-07', '2016-07-08',
               '2016-07-09', '2016-07-10'],
              dtype='datetime64[ns]', freq='D')
1
2
rng = pd.date_range('2016/07/01', periods=10 , freq='3D')
rng
DatetimeIndex(['2016-07-01', '2016-07-04', '2016-07-07', '2016-07-10',
               '2016-07-13', '2016-07-16', '2016-07-19', '2016-07-22',
               '2016-07-25', '2016-07-28'],
              dtype='datetime64[ns]', freq='3D')
1
2
time = pd.Series(np.random.randn(20), index=pd.date_range('2016/1/1', periods=20))
time
2016-01-01   -0.512570
2016-01-02   -1.077638
2016-01-03    0.126473
2016-01-04   -1.242304
2016-01-05   -0.311126
2016-01-06   -0.380349
2016-01-07    1.459504
2016-01-08   -0.328805
2016-01-09    0.537477
2016-01-10   -0.377715
2016-01-11   -0.036280
2016-01-12   -2.522750
2016-01-13   -0.936564
2016-01-14    0.220823
2016-01-15   -0.515707
2016-01-16   -0.338733
2016-01-17    1.403778
2016-01-18    1.316850
2016-01-19   -0.988479
2016-01-20   -1.655101
Freq: D, dtype: float64
1
2
data = pd.date_range('2020/01/01','2020/01/20',freq='D')
data
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
               '2020-01-09', '2020-01-10', '2020-01-11', '2020-01-12',
               '2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16',
               '2020-01-17', '2020-01-18', '2020-01-19', '2020-01-20'],
              dtype='datetime64[ns]', freq='D')

过滤

1
time.truncate(before='2016-1-10')   # after
2016-01-10   -0.377715
2016-01-11   -0.036280
2016-01-12   -2.522750
2016-01-13   -0.936564
2016-01-14    0.220823
2016-01-15   -0.515707
2016-01-16   -0.338733
2016-01-17    1.403778
2016-01-18    1.316850
2016-01-19   -0.988479
2016-01-20   -1.655101
Freq: D, dtype: float64
1
pd.Timestamp('2020/01/01')
Timestamp('2020-01-01 00:00:00')

数据重采样

  • 升采样
  • 降采样
1
2
3
rng = pd.date_range('2020/01/01', periods=90, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts.head()
2020-01-01    1.316377
2020-01-02    1.979066
2020-01-03   -1.101733
2020-01-04   -0.154840
2020-01-05    0.661635
Freq: D, dtype: float64
1
ts.resample('M').sum()  # 将天转化为月
2020-01-31    6.521808
2020-02-29    1.520356
2020-03-31   -6.827269
Freq: M, dtype: float64
1
2
day3D = ts.resample('3D').sum()
print(day3D)
2020-01-01    2.193710
2020-01-04    0.216971
2020-01-07   -1.321336
2020-01-10    1.099015
2020-01-13    3.568412
2020-01-16    2.904386
2020-01-19   -2.107415
2020-01-22   -0.324793
2020-01-25   -1.017733
2020-01-28    0.357197
2020-01-31    0.586223
2020-02-03   -1.602844
2020-02-06    0.676791
2020-02-09   -2.443876
2020-02-12   -0.546803
2020-02-15    4.927371
2020-02-18   -0.726225
2020-02-21    1.586811
2020-02-24    0.500058
2020-02-27   -0.483757
2020-03-01   -0.520131
2020-03-04    3.073532
2020-03-07   -2.036933
2020-03-10   -3.002754
2020-03-13    1.658833
2020-03-16   -0.804828
2020-03-19   -1.198226
2020-03-22   -3.494161
2020-03-25    1.942300
2020-03-28   -2.444900
Freq: 3D, dtype: float64
1
2
# 升采样
print(day3D.resample('D').asfreq())
2020-01-01    2.193710
2020-01-02         NaN
2020-01-03         NaN
2020-01-04    0.216971
2020-01-05         NaN
2020-01-06         NaN
2020-01-07   -1.321336
2020-01-08         NaN
2020-01-09         NaN
2020-01-10    1.099015
2020-01-11         NaN
2020-01-12         NaN
2020-01-13    3.568412
2020-01-14         NaN
2020-01-15         NaN
2020-01-16    2.904386
2020-01-17         NaN
2020-01-18         NaN
2020-01-19   -2.107415
2020-01-20         NaN
2020-01-21         NaN
2020-01-22   -0.324793
2020-01-23         NaN
2020-01-24         NaN
2020-01-25   -1.017733
2020-01-26         NaN
2020-01-27         NaN
2020-01-28    0.357197
2020-01-29         NaN
2020-01-30         NaN
                ...   
2020-02-28         NaN
2020-02-29         NaN
2020-03-01   -0.520131
2020-03-02         NaN
2020-03-03         NaN
2020-03-04    3.073532
2020-03-05         NaN
2020-03-06         NaN
2020-03-07   -2.036933
2020-03-08         NaN
2020-03-09         NaN
2020-03-10   -3.002754
2020-03-11         NaN
2020-03-12         NaN
2020-03-13    1.658833
2020-03-14         NaN
2020-03-15         NaN
2020-03-16   -0.804828
2020-03-17         NaN
2020-03-18         NaN
2020-03-19   -1.198226
2020-03-20         NaN
2020-03-21         NaN
2020-03-22   -3.494161
2020-03-23         NaN
2020-03-24         NaN
2020-03-25    1.942300
2020-03-26         NaN
2020-03-27         NaN
2020-03-28   -2.444900
Freq: D, Length: 88, dtype: float64

插值方法:

  • ffill 空值取前面的值
  • bfill 空值取后面的值
  • interpolate 线性取值
1
day3D.resample('D').ffill(1).head()
2020-01-01    2.193710
2020-01-02    2.193710
2020-01-03         NaN
2020-01-04    0.216971
2020-01-05    0.216971
Freq: D, dtype: float64
1
day3D.resample('D').interpolate('linear').head()
2020-01-01    2.193710
2020-01-02    1.534797
2020-01-03    0.875884
2020-01-04    0.216971
2020-01-05   -0.295798
Freq: D, dtype: float64

滑动窗口

1
2
r = ts.rolling(window=10)
r
Rolling [window=10,center=False,axis=0]
1
2
# r.max r.median r.std r.skew r.sum r.var
print(r.mean())
2020-01-01         NaN
2020-01-02         NaN
2020-01-03         NaN
2020-01-04         NaN
2020-01-05         NaN
2020-01-06         NaN
2020-01-07         NaN
2020-01-08         NaN
2020-01-09         NaN
2020-01-10    0.227167
2020-01-11    0.078212
2020-01-12   -0.110708
2020-01-13    0.148539
2020-01-14    0.297714
2020-01-15    0.305627
2020-01-16    0.575904
2020-01-17    0.611440
2020-01-18    0.700247
2020-01-19    0.621096
2020-01-20    0.320521
2020-01-21    0.445525
2020-01-22    0.198997
2020-01-23    0.040407
2020-01-24    0.121294
2020-01-25   -0.011793
2020-01-26   -0.331101
2020-01-27   -0.350714
2020-01-28   -0.319107
2020-01-29   -0.204708
2020-01-30    0.009153
                ...   
2020-03-01    0.117644
2020-03-02   -0.073529
2020-03-03   -0.047548
2020-03-04    0.160732
2020-03-05    0.345953
2020-03-06    0.334522
2020-03-07    0.287958
2020-03-08    0.273020
2020-03-09    0.063349
2020-03-10   -0.136058
2020-03-11   -0.151545
2020-03-12   -0.130524
2020-03-13   -0.149661
2020-03-14   -0.384004
2020-03-15   -0.271879
2020-03-16   -0.500738
2020-03-17   -0.359947
2020-03-18   -0.357707
2020-03-19   -0.265655
2020-03-20   -0.161695
2020-03-21   -0.088838
2020-03-22   -0.185694
2020-03-23   -0.330775
2020-03-24   -0.407565
2020-03-25   -0.457954
2020-03-26   -0.254431
2020-03-27   -0.414623
2020-03-28   -0.291319
2020-03-29   -0.424687
2020-03-30   -0.384974
Freq: D, Length: 90, dtype: float64
1
2
3
4
5
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(15, 5))
ts.plot(style='r--')
ts.rolling(window = 10).mean().plot(style='b')
<matplotlib.axes._subplots.AxesSubplot at 0x123521400>

png


ARIMA时间序列预测-1
https://zhangfuli.github.io/2020/08/18/ARIMA时间序列预测-1/
作者
张富利
发布于
2020年8月18日
许可协议