1 2 import numpy as npimport pandas as pd
时间序列
1 2 3 4 5 rng = pd.date_range('2016/07/01' , periods=10 , freq='D' ) rng
DatetimeIndex(['2016-07-01', '2016-07-02', '2016-07-03', '2016-07-04',
'2016-07-05', '2016-07-06', '2016-07-07', '2016-07-08',
'2016-07-09', '2016-07-10'],
dtype='datetime64[ns]', freq='D')
1 2 rng = pd.date_range('2016/07/01' , periods=10 , freq='3D' ) rng
DatetimeIndex(['2016-07-01', '2016-07-04', '2016-07-07', '2016-07-10',
'2016-07-13', '2016-07-16', '2016-07-19', '2016-07-22',
'2016-07-25', '2016-07-28'],
dtype='datetime64[ns]', freq='3D')
1 2 time = pd.Series(np.random.randn(20 ), index=pd.date_range('2016/1/1' , periods=20 )) time
2016-01-01 -0.512570
2016-01-02 -1.077638
2016-01-03 0.126473
2016-01-04 -1.242304
2016-01-05 -0.311126
2016-01-06 -0.380349
2016-01-07 1.459504
2016-01-08 -0.328805
2016-01-09 0.537477
2016-01-10 -0.377715
2016-01-11 -0.036280
2016-01-12 -2.522750
2016-01-13 -0.936564
2016-01-14 0.220823
2016-01-15 -0.515707
2016-01-16 -0.338733
2016-01-17 1.403778
2016-01-18 1.316850
2016-01-19 -0.988479
2016-01-20 -1.655101
Freq: D, dtype: float64
1 2 data = pd.date_range('2020/01/01' ,'2020/01/20' ,freq='D' ) data
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
'2020-01-09', '2020-01-10', '2020-01-11', '2020-01-12',
'2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16',
'2020-01-17', '2020-01-18', '2020-01-19', '2020-01-20'],
dtype='datetime64[ns]', freq='D')
过滤 1 time.truncate(before='2016-1-10' )
2016-01-10 -0.377715
2016-01-11 -0.036280
2016-01-12 -2.522750
2016-01-13 -0.936564
2016-01-14 0.220823
2016-01-15 -0.515707
2016-01-16 -0.338733
2016-01-17 1.403778
2016-01-18 1.316850
2016-01-19 -0.988479
2016-01-20 -1.655101
Freq: D, dtype: float64
1 pd.Timestamp('2020/01/01' )
Timestamp('2020-01-01 00:00:00')
数据重采样
1 2 3 rng = pd.date_range('2020/01/01' , periods=90 , freq='D' ) ts = pd.Series(np.random.randn(len (rng)), index=rng) ts.head()
2020-01-01 1.316377
2020-01-02 1.979066
2020-01-03 -1.101733
2020-01-04 -0.154840
2020-01-05 0.661635
Freq: D, dtype: float64
2020-01-31 6.521808
2020-02-29 1.520356
2020-03-31 -6.827269
Freq: M, dtype: float64
1 2 day3D = ts.resample('3D' ).sum () print (day3D)
2020-01-01 2.193710
2020-01-04 0.216971
2020-01-07 -1.321336
2020-01-10 1.099015
2020-01-13 3.568412
2020-01-16 2.904386
2020-01-19 -2.107415
2020-01-22 -0.324793
2020-01-25 -1.017733
2020-01-28 0.357197
2020-01-31 0.586223
2020-02-03 -1.602844
2020-02-06 0.676791
2020-02-09 -2.443876
2020-02-12 -0.546803
2020-02-15 4.927371
2020-02-18 -0.726225
2020-02-21 1.586811
2020-02-24 0.500058
2020-02-27 -0.483757
2020-03-01 -0.520131
2020-03-04 3.073532
2020-03-07 -2.036933
2020-03-10 -3.002754
2020-03-13 1.658833
2020-03-16 -0.804828
2020-03-19 -1.198226
2020-03-22 -3.494161
2020-03-25 1.942300
2020-03-28 -2.444900
Freq: 3D, dtype: float64
1 2 print (day3D.resample('D' ).asfreq())
2020-01-01 2.193710
2020-01-02 NaN
2020-01-03 NaN
2020-01-04 0.216971
2020-01-05 NaN
2020-01-06 NaN
2020-01-07 -1.321336
2020-01-08 NaN
2020-01-09 NaN
2020-01-10 1.099015
2020-01-11 NaN
2020-01-12 NaN
2020-01-13 3.568412
2020-01-14 NaN
2020-01-15 NaN
2020-01-16 2.904386
2020-01-17 NaN
2020-01-18 NaN
2020-01-19 -2.107415
2020-01-20 NaN
2020-01-21 NaN
2020-01-22 -0.324793
2020-01-23 NaN
2020-01-24 NaN
2020-01-25 -1.017733
2020-01-26 NaN
2020-01-27 NaN
2020-01-28 0.357197
2020-01-29 NaN
2020-01-30 NaN
...
2020-02-28 NaN
2020-02-29 NaN
2020-03-01 -0.520131
2020-03-02 NaN
2020-03-03 NaN
2020-03-04 3.073532
2020-03-05 NaN
2020-03-06 NaN
2020-03-07 -2.036933
2020-03-08 NaN
2020-03-09 NaN
2020-03-10 -3.002754
2020-03-11 NaN
2020-03-12 NaN
2020-03-13 1.658833
2020-03-14 NaN
2020-03-15 NaN
2020-03-16 -0.804828
2020-03-17 NaN
2020-03-18 NaN
2020-03-19 -1.198226
2020-03-20 NaN
2020-03-21 NaN
2020-03-22 -3.494161
2020-03-23 NaN
2020-03-24 NaN
2020-03-25 1.942300
2020-03-26 NaN
2020-03-27 NaN
2020-03-28 -2.444900
Freq: D, Length: 88, dtype: float64
插值方法:
ffill 空值取前面的值
bfill 空值取后面的值
interpolate 线性取值
1 day3D.resample('D' ).ffill(1 ).head()
2020-01-01 2.193710
2020-01-02 2.193710
2020-01-03 NaN
2020-01-04 0.216971
2020-01-05 0.216971
Freq: D, dtype: float64
1 day3D.resample('D' ).interpolate('linear' ).head()
2020-01-01 2.193710
2020-01-02 1.534797
2020-01-03 0.875884
2020-01-04 0.216971
2020-01-05 -0.295798
Freq: D, dtype: float64
滑动窗口 1 2 r = ts.rolling(window=10 ) r
Rolling [window=10,center=False,axis=0]
2020-01-01 NaN
2020-01-02 NaN
2020-01-03 NaN
2020-01-04 NaN
2020-01-05 NaN
2020-01-06 NaN
2020-01-07 NaN
2020-01-08 NaN
2020-01-09 NaN
2020-01-10 0.227167
2020-01-11 0.078212
2020-01-12 -0.110708
2020-01-13 0.148539
2020-01-14 0.297714
2020-01-15 0.305627
2020-01-16 0.575904
2020-01-17 0.611440
2020-01-18 0.700247
2020-01-19 0.621096
2020-01-20 0.320521
2020-01-21 0.445525
2020-01-22 0.198997
2020-01-23 0.040407
2020-01-24 0.121294
2020-01-25 -0.011793
2020-01-26 -0.331101
2020-01-27 -0.350714
2020-01-28 -0.319107
2020-01-29 -0.204708
2020-01-30 0.009153
...
2020-03-01 0.117644
2020-03-02 -0.073529
2020-03-03 -0.047548
2020-03-04 0.160732
2020-03-05 0.345953
2020-03-06 0.334522
2020-03-07 0.287958
2020-03-08 0.273020
2020-03-09 0.063349
2020-03-10 -0.136058
2020-03-11 -0.151545
2020-03-12 -0.130524
2020-03-13 -0.149661
2020-03-14 -0.384004
2020-03-15 -0.271879
2020-03-16 -0.500738
2020-03-17 -0.359947
2020-03-18 -0.357707
2020-03-19 -0.265655
2020-03-20 -0.161695
2020-03-21 -0.088838
2020-03-22 -0.185694
2020-03-23 -0.330775
2020-03-24 -0.407565
2020-03-25 -0.457954
2020-03-26 -0.254431
2020-03-27 -0.414623
2020-03-28 -0.291319
2020-03-29 -0.424687
2020-03-30 -0.384974
Freq: D, Length: 90, dtype: float64
1 2 3 4 5 import matplotlib.pyplot as plt%matplotlib inline plt.figure(figsize=(15 , 5 )) ts.plot(style='r--' ) ts.rolling(window = 10 ).mean().plot(style='b' )
<matplotlib.axes._subplots.AxesSubplot at 0x123521400>