欧美三区_成人在线免费观看视频_欧美极品少妇xxxxⅹ免费视频_a级毛片免费播放_鲁一鲁中文字幕久久_亚洲一级特黄

Python數(shù)據(jù)分析模塊pandas用法詳解

系統(tǒng) 1671 0

本文實例講述了Python數(shù)據(jù)分析模塊pandas用法。分享給大家供大家參考,具體如下:

一 介紹

pandas(Python Data Analysis Library)是基于numpy的數(shù)據(jù)分析模塊,提供了大量標準數(shù)據(jù)模型和高效操作大型數(shù)據(jù)集所需要的工具,可以說pandas是使得Python能夠成為高效且強大的數(shù)據(jù)分析環(huán)境的重要因素之一。

pandas主要提供了3種數(shù)據(jù)結(jié)構(gòu):

1)Series,帶標簽的一維數(shù)組。

2)DataFrame,帶標簽且大小可變的二維表格結(jié)構(gòu)。

3)Panel,帶標簽且大小可變的三維數(shù)組。

二 代碼

1、生成一維數(shù)組

            
>>>import pandas as pd
>>>import numpy as np
>>> x = pd.Series([1,3,5, np.nan])
>>>print(x)
01.0
13.0
25.0
3NaN
dtype: float64
          

2、生成二維數(shù)組

            
>>> dates = pd.date_range(start='20170101', end='20171231', freq='D')#間隔為天
>>>print(dates)
DatetimeIndex(['2017-01-01','2017-01-02','2017-01-03','2017-01-04',
'2017-01-05','2017-01-06','2017-01-07','2017-01-08',
'2017-01-09','2017-01-10',
...
'2017-12-22','2017-12-23','2017-12-24','2017-12-25',
'2017-12-26','2017-12-27','2017-12-28','2017-12-29',
'2017-12-30','2017-12-31'],
dtype='datetime64[ns]', length=365, freq='D')
>>> dates = pd.date_range(start='20170101', end='20171231', freq='M')#間隔為月
>>>print(dates)
DatetimeIndex(['2017-01-31','2017-02-28','2017-03-31','2017-04-30',
'2017-05-31','2017-06-30','2017-07-31','2017-08-31',
'2017-09-30','2017-10-31','2017-11-30','2017-12-31'],
dtype='datetime64[ns]', freq='M')
>>> df = pd.DataFrame(np.random.randn(12,4), index=dates, columns=list('ABCD'))
>>>print(df)
A B C D
2017-01-31-0.6825560.2441020.4508550.236475
2017-02-28-0.6300600.5906670.4824380.225697
2017-03-311.0669890.3193391.0949531.716053
2017-04-300.334944-0.053049-1.009493-1.039470
2017-05-31-0.380778-0.0444290.0756470.931243
2017-06-300.8675400.872197-0.738974-1.114596
2017-07-310.423371-1.0863860.183820-0.438921
2017-08-311.2851630.634134-0.4729731.281057
2017-09-30-1.002832-0.888122-1.316014-0.070637
2017-10-311.735617-0.2538150.5544031.536211
2017-11-302.0303840.6675561.0126980.239479
2017-12-312.059718-0.0890501.4205170.224578
>>> df = pd.DataFrame([[np.random.randint(1,100)for j in range(4)]for i in range(12)], index=dates, columns=list('ABCD'))
>>>print(df)
A B C D
2017-01-317532522
2017-02-2870997098
2017-03-3199477567
2017-04-3033701749
2017-05-3162886891
2017-06-3019751844
2017-07-3150856582
2017-08-315628776
2017-09-306173111
2017-10-318296692
2017-11-306359194
2017-12-3179586933
>>> df = pd.DataFrame({'A':[np.random.randint(1,100)for i in range(4)],
'B':pd.date_range(start='20130101', periods=4, freq='D'),
'C':pd.Series([1,2,3,4],index=list(range(4)),dtype='float32'),
'D':np.array([3]*4,dtype='int32'),
'E':pd.Categorical(["test","train","test","train"]),
'F':'foo'})
>>>print(df)
A B C D E F
0152013-01-011.03 test foo
1112013-01-022.03 train foo
2912013-01-033.03 test foo
3912013-01-044.03 train foo
>>> df = pd.DataFrame({'A':[np.random.randint(1,100)for i in range(4)],
'B':pd.date_range(start='20130101', periods=4, freq='D'),
'C':pd.Series([1,2,3,4],index=['zhang','li','zhou','wang'],dtype='float32'),
'D':np.array([3]*4,dtype='int32'),
'E':pd.Categorical(["test","train","test","train"]),
'F':'foo'})
>>>print(df)
A B C D E F
zhang 362013-01-011.03 test foo
li 862013-01-022.03 train foo
zhou 102013-01-033.03 test foo
wang 792013-01-044.03 train foo
>>>
          

3、二維數(shù)據(jù)查看

            
>>> df.head() #默認顯示前5行
A B C D E F
zhang 362013-01-011.03 test foo
li 862013-01-022.03 train foo
zhou 102013-01-033.03 test foo
wang 792013-01-044.03 train foo
>>> df.head(3) #查看前3行
A B C D E F
zhang 362013-01-011.03 test foo
li 862013-01-022.03 train foo
zhou 102013-01-033.03 test foo
>>> df.tail(2) #查看最后2行
A B C D E F
zhou 102013-01-033.03 test foo
wang 792013-01-044.03 train foo
          

4、查看二維數(shù)據(jù)的索引、列名和數(shù)據(jù)

            
>>> df.index
Index(['zhang','li','zhou','wang'], dtype='object')
>>> df.columns
Index(['A','B','C','D','E','F'], dtype='object')
>>> df.values
array([[36,Timestamp('2013-01-01 00:00:00'),1.0,3,'test','foo'],
[86,Timestamp('2013-01-02 00:00:00'),2.0,3,'train','foo'],
[10,Timestamp('2013-01-03 00:00:00'),3.0,3,'test','foo'],
[79,Timestamp('2013-01-04 00:00:00'),4.0,3,'train','foo']], dtype=object)
          

5、查看數(shù)據(jù)的統(tǒng)計信息

            
>>> df.describe() #平均值、標準差、最小值、最大值等信息
A C D
count 4.0000004.0000004.0
mean 52.7500002.5000003.0
std 36.0682221.2909940.0
min 10.0000001.0000003.0
25%29.5000001.7500003.0
50%57.5000002.5000003.0
75%80.7500003.2500003.0
max 86.0000004.0000003.0
          

6、二維數(shù)據(jù)轉(zhuǎn)置?

            
>>> df.T
zhang li zhou \
A 368610
B 2013-01-0100:00:002013-01-0200:00:002013-01-0300:00:00
C 123
D 333
E test train test
F foo foo foo
wang
A 79
B 2013-01-0400:00:00
C 4
D 3
E train
F foo
          

7、排序?

            
>>> df.sort_index(axis=0, ascending=False)#對軸進行排序
A B C D E F
zhou 102013-01-033.03 test foo
zhang 362013-01-011.03 test foo
wang 792013-01-044.03 train foo
li 862013-01-022.03 train foo
>>> df.sort_index(axis=1, ascending=False)
F E D C B A
zhang foo test 31.02013-01-0136
li foo train 32.02013-01-0286
zhou foo test 33.02013-01-0310
wang foo train 34.02013-01-0479
>>> df.sort_index(axis=0, ascending=True)
A B C D E F
li 862013-01-022.03 train foo
wang 792013-01-044.03 train foo
zhang 362013-01-011.03 test foo
zhou 102013-01-033.03 test foo
>>> df.sort_values(by='A')#對數(shù)據(jù)進行排序
A B C D E F
zhou 102013-01-033.03 test foo
zhang 362013-01-011.03 test foo
wang 792013-01-044.03 train foo
li 862013-01-022.03 train foo
>>> df.sort_values(by='A', ascending=False)#降序排列
A B C D E F
li 862013-01-022.03 train foo
wang 792013-01-044.03 train foo
zhang 362013-01-011.03 test foo
zhou 102013-01-033.03 test foo
          

8、數(shù)據(jù)選擇

            
>>> df['A']#選擇列
zhang 1
li 1
zhou 60
wang 58
Name: A, dtype: int64
>>> df[0:2]#使用切片選擇多行
A B C D E F
zhang 12013-01-011.03 test foo
li 12013-01-022.03 train foo
>>> df.loc[:,['A','C']]#選擇多列
A C
zhang 11.0
li 12.0
zhou 603.0
wang 584.0
>>> df.loc[['zhang','zhou'],['A','D','E']]#同時指定多行與多列進行選擇
A D E
zhang 13 test
zhou 603 test
>>> df.loc['zhang',['A','D','E']]
A 1
D 3
E test
Name: zhang, dtype: object
          

9、數(shù)據(jù)修改和設(shè)置

            
>>> df.iat[0,2]=3#修改指定行、列位置的數(shù)據(jù)值
>>>print(df)
A B C D E F
zhang 12013-01-013.03 test foo
li 12013-01-022.03 train foo
zhou 602013-01-033.03 test foo
wang 582013-01-044.03 train foo
>>> df.loc[:,'D']=[np.random.randint(50,60)for i in range(4)]#修改某列的值
>>>print(df)
A B C D E F
zhang 12013-01-013.057 test foo
li 12013-01-022.052 train foo
zhou 602013-01-033.057 test foo
wang 582013-01-044.056 train foo
>>> df['C']=-df['C']#對指定列數(shù)據(jù)取反
>>>print(df)
A B C D E F
zhang 12013-01-01-3.057 test foo
li 12013-01-02-2.052 train foo
zhou 602013-01-03-3.057 test foo
wang 582013-01-04-4.056 train foo
          

10、缺失值處理

            
>>> df1 = df.reindex(index=['zhang','li','zhou','wang'], columns=list(df.columns)+['G'])
>>>print(df1)
A B C D E F G
zhang 12013-01-01-3.057 test foo NaN
li 12013-01-02-2.052 train foo NaN
zhou 602013-01-03-3.057 test foo NaN
wang 582013-01-04-4.056 train foo NaN
>>> df1.iat[0,6]=3#修改指定位置元素值,該列其他元素為缺失值NaN
>>>print(df1)
A B C D E F G
zhang 12013-01-01-3.057 test foo 3.0
li 12013-01-02-2.052 train foo NaN
zhou 602013-01-03-3.057 test foo NaN
wang 582013-01-04-4.056 train foo NaN
>>> pd.isnull(df1)#測試缺失值,返回值為True/False陣列
A B C D E F G
zhang FalseFalseFalseFalseFalseFalseFalse
li FalseFalseFalseFalseFalseFalseTrue
zhou FalseFalseFalseFalseFalseFalseTrue
wang FalseFalseFalseFalseFalseFalseTrue
>>> df1.dropna()#返回不包含缺失值的行
A B C D E F G
zhang 12013-01-01-3.057 test foo 3.0
>>> df1['G'].fillna(5, inplace=True)#使用指定值填充缺失值
>>>print(df1)
A B C D E F G
zhang 12013-01-01-3.057 test foo 3.0
li 12013-01-02-2.052 train foo 5.0
zhou 602013-01-03-3.057 test foo 5.0
wang 582013-01-04-4.056 train foo 5.0
          

11、數(shù)據(jù)操作

            
>>> df1.mean()#平均值,自動忽略缺失值
A 30.0
C -3.0
D 55.5
G 4.5
dtype: float64
>>> df.mean(1)#橫向計算平均值
zhang 18.333333
li 17.000000
zhou 38.000000
wang 36.666667
dtype: float64
>>> df1.shift(1)#數(shù)據(jù)移位
A B C D E F G
zhang NaNNaTNaNNaNNaNNaNNaN
li 1.02013-01-01-3.057.0 test foo 3.0
zhou 1.02013-01-02-2.052.0 train foo 5.0
wang 60.02013-01-03-3.057.0 test foo 5.0
>>> df1['D'].value_counts()#直方圖統(tǒng)計
572
561
521
Name: D, dtype: int64
>>>print(df1)
A B C D E F G
zhang 12013-01-01-3.057 test foo 3.0
li 12013-01-02-2.052 train foo 5.0
zhou 602013-01-03-3.057 test foo 5.0
wang 582013-01-04-4.056 train foo 5.0
>>> df2 = pd.DataFrame(np.random.randn(10,4))
>>>print(df2)
0123
0-0.939904-1.856658-0.2819650.203624
10.3501620.060674-0.9148080.135735
2-1.031384-1.6112740.341546-0.363671
30.139464-0.050959-0.810610-0.772648
4-1.146810-0.7916081.488790-0.490004
5-0.100707-0.763545-0.071274-0.298142
6-0.2120140.8097090.6931960.980568
7-0.812985-0.000325-0.675101-0.217394
80.066969-0.084609-0.4330990.535616
9-0.319120-0.5328541.321712-1.751913
>>> p1 = df2[:3] >>> print(p1) 0 1 2 3 0 -0.939904 -1.856658 -0.281965 0.203624 1 0.350162 0.060674 -0.914808 0.135735 2 -1.031384 -1.611274 0.341546 -0.363671 >>> p2 = df2[3:7] >>> print(p2) 0 1 2 3 3 0.139464 -0.050959 -0.810610 -0.772648 4 -1.146810 -0.791608 1.488790 -0.490004 5 -0.100707 -0.763545 -0.071274 -0.298142 6 -0.212014 0.809709 0.693196 0.980568 >>> p3 = df2[7:] >>> print(p3) 0 1 2 3 7 -0.812985 -0.000325 -0.675101 -0.217394 8 0.066969 -0.084609 -0.433099 0.535616 9 -0.319120 -0.532854 1.321712 -1.751913 >>> df3 = pd.concat([p1, p2, p3]) #數(shù)據(jù)行合并 >>> print(df3) 0 1 2 3 0 -0.939904 -1.856658 -0.281965 0.203624 1 0.350162 0.060674 -0.914808 0.135735 2 -1.031384 -1.611274 0.341546 -0.363671 3 0.139464 -0.050959 -0.810610 -0.772648 4 -1.146810 -0.791608 1.488790 -0.490004 5 -0.100707 -0.763545 -0.071274 -0.298142 6 -0.212014 0.809709 0.693196 0.980568 7 -0.812985 -0.000325 -0.675101 -0.217394 8 0.066969 -0.084609 -0.433099 0.535616 9 -0.319120 -0.532854 1.321712 -1.751913 >>> df2 == df3 0 1 2 3 0 True True True True 1 True True True True 2 True True True True 3 True True True True 4 True True True True 5 True True True True 6 True True True True 7 True True True True 8 True True True True 9 True True True True >>> df4 = pd.DataFrame({'A':[np.random.randint(1,5) for i in range(8)], 'B':[np.random.randint(10,15) for i in range(8)], 'C':[np.random.randint(20,30) for i in range(8)], 'D':[np.random.randint(80,100) for i in range(8)]}) >>> print(df4) A B C D 0 4 11 24 91 1 1 13 28 95 2 2 12 27 91 3 1 12 20 87 4 3 11 24 96 5 1 13 21 99 6 3 11 22 95 7 2 13 26 98 >>> >>> df4.groupby('A').sum() #數(shù)據(jù)分組計算 B C D A 1 38 69 281 2 25 53 189 3 22 46 191 4 11 24 91 >>> >>> df4.groupby(['A','B']).mean() C D A B 1 12 20.0 87.0 13 24.5 97.0 2 12 27.0 91.0 13 26.0 98.0 3 11 23.0 95.5 4 11 24.0 91.0
          

12、結(jié)合matplotlib繪圖

            
>>>import pandas as pd
>>>import numpy as np
>>>import matplotlib.pyplot as plt
>>> df = pd.DataFrame(np.random.randn(1000,2), columns=['B','C']).cumsum()
>>>print(df)
B C
00.0898860.511081
11.3237661.584758
21.489479-0.438671
30.831331-0.398021
4-0.2482330.494418
5-0.0130850.684518
60.666951-1.422161
71.768838-0.658786
82.6610800.648505
91.9517510.836261
103.5387851.657475
113.2540342.052609
124.2486201.568401
134.0771730.055622
143.452590-0.200314
152.627620-0.408829
163.690537-0.210440
173.1849240.365447
183.646556-0.150044
194.164563-0.023405
202.3914470.517872
212.8651530.686649
223.6231830.663927
231.5451170.151044
243.5959240.903619
253.0138041.855083
264.4388011.014572
275.1552160.882628
284.4314570.741509
292.8419490.709991
........
970-7.910646-13.738689
971-7.318091-14.811335
972-9.144376-15.466873
973-9.538658-15.367167
974-9.061114-16.822726
975-9.803798-17.368350
976-10.180575-17.270180
977-10.601352-17.671543
978-10.804909-19.535919
979-10.397964-20.361419
980-10.979640-20.300267
981-8.738223-20.202669
982-9.339929-21.528973
983-9.780686-20.902152
984-11.072655-21.235735
985-10.849717-20.439201
986-10.953247-19.708973
987-13.032707-18.687553
988-12.984567-19.557132
989-13.508836-18.747584
990-13.420713-19.883180
991-11.718125-20.474092
992-11.936512-21.360752
993-14.225655-22.006776
994-13.524940-20.844519
995-14.088767-20.492952
996-14.169056-20.666777
997-14.798708-19.960555
998-15.766568-19.395622
999-17.281143-19.089793
[1000 rows x 2 columns]
>>> df['A']= pd.Series(list(range(len(df))))
>>>print(df)
B C A
00.0898860.5110810
11.3237661.5847581
21.489479-0.4386712
30.831331-0.3980213
4-0.2482330.4944184
5-0.0130850.6845185
60.666951-1.4221616
71.768838-0.6587867
82.6610800.6485058
91.9517510.8362619
103.5387851.65747510
113.2540342.05260911
124.2486201.56840112
134.0771730.05562213
143.452590-0.20031414
152.627620-0.40882915
163.690537-0.21044016
173.1849240.36544717
183.646556-0.15004418
194.164563-0.02340519
202.3914470.51787220
212.8651530.68664921
223.6231830.66392722
231.5451170.15104423
243.5959240.90361924
253.0138041.85508325
264.4388011.01457226
275.1552160.88262827
284.4314570.74150928
292.8419490.70999129
...........
970-7.910646-13.738689970
971-7.318091-14.811335971
972-9.144376-15.466873972
973-9.538658-15.367167973
974-9.061114-16.822726974
975-9.803798-17.368350975
976-10.180575-17.270180976
977-10.601352-17.671543977
978-10.804909-19.535919978
979-10.397964-20.361419979
980-10.979640-20.300267980
981-8.738223-20.202669981
982-9.339929-21.528973982
983-9.780686-20.902152983
984-11.072655-21.235735984
985-10.849717-20.439201985
986-10.953247-19.708973986
987-13.032707-18.687553987
988-12.984567-19.557132988
989-13.508836-18.747584989
990-13.420713-19.883180990
991-11.718125-20.474092991
992-11.936512-21.360752992
993-14.225655-22.006776993
994-13.524940-20.844519994
995-14.088767-20.492952995
996-14.169056-20.666777996
997-14.798708-19.960555997
998-15.766568-19.395622998
999-17.281143-19.089793999
[1000 rows x 3 columns]
>>> plt.figure()

            
              
>>> df.plot(x='A')

              
                
>>> plt.show()
              
            
          

運行結(jié)果
Python數(shù)據(jù)分析模塊pandas用法詳解_第1張圖片
?

            
>>> df = pd.DataFrame(np.random.rand(10,4), columns=['a','b','c','d'])
>>>print(df)
a b c d
00.5044340.1908750.0016870.327372
10.4068440.6020290.9120750.815889
20.8285340.9859100.0946620.552089
30.1988430.8187850.7506490.967054
40.4984940.1513780.4175060.264438
50.6552880.6727880.0886160.433270
60.4931270.0092540.1794790.396655
70.4193860.9109860.0200040.229063
80.6714690.6121890.3749200.407093
90.4149780.0334990.7560250.717849
>>> df.plot(kind='bar')

            
              
>>> plt.show()
            
          

運行結(jié)果

Python數(shù)據(jù)分析模塊pandas用法詳解_第2張圖片
?

            
>>> df = pd.DataFrame(np.random.rand(10,4), columns=['a','b','c','d'])
>>> df.plot(kind='barh', stacked=True)

            
              
>>> plt.show()
            
          

Python數(shù)據(jù)分析模塊pandas用法詳解_第3張圖片
? ? ? ?

更多關(guān)于Python相關(guān)內(nèi)容感興趣的讀者可查看本站專題:《Python數(shù)學(xué)運算技巧總結(jié)》、《Python數(shù)據(jù)結(jié)構(gòu)與算法教程》、《Python函數(shù)使用技巧總結(jié)》、《Python字符串操作技巧匯總》及《Python入門與進階經(jīng)典教程》

希望本文所述對大家Python程序設(shè)計有所幫助。


更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主

微信掃碼或搜索:z360901061

微信掃一掃加我為好友

QQ號聯(lián)系: 360901061

您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。

【本文對您有幫助就好】

您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描上面二維碼支持博主2元、5元、10元、自定義金額等您想捐的金額吧,站長會非常 感謝您的哦?。?!

發(fā)表我的評論
最新評論 總共0條評論
主站蜘蛛池模板: 天天色天天 | 伦理午夜电影免费观看 | 日本一区二区三区中文字幕 | www.色.com | 精品视频手机在线观看免费 | 欧美一级www片免费观看 | 91热久久免费频精品黑人99 | 日韩在线1 | 国产精品久久久久久久久免费 | 午夜视频在线免费观看 | 欧美黑人疯狂性受xxxxx喷水 | 久久亚洲国产精品 | 日韩卡1卡2 卡三卡2021老狼 | 高清免费国产在线观看 | www欧美 | 欧美日韩亚洲国内综合网俺 | 亚洲综合一区二区三区 | 久草在线在线观看 | 一级毛片看真人在线视频 | 欧美色性 | www.久久久 | 成人一区二区三区在线观看 | 99热久久国产精品免费看 | 日本免费观看官网 | 多男操一女视频 | 一区二区三区国产 | 一级毛片免费观看不收费 | 青青青青手机在线视频观看国产 | 亚洲国产精品一区二区久久 | 久久亚洲一区二区 | 丝袜美腿一区 | 请吃饭的姐姐 | 国产综合久久 | 91香蕉人成app| 青春草在线观看 | 日本又黄又粗暴的gif动态图含羞 | 好吊视频 | 亚洲欧洲另类 | 日日噜噜夜夜狠狠视频buoke | 鲁久久| 国产欧美精品一区二区三区 |