Python pandas – read_csv 參數 parse_dates 與 date_parser

測試環境為 CentOS 8 (虛擬機)

透過 pandas 函數 read_csv 時可以使用參數 parse_dates 與 date_parser 將讀進來的資料先做處理.參考文章 https://stackoverflow.com/questions/17465045/can-pandas-automatically-read-dates-from-a-csv-file

範例(裡面的 TimeStamp 格式與 Python 時間格式不同 )

[root@localhost ~]# cat ts.csv
TimeStamp,Value
12/23/2022 6:36:22 PM,100
12/23/2022 6:37:22 PM,210
12/23/2022 6:38:22 PM,320

[root@localhost ~]# python3
Python 3.6.8 (default, Sep 10 2021, 09:13:53)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.

import pandas as pd
df = pd.read_csv('ts.csv')

TimeStamp 格式為 object 而非 datetime64

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   TimeStamp   3 non-null      object
 1    Value      3 non-null      int64
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes

透過 read_csv 參數 parse_dates 與 date_parser 並自訂函數 dateparse 做轉換.

from datetime import datetime
import pandas as pd

dateparse = lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S %p')

df = pd.read_csv('ts.csv' , parse_dates=['TimeStamp'], date_parser=dateparse)

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   TimeStamp  3 non-null      datetime64[ns]
 1   Value      3 non-null      int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 176.0 bytes

>>> df
            TimeStamp  Value
0 2022-12-23 06:36:22    100
1 2022-12-23 06:37:22    210
2 2022-12-23 06:38:22    320

沒有解決問題,試試搜尋本站其他內容

發佈留言 取消回覆

發佈留言取消回覆