測試環境為 CentOS 8 (虛擬機)
透過 pandas 函數 read_csv 時可以使用參數 parse_dates 與 date_parser 將讀進來的資料先做處理.參考文章 https://stackoverflow.com/questions/17465045/can-pandas-automatically-read-dates-from-a-csv-file
範例(裡面的 TimeStamp 格式與 Python 時間格式不同 )
[root@localhost ~]# cat ts.csv TimeStamp,Value 12/23/2022 6:36:22 PM,100 12/23/2022 6:37:22 PM,210 12/23/2022 6:38:22 PM,320
[root@localhost ~]# python3 Python 3.6.8 (default, Sep 10 2021, 09:13:53) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux Type "help", "copyright", "credits" or "license" for more information.
import pandas as pd df = pd.read_csv('ts.csv')
TimeStamp 格式為 object 而非 datetime64
>>> df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 TimeStamp 3 non-null object 1 Value 3 non-null int64 dtypes: int64(1), object(1) memory usage: 176.0+ bytes
透過 read_csv 參數 parse_dates 與 date_parser 並自訂函數 dateparse 做轉換.
from datetime import datetime import pandas as pd dateparse = lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S %p') df = pd.read_csv('ts.csv' , parse_dates=['TimeStamp'], date_parser=dateparse)
>>> df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 TimeStamp 3 non-null datetime64[ns] 1 Value 3 non-null int64 dtypes: datetime64[ns](1), int64(1) memory usage: 176.0 bytes
>>> df TimeStamp Value 0 2022-12-23 06:36:22 100 1 2022-12-23 06:37:22 210 2 2022-12-23 06:38:22 320
沒有解決問題,試試搜尋本站其他內容