測試環境為 CentOS 8 (虛擬機)
參考資料 – https://towardsdatascience.com/pandasql-interesting-way-to-run-sql-queries-in-python-18a4fc36406a
通常使用 Python 時會將資料儲存在 SQLite 裡面,再透過 SQLite 語法做資料分析,但有時候資料可能是檔案形式,這時候會 Pandas 來處理資料,但語法會不習慣的可以使用 pandasql 套件來直接針對 panda 的資料下 SQLite 語法,下面以 股票 CSV 格式為範例.
[root@localhost ~]# wget https://www.twse.com.tw/exchangeReport/STOCK_DAY_ALL?response=open_data -O tw_stock_20230221.csv
[root@localhost ~]# cat tw_stock_20230221.csv 證券代號,證券名稱,成交股數,成交金額,開盤價,最高價,最低價,收盤價,漲跌價差,成交筆數 "0050","元大台灣50","4772315","567003197","119.00","119.10","118.55","118.95","-0.0500","9128" "0051","元大中型100","19315","1086348","56.05","56.30","56.05","56.25","0.2000","134" "0052","富邦科技","251513","26328173","104.85","105.00","104.45","104.75","-0.1500","423" "0053","元大電子","8378","488691","58.60","58.60","58.25","58.60","-0.1000","958" "0055","元大MSCI金融","159738","3609835","22.70","22.70","22.54","22.55","-0.1900","279" "0056","元大高股息","29201103","803042808","27.41","27.55","27.36","27.53","0.1200","17948" "0057","富邦摩台","3882","335388","86.50","86.65","86.45","86.65","-0.2000","885"
安裝所需模組
[root@localhost ~]# pip install pandas [root@localhost ~]# pip install pandasql
匯入模組
[root@localhost ~]# python3 Python 3.6.8 (default, Sep 10 2021, 09:13:53) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux Type "help", "copyright", "credits" or "license" for more information.
程式:
import pandas as pd from pandasql import sqldf df = pd.read_csv('tw_stock_20230221.csv') query = "SELECT * FROM df WHERE 收盤價 > 1000" sqldf(query)
程式說明:
匯入所需套件.
>>> import pandas as pd >>> from pandasql import sqldf
透過 DataFrame 物件將 csv 檔案轉換過來.
>>> df = pd.read_csv('tw_stock_20230221.csv') >>> df 證券代號 證券名稱 成交股數 成交金額 開盤價 最高價 最低價 收盤價 漲跌價差 成交筆數 0 0050 元大台灣50 4772315.0 567003197.0 119.00 119.10 118.55 118.95 -0.05 9128.0 1 0051 元大中型100 19315.0 1086348.0 56.05 56.30 56.05 56.25 0.20 134.0 2 0052 富邦科技 251513.0 26328173.0 104.85 105.00 104.45 104.75 -0.15 423.0 3 0053 元大電子 8378.0 488691.0 58.60 58.60 58.25 58.60 -0.10 958.0 4 0055 元大MSCI金融 159738.0 3609835.0 22.70 22.70 22.54 22.55 -0.19 279.0 ... ... ... ... ... ... ... ... ... ... ... 1175 9944 新麗 74472.0 1526030.0 20.55 20.55 20.35 20.35 -0.20 98.0 1176 9945 潤泰新 2535015.0 112377835.0 44.40 44.45 44.20 44.40 0.00 1939.0 1177 9946 三發地產 77156.0 928860.0 12.00 12.10 11.95 12.05 0.05 74.0 1178 9955 佳龍 229733.0 4771269.0 20.95 20.95 20.65 20.80 -0.15 200.0 1179 9958 世紀鋼 3969464.0 423086085.0 108.00 108.50 104.50 105.50 -1.00 3143.0 [1180 rows x 10 columns]
檢視資料欄位與型別.
>>> df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 1180 entries, 0 to 1179 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 證券代號 1180 non-null object 1 證券名稱 1180 non-null object 2 成交股數 1169 non-null float64 3 成交金額 1169 non-null float64 4 開盤價 1167 non-null float64 5 最高價 1167 non-null float64 6 最低價 1167 non-null float64 7 收盤價 1167 non-null float64 8 漲跌價差 1180 non-null float64 9 成交筆數 1169 non-null float64 dtypes: float64(8), object(2) memory usage: 92.3+ KB
自訂 SQL 語法.
>>> query = "SELECT * FROM df WHERE 收盤價 > 1000"
透過 sqldf 物件執行剛剛設定的 SQL 語法(收盤價 > 1000)
>>> sqldf(query) 證券代號 證券名稱 成交股數 成交金額 開盤價 最高價 最低價 收盤價 漲跌價差 成交筆數 0 1590 亞德客-KY 548937.0 5.980080e+08 1070.0 1100.0 1065.0 1090.0 35.0 616.0 1 3008 大立光 512586.0 1.153767e+09 2210.0 2270.0 2210.0 2260.0 70.0 2522.0 2 3443 創意 6632633.0 6.792996e+09 1010.0 1045.0 998.0 1035.0 40.0 6906.0 3 5269 祥碩 1844419.0 1.822878e+09 965.0 1010.0 950.0 1005.0 55.0 2896.0 4 6409 旭隼 135592.0 2.191790e+08 1590.0 1655.0 1575.0 1640.0 65.0 376.0
沒有解決問題,試試搜尋本站其他內容