Python – pandasql

Loading

測試環境為 CentOS 8 (虛擬機)

參考資料 – https://towardsdatascience.com/pandasql-interesting-way-to-run-sql-queries-in-python-18a4fc36406a

通常使用 Python 時會將資料儲存在 SQLite 裡面,再透過 SQLite 語法做資料分析,但有時候資料可能是檔案形式,這時候會 Pandas 來處理資料,但語法會不習慣的可以使用 pandasql 套件來直接針對 panda 的資料下 SQLite 語法,下面以 股票 CSV 格式為範例.

[root@localhost ~]# wget https://www.twse.com.tw/exchangeReport/STOCK_DAY_ALL?response=open_data  -O tw_stock_20230221.csv
[root@localhost ~]# cat tw_stock_20230221.csv
證券代號,證券名稱,成交股數,成交金額,開盤價,最高價,最低價,收盤價,漲跌價差,成交筆數
"0050","元大台灣50","4772315","567003197","119.00","119.10","118.55","118.95","-0.0500","9128"
"0051","元大中型100","19315","1086348","56.05","56.30","56.05","56.25","0.2000","134"
"0052","富邦科技","251513","26328173","104.85","105.00","104.45","104.75","-0.1500","423"
"0053","元大電子","8378","488691","58.60","58.60","58.25","58.60","-0.1000","958"
"0055","元大MSCI金融","159738","3609835","22.70","22.70","22.54","22.55","-0.1900","279"
"0056","元大高股息","29201103","803042808","27.41","27.55","27.36","27.53","0.1200","17948"
"0057","富邦摩台","3882","335388","86.50","86.65","86.45","86.65","-0.2000","885"

安裝所需模組

[root@localhost ~]# pip install pandas
[root@localhost ~]# pip install pandasql

匯入模組

[root@localhost ~]# python3
Python 3.6.8 (default, Sep 10 2021, 09:13:53)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.

程式:

import pandas as pd
from pandasql import sqldf
df = pd.read_csv('tw_stock_20230221.csv')
query = "SELECT * FROM df WHERE 收盤價 > 1000"
sqldf(query)

程式說明:
匯入所需套件.

>>> import pandas as pd
>>> from pandasql import sqldf

透過 DataFrame 物件將 csv 檔案轉換過來.

>>> df = pd.read_csv('tw_stock_20230221.csv')
>>> df
      證券代號      證券名稱       成交股數         成交金額     開盤價     最高價     最低價     收盤價  漲跌價差    成交筆數
0     0050    元大台灣50  4772315.0  567003197.0  119.00  119.10  118.55  118.95 -0.05  9128.0
1     0051   元大中型100    19315.0    1086348.0   56.05   56.30   56.05   56.25  0.20   134.0
2     0052      富邦科技   251513.0   26328173.0  104.85  105.00  104.45  104.75 -0.15   423.0
3     0053      元大電子     8378.0     488691.0   58.60   58.60   58.25   58.60 -0.10   958.0
4     0055  元大MSCI金融   159738.0    3609835.0   22.70   22.70   22.54   22.55 -0.19   279.0
...    ...       ...        ...          ...     ...     ...     ...     ...   ...     ...
1175  9944        新麗    74472.0    1526030.0   20.55   20.55   20.35   20.35 -0.20    98.0
1176  9945       潤泰新  2535015.0  112377835.0   44.40   44.45   44.20   44.40  0.00  1939.0
1177  9946      三發地產    77156.0     928860.0   12.00   12.10   11.95   12.05  0.05    74.0
1178  9955        佳龍   229733.0    4771269.0   20.95   20.95   20.65   20.80 -0.15   200.0
1179  9958       世紀鋼  3969464.0  423086085.0  108.00  108.50  104.50  105.50 -1.00  3143.0

[1180 rows x 10 columns]

檢視資料欄位與型別.

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1180 entries, 0 to 1179
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   證券代號    1180 non-null   object
 1   證券名稱    1180 non-null   object
 2   成交股數    1169 non-null   float64
 3   成交金額    1169 non-null   float64
 4   開盤價     1167 non-null   float64
 5   最高價     1167 non-null   float64
 6   最低價     1167 non-null   float64
 7   收盤價     1167 non-null   float64
 8   漲跌價差    1180 non-null   float64
 9   成交筆數    1169 non-null   float64
dtypes: float64(8), object(2)
memory usage: 92.3+ KB

自訂 SQL 語法.

>>> query = "SELECT * FROM df WHERE 收盤價 > 1000"

透過 sqldf 物件執行剛剛設定的 SQL 語法(收盤價 > 1000)

>>> sqldf(query)
   證券代號    證券名稱       成交股數          成交金額     開盤價     最高價     最低價     收盤價  漲跌價差    成交筆數
0  1590  亞德客-KY   548937.0  5.980080e+08  1070.0  1100.0  1065.0  1090.0  35.0   616.0
1  3008     大立光   512586.0  1.153767e+09  2210.0  2270.0  2210.0  2260.0  70.0  2522.0
2  3443      創意  6632633.0  6.792996e+09  1010.0  1045.0   998.0  1035.0  40.0  6906.0
3  5269      祥碩  1844419.0  1.822878e+09   965.0  1010.0   950.0  1005.0  55.0  2896.0
4  6409      旭隼   135592.0  2.191790e+08  1590.0  1655.0  1575.0  1640.0  65.0   376.0
沒有解決問題,試試搜尋本站其他內容

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *

這個網站採用 Akismet 服務減少垃圾留言。進一步了解 Akismet 如何處理網站訪客的留言資料