先发制人2026
72.36M · 2026-03-22
pyarrow 是一个用于处理大规模列式数据的高性能 Python 库。 它可以帮助你:
pyarrow 广泛应用于以下实际场景:
pip install pyarrow
# 如果安装慢的话,推荐使用国内镜像源
pip install pyarrow -i
将一个简单的Pandas DataFrame转换为Arrow Table,并检查其数据类型。
import pandas as pd
import pyarrow as pa
# 创建一个简单的Pandas DataFrame
data = {'col1': [1, 2, 3, 4], 'col2': ['A', 'B', 'C', 'D']}
df = pd.DataFrame(data)
print("Original Pandas DataFrame:")
print(df)
print("nDataFrame types:")
print(df.dtypes)
# 将Pandas DataFrame转换为pyarrow Table
arrow_table = pa.Table.from_pandas(df)
print("nConverted pyarrow Table:")
print(arrow_table)
# 检查arrow_table的schema
print("npyarrow Table Schema:")
print(arrow_table.schema)
# 示例:根据条件处理数据
# 如果第一个字段是int64,则打印提示
if arrow_table.schema[0].type == pa.int64():
print(f"nTip: The first column '{arrow_table.schema[0].name}' is indeed an int64 type.")
else:
print(f"nTip: The first column '{arrow_table.schema[0].name}' is not an int64 type. It's {arrow_table.schema[0].type}.")
# 示例2: 检查表格中是否有超过两列
if len(arrow_table.column_names) > 2:
print("nThis Arrow Table has more than two columns.")
else:
print("nThis Arrow Table has two columns or fewer.")
使用 PythonRun 在线运行这段代码,结果如下:
Original Pandas DataFrame:
col1 col2
0 1 A
1 2 B
2 3 C
3 4 D
DataFrame types:
col1 int64
col2 object
dtype: object
Converted pyarrow Table:
pyarrow.Table
col1: int64
col2: string
----
col1: [[1,2,3,4]]
col2: [["A","B","C","D"]]
pyarrow Table Schema:
col1: int64
col2: string
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 490
Tip: The first column 'col1' is indeed an int64 type.
This Arrow Table has two columns or fewer.
使用 Mermaid在线编辑器 绘制示例代码的流程图,结果如下: