标签: Module - 思考者

Python模块-pandas.DataFrame转Excel格式的bytes数据

背景

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

数据分析工具包 pandas，是利用 Python 进行数据分析时，一个难以割舍的选项。它的官方文档 pandas/docs 示例非常丰富，而且它的作者 Wes McKinney 另外还写了一本书 Python for Data Analysis，截至目前已经更新到第 3 版，👆点击下载。
中国国内也有热心网友做了免费的翻译分享，这是第 2 版，简书-《Python数据分析》2nd。
Python for Data Analysis, 3rd Edition

我曾经实现过一个单机部署的 ETL (extract, transform, load) 程序，三个步骤都基于 pandas 实现。不过这个程序最常用的功能，却仅仅是定时读取一批 SQL，然后写入 Excel，最后把这些 Excel 文件作为邮件附件进行发送，真是杀鸡用牛刀😂。

其实，我个人并不太喜欢使用 Excel 文件，先不论 Excel 那缓慢的打开速度，它的一个工作表最多也只能有 1048576（2^20）行和 16384（2^14）列，工作表名字最多 31 个字符。所以在我看来，CSV 才是更好的选项。

在那个 ETL 中，只允许从每个源读取一个 pandas 的 DataFrame，但在输出时，可以把多个 DataFrame 输出到一个目标里面。对于一种这类情况，即把多个 DataFrame 输出到同一个 Excel 工作簿，如果这个目标之后再作为一个源时，就不好处理了。如果这是最终的输出，而不是管道的一个中间环节，却是可以接受的。

Excel 最大的问题是工作表的规模有限，如果你的表格的规格超出 1048576×16384，也就是这个矩形不能把你数据表格完全盖住，你就得对 DataFrame 进行拆分。一般来说，我个人是不推荐做拆分的。我的建议是，一个工作簿，只开一个工作表，如果一个工作表存储不下，那就用 CSV 或者 hdf5 格式。

Python模块-打包多个参数于一体

背景

在调用 Python 函数 (See also realpython - Defining Your Own Python Function) 的时候，往往需要传入一些参数。函数所需的参数被称为形式参数 parameter，传入的参数被称为实际参数 argument (See also 5 Types of Arguments in Python Function Definitions)。

See also the FAQ question on the difference between arguments and parameters, the inspect.Parameter class, the Function definitions section, and PEP 362.

我常常需要把一些实际参数收集起来，可能后续还要进行一些更新，并在需要的时候反复使用。有一种办法是使用偏函数 functools.partial，但这需要绑定具体的函数。

>>> from functools import partial
>>> basetwo = partial(int, base=2)
>>> basetwo.__doc__ = 'Convert base 2 string to an int.'
>>> basetwo('10010')
18
>>> basethree = partial(basetwo, base=3)
>>> basethree.__doc__ = 'Convert base 3 string to an int.'
>>> basethree('10010')
84

于是我实现了一个类 Args，它可以一次性收集一些位置参数(positional argument，See also stackoverflow - Understanding positional arguments in Python)和关键字参数(keyword argument，See also realpython - Python args and kwargs: Demystified)，并在以后需要时，反复直接使用。

>>> from args import UpdativeArgs
>>> args = UpdativeArgs('10010', base=2)
>>> args(int)
18
>>> args.update(base=3)
>>> args(int)
84

2022-08-27

Python脚本作为配置文件加载

脚本工具

Python模块-pandas.DataFrame转Excel格式的bytes数据

背景

Python模块-打包多个参数于一体

背景

链接

分类

标签云

最新文章

归档

标签

最新文章

归档

标签

Your browser is out-of-date!