博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Git-pandas v1.0.0,或如何检查稳定版本
阅读量:2519 次
发布时间:2019-05-11

本文共 5161 字,大约阅读时间需要 17 分钟。

In the process of making the v1.0.0 release of , I had one primary goal: to simplify and solidify the interface to git-pandas objects (the ProjectDirectory and the Repository).  At the end of the day, the usefulness of a project like git-pandas versus one off analysis or rolling your own interface is consistent and predictable interfaces to commonly used functions.

在制作 v1.0.0版本的过程中,我有一个主要目标:简化和巩固git-pandas对象(ProjectDirectory和Repository)的接口。 归根结底,像git-pandas之类的项目与一次性分析或滚动自己的界面相比,其有用之处是与常用功能保持一致且可预测的界面。

So with that in mind, I was interested in the various input parameters for the functions.  What I wanted to avoid was something like:

因此,我对函数的各种输入参数感兴趣。 我想避免的是这样的:

df = repo.functionA(file_extension='py')df = repo.functionB(file_ext='py')

So to quickly get an idea of where things stood, I looked to the in the standard python library.  With this, we can load git-pandas into memory, find all of the classes in it, and get a dictionary of the arguments to each function.  I’ve actually left this script in the git-pandas repo , if you’d like to use it, but let’s dig into it here:

因此,为了快速了解情况,我查看了标准python库中的 。 这样,我们可以将git-pandas加载到内存中,找到其中的所有类,并获取每个函数的参数字典。 其实我已经在混帐大熊猫离开了这个脚本回购 ,如果你想使用它,但让我们钻进去在这里:

The first step is to extract the objects (classes and functions are really what we are interested in here) from the module.  In our case, we are just looking at those objects directly importable via “from gitpandas import foo”.

第一步是从模块中提取对象(类和函数确实是我们感兴趣的对象)。 在我们的例子中,我们只是在看那些可以通过“ from gitpandas import foo”直接导入的对象。

def extract_objects(m, classes=True, functions=False):    # add in the classes at this level    out = {}    if classes:        m_dict = {k: v for k, v in m.__dict__.items() if inspect.isclass(v)}        out.update(m_dict)    if functions:        m_dict = {k: v for k, v in m.__dict__.items() if inspect.isfunction(v)}        out.update(m_dict)    return out

Here we use the dict attribute of the module to iterate through the objects stored in it, checking if they are classes or functions, and shoving them into a dictionary for further analysis.

在这里,我们使用模块的dict属性来遍历存储在模块中的对象,检查它们是类还是函数,并将它们放入字典中以进行进一步分析。

Next we need to find the arguments for each function, or function in a class:

接下来,我们需要找到每个函数或类中函数的参数:

def get_signatures(m, remove_self=True):    if remove_self:        excludes = ['self']    else:        excludes = []    out = {}    for key in m.keys():        try:            for k, v in m[key].__dict__.items():                try:                    out[str(key) + '.' + k] = [x for x in list(inspect.getargspec(v).args) if x not in excludes]                except:                    pass        except:            out[key] = [x for x in list(inspect.getargspec(m[key]).args) if x not in excludes]    return out

To denote class methods, we use class.method notation.  Optionally, we can exclude the ‘self’ parameter which is convention to use as the instance variable in class methods.

为了表示类方法,我们使用class.method表示法。 可选地,我们可以排除“ self”参数,该参数习惯上用作类方法中的实例变量。

Finally, we can take this dictionary of functions and arguments, and find the unique set of arguments for the module.

最后,我们可以使用此函数和参数的字典,并为模块找到唯一的参数集。

def get_distinct_params(m):    out = set()    for k in m.keys():        out.update(m[k])    return out

So pulling these three together, we can just do:

因此,将这三个结合在一起,我们可以做:

sigs = get_signatures(extract_objects(module))print(get_distinct_params(sigs))

And find out that git-pandas has only a handful of possible arguments:

并发现git-pandas只有少数可能的参数:

  1. extensions: a list of file extensions to analyze
  2. by: a categorical option for how to aggregate or pivot a dataframe (e.g.: by author or by project)
  3. branch: the git branch to analyze
  4. limit: a max number of rows to return
  5. verbose: whether or not to log out detailed information
  6. filename: a specific file to analyze
  7. committer: a boolean for whether to perform analysis on the committer (as opposed to author)
  8. working_dir: the directory your repository or repositories are in
  9. ignore_dir: a list of directories to ignore in the analysis
  10. coverage: a boolean for whether or not to include  coverage data in a resultset
  11. skip: an integer of rows to skip in the return set (so limit 10, skip 2 would return rows 0, 2, 4, 6, … and 18)
  12. num_datapoints: a total number of datapoints (evenly spaced across the whole dataset) to return
  13. normalize: boolean for whether to return normalized or absolute values
  14. ignore_repos: which repositories to ignore when assembling a ProjectDirectory
  15. days: the number of days of data to return (since now)
  16. rev: the specific revision to analyze
  1. 扩展名:要分析的文件扩展名列表
  2. 作者:关于如何汇总或旋转数据框的分类选项(例如:按作者或按项目)
  3. 分支:要分析的git分支
  4. 限制:要返回的最大行数
  5. 详细:是否注销详细信息
  6. 文件名:要分析的特定文件
  7. committer:是否对提交者执行分析的布尔值(与作者相对)
  8. working_dir:您的存储库所在的目录
  9. ignore_dir:分析中要忽略的目录列表
  10. coverage:是否在结果集中包含coverage数据的布尔值
  11. skip:返回集中要跳过的行的整数(因此限制10,跳过2将返回行0、2、4、6,…和18)
  12. num_datapoints:要返回的数据点总数(在整个数据集中均匀分布)
  13. normalize:布尔值,用于返回归一化值还是绝对值
  14. ignore_repos:组装ProjectDirectory时要忽略的存储库
  15. days:返回数据的天数(从现在开始)
  16. rev:要分析的具体修订

Going forward the aim will continue to be keeping this list short and logical while growing the functionality.

向前迈进的目标将是在增加功能的同时,使这份清单简短合理。

 

So check out the new release on PyPI, or at the source:

因此,请在PyPI上或在源代码处查看新版本:

翻译自:

转载地址:http://rrqwd.baihongyu.com/

你可能感兴趣的文章
经典排序算法回顾:选择排序,快速排序
查看>>
BZOJ2213 [Poi2011]Difference 【乱搞】
查看>>
c# 对加密的MP4文件进行解密
查看>>
AOP面向切面编程C#实例
查看>>
Win form碎知识点
查看>>
避免使用不必要的浮动
查看>>
第一节:ASP.NET开发环境配置
查看>>
sqlserver database常用命令
查看>>
rsync远程同步的基本配置与使用
查看>>
第二天作业
查看>>
访问属性和访问实例变量的区别
查看>>
Spring MVC 异常处理 - SimpleMappingExceptionResolver
查看>>
props 父组件给子组件传递参数
查看>>
【loj6038】「雅礼集训 2017 Day5」远行 树的直径+并查集+LCT
查看>>
十二种获取Spring的上下文环境ApplicationContext的方法
查看>>
UVA 11346 Probability 概率 (连续概率)
查看>>
linux uniq 命令
查看>>
Openssl rand命令
查看>>
HDU2825 Wireless Password 【AC自动机】【状压DP】
查看>>
BZOJ1015: [JSOI2008]星球大战starwar【并查集】【傻逼题】
查看>>