识别值的连续出现

你可以：

df['consecutive'] = df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count

要得到：

   Count  consecutive0      1 11      0 02      1 23      1 24      0 05      0 06      1 37      1 38      1 39      0 0

在这里，您可以设置任何阈值：

threshold = 2df['consecutive'] = (df.consecutive > threshold).astype(int)

要得到：

   Count  consecutive0      1 01      0 02      1 13      1 14      0 05      0 06      1 17      1 18      1 19      0 0

或者，只需一步即可：

(df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count >= threshold).astype(int)

在效率方面，

pandas

当问题的规模变大时，使用方法可以显着提高速度：

 df = pd.concat([df for _ in range(1000)])%timeit (df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count >= threshold).astype(int)1000 loops, best of 3: 1.47 ms per loop

相比：

%%timeitl = []for k, g in groupby(df.Count):    size = sum(1 for _ in g)    if k == 1 and size >= 2:        l = l + [1]*size    else:        l = l + [0]*size    pd.Series(l)10 loops, best of 3: 76.7 ms per loop

识别值的连续出现

面试问答相关栏目本月热门文章