[python] numpy.histogra: binning 해서 CDF또는 PDF찍기

Struggler J. 2017. 8. 23. 01:40

For PDF,

import numpy as np

bins = np.linspace(0, 1, 100) # linear 하게 0 부터 1사이를 100등분해준다.

bin_data = np.histogram(data, bins)[0]/sum(np.histogram(data, bins)[0])

#histogram은 두 개의 array 를 출력하는데 첫 번째가 각 빈에 몇개의 데이터가 있는지 이고 두 번째가array에 bin의 대표값이 들어 간다.

For CDF,

import numpy as np

bins = np.linspace(0, 1, 100) # linear 하게 0 부터 1사이를 100등분해준다.

cdf = np.histogram(data, bins, weights=data)[0]/np.histogram(data, bins)[0]

근데! 비어있는 빈이 있는 경우엔 영으로 나눠져서 nan이 출력된다.

이게 짜증난다면

pdf = np.histogram(data, bins)[0]

zeroIDX = np.where(pdf==0)

cdf[zeroIDX] = 0

을 해주면 nan대신 0이 들어 간답니다 ㅎㅎ.

위의 계산은 numpy array에 지원되는 방법입니다.

어차피 binning할 때 np를 써서 결과가 다 numpy array라서 위와 같은 방법이 통하는 겁니다.