i'm running simple hmm using scikit-learn's hmmlearn
module. works observed data, fails when pass observations missing data. small example:
import numpy np import hmmlearn import hmmlearn.hmm hmm transmat = np.array([[0.9, 0.1], [0.1, 0.9]]) emitmat = np.array([[0.5, 0.5], [0.9, 0.1]]) # not work: cannot have missing data obs = np.array([0, 1] * 5 + [np.nan] * 5) # works #obs = np.array([0, 1] * 5 + [1] * 5) startprob = np.array([0.5, 0.5]) h = hmm.multinomialhmm(n_components=2, startprob=startprob, transmat=transmat) h.emissionprob_ = emitmat print obs, type(obs) posteriors = h.predict_proba(obs) print posteriors
if obs
observed (every element 0 or 1) works estimates unobserved data points. tried encoding these np.nan
or none
neither works. gives error indexerror: arrays used indices must of integer (or boolean) type
(in hmm.py", line 430, in _compute_log_likelihood
).
how can done in hmmlearn?
currently there's no way of doing missing data imputation using hmmlearn
.
as ad hoc approach can partition observation sequence observed subsequences , each subsequence either pick next state , observation or simulate them randomly transition , emission probabilities. note strategy can lead inconsistencies on subsequence boundaries.
Comments
Post a Comment