python - missing data in hmmlearn from scikit-learn -


i'm running simple hmm using scikit-learn's hmmlearn module. works observed data, fails when pass observations missing data. small example:

import numpy np import hmmlearn import hmmlearn.hmm hmm  transmat = np.array([[0.9, 0.1],                      [0.1, 0.9]]) emitmat = np.array([[0.5, 0.5],                     [0.9, 0.1]])  # not work: cannot have missing data obs = np.array([0, 1] * 5 + [np.nan] * 5)  # works #obs = np.array([0, 1] * 5 + [1] * 5)  startprob = np.array([0.5, 0.5]) h = hmm.multinomialhmm(n_components=2,                        startprob=startprob,                        transmat=transmat) h.emissionprob_ = emitmat print obs, type(obs) posteriors = h.predict_proba(obs) print posteriors 

if obs observed (every element 0 or 1) works estimates unobserved data points. tried encoding these np.nan or none neither works. gives error indexerror: arrays used indices must of integer (or boolean) type (in hmm.py", line 430, in _compute_log_likelihood).

how can done in hmmlearn?

currently there's no way of doing missing data imputation using hmmlearn.

as ad hoc approach can partition observation sequence observed subsequences , each subsequence either pick next state , observation or simulate them randomly transition , emission probabilities. note strategy can lead inconsistencies on subsequence boundaries.


Comments