python - String encoding of win32print results -


i have non-literal string programmatically obtained title of printed document online.

when try commit mongodb, get:

bson.errors.invalidstringdata: strings in documents must valid utf-8: 'wxpython: windows styles , events hunter \xab mouse vs. python' 

string retrieval code:

for printstats in printers:      handle = win32print.openprinter(printstats[2])     queued = win32print.enumjobs(handle, 0, -1, 1)      printjob in queued:          username = printjob['pusername']         computer = printjob['pmachinename']         document = printjob['pdocument']         identity = printjob['jobid']         jobstate = printjob['status']  print document > "wxpython: windows styles , events hunter « mouse vs. python" 

from comments in other answers, can see error is:

bson.errors.invalidstringdata: strings in documents must valid utf-8: 'wxpython: windows styles , events hunter \xab mouse vs. python' 

as « encoded \xab, means string encoded in iso-8995-1, iso-8995-15, windows-1252/latin-1. related locale of machine.

you need decode before passing mongodb, supports unicode strings (it not limited ascii assert):

document = printjob['pdocument'].decode("latin-1")  >>> print type(document) <type 'unicode'> 

you can pass document python mongodb driver.

to make code portable, can use codec alias mbcs (in place of 'latin-1'). mbcs automatically translated configured windows locale (thanks @roeland)


Comments