overview of i'm trying do. have encrypted versions of files need read pandas. couple of reasons better decrypt stream rather file, that's interest below although attempt decrypt file intermediate step (but isn't working).
i'm able working csv, not either hdf or stata (i'd accept answer works either hdf or stata, though answer might same both, why i'm combining in 1 question).
the code encrypting/decrypting files taken stackoverflow question (which can't find @ moment).
import pandas pd import io crypto import random crypto.cipher import aes def pad(s): return s + b"\0" * (aes.block_size - len(s) % aes.block_size) def encrypt(message, key, key_size=256): message = pad(message) iv = random.new().read(aes.block_size) cipher = aes.new(key, aes.mode_cbc, iv) return iv + cipher.encrypt(message) def decrypt(ciphertext, key): iv = ciphertext[:aes.block_size] cipher = aes.new(key, aes.mode_cbc, iv) plaintext = cipher.decrypt(ciphertext[aes.block_size:]) return plaintext.rstrip(b"\0") def encrypt_file(file_name, key): open(file_name, 'rb') fo: plaintext = fo.read() enc = encrypt(plaintext, key) open(file_name + ".enc", 'wb') fo: fo.write(enc) def decrypt_file(file_name, key): open(file_name, 'rb') fo: ciphertext = fo.read() dec = decrypt(ciphertext, key) open(file_name[:-4], 'wb') fo: fo.write(dec)
and here's attempt extend code decrypt stream rather file.
def decrypt_stream(file_name, key): open(file_name, 'rb') fo: ciphertext = fo.read() dec = decrypt(ciphertext, key) cipherbyte = io.bytesio() cipherbyte.write(dec) cipherbyte.seek(0) return cipherbyte
finally, here's sample program sample data attempting make work:
key = 'this example key'[:16] df = pd.dataframe({ 'x':[1,2], 'y':[3,4] }) df.to_csv('test.csv',index=false) df.to_hdf('test.h5','test',mode='w') df.to_stata('test.dta') encrypt_file('test.csv',key) encrypt_file('test.h5',key) encrypt_file('test.dta',key) decrypt_file('test.csv.enc',key) decrypt_file('test.h5.enc',key) decrypt_file('test.dta.enc',key) # csv works here hdf , stata don't # i'm less interested in part include completeness df_from_file = pd.read_csv('test.csv') df_from_file = pd.read_hdf('test.h5','test') df_from_file = pd.read_stata('test.dta') # csv works here hdf , stata don't # hdf , stata lines below need working df_from_stream = pd.read_csv( decrypt_stream('test.csv.enc',key) ) df_from_stream = pd.read_hdf( decrypt_stream('test.h5.enc',key), 'test' ) df_from_stream = pd.read_stata( decrypt_stream('test.dta.enc',key) )
unfortunately don't think can shrink code anymore , still have complete example.
again, hope have 4 non-working lines above working (file , stream hdf , stata) i'm happy accept answer works either hdf stream alone or stata stream alone.
also, i'm open other encryption alternatives, used existing pycrypto-based code found here on so. work explicitly requires 256-bit aes beyond i'm open solution needn't based on pycrypto library or specific code example above.
info on setup:
python: 3.4.3 pandas: 0.17.0 (anaconda 2.3.0 distribution) mac os: 10.11.3
the biggest issue padding/unpadding method. assumes null character can't part of actual content. since stata/hdf
files binary, it's safer pad using number of bytes use, encoded character. number used during unpadding.
also time being, read_hdf
doesn't support reading file object, if api documentation claims so. if restrict ourselves stata
format, following code perform need:
import pandas pd import io crypto import random crypto.cipher import aes def pad(s): n = aes.block_size - len(s) % aes.block_size return s + n * chr(n) def unpad(s): return s[:-ord(s[-1])] def encrypt(message, key, key_size=256): message = pad(message) iv = random.new().read(aes.block_size) cipher = aes.new(key, aes.mode_cbc, iv) return iv + cipher.encrypt(message) def decrypt(ciphertext, key): iv = ciphertext[:aes.block_size] cipher = aes.new(key, aes.mode_cbc, iv) plaintext = cipher.decrypt(ciphertext[aes.block_size:]) return unpad(plaintext) def encrypt_file(file_name, key): open(file_name, 'rb') fo: plaintext = fo.read() enc = encrypt(plaintext, key) open(file_name + ".enc", 'wb') fo: fo.write(enc) def decrypt_stream(file_name, key): open(file_name, 'rb') fo: ciphertext = fo.read() dec = decrypt(ciphertext, key) cipherbyte = io.bytesio() cipherbyte.write(dec) cipherbyte.seek(0) return cipherbyte key = 'this example key'[:16] df = pd.dataframe({ 'x': [1,2], 'y': [3,4] }) df.to_stata('test.dta') encrypt_file('test.dta', key) print pd.read_stata(decrypt_stream('test.dta.enc', key))
output:
index x y 0 0 1 3 1 1 2 4
in python 3 can use following pad
, unpad
versions:
def pad(s): n = aes.block_size - len(s) % aes.block_size return s + bytearray([n] * n) def unpad(s): return s[:-s[-1]]
Comments
Post a Comment