Efficient way to check high dimensional arrays are overlapped in two ndarray in Python -


for example, have 2 ndarrays, shape of train_dataset (10000, 28, 28) , shape of val_dateset (2000, 28, 28).

except using iterations, there efficient way use numpy array functions find overlap between 2 ndarrays?

memory permitting use broadcasting, -

val_dateset[(train_dataset[:,none] == val_dateset).all(axis=(2,3)).any(0)] 

sample run -

in [55]: train_dataset out[55]:  array([[[1, 1],         [1, 1]],         [[1, 0],         [0, 0]],         [[0, 0],         [0, 1]],         [[0, 1],         [0, 0]],         [[1, 1],         [1, 0]]])  in [56]: val_dateset out[56]:  array([[[0, 1],         [1, 0]],         [[1, 1],         [1, 1]],         [[0, 0],         [0, 1]]])  in [57]: val_dateset[(train_dataset[:,none] == val_dateset).all(axis=(2,3)).any(0)] out[57]:  array([[[1, 1],         [1, 1]],         [[0, 0],         [0, 1]]]) 

if elements integers, collapse every block of axis=(1,2) in input arrays scalar assuming them linearly index-able numbers , efficiently use np.in1d or np.intersect1d find matches.


Comments