python - How to match fields from two lists and further filter based upon the values in subsequent fields? -
edit: question answered on reddit. here link if interested in answer problem https://www.reddit.com/r/learnpython/comments/42ibhg/how_to_match_fields_from_two_lists_and_further/
i attempting pos , alt strings file1 match in file2, simple. however, file2 has values in 17th split element/column last element/column (340th) contains string such 1/1:1.2.2:51:12 want filter for.
i want extract rows file2 contain/match pos , alt file1. thereafter, want further filter matched results contain values in 17th split element/column onwards. values have split ":" can filter split[0] = "1/1" , split[2] > 50. problem have no idea how this.
i imagine have iterate on these , split not sure how code presently in loop , values want filter in columns not rows.
any advice appreciated, have sat problem since friday , have yet find solution.
import os,itertools,re file1 = open("file1.txt","r") file2 = open("file2.txt","r") matched = [] (x),(y) in itertools.product(file2,file1): if not x.startswith("#"): cells_y = y.split("\t") pos_y = cells[0] alt_y = cells[3] cells_x = x.split("\t") pos_x = cells_x[0]+":"+cells_x[1] alt_x = cells_x[4] if pos_y in pos_x , alt_y in alt_x: matched.append(x) z in matched: cells_z = z.split("\t") if cells_z[16:len(cells_z)]:
your requirement not clear, might mean this:
for (x),(y) in itertools.product(file2,file1): if x.startswith("#"): continue cells_y = y.split("\t") pos_y = cells[0] alt_y = cells[3] cells_x = x.split("\t") pos_x = cells_x[0]+":"+cells_x[1] alt_x = cells_x[4] if pos_y != pos_x: continue if alt_y != alt_x: continue extra_match = false f in range(17, 341): y_extra = y[f].split(':') if y_extra[0] != '1/1': continue if y_extra[2] <= 50: continue extra_match = true break if not extra_match: continue xy = x + y matched.append(xy)
i chose concatenate x , y matched array, since wasn't sure whether or not want data. if not, feel free go appending x or y.
Comments
Post a Comment