google cloud dataflow - write window wise result on different file -


is possible write window wise result on different file. means, need append time file prefix or make time wise directory can access particular window result without additional filter. (as apache-spark)

the answer depends on whether using windowing in batch or streaming mode.

in streaming mode, cloud dataflow service doesn't support writing files at time. in case, you'd want use bigquery sink instead, support per-window sharding.

code example (see javadoc more details):

pcollection<tablerow> quotes = ...; quotes.apply(window.<tablerow>into(calendarwindows.days(1)))   .apply(bigqueryio.write      .named("write")      .withschema(schema)      .to(new serializablefunction<boundedwindow, string>() {        public string apply(boundedwindow window) {          // cast below safe because calendarwindows.days(1) produces intervalwindows.          string daystring = datetimeformat.forpattern("yyyy_mm_dd")               .withzone(datetimezone.utc)               .print(((intervalwindow) window).start());          return "my-project:output.output_table_" + daystring;        }      })); 

in batch mode, textio.write doesn't have convenience method ready such purpose, can implement similar without trouble. example, way accomplish via partition transform, outputs piped separate textio.write sinks.


Comments