merlin.batch package¶
Submodules¶
merlin.batch.big_query_util module¶
- merlin.batch.big_query_util.valid_column(column_name: str) bool[source]¶
Validate BigQuery column name
- Parameters
column_name – BigQuery column name
- Returns
boolean
Rules based on this page https://cloud.google.com/bigquery/docs/schemas#column_names * A column name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_) * It must start with a letter or underscore * Maximum length 128
- merlin.batch.big_query_util.valid_columns(columns) bool[source]¶
Validate multiple BiqQuery columns
- Parameters
columns – List of columns
- Returns
boolean
- merlin.batch.big_query_util.valid_dataset(dataset: str) bool[source]¶
Validate BigQuery dataset name
- Parameters
dataset – BigQuery dataset name
- Returns
boolean
Rules based on this page https://cloud.google.com/bigquery/docs/datasets#dataset-naming * May contain up to 1,024 characters * Can contain letters (upper or lower case), numbers, and underscores
- merlin.batch.big_query_util.valid_table_id(table_id: str) bool[source]¶
Validate BigQuery source_table which satisfied this format project_id.dataset.table
- Parameters
table_id – Source table
- Returns
boolean
- merlin.batch.big_query_util.valid_table_name(table_name: str) bool[source]¶
Validate BigQuery table name
- Parameters
table_name – BigQuery table name
- Returns
boolean
Rules based on this page https://cloud.google.com/bigquery/docs/tables#table_naming * A table name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_) * Maximum length 1024
- merlin.batch.big_query_util.validate_text(text: str, pattern: str, max_length: int) bool[source]¶
Validate text based on regex pattern and maximum length allowed
- Parameters
text – Text to validate
pattern – Regular expression pattern to validate text
max_length – Maximum length allowed
- Returns
boolean
merlin.batch.config module¶
- class merlin.batch.config.PredictionJobConfig(source: Source, sink: Sink, service_account_name: str, result_type: ResultType = ResultType.DOUBLE, item_type: ResultType = ResultType.DOUBLE, resource_request: Optional[PredictionJobResourceRequest] = None, env_vars: Optional[Dict[str, str]] = None)[source]¶
Bases:
object- __init__(source: Source, sink: Sink, service_account_name: str, result_type: ResultType = ResultType.DOUBLE, item_type: ResultType = ResultType.DOUBLE, resource_request: Optional[PredictionJobResourceRequest] = None, env_vars: Optional[Dict[str, str]] = None)[source]¶
Create configuration for starting a prediction job
- Parameters
source – source configuration. See merlin.batch.source package.
sink – sink configuration. See merlin.batch.sink package
service_account_name – secret name containing the service account for executing the prediction job.
result_type – type of the prediction result (default to ResultType.DOUBLE).
item_type – item type of the prediction result if the result_type is ResultType.ARRAY. Otherwise will be ignored.
resource_request – optional resource request for starting the prediction job. If not given the system default will be used.
env_vars – optional environment variables in the form of a key value pair in a list.
- property env_vars: Optional[Dict[str, str]]¶
- property item_type: ResultType¶
- property resource_request: Optional[PredictionJobResourceRequest]¶
- property result_type: ResultType¶
- property service_account_name: str¶
- class merlin.batch.config.PredictionJobResourceRequest(driver_cpu_request: str, driver_memory_request: str, executor_cpu_request: str, executor_memory_request: str, executor_replica: int)[source]¶
Bases:
objectResource request configuration for starting prediction job
- __init__(driver_cpu_request: str, driver_memory_request: str, executor_cpu_request: str, executor_memory_request: str, executor_replica: int)[source]¶
Create resource request object
- Parameters
driver_cpu_request – driver’s cpu request in kubernetes request format (e.g. : 500m, 1, 2, etc)
driver_memory_request – driver’s memory request in kubernetes format (e.g.: 512Mi, 1Gi, 2Gi, etc)
executor_cpu_request – executors’s cpu request in kubernetes request format (e.g. : 500m, 1, 2, etc)
executor_memory_request – executors’s memory request in kubernetes format (e.g.: 512Mi, 1Gi, 2Gi, etc)
executor_replica – number of executor to be used
merlin.batch.job module¶
merlin.batch.sink module¶
- class merlin.batch.sink.BigQuerySink(table: str, staging_bucket: str, result_column: str, save_mode: SaveMode = SaveMode.ERRORIFEXISTS, options: Optional[MutableMapping[str, str]] = None)[source]¶
Bases:
SinkSink contract for BigQuery to create prediction job
- __init__(table: str, staging_bucket: str, result_column: str, save_mode: SaveMode = SaveMode.ERRORIFEXISTS, options: Optional[MutableMapping[str, str]] = None)[source]¶
- Parameters
table – table id of destination BQ table in format gcp-project.dataset.table_name
staging_bucket – temporary GCS bucket for staging write into BQ table
result_column – column name that will be used to store prediction result.
save_mode – save mode. Default to SaveMode.ERRORIFEXISTS. Which will fail if destination table already exists
options – additional sink option to configure the prediction job.
- property options: Optional[MutableMapping[str, str]]¶
- property result_column: str¶
- property staging_bucket: str¶
- property table: str¶
merlin.batch.source module¶
- class merlin.batch.source.BigQuerySource(table: str, features: Iterable[str], options: Optional[MutableMapping[str, str]] = None)[source]¶
Bases:
SourceSource contract for BigQuery to create prediction job
- __init__(table: str, features: Iterable[str], options: Optional[MutableMapping[str, str]] = None)[source]¶
- Parameters
table – table id if the source in format of gcp-project.dataset.table_name
features – list of features to be used for prediction, it has to match the column name in the source table.
options – additional option to configure source.
- property features: Iterable[str]¶
- property options: Optional[MutableMapping[str, str]]¶
- property table: str¶