Skip to content

getml_mlflow.autolog

autolog(
    *,
    log_data_information: bool = True,
    log_data_as_artifact: bool = True,
    log_function_parameters: bool = True,
    log_function_return: bool = True,
    log_function_as_trace: bool = True,
    log_pipeline_parameters: bool = True,
    log_pipeline_tags: bool = True,
    log_pipeline_scores: bool = True,
    log_pipeline_features: bool = True,
    log_pipeline_columns: bool = True,
    log_pipeline_targets: bool = True,
    log_pipeline_data_model: bool = True,
    log_pipeline_as_artifact: bool = True,
    log_system_metrics: bool = True,
    disable: bool = False,
    silent: bool = False,
    create_runs: bool = True,
    extra_tags: Optional[Dict[str, str]] = None,
    getml_project_path: Optional[Path] = None,
    tracking_uri: Optional[str] = None
) -> None

Enable automatic logging of getML operations to MLflow.

This function enables automatic logging of the following operations to MLflow:

  • pipeline creation, loading and operations (fit, score, predict, transform)
  • project setting and switching.

Pipeline parameters, performance metrics, dataframe metadata, and other relevant information are captured and displayed in the MLflow UI. It also allows logging of dataframes passed as a function parameter or returned by a function as artifacts.

The artifacts are stored in artifacts-destination set when running mlflow ui command. The default is artifacts directory in the current working directory.

In the UI, getML pipelines correspond to MLflow runs, functions correspond to sub-runs, and projects correspond to experiments.

For a detailed introduction on this MLflow integration, including setup, working with artifact pipelines, and more, please refer to our Tracking with MLflow guide. The guide provides examples and configuration options to help you get the most from it.

PARAMETER DESCRIPTION
log_data_information

Whether to log metadata about a Dataframe or View (e.g., number of rows & columns, column names, roles).

The roles are indicated with the following emojis in the MLflow UI:

  • ๐Ÿ—ƒ for categorical columns
  • ๐Ÿ”— for join keys
  • ๐Ÿ”ข for numerical columns
  • ๐ŸŽฏ for target column(s)
  • ๐Ÿ“ for text columns
  • โฐ for timestamp columns
  • ๐Ÿงฎ for unused float columns
  • ๐Ÿงต for unused string columns

TYPE: bool DEFAULT: True

log_data_as_artifact

Whether to log a DataFrame, View or Subset function parameter as a .parquet artifact. In addition, it allows logging of DataFrame returned by functions. In MLflow UI, the artifacts are available to download. log_function_parameters or log_function_return must be True for this to work.

TYPE: bool DEFAULT: True

log_function_parameters

Whether to log parameters passed to getML functions, e.g., pipe.fit() in the MLflow UI. To log the DataFrame, View or Subset function parameters as artifacts, log_data_as_artifact=True must also be set to True.

TYPE: bool DEFAULT: True

log_function_return

Whether to log return values of getML functions as artifacts. For example, it enables logging of DataFrame (as .parquet) and numpy.ndarray (as .npy) returned by transform() or predict() methods. log_data_as_artifact must also be True for DataFrame logging.

TYPE: bool DEFAULT: True

log_function_as_trace

Whether to log function calls as MLflow traces for detailed execution flow.

TYPE: bool DEFAULT: True

log_pipeline_parameters

Whether to log parameters of a pipeline.

TYPE: bool DEFAULT: True

log_pipeline_tags

Whether to log tags of a pipeline.

TYPE: bool DEFAULT: True

log_pipeline_scores

Whether to log scores (metrics) of a pipeline.

TYPE: bool DEFAULT: True

log_pipeline_features

Whether to log features learned during pipeline fitting.

TYPE: bool DEFAULT: True

log_pipeline_columns

Whether to log columns (whose importance can be calculated) of a pipeline.

TYPE: bool DEFAULT: True

log_pipeline_targets

Whether to log targets of a pipeline.

TYPE: bool DEFAULT: True

log_pipeline_data_model

Whether to log the data model provided in the pipeline. It is available as an HTML artifact to view or download.

TYPE: bool DEFAULT: True

log_pipeline_as_artifact

Whether to save pipelines as MLflow artifacts.

Docker configuration, download_artifact_pipeline() and switch_to_artifact_pipeline()

When using this parameter with Docker, you'll need to set up proper bind mounts to allow pipeline artifact logging. For detailed instructions on working with artifact pipelines, Docker configurations, and related functions like download_artifact_pipeline() and switch_to_artifact_pipeline(), please refer to the Tracking with MLflow guide.

TYPE: bool DEFAULT: True

log_system_metrics

Whether to log system metrics (CPU, memory usage) during pipeline fitting. Metrics are available for getML Enterprise only.

TYPE: bool DEFAULT: True

disable

If True, disables all getML autologging.

TYPE: bool DEFAULT: False

silent

If True, suppresses all informational logging messages.

TYPE: bool DEFAULT: False

create_runs

If True, creates new MLflow runs automatically when logging. You may set it to False and log under your own run. For example:

import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("your_experiment_name")
with mlflow.start_run(run_name="your_run_name"):
    pipe.fit(container.train)

TYPE: bool DEFAULT: True

extra_tags

Additional custom tags to log with each MLflow run.

TYPE: Dict[str, str] DEFAULT: None

getml_project_path

Path to the getML projects directory. Used for accessing and logging pipeline artifacts when log_pipeline_as_artifact=True. If not provided, defaults to $HOME/.getML/projects.

TYPE: Path DEFAULT: None

tracking_uri

MLflow tracking server URI. If not provided, uses http://localhost:5000.

TYPE: str DEFAULT: None

Examples:

Basic usage with default settings:

import getml
import getml_mlflow
getml_mlflow.autolog()
# Subsequent getML pipeline operations will be logged to MLflow

Custom configuration:

getml_mlflow.autolog(
    log_pipeline_as_artifact=True,
    log_system_metrics=False,
    tracking_uri="http://localhost:5000"
)

Source code in getml_mlflow/autologging.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
@autologging_integration(FLAVOR_NAME)
def autolog(
    *,
    log_data_information: bool = True,
    log_data_as_artifact: bool = True,
    log_function_parameters: bool = True,
    log_function_return: bool = True,
    log_function_as_trace: bool = True,
    log_pipeline_parameters: bool = True,
    log_pipeline_tags: bool = True,
    log_pipeline_scores: bool = True,
    log_pipeline_features: bool = True,
    log_pipeline_columns: bool = True,
    log_pipeline_targets: bool = True,
    log_pipeline_data_model: bool = True,
    log_pipeline_as_artifact: bool = True,
    log_system_metrics: bool = True,
    disable: bool = False,
    silent: bool = False,
    create_runs: bool = True,
    extra_tags: Optional[Dict[str, str]] = None,
    getml_project_path: Optional[Path] = None,
    tracking_uri: Optional[str] = None,
) -> None:
    """Enable automatic logging of getML operations to MLflow.

    This function enables automatic logging of the following operations to MLflow:

    - pipeline creation, loading and operations (fit, score, predict, transform)
    - project setting and switching.

    Pipeline parameters, performance metrics, dataframe metadata, and other relevant
    information are captured and displayed in the MLflow UI. It also allows logging of
    dataframes passed as a function parameter or returned by a function as artifacts.

    The artifacts are stored in `artifacts-destination` set when running
    `mlflow ui` command. The default is `artifacts` directory in the current
    working directory.

    In the UI, getML pipelines correspond to MLflow runs, functions correspond to
    sub-runs, and projects correspond to experiments.

    For a detailed introduction on this MLflow integration, including setup, working with
    artifact pipelines, and more, please refer to our
    [Tracking with MLflow][mlflow-integration-guide] guide. The guide provides examples
    and configuration options to help you get the most from it.

    Args:
        log_data_information (bool, optional): Whether to log metadata about
            a `Dataframe` or `View` (e.g., number of rows & columns, column names, roles).

            The [`roles`][getml.data.roles] are indicated with the
            following emojis in the MLflow UI:

            - ๐Ÿ—ƒ for categorical columns
            - ๐Ÿ”— for join keys
            - ๐Ÿ”ข for numerical columns
            - ๐ŸŽฏ for target column(s)
            - ๐Ÿ“ for text columns
            - โฐ for timestamp columns
            - ๐Ÿงฎ for unused float columns
            - ๐Ÿงต for unused string columns

        log_data_as_artifact (bool, optional): Whether to log a `DataFrame`,
            `View` or `Subset` function parameter as a `.parquet` artifact. In addition,
            it allows logging of `DataFrame` returned by functions. In MLflow UI, the
            artifacts are available to download. `log_function_parameters` or
            `log_function_return` must be `True` for this to work.

        log_function_parameters (bool, optional): Whether to log parameters passed to
            getML functions, e.g., pipe.fit() in the MLflow UI. To log the `DataFrame`,
            `View` or `Subset` function parameters as artifacts,
            `log_data_as_artifact=True` must also be set to `True`.

        log_function_return (bool, optional): Whether to log return values of getML
            functions as artifacts. For example, it enables logging of `DataFrame`
            (as `.parquet`) and `numpy.ndarray` (as `.npy`) returned by `transform()`
            or `predict()` methods. `log_data_as_artifact` must also be `True`
            for `DataFrame` logging.

        log_function_as_trace (bool, optional): Whether to log function calls as MLflow
            traces for detailed execution flow.

        log_pipeline_parameters (bool, optional): Whether to log
            [`parameters`][getml.pipeline.Pipeline] of a pipeline.

        log_pipeline_tags (bool, optional): Whether to log
            [`tags`][getml.pipeline.Pipeline] of a pipeline.

        log_pipeline_scores (bool, optional): Whether to log [`scores`][getml.pipeline.Scores]
            (metrics) of a pipeline.

        log_pipeline_features (bool, optional): Whether to log [`features`][getml.pipeline.Features]
            learned during pipeline fitting.

        log_pipeline_columns (bool, optional): Whether to log [`columns`][getml.pipeline.Columns]
            (whose importance can be calculated) of a pipeline.

        log_pipeline_targets (bool, optional): Whether to log
            [`targets`][getml.pipeline.Pipeline.targets] of a pipeline.

        log_pipeline_data_model (bool, optional): Whether to log the
            [`data model`][getml.data.DataModel] provided in the pipeline. It is available
            as an HTML artifact to view or download.

        log_pipeline_as_artifact (bool, optional): Whether to save pipelines as
            MLflow artifacts.

            ??? note "Docker configuration, `download_artifact_pipeline()` and `switch_to_artifact_pipeline()`"

                When using this parameter with Docker, you'll need to set up proper bind
                mounts to allow pipeline artifact logging. For detailed instructions on
                working with artifact pipelines, Docker configurations, and related
                functions like `download_artifact_pipeline()` and `switch_to_artifact_pipeline()`,
                please refer to the [Tracking with MLflow][mlflow-integration-guide] guide.

        log_system_metrics (bool, optional): Whether to log system metrics (CPU, memory usage)
            during pipeline fitting. Metrics are available for getML Enterprise only.

        disable (bool, optional): If True, disables all getML autologging.

        silent (bool, optional): If True, suppresses all informational logging messages.

        create_runs (bool, optional): If True, creates new MLflow runs automatically
            when logging. You may set it to False and log under your own run. For example:

            ```python
            import mlflow
            mlflow.set_tracking_uri("http://localhost:5000")
            mlflow.set_experiment("your_experiment_name")
            with mlflow.start_run(run_name="your_run_name"):
                pipe.fit(container.train)
            ```

        extra_tags (Dict[str, str], optional): Additional custom tags to log with each MLflow run.

        getml_project_path (Path, optional): Path to the getML projects directory.
            Used for accessing and logging pipeline artifacts when
            `log_pipeline_as_artifact=True`. If not provided, defaults to
            `$HOME/.getML/projects`.

        tracking_uri (str, optional): MLflow tracking server URI. If not provided,
            uses `http://localhost:5000`.


    Examples:
        Basic usage with default settings:

        ```python
        import getml
        import getml_mlflow
        getml_mlflow.autolog()
        # Subsequent getML pipeline operations will be logged to MLflow
        ```

        Custom configuration:
        ```python
        getml_mlflow.autolog(
            log_pipeline_as_artifact=True,
            log_system_metrics=False,
            tracking_uri="http://localhost:5000"
        )
        ```
    """
    if disable:
        revert_patches(FLAVOR_NAME)
        return

    tracking_uri = tracking_uri or DEFAULT_MLFLOW_TRACKING_URI
    mlflow.set_tracking_uri(tracking_uri)

    logging_configuration: LoggingConfiguration = LoggingConfiguration(
        general=GeneralLoggingConfiguration(
            mlflow_client=MlflowClient(tracking_uri=tracking_uri),
            log_system_metrics=log_system_metrics,
            silent=silent,
            create_runs=create_runs,
            extra_tags=extra_tags,
            getml_project_path=getml_project_path,
        ),
        data_container=DataContainerLoggingConfiguration(
            log_information=log_data_information,
            log_as_artifact=log_data_as_artifact,
        ),
        function=FunctionLoggingConfiguration(
            log_parameters=log_function_parameters,
            log_return=log_function_return,
            log_as_trace=log_function_as_trace,
        ),
        pipeline=PipelineLoggingConfiguration(
            log_parameters=log_pipeline_parameters,
            log_tags=log_pipeline_tags,
            log_scores=log_pipeline_scores,
            log_features=log_pipeline_features,
            log_columns=log_pipeline_columns,
            log_targets=log_pipeline_targets,
            log_data_model=log_pipeline_data_model,
            log_as_artifact=log_pipeline_as_artifact,
        ),
    )

    for function in FUNCTIONS_TO_PATCH:
        safe_patch(
            autologging_integration=FLAVOR_NAME,
            destination=function.destination,
            function_name=function.function_name,
            patch_function=(
                with_kwargs(logging_configuration=logging_configuration)(
                    function.patch_function
                )
                if function.with_logging_configuration
                else function.patch_function
            ),
            manage_run=False,
        )