Skip to content

The getML Suite

The getML ecosystem comprises three fundamental components:

Engine

Written in C++, the getML Engine is the core of the Suite and does all the heavy lifting. It is responsible for data management, feature engineering, and machine learning.

Starting the Engine

Depending on the method used for the installation of getML Suite, the Engine can be started by executing:

Follow the links to learn more about each method.

Shutting down the Engine

Depending on how you started the Engine, there are different ways to shut it down:

  • In the Python API: getml.engine.shutdown()
  • On command-line interface (CLI): Press Ctrl-C or run getML -stop
  • For a docker container: Press Ctrl-C
  • Click the ' Shutdown' tab in the sidebar of the monitor (Enterprise edition).

Logging

The Engine keeps a log about what it is currently doing.

The easiest way to view the log is to click the '<> Log' tab in the sidebar of the getML Monitor. The Engine will also output its log to the command line when it is started using the command-line interface.

Python API

Control the Engine with the getML Python API, which provides handlers to the objects in the Engine and all other necessary tools for end-to-end data science projects. For an in-depth read about its individual classes and methods, check out the Python API documentation.

Note

  • The classes in the Python API act as handles to objects in the getML Engine.
  • When you connect to or create a project:
    • The API establishes a socket connection to the Engine through a determined port.
    • All subsequent commands are sent to the Engine via this connection.

Setup new project

Set a project in the getML Engine using set_project().

import getml
getml.engine.launch()
getml.engine.set_project("test")

Note

If the project name does not match an existing project, a new one will be created.

Managing projects

To get a list of all available projects, use list_projects(). To remove an entire project, use delete_project().

getml.engine.list_projects()
getml.engine.delete_project("test")

For more information, refer to the Managing projects section.

DataFrames

Create a DataFrame by calling for example:

data = getml.data.DataFrame.from_csv(
    "path/to/my/data.csv", 
    "my_data"
)

This creates a data frame object in the getML Engine, imports the provided data, and returns a handler to the object as a DataFrame in the Python API.

Note

There are many other methods to create a DataFrame, including from_db(), from_json(), or from_pandas(). For a full list of available methods, refer to the Importing data section.

Synchronization

When you apply any method, like add(), the changes will be automatically reflected in both the Engine and Python. Under the hood, the Python API sends a command to create a new column to the getML Engine. The moment the Engine is done, it informs the Python API and the latter triggers the refresh() method to update the Python handler.

Saving

Warning

DataFrames are never saved automatically and never loaded automatically. All unsaved changes to a DataFrame will be lost when restarting the Engine.

To get a list of all your current data_frames, access the container via:

getml.project.data_frames
#or
getml.data.list_data_frames()

You can save a specific data frame to disk using .save() method on the DataFrame:

# by index
getml.project.data_frames[0].save()
# by name
getml.project.data_frames["my_data"].save()

To save all data frames associated with the current project, use the .save() method on the Container:

getml.project.data_frames.save()

Loading

To load a specific DataFrame, use load_data_frame() or DataFrame().load():

df = getml.data.load_data_frame("my_data")
# Forces the API to load the version stored on disk over the one held in memory
df = getml.data.DataFrame("my_data").load()

Use .load() on the Container to load all data frames associated with the current project:

getml.project.data_frames.load()

Note

If a DataFrame is already available in memory (for example "my_data" from above), load_data_frame() will return a handle to that data frame. If no such DataFrame is held in memory, the function will try to load the data frame from disk and then return a handle. If that is unsuccessful, an exception is thrown.

Pipelines

The lifecycle of a Pipeline is straightforward and streamlined by the getML Engine, which automatically saves all changes made to a pipeline and loads all pipelines within a project. Pipelines are created within the Python API using constructors, where they are defined by a set of hyperparameters.

Note

The actual weights of the machine learning algorithms are stored exclusively in the getML Engine and are not transferred to the Python API.

Any changes made through methods such as fit() are automatically updated in both the Engine and the Python API.

By using set_project(), you can load an existing project, and all associated pipelines will be automatically loaded into memory. To view all pipelines in the current project, access the Pipelines container via getml.project.Pipelines.

The function list_pipelines() lists all available pipelines within a project:

getml.pipeline.list_pipelines()

To create a corresponding handle in the Python API, use the load() function:

pipe = getml.pipeline.load(NAME_OF_THE_PIPELINE)

Monitor

Enterprise edition

This feature is exclusive to the Enterprise edition and is not available in the Community edition. Discover the benefits of the Enterprise edition and compare their features.

For licensing information and technical support, please contact us.

The Monitor provides information on the data imported into the Engine, as well as on the trained pipelines and their performance. It is written in Go and compiled into a binary separate from the getML Engine.

Accessing the Monitor

The Monitor runs on the same machine as the Engine, using sockets for communication. By default, it opens an HTTP port (1709) for browser access. To view the Monitor, enter the following address in your browser's navigation bar:

http://localhost:1709

Please note, the HTTP port is only accessible from within the host machine running the getML Suite.

The main purpose of the Monitor is to provide visual feedback to support your data science projects.

Tip

If you experience issues opening the Monitor, try the following steps:

  • Manually shut down and restart the Engine using getml.engine.shutdown() and getml.engine.launch().
  • Kill the associated background process in the terminal and restart the Engine.
  • Close all tabs and windows where the Monitor was previously running and try again.

To get started, head over to the installation instructions.