Skip to content

getml.sqlite3

This module contains wrappers around sqlite3 and related utility functions, which enable you to productionize pipelines using only sqlite3 and Python, fully based on open-source code.

This requires SQLite version 3.33.0 or above. To check the sqlite3 version of your Python distribution, do the following:

import sqlite3
sqlite3.sqlite_version
Example

For our example we will assume that you want to productionize the CORA project.

First, we want to transpile the features into SQL code, like this:

# Set targets to False, if you want an inference pipeline.
pipe1.features.to_sql(targets=True).save("cora")
This transpiles the features learned by pipe1 into a set of SQLite3 scripts ready to be executed. These scripts are contained in a folder called "cora".

We also assume that you have the three tables needed for the CORA project in the form of pandas.DataFrames (other data sources are possible).

We want to create a new sqlite3 connection and then read in the data:

conn = getml.sqlite3.connect("cora.db")

getml.sqlite3.read_pandas(
    conn, table_name="cites", data_frame=cites, if_exists="replace")

getml.sqlite3.read_pandas(
    conn, table_name="content", data_frame=content, if_exists="replace")

getml.sqlite3.read_pandas(
    conn, table_name="paper", data_frame=paper, if_exists="replace")
Now we can execute the scripts we have just created:
conn = getml.sqlite3.execute(conn, "cora")
The transpiled pipeline will always create a table called "FEATURES", which contain the features. Here is how we retrieve them:
features = getml.sqlite3.to_pandas(conn, "FEATURES")
Now you have created your features in a pandas DataFrame ready to be inserted into your favorite machine learning library.

To build stable data science pipelines, it is often a good idea to ensure type safety by hard-coding your table schema. You can use the sniff... methods to do that:

getml.sqlite3.sniff_pandas("cites", cites)
This will generate SQLite3 code that creates the "cites" table. You can hard-code that into your pipeline. This will ensure that the data always have the correct types, avoiding awkward problems in the future.

connect

connect(database: str) -> Connection

Generates a new sqlite3 connection.

This connection contains all customized aggregations and transformation functions needed to execute the SQL pipeline generated by getML. Other than that it behaves just like a normal sqlite3 connection from the Python standard library.

PARAMETER DESCRIPTION
database

Filename of the database. Use ':memory:' to create an in-memory database.

TYPE: str

RETURNS DESCRIPTION
Connection

A new sqlite3 connection with all custom functions and aggregations registered.

Source code in getml/sqlite3/connect.py
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
def connect(database: str) -> sqlite3.Connection:
    """
    Generates a new sqlite3 connection.

    This connection contains all customized aggregations
    and transformation functions needed to execute the
    SQL pipeline generated by getML. Other than that
    it behaves just like a normal sqlite3 connection from
    the Python standard library.

    Args:
        database:
            Filename of the database. Use ':memory:' to
            create an in-memory database.

    Returns:
        A new sqlite3 connection with all custom
            functions and aggregations registered.
    """

    if not isinstance(database, str):
        raise TypeError("'database' must be of type str")

    if sqlite3.sqlite_version < "3.33.0":
        raise ValueError(
            "getML requires SQLite version 3.33.0 or above. Found version "
            + sqlite3.sqlite_version
            + ". Please upgrade Python and/or the Python sqlite3 package."
        )

    conn = sqlite3.connect(database)

    conn.create_function("contains", 2, _contains)
    conn.create_function("email_domain", 1, _email_domain)
    conn.create_function("get_word", 2, _get_word)
    conn.create_function("num_words", 1, _num_words)

    conn.create_aggregate("COUNT_ABOVE_MEAN", 1, _CountAboveMean)  # type: ignore
    conn.create_aggregate("COUNT_BELOW_MEAN", 1, _CountBelowMean)  # type: ignore
    conn.create_aggregate("COUNT_DISTINCT_OVER_COUNT", 1, _CountDistinctOverCount)  # type: ignore
    conn.create_aggregate("EWMA_1S", 2, _EWMA1S)  # type: ignore
    conn.create_aggregate("EWMA_1M", 2, _EWMA1M)  # type: ignore
    conn.create_aggregate("EWMA_1H", 2, _EWMA1H)  # type: ignore
    conn.create_aggregate("EWMA_1D", 2, _EWMA1D)  # type: ignore
    conn.create_aggregate("EWMA_7D", 2, _EWMA7D)  # type: ignore
    conn.create_aggregate("EWMA_30D", 2, _EWMA30D)  # type: ignore
    conn.create_aggregate("EWMA_90D", 2, _EWMA90D)  # type: ignore
    conn.create_aggregate("EWMA_365D", 2, _EWMA365D)  # type: ignore
    conn.create_aggregate("EWMA_TREND_1S", 2, _EWMATrend1S)  # type: ignore
    conn.create_aggregate("EWMA_TREND_1M", 2, _EWMATrend1M)  # type: ignore
    conn.create_aggregate("EWMA_TREND_1H", 2, _EWMATrend1H)  # type: ignore
    conn.create_aggregate("EWMA_TREND_1D", 2, _EWMATrend1D)  # type: ignore
    conn.create_aggregate("EWMA_TREND_7D", 2, _EWMATrend7D)  # type: ignore
    conn.create_aggregate("EWMA_TREND_30D", 2, _EWMATrend30D)  # type: ignore
    conn.create_aggregate("EWMA_TREND_90D", 2, _EWMATrend90D)  # type: ignore
    conn.create_aggregate("EWMA_TREND_365D", 2, _EWMATrend365D)  # type: ignore
    conn.create_aggregate("FIRST", 2, _First)  # type: ignore
    conn.create_aggregate("KURTOSIS", 1, _Kurtosis)
    conn.create_aggregate("LAST", 2, _Last)  # type: ignore
    conn.create_aggregate("MEDIAN", 1, _Median)
    conn.create_aggregate("MODE", 1, _Mode)
    conn.create_aggregate("NUM_MAX", 1, _NumMax)  # type: ignore
    conn.create_aggregate("NUM_MIN", 1, _NumMin)  # type: ignore
    conn.create_aggregate("Q1", 1, _Q1)  # type: ignore
    conn.create_aggregate("Q5", 1, _Q5)  # type: ignore
    conn.create_aggregate("Q10", 1, _Q10)  # type: ignore
    conn.create_aggregate("Q25", 1, _Q25)  # type: ignore
    conn.create_aggregate("Q75", 1, _Q75)  # type: ignore
    conn.create_aggregate("Q90", 1, _Q90)  # type: ignore
    conn.create_aggregate("Q95", 1, _Q95)  # type: ignore
    conn.create_aggregate("Q99", 1, _Q99)  # type: ignore
    conn.create_aggregate("SKEW", 1, _Skew)
    conn.create_aggregate("STDDEV", 1, _Stddev)
    conn.create_aggregate("TIME_SINCE_FIRST_MAXIMUM", 2, _TimeSinceFirstMaximum)  # type: ignore
    conn.create_aggregate("TIME_SINCE_FIRST_MINIMUM", 2, _TimeSinceFirstMinimum)  # type: ignore
    conn.create_aggregate("TIME_SINCE_LAST_MAXIMUM", 2, _TimeSinceLastMaximum)  # type: ignore
    conn.create_aggregate("TIME_SINCE_LAST_MINIMUM", 2, _TimeSinceLastMinimum)  # type: ignore
    conn.create_aggregate("TREND", 2, _Trend)  # type: ignore
    conn.create_aggregate("VAR", 1, _Var)
    conn.create_aggregate("VARIATION_COEFFICIENT", 1, _VariationCoefficient)

    return conn

execute

execute(conn: Connection, fname: str) -> None

Executes an SQL script or several SQL scripts on SQLite3.

PARAMETER DESCRIPTION
conn

A sqlite3 connection created by connect.

TYPE: Connection

fname

The names of the SQL script or a folder containing SQL scripts. If you decide to pass a folder, the SQL scripts must have the ending '.sql'.

TYPE: str

Source code in getml/sqlite3/execute.py
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def execute(conn: sqlite3.Connection, fname: str) -> None:
    """
    Executes an SQL script or several SQL scripts on SQLite3.

    Args:
        conn:
            A sqlite3 connection created by [`connect`][getml.sqlite3.connect.connect].

        fname:
            The names of the SQL script or a folder containing SQL scripts.
            If you decide to pass a folder, the SQL scripts must have the ending '.sql'.
    """
    # ------------------------------------------------------------

    if not isinstance(conn, sqlite3.Connection):
        raise TypeError("'conn' must be an sqlite3.Connection object")

    if not isinstance(fname, str):
        raise TypeError("'fname' must be of type str")

    # ------------------------------------------------------------

    # Store temporary object in-memory.
    conn.execute("PRAGMA temp_store=2;")

    if os.path.isdir(fname):
        scripts = _retrieve_scripts(fname, ".sql")
        for script in scripts:
            execute(conn, script)
        return

    _log("Executing " + fname + "...")

    with open(fname, encoding="utf-8") as sqlfile:
        queries = sqlfile.read().split(";")

    for query in queries:
        conn.execute(query + ";")

    conn.commit()

read_csv

read_csv(
    conn: Connection,
    fnames: Union[str, List[str]],
    table_name: str,
    header: bool = True,
    if_exists: str = "append",
    quotechar: str = '"',
    sep: str = ",",
    skip: int = 0,
    colnames: Optional[List[str]] = None,
) -> None

Reads a list of CSV files and writes them into an sqlite3 table.

PARAMETER DESCRIPTION
conn

A sqlite3 connection created by connect.

TYPE: Connection

fnames

The names of the CSV files.

TYPE: Union[str, List[str]]

table_name

The name of the table to write to.

TYPE: str

header

Whether the csv file contains a header. If True, the first line is skipped and column names are inferred accordingly.

TYPE: bool DEFAULT: True

quotechar

The string escape character.

TYPE: str DEFAULT: '"'

if_exists

How to behave if the table already exists:

  • 'fail': Raise a ValueError.
  • 'replace': Drop the table before inserting new values.
  • 'append': Insert new values to the existing table.

TYPE: str DEFAULT: 'append'

sep

The field separator.

TYPE: str DEFAULT: ','

skip

The number of lines to skip (before a possible header)

TYPE: int DEFAULT: 0

colnames

The first line of a CSV file usually contains the column names. When this is not the case, you can explicitly pass them. If you pass colnames, it is assumed that the CSV files do not contain a header, thus overriding the 'header' variable.

TYPE: Optional[List[str]] DEFAULT: None

Source code in getml/sqlite3/read_csv.py
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
def read_csv(
    conn: sqlite3.Connection,
    fnames: Union[str, List[str]],
    table_name: str,
    header: bool = True,
    if_exists: str = "append",
    quotechar: str = '"',
    sep: str = ",",
    skip: int = 0,
    colnames: Optional[List[str]] = None,
) -> None:
    """
    Reads a list of CSV files and writes them into an sqlite3 table.

    Args:
        conn:
            A sqlite3 connection created by [`connect`][getml.sqlite3.connect.connect].

        fnames:
            The names of the CSV files.

        table_name:
            The name of the table to write to.

        header:
            Whether the csv file contains a header. If True, the first line
            is skipped and column names are inferred accordingly.

        quotechar:
            The string escape character.

        if_exists:
            How to behave if the table already exists:

            - 'fail': Raise a ValueError.
            - 'replace': Drop the table before inserting new values.
            - 'append': Insert new values to the existing table.

        sep:
            The field separator.

        skip:
            The number of lines to skip (before a possible header)

        colnames:
            The first line of a CSV file
            usually contains the column names. When this is not the case, you can
            explicitly pass them. If you pass colnames, it is assumed that the
            CSV files do not contain a header, thus overriding the 'header' variable.
    """
    # ------------------------------------------------------------

    if not isinstance(fnames, list):
        fnames = [fnames]

    # ------------------------------------------------------------

    if not isinstance(conn, sqlite3.Connection):
        raise TypeError("'conn' must be an sqlite3.Connection object")

    if not _is_non_empty_typed_list(fnames, str):
        raise TypeError("'fnames' must be a string or a non-empty list of strings")

    if not isinstance(table_name, str):
        raise TypeError("'table_name' must be a string")

    if not isinstance(header, bool):
        raise TypeError("'header' must be a bool")

    if not isinstance(quotechar, str):
        raise TypeError("'quotechar' must be a str")

    if not isinstance(if_exists, str):
        raise TypeError("'if_exists' must be a str")

    if not isinstance(sep, str):
        raise TypeError("'sep' must be a str")

    if not isinstance(skip, int):
        raise TypeError("'skip' must be an int")

    if colnames is not None and not _is_typed_list(colnames, str):
        raise TypeError("'colnames' must be a list of strings or None")

    # ------------------------------------------------------------

    schema = sniff_csv(
        fnames=fnames,
        table_name=table_name,
        header=header,
        quotechar=quotechar,
        sep=sep,
        skip=skip,
        colnames=colnames,
    )

    _create_table(conn, table_name, schema, if_exists)

    for fname in fnames:
        _log("Loading '" + fname + "' into '" + table_name + "'...")
        data = _read_csv_file(fname, sep, quotechar, header and not colnames, skip)
        read_list(conn, table_name, data)

read_list

read_list(
    conn: Connection, table_name: str, data: List[List[Any]]
) -> None

Reads data into an sqlite3 table.

PARAMETER DESCRIPTION
conn

A sqlite3 connection created by connect.

TYPE: Connection

table_name

The name of the table to write to.

TYPE: str

data

The data to insert into the table. Every list represents one row to be read into the table.

TYPE: List[List[Any]]

Source code in getml/sqlite3/read_list.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def read_list(conn: sqlite3.Connection, table_name: str, data: List[List[Any]]) -> None:
    """
    Reads data into an sqlite3 table.

    Args:
        conn:
            A sqlite3 connection created by [`connect`][getml.sqlite3.connect.connect].

        table_name:
            The name of the table to write to.

        data:
            The data to insert into the table.
            Every list represents one row to be read into the table.
    """

    # ------------------------------------------------------------

    if not isinstance(conn, sqlite3.Connection):
        raise TypeError("'conn' must be an sqlite3.Connection object")

    if not isinstance(table_name, str):
        raise TypeError("'table_name' must be a string")

    if not isinstance(data, list):
        raise TypeError("'data' must be a list of lists")

    # ------------------------------------------------------------

    ncols = _get_num_columns(conn, table_name)
    old_length = len(data)
    data = [line for line in data if len(line) == ncols]
    placeholders = "(" + ",".join(["?"] * ncols) + ")"
    query = 'INSERT INTO "' + table_name + '" VALUES ' + placeholders
    conn.executemany(query, data)
    conn.commit()
    _log(
        "Read "
        + str(len(data))
        + " lines. "
        + str(old_length - len(data))
        + " invalid lines."
    )

read_pandas

read_pandas(
    conn: Connection,
    table_name: str,
    data_frame: DataFrame,
    if_exists: Literal[
        "fail", "replace", "append"
    ] = "append",
) -> None

Loads a pandas.DataFrame into SQLite3.

PARAMETER DESCRIPTION
conn

A sqlite3 connection created by connect.

TYPE: Connection

table_name

The name of the table to write to.

TYPE: str

data_frame

The pandas.DataFrame to read into the table. The column names must match the column names of the target table in the SQLite3 database, but their order is not important.

TYPE: DataFrame

if_exists

How to behave if the table already exists:

  • 'fail': Raise a ValueError.
  • 'replace': Drop the table before inserting new values.
  • 'append': Insert new values into the existing table.

TYPE: Literal['fail', 'replace', 'append'] DEFAULT: 'append'

Source code in getml/sqlite3/read_pandas.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def read_pandas(
    conn: sqlite3.Connection,
    table_name: str,
    data_frame: pd.DataFrame,
    if_exists: Literal["fail", "replace", "append"] = "append",
) -> None:
    """
    Loads a pandas.DataFrame into SQLite3.

    Args:
        conn:
            A sqlite3 connection created by [`connect`][getml.sqlite3.connect.connect].

        table_name:
            The name of the table to write to.

        data_frame:
            The pandas.DataFrame to read
            into the table. The column names must match the column
            names of the target table in the SQLite3 database, but
            their order is not important.

        if_exists:
            How to behave if the table already exists:

            - 'fail': Raise a ValueError.
            - 'replace': Drop the table before inserting new values.
            - 'append': Insert new values into the existing table.
    """

    # ------------------------------------------------------------

    if not isinstance(conn, sqlite3.Connection):
        raise TypeError("'conn' must be an sqlite3.Connection object")

    if not isinstance(table_name, str):
        raise TypeError("'table_name' must be a str")

    if not isinstance(data_frame, pd.DataFrame):
        raise TypeError("'data_frame' must be a pandas.DataFrame")

    if not isinstance(if_exists, str):
        raise TypeError("'if_exists' must be a str")

    # ------------------------------------------------------------

    _log("Loading pandas.DataFrame into '" + table_name + "'...")

    schema = sniff_pandas(table_name, data_frame)

    _create_table(conn, table_name, schema, if_exists)

    colnames = _get_colnames(conn, table_name)
    data = data_frame[colnames].values.tolist()
    data = [
        [
            field
            if isinstance(field, (numbers.Number, str)) or field is None
            else str(field)
            for field in row
        ]
        for row in data
    ]
    read_list(conn, table_name, data)

sniff_csv

sniff_csv(
    fnames: Union[str, List[str]],
    table_name: str,
    header: bool = True,
    num_lines_sniffed: int = 1000,
    quotechar: str = '"',
    sep: str = ",",
    skip: int = 0,
    colnames: Optional[List[str]] = None,
) -> str

Sniffs a list of csv files.

PARAMETER DESCRIPTION
fnames

The list of CSV file names to be read.

TYPE: Union[str, List[str]]

table_name

Name of the table in which the data is to be inserted.

TYPE: str

header

Whether the csv file contains a header. If True, the first line is skipped and column names are inferred accordingly.

TYPE: bool DEFAULT: True

num_lines_sniffed

Number of lines analyzed by the sniffer.

TYPE: int DEFAULT: 1000

quotechar

The character used to wrap strings.

TYPE: str DEFAULT: '"'

sep

The separator used for separating fields.

TYPE: str DEFAULT: ','

skip

Number of lines to skip at the beginning of each file.

TYPE: int DEFAULT: 0

colnames

The first line of a CSV file usually contains the column names. When this is not the case, you can explicitly pass them. If you pass colnames, it is assumed that the CSV files do not contain a header, thus overriding the 'header' variable.

TYPE: Optional[List[str]] DEFAULT: None

RETURNS DESCRIPTION
str

Appropriate CREATE TABLE statement.

Source code in getml/sqlite3/sniff_csv.py
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
def sniff_csv(
    fnames: Union[str, List[str]],
    table_name: str,
    header: bool = True,
    num_lines_sniffed: int = 1000,
    quotechar: str = '"',
    sep: str = ",",
    skip: int = 0,
    colnames: Optional[List[str]] = None,
) -> str:
    """
    Sniffs a list of csv files.

    Args:
        fnames:
            The list of CSV file names to be read.

        table_name:
            Name of the table in which the data is to be inserted.

        header:
            Whether the csv file contains a header. If True, the first line
            is skipped and column names are inferred accordingly.

        num_lines_sniffed:
            Number of lines analyzed by the sniffer.

        quotechar:
            The character used to wrap strings.

        sep:
            The separator used for separating fields.

        skip:
            Number of lines to skip at the beginning of each
            file.

        colnames:
            The first line of a CSV file
            usually contains the column names. When this is not the case, you can
            explicitly pass them. If you pass colnames, it is assumed that the
            CSV files do not contain a header, thus overriding the 'header' variable.

    Returns:
            Appropriate `CREATE TABLE` statement.
    """

    # ------------------------------------------------------------

    if not isinstance(fnames, list):
        fnames = [fnames]

    # ------------------------------------------------------------

    if not _is_non_empty_typed_list(fnames, str):
        raise TypeError("'fnames' must be a string or a non-empty list of strings")

    if not isinstance(table_name, str):
        raise TypeError("'table_name' must be a string")

    if not isinstance(header, bool):
        raise TypeError("'header' must be a bool")

    if not isinstance(num_lines_sniffed, int):
        raise TypeError("'num_lines_sniffed' must be a int")

    if not isinstance(quotechar, str):
        raise TypeError("'quotechar' must be a str")

    if not isinstance(sep, str):
        raise TypeError("'sep' must be a str")

    if not isinstance(skip, int):
        raise TypeError("'skip' must be an int")

    if colnames is not None and not _is_typed_list(colnames, str):
        raise TypeError("'colnames' must be a list of strings or None")

    # ------------------------------------------------------------

    header_lines = 0 if header and not colnames else None

    def read(fname):
        return pd.read_csv(
            fname,
            nrows=num_lines_sniffed,
            header=header_lines,
            sep=sep,
            quotechar=quotechar,
            skiprows=skip,
            names=colnames,
        )

    data_frames = [read(fname) for fname in fnames]

    merged = pd.concat(data_frames, join="inner")

    return sniff_pandas(table_name, merged)

sniff_pandas

sniff_pandas(table_name: str, data_frame: DataFrame) -> str

Sniffs a pandas data frame.

PARAMETER DESCRIPTION
table_name

Name of the table in which the data is to be inserted.

TYPE: str

data_frame

The pandas.DataFrame to read into the table.

TYPE: DataFrame

RETURNS DESCRIPTION
str

Appropriate CREATE TABLE statement.

Source code in getml/sqlite3/sniff_pandas.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def sniff_pandas(table_name: str, data_frame: pd.DataFrame) -> str:
    """
    Sniffs a pandas data frame.

    Args:
        table_name:
            Name of the table in which the data is to be inserted.

        data_frame:
            The pandas.DataFrame to read into the table.

    Returns:
            Appropriate `CREATE TABLE` statement.
    """
    # ------------------------------------------------------------

    if not isinstance(table_name, str):
        raise TypeError("'table_name' must be a str")

    if not isinstance(data_frame, pd.DataFrame):
        raise TypeError("'data_frame' must be a pandas.DataFrame")

    # ------------------------------------------------------------

    colnames = data_frame.columns
    coltypes = data_frame.dtypes

    sql_types: Dict[str, List[str]] = {"INTEGER": [], "REAL": [], "TEXT": []}

    for cname, ctype in zip(colnames, coltypes):
        if _is_int_type(ctype):
            sql_types["INTEGER"].append(cname)
            continue
        if _is_numerical_type_numpy(ctype):
            sql_types["REAL"].append(cname)
        else:
            sql_types["TEXT"].append(cname)

    return _generate_schema(table_name, sql_types)

to_list

to_list(
    conn: Connection, query: str
) -> Tuple[List[str], List[list]]

Transforms a query or table into a list of lists. Returns a tuple which contains the column names and the actual data.

PARAMETER DESCRIPTION
conn

A sqlite3 connection created by connect.

TYPE: Connection

query

The query used to get the table. You can also pass the name of the table, in which case the entire table will be imported.

TYPE: str

RETURNS DESCRIPTION
Tuple[List[str], List[list]]

The column names and the data as a list of lists.

Source code in getml/sqlite3/to_list.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def to_list(conn: sqlite3.Connection, query: str) -> Tuple[List[str], List[list]]:
    """
    Transforms a query or table into a list of lists. Returns
    a tuple which contains the column names and the actual data.

    Args:
        conn:
            A sqlite3 connection created by [`connect`][getml.sqlite3.connect.connect].

        query:
            The query used to get the table. You can also
            pass the name of the table, in which case the entire
            table will be imported.

    Returns:
            The column names and the data as a list of lists.
    """
    # ------------------------------------------------------------

    if not isinstance(conn, sqlite3.Connection):
        raise TypeError("'conn' must be an sqlite3.Connection object")

    if not isinstance(query, str):
        raise TypeError("'query' must be a str")

    # ------------------------------------------------------------

    query = _handle_query(query)
    cursor = conn.execute(query)
    colnames = [description[0] for description in cursor.description]
    data = cursor.fetchall()
    return colnames, data

to_pandas

to_pandas(conn: Connection, query: str) -> DataFrame

Returns a table as a pandas.DataFrame.

PARAMETER DESCRIPTION
conn

A sqlite3 connection created by connect.

TYPE: Connection

query

The query used to get the table. You can also pass the name of the table, in which case the entire table will be imported.

TYPE: str

RETURNS DESCRIPTION
DataFrame

The table as a pandas.DataFrame.

Source code in getml/sqlite3/to_pandas.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
def to_pandas(conn: sqlite3.Connection, query: str) -> pd.DataFrame:
    """
    Returns a table as a pandas.DataFrame.

    Args:
        conn:
            A sqlite3 connection created by [`connect`][getml.sqlite3.connect.connect].

        query:
            The query used to get the table. You can also
            pass the name of the table, in which case the entire
            table will be imported.

    Returns:
            The table as a pandas.DataFrame.
    """
    # ------------------------------------------------------------

    if not isinstance(conn, sqlite3.Connection):
        raise TypeError("'conn' must be an sqlite3.Connection object")

    if not isinstance(query, str):
        raise TypeError("'query' must be a str")

    # ------------------------------------------------------------

    colnames, data = to_list(conn, query)
    data_frame = pd.DataFrame(data)
    data_frame.columns = colnames
    return data_frame