getml.sqlite3
This module contains wrappers around sqlite3 and related utility functions, which enable you to productionize pipelines using only sqlite3 and Python, fully based on open-source code.
This requires SQLite version 3.33.0 or above. To check the sqlite3 version of your Python distribution, do the following:
import sqlite3
sqlite3.sqlite_version
Example
For our example we will assume that you want to productionize the CORA project.
First, we want to transpile the features into SQL code, like this:
# Set targets to False, if you want an inference pipeline.
pipe1.features.to_sql(targets=True).save("cora")
We also assume that you have the three tables needed for the CORA project in the form of pandas.DataFrames (other data sources are possible).
We want to create a new sqlite3 connection and then read in the data:
conn = getml.sqlite3.connect("cora.db")
getml.sqlite3.read_pandas(
conn, table_name="cites", data_frame=cites, if_exists="replace")
getml.sqlite3.read_pandas(
conn, table_name="content", data_frame=content, if_exists="replace")
getml.sqlite3.read_pandas(
conn, table_name="paper", data_frame=paper, if_exists="replace")
conn = getml.sqlite3.execute(conn, "cora")
features = getml.sqlite3.to_pandas(conn, "FEATURES")
To build stable data science pipelines, it is often a good idea to ensure type safety by hard-coding your table schema. You can use the sniff... methods to do that:
getml.sqlite3.sniff_pandas("cites", cites)
connect
connect(database: str) -> Connection
Generates a new sqlite3 connection.
This connection contains all customized aggregations and transformation functions needed to execute the SQL pipeline generated by getML. Other than that it behaves just like a normal sqlite3 connection from the Python standard library.
PARAMETER | DESCRIPTION |
---|---|
database |
Filename of the database. Use ':memory:' to create an in-memory database.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Connection
|
A new sqlite3 connection with all custom functions and aggregations registered. |
Source code in getml/sqlite3/connect.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
|
execute
execute(conn: Connection, fname: str) -> None
Executes an SQL script or several SQL scripts on SQLite3.
PARAMETER | DESCRIPTION |
---|---|
conn |
A sqlite3 connection created by
TYPE:
|
fname |
The names of the SQL script or a folder containing SQL scripts. If you decide to pass a folder, the SQL scripts must have the ending '.sql'.
TYPE:
|
Source code in getml/sqlite3/execute.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
read_csv
read_csv(
conn: Connection,
fnames: Union[str, List[str]],
table_name: str,
header: bool = True,
if_exists: str = "append",
quotechar: str = '"',
sep: str = ",",
skip: int = 0,
colnames: Optional[List[str]] = None,
) -> None
Reads a list of CSV files and writes them into an sqlite3 table.
PARAMETER | DESCRIPTION |
---|---|
conn |
A sqlite3 connection created by
TYPE:
|
fnames |
The names of the CSV files. |
table_name |
The name of the table to write to.
TYPE:
|
header |
Whether the csv file contains a header. If True, the first line is skipped and column names are inferred accordingly.
TYPE:
|
quotechar |
The string escape character.
TYPE:
|
if_exists |
How to behave if the table already exists:
TYPE:
|
sep |
The field separator.
TYPE:
|
skip |
The number of lines to skip (before a possible header)
TYPE:
|
colnames |
The first line of a CSV file usually contains the column names. When this is not the case, you can explicitly pass them. If you pass colnames, it is assumed that the CSV files do not contain a header, thus overriding the 'header' variable. |
Source code in getml/sqlite3/read_csv.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
|
read_list
read_list(
conn: Connection, table_name: str, data: List[List[Any]]
) -> None
Reads data into an sqlite3 table.
PARAMETER | DESCRIPTION |
---|---|
conn |
A sqlite3 connection created by
TYPE:
|
table_name |
The name of the table to write to.
TYPE:
|
data |
The data to insert into the table. Every list represents one row to be read into the table. |
Source code in getml/sqlite3/read_list.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
read_pandas
read_pandas(
conn: Connection,
table_name: str,
data_frame: DataFrame,
if_exists: Literal[
"fail", "replace", "append"
] = "append",
) -> None
Loads a pandas.DataFrame into SQLite3.
PARAMETER | DESCRIPTION |
---|---|
conn |
A sqlite3 connection created by
TYPE:
|
table_name |
The name of the table to write to.
TYPE:
|
data_frame |
The pandas.DataFrame to read into the table. The column names must match the column names of the target table in the SQLite3 database, but their order is not important.
TYPE:
|
if_exists |
How to behave if the table already exists:
TYPE:
|
Source code in getml/sqlite3/read_pandas.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
sniff_csv
sniff_csv(
fnames: Union[str, List[str]],
table_name: str,
header: bool = True,
num_lines_sniffed: int = 1000,
quotechar: str = '"',
sep: str = ",",
skip: int = 0,
colnames: Optional[List[str]] = None,
) -> str
Sniffs a list of csv files.
PARAMETER | DESCRIPTION |
---|---|
fnames |
The list of CSV file names to be read. |
table_name |
Name of the table in which the data is to be inserted.
TYPE:
|
header |
Whether the csv file contains a header. If True, the first line is skipped and column names are inferred accordingly.
TYPE:
|
num_lines_sniffed |
Number of lines analyzed by the sniffer.
TYPE:
|
quotechar |
The character used to wrap strings.
TYPE:
|
sep |
The separator used for separating fields.
TYPE:
|
skip |
Number of lines to skip at the beginning of each file.
TYPE:
|
colnames |
The first line of a CSV file usually contains the column names. When this is not the case, you can explicitly pass them. If you pass colnames, it is assumed that the CSV files do not contain a header, thus overriding the 'header' variable. |
RETURNS | DESCRIPTION |
---|---|
str
|
Appropriate |
Source code in getml/sqlite3/sniff_csv.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
sniff_pandas
Sniffs a pandas data frame.
PARAMETER | DESCRIPTION |
---|---|
table_name |
Name of the table in which the data is to be inserted.
TYPE:
|
data_frame |
The pandas.DataFrame to read into the table.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
Appropriate |
Source code in getml/sqlite3/sniff_pandas.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
to_list
Transforms a query or table into a list of lists. Returns a tuple which contains the column names and the actual data.
PARAMETER | DESCRIPTION |
---|---|
conn |
A sqlite3 connection created by
TYPE:
|
query |
The query used to get the table. You can also pass the name of the table, in which case the entire table will be imported.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[List[str], List[list]]
|
The column names and the data as a list of lists. |
Source code in getml/sqlite3/to_list.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
to_pandas
to_pandas(conn: Connection, query: str) -> DataFrame
Returns a table as a pandas.DataFrame.
PARAMETER | DESCRIPTION |
---|---|
conn |
A sqlite3 connection created by
TYPE:
|
query |
The query used to get the table. You can also pass the name of the table, in which case the entire table will be imported.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
The table as a pandas.DataFrame. |
Source code in getml/sqlite3/to_pandas.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|