cfl_data_utils.database package¶

pylint: disable=missing-docstring

Submodules¶

cfl_data_utils.database._database module¶

Extendable Database class for connecting to different database types within Chetwood Financial.

TODO

Implement time out functionality for reducing idle connections

class cfl_data_utils.database._database.Database(ssh_host=None, ssh_port=22, ssh_username=None, pkey_path=None, db_name=None, db_bind_address=None, db_host='127.0.0.1', db_port=None, db_user=None, db_password=None, stubbed=False, max_idle_time=30, disable_ssh_tunnel=False)[source]¶

Bases: object

Base database class

This is to be used as a super class by different database connectors

db_name¶

Name of the database to connect to

Type:	str

ssh_host¶

Host IP of the SSH tunnel

Type:	str

ssh_username¶

User for access to SSH tunnel

Type:	str

pkey_path¶

Path to the .pem file containing the PKey

Type:	str

db_bind_address¶

Binding address for the database

Type:	str

db_user¶

Database username

Type:	str

db_password¶

Database password

Type:	str

db_host¶

Host IP for the database, usually 127.0.0.1 after SSH tunneling

Type:	str

db_port¶

DB port

Type:	int

ssh_port¶

SSH Tunnel port, defaults to 22

Type:	int

stubbed¶

Flag to stub database calls

Type:	bool

max_idle_time¶

Maximum time connection can idle before being disconnected

Type:	int

commit()[source]¶

Commits all changes to database

Returns:	the Cursor object for continued usage

connect_to_db(disable_ssh_tunnel)[source]¶

Placeholder method for the actual connection to the DB

Parameters:	disable_ssh_tunnel (bool) – Determines whether an SSH tunnel should be used

df_from_query(stmt, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None)[source]¶

Generates a pandas dataframe from an SQL query. All parameter defaults match those of read_sql_query.

Parameters:

stmt (str) – The query to be executed by the Cursor object
index_col (Union[Iterable[str], str]) – Column(s) to set as index(MultiIndex).
coerce_float (bool) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point. Useful for SQL result sets.
params (Union[List, Tuple, Dict]) – List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent.
parse_dates (Union[List, Dict]) –
- List of column names to parse as dates.
- Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.
- Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.
chunksize (int) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.

Returns:

DataFrame object

df_to_table(df, name, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)[source]¶

Write records stored in a DataFrame to a SQL database.

Parameters:

df (DataFrame) – DataFrame to convert
name (str) – Name of table to be created
schema (str) – Specify the schema (if database flavor supports this)
if_exists (str) – How to behave if the table already exists
index (bool) – Write DataFrame index as a column. Uses index_label as the column name in the table.
index_label (Union[str, List]) – Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
chunksize (int) – Rows will be written in batches of this size at a time. By default, all rows will be written at once.
dtype (dict) – Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode.
method (Union[None, str, callable]) – Controls the SQL insertion clause used: None : Uses standard SQL INSERT clause (one per row). ‘multi’: Pass multiple values in a single INSERT clause. callable with signature (pd_table, conn, keys, data_iter)

disconnect(silent: bool = False)[source]¶

Disconnects from the database and the SSH tunnel

Parameters:	silent (bool) – suppresses final print statement

executemany(stmt: str, data)[source]¶

Parameters:	stmt (str) – The query to be executed by the Cursor object data (Union[List, Tuple]) – The data to be processed
Returns:	the Cursor object for continued usage

query(sql, commit=True)[source]¶

Executes a query passed in by using the DatabaseManager object

Parameters:	sql (str) – The sql query to be executed by the Cursor object commit (bool) – Committing on queries can be disabled for rapid writing (e.g. q_one commit at end)
Returns:	the Cursor object for continued usage

setup()[source]¶: Allows each database subclass to be setup differently according to the relevant dialect and driver

validate_setup(disable_ssh_tunnel)[source]¶

Validate the setup of the database: args, connections etc.

Parameters:	disable_ssh_tunnel (bool) – Determines whether an SSH tunnel should be used

cfl_data_utils.database.postgresql_manager module¶

Provides the PostgreSQLManager class to connect to a PostgreSQL instance

class cfl_data_utils.database.postgresql_manager.PostgreSQLManager(ssh_host=None, ssh_port=22, ssh_username=None, pkey_path=None, db_name=None, db_bind_address=None, db_host='127.0.0.1', db_port=None, db_user=None, db_password=None, stubbed=False, max_idle_time=30, disable_ssh_tunnel=False)[source]¶

Bases: cfl_data_utils.database._database.Database

Extension of the Database class specifically for connecting to a PostgreSQL instance

connect_to_db(disable_ssh_tunnel=False)[source]¶: Open the connection to the database

setup()[source]¶: Setup the SSH Tunnel and Database connection

cfl_data_utils.database.rds_manager module¶

Provides the RDSManager class to connect to the existing AWS RDS in use by CFL

class cfl_data_utils.database.rds_manager.RDSManager(ssh_host=None, ssh_port=22, ssh_username=None, pkey_path=None, db_name=None, db_bind_address=None, db_host='127.0.0.1', db_port=None, db_user=None, db_password=None, stubbed=False, max_idle_time=30, disable_ssh_tunnel=False)[source]¶

Bases: cfl_data_utils.database._database.Database

Extension of the Database class specifically for connecting to a MySQL AWS RDS

connect_to_db(disable_ssh_tunnel=False)[source]¶

Open the connection to the database

Parameters:	disable_ssh_tunnel (bool) – Determines whether an SSH tunnel should be used

setup()[source]¶: Setup the SSH Tunnel and Database connection