cfl_data_utils.database package

pylint: disable=missing-docstring

Submodules

cfl_data_utils.database._database module

Extendable Database class for connecting to different database types within Chetwood Financial.

TODO
  • Implement time out functionality for reducing idle connections
class cfl_data_utils.database._database.Database(ssh_host=None, ssh_port=22, ssh_username=None, pkey_path=None, db_name=None, db_bind_address=None, db_host='127.0.0.1', db_port=None, db_user=None, db_password=None, stubbed=False, max_idle_time=30, disable_ssh_tunnel=False)[source]

Bases: object

Base database class

This is to be used as a super class by different database connectors

db_name

Name of the database to connect to

Type:str
ssh_host

Host IP of the SSH tunnel

Type:str
ssh_username

User for access to SSH tunnel

Type:str
pkey_path

Path to the .pem file containing the PKey

Type:str
db_bind_address

Binding address for the database

Type:str
db_user

Database username

Type:str
db_password

Database password

Type:str
db_host

Host IP for the database, usually 127.0.0.1 after SSH tunneling

Type:str
db_port

DB port

Type:int
ssh_port

SSH Tunnel port, defaults to 22

Type:int
stubbed

Flag to stub database calls

Type:bool
max_idle_time

Maximum time connection can idle before being disconnected

Type:int
commit()[source]

Commits all changes to database

Returns:the Cursor object for continued usage
connect_to_db(disable_ssh_tunnel)[source]

Placeholder method for the actual connection to the DB

Parameters:disable_ssh_tunnel (bool) – Determines whether an SSH tunnel should be used
df_from_query(stmt, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None)[source]

Generates a pandas dataframe from an SQL query. All parameter defaults match those of read_sql_query.

Parameters:
  • stmt (str) – The query to be executed by the Cursor object
  • index_col (Union[Iterable[str], str]) – Column(s) to set as index(MultiIndex).
  • coerce_float (bool) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point. Useful for SQL result sets.
  • params (Union[List, Tuple, Dict]) – List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent.
  • parse_dates (Union[List, Dict]) –
    • List of column names to parse as dates.
    • Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.
    • Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.
  • chunksize (int) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.
Returns:

DataFrame object

df_to_table(df, name, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)[source]

Write records stored in a DataFrame to a SQL database.

Parameters:
  • df (DataFrame) – DataFrame to convert
  • name (str) – Name of table to be created
  • schema (str) – Specify the schema (if database flavor supports this)
  • if_exists (str) – How to behave if the table already exists
  • index (bool) – Write DataFrame index as a column. Uses index_label as the column name in the table.
  • index_label (Union[str, List]) – Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
  • chunksize (int) – Rows will be written in batches of this size at a time. By default, all rows will be written at once.
  • dtype (dict) – Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode.
  • method (Union[None, str, callable]) – Controls the SQL insertion clause used: None : Uses standard SQL INSERT clause (one per row). ‘multi’: Pass multiple values in a single INSERT clause. callable with signature (pd_table, conn, keys, data_iter)
disconnect(silent: bool = False)[source]

Disconnects from the database and the SSH tunnel

Parameters:silent (bool) – suppresses final print statement
executemany(stmt: str, data)[source]
Parameters:
  • stmt (str) – The query to be executed by the Cursor object
  • data (Union[List, Tuple]) – The data to be processed
Returns:

the Cursor object for continued usage

query(sql, commit=True)[source]

Executes a query passed in by using the DatabaseManager object

Parameters:
  • sql (str) – The sql query to be executed by the Cursor object
  • commit (bool) – Committing on queries can be disabled for rapid writing (e.g. q_one commit at end)
Returns:

the Cursor object for continued usage

setup()[source]

Allows each database subclass to be setup differently according to the relevant dialect and driver

validate_setup(disable_ssh_tunnel)[source]

Validate the setup of the database: args, connections etc.

Parameters:disable_ssh_tunnel (bool) – Determines whether an SSH tunnel should be used

cfl_data_utils.database.postgresql_manager module

Provides the PostgreSQLManager class to connect to a PostgreSQL instance

class cfl_data_utils.database.postgresql_manager.PostgreSQLManager(ssh_host=None, ssh_port=22, ssh_username=None, pkey_path=None, db_name=None, db_bind_address=None, db_host='127.0.0.1', db_port=None, db_user=None, db_password=None, stubbed=False, max_idle_time=30, disable_ssh_tunnel=False)[source]

Bases: cfl_data_utils.database._database.Database

Extension of the Database class specifically for connecting to a PostgreSQL instance

connect_to_db(disable_ssh_tunnel=False)[source]

Open the connection to the database

setup()[source]

Setup the SSH Tunnel and Database connection

cfl_data_utils.database.rds_manager module

Provides the RDSManager class to connect to the existing AWS RDS in use by CFL

class cfl_data_utils.database.rds_manager.RDSManager(ssh_host=None, ssh_port=22, ssh_username=None, pkey_path=None, db_name=None, db_bind_address=None, db_host='127.0.0.1', db_port=None, db_user=None, db_password=None, stubbed=False, max_idle_time=30, disable_ssh_tunnel=False)[source]

Bases: cfl_data_utils.database._database.Database

Extension of the Database class specifically for connecting to a MySQL AWS RDS

connect_to_db(disable_ssh_tunnel=False)[source]

Open the connection to the database

Parameters:disable_ssh_tunnel (bool) – Determines whether an SSH tunnel should be used
setup()[source]

Setup the SSH Tunnel and Database connection