cfl_data_utils.database package¶
pylint: disable=missing-docstring
Submodules¶
cfl_data_utils.database._database module¶
Extendable Database class for connecting to different database types within Chetwood Financial.
- TODO
- Implement time out functionality for reducing idle connections
-
class
cfl_data_utils.database._database.
Database
(ssh_host=None, ssh_port=22, ssh_username=None, pkey_path=None, db_name=None, db_bind_address=None, db_host='127.0.0.1', db_port=None, db_user=None, db_password=None, stubbed=False, max_idle_time=30, disable_ssh_tunnel=False)[source]¶ Bases:
object
Base database class
This is to be used as a super class by different database connectors
-
db_name
¶ Name of the database to connect to
Type: str
-
ssh_host
¶ Host IP of the SSH tunnel
Type: str
-
ssh_username
¶ User for access to SSH tunnel
Type: str
-
pkey_path
¶ Path to the .pem file containing the PKey
Type: str
-
db_bind_address
¶ Binding address for the database
Type: str
-
db_user
¶ Database username
Type: str
-
db_password
¶ Database password
Type: str
-
db_host
¶ Host IP for the database, usually 127.0.0.1 after SSH tunneling
Type: str
-
db_port
¶ DB port
Type: int
-
ssh_port
¶ SSH Tunnel port, defaults to 22
Type: int
-
stubbed
¶ Flag to stub database calls
Type: bool
-
max_idle_time
¶ Maximum time connection can idle before being disconnected
Type: int
-
connect_to_db
(disable_ssh_tunnel)[source]¶ Placeholder method for the actual connection to the DB
Parameters: disable_ssh_tunnel (bool) – Determines whether an SSH tunnel should be used
-
df_from_query
(stmt, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None)[source]¶ Generates a pandas dataframe from an SQL query. All parameter defaults match those of read_sql_query.
Parameters: - stmt (str) – The query to be executed by the Cursor object
- index_col (Union[Iterable[str], str]) – Column(s) to set as index(MultiIndex).
- coerce_float (bool) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point. Useful for SQL result sets.
- params (Union[List, Tuple, Dict]) – List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent.
- parse_dates (Union[List, Dict]) –
- List of column names to parse as dates.
- Dict of
{column_name: format string}
where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps. - Dict of
{column_name: arg dict}
, where the arg dict corresponds to the keyword arguments ofpandas.to_datetime()
Especially useful with databases without native Datetime support, such as SQLite.
- chunksize (int) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.
Returns: DataFrame object
-
df_to_table
(df, name, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)[source]¶ Write records stored in a DataFrame to a SQL database.
Parameters: - df (DataFrame) – DataFrame to convert
- name (str) – Name of table to be created
- schema (str) – Specify the schema (if database flavor supports this)
- if_exists (str) – How to behave if the table already exists
- index (bool) – Write DataFrame index as a column. Uses index_label as the column name in the table.
- index_label (Union[str, List]) – Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
- chunksize (int) – Rows will be written in batches of this size at a time. By default, all rows will be written at once.
- dtype (dict) – Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode.
- method (Union[None, str, callable]) – Controls the SQL insertion clause used: None : Uses standard SQL INSERT clause (one per row). ‘multi’: Pass multiple values in a single INSERT clause. callable with signature (pd_table, conn, keys, data_iter)
-
disconnect
(silent: bool = False)[source]¶ Disconnects from the database and the SSH tunnel
Parameters: silent (bool) – suppresses final print statement
-
executemany
(stmt: str, data)[source]¶ Parameters: - stmt (str) – The query to be executed by the Cursor object
- data (Union[List, Tuple]) – The data to be processed
Returns: the Cursor object for continued usage
-
query
(sql, commit=True)[source]¶ Executes a query passed in by using the DatabaseManager object
Parameters: - sql (str) – The sql query to be executed by the Cursor object
- commit (bool) – Committing on queries can be disabled for rapid writing (e.g. q_one commit at end)
Returns: the Cursor object for continued usage
-
cfl_data_utils.database.postgresql_manager module¶
Provides the PostgreSQLManager class to connect to a PostgreSQL instance
-
class
cfl_data_utils.database.postgresql_manager.
PostgreSQLManager
(ssh_host=None, ssh_port=22, ssh_username=None, pkey_path=None, db_name=None, db_bind_address=None, db_host='127.0.0.1', db_port=None, db_user=None, db_password=None, stubbed=False, max_idle_time=30, disable_ssh_tunnel=False)[source]¶ Bases:
cfl_data_utils.database._database.Database
Extension of the Database class specifically for connecting to a PostgreSQL instance
cfl_data_utils.database.rds_manager module¶
Provides the RDSManager class to connect to the existing AWS RDS in use by CFL
-
class
cfl_data_utils.database.rds_manager.
RDSManager
(ssh_host=None, ssh_port=22, ssh_username=None, pkey_path=None, db_name=None, db_bind_address=None, db_host='127.0.0.1', db_port=None, db_user=None, db_password=None, stubbed=False, max_idle_time=30, disable_ssh_tunnel=False)[source]¶ Bases:
cfl_data_utils.database._database.Database
Extension of the Database class specifically for connecting to a MySQL AWS RDS