Direct database access

Introduction

Data providers (companies and other organisations) as well as authorised researchers can have direct access to their own data on the master PIISAC database. This allows any type of data extraction, analysis, mapping and reporting, which may enable analyses beyond those current available in existing data management systems. 

Contents

Who is this for?

Direct database access is designed for any data provider (company or other) or authorised researcher that needs low-level database access to authorised subsets of the PIISAC data.

What data can I access?

You can only access a limited subset of the available data, depending on the situation:

  • Company representative
    • Full data from your own company, but no data from any other company. You may also have access to public data sources that have been integrated into PIISAC.
  • Other data provider
    • Full access to the data you have provided, but no data provided by companies or other data providers. You may also have access to public data sources that have been integrated into PIISAC.
  • Authorised researcher
    • You will be provided with access to a prepared data set relevant to your research. This is likely to contain data from multiple sources (for example, multiple companies), but be de-identified and filtered to include only the required fields. 

How do I get access?

You can request direct database access by asking the data manager (Ausvet). 

Does it cost?

For data providers (e.g. participating companies) there is no extra cost for accessing the data – this is included as part of your participating in the system. 

For researchers, it is necessary to develop custom views to provide secure access to authorised subsets of the data. You will be charged a fee for the development of these views and setting up secure access. 

What skills do I need to use the data?

When you use direct database access, you are linking directly to the master PostgreSQL database. In order to extract data, you will need at least basic skills in SQL (Structured Query Language).  There is comprehensive documentation about PostgreSQL available, including detailed information on using the SELECT statement to extract data, and tutorial to get started. 

In addition, you need to know how to use the data once it is extracted.  You may want to summarise the data,  do statistical analysis, or visualise the data using graphs or maps. You will therefore need to have an understanding of the skills for data analysis and visualisation, using your choice of software. 

What tools do I need to use the data?

You will need:

  • a computer
  • a fixed IP address (for security reasons)
  • client software that can connect to the database and perform the analysis or visualisation that I want. 

This guide will explain how to use two different open source (free) software packages, but many others are available,

PgAdmin: a PostgreSQL client that allows you to ‘talk’ to the master database using SQL commands. This is very useful for exploring, extracting and summarising data. 

R statistical environment: This is powerful data analysis and visualisation software. It can be complex to learn how to use it, but is extremely powerful once you have mastered it.

QGIS: This is a geographical information system that is able to load data from a wide variety of data sources (including PostgreSQL), and display it on maps. 

Is direct database access secure?

There are several layers of security to ensure that your data is safe:

  • Fixed IP: You can only access your data from a single IP address that you have nominated. If anybody tries to access your account from a different IP address (i.e. from a different site) then they will be blocked. 
  • User name and password: To access your data you will need a user name and password, issued to you by Ausvet. This gives you access to your secure area of the PIISAC database. Of course, you need to protect this carefully. If other people discover your user name and password, they could access your data (but only if they access it from the authorised IP address – i.e. your computer). 
  • SSL encryption: All data transferred between the master database and your computer is encrypted using banking-level encryption standards. This makes it almost impossible for anybody to intercept the data while it is being transferred to your computer
  • Database permissions: Your account has been set up to provide you with permission to access only authorised data. Depending on the tool you use, you may be able to see that other tables and views exist, but you won’t be able to view the data in those tables and views. 

So how do I connect to the database?

Step 1: You will request access and tell Ausvet the fixed IP address that you will use to access the data.

Step 2: Ausvet will give you your security credentials which will include

  • Your user name
  • Your password
  • The IP address of the master database
  • The name of the database
  • The port used to access the database

Step 3: Use these details in your software to connect to the database, extract your data, and start analysing!

How to connect with different software?

This section contains instructions for three different software packages, depending on your needs. They are all based on the use of the Windows 10 operating system. For other systems, check the software website for instructions. 

pgAdmin

pgAdmin  4 is a PostgreSQL client, allowing communication with the PostgreSQL database using SQL commands. 

  1. Download the software (latest version, for PostgreSQL 10.5) from here (83 Mb) Install the software as usual for windows. 
  2. Start pgAdmin 4
  3. From the opening page, quick links, click on ‘Add New Server’
  4. In the Create – Server dialog:
    1. Enter a name for the server, for example ‘PIISAC’
    2. Click on the Connection tab
    3. Enter the port from your connection credentials
    4. For the Maintenance Database, enter the database name from your security credentials
    5. Enter your username
    6. Enter your password
    7. Do not check the ‘save password’ box
    8. Leave Role and Service blank
    9. Click on the SSL tab
    10. Change SSL Mode to ‘Require’
    11. Click ‘Save’
  5. On the browser tab click on the new database entry: PIISAC
  6. Click on the + sign next to Databases
  7. Click on the + sign next to Omnisyan
  8. Click on the + sign next to Schemas
  9. In the list of schemas, find the schema for your company or research project, and click on it
  10. Click on the + sign next the Views. This is a list of the data views that you have access to. 
  11. From the main menu, click on the Tools menu, then select ‘Query Tool’ to start writing an SQL query
  12. Enter the a query to extract data. For example (depending on the views you have access to) you could write: SELECT * from daily_records limit 10;
  13. Run the query by clicking on the the Run icon
  14. You will see the results of the query in the window. 

Note that pgAdmin 4 runs in a web browser. If you have any problems, you can try an older desktop version, pgAdmin 3

R

R is an advanced statistical environment with powerful analytical and graphical capabilities. It has become the de facto standard for publishing statistical tools. 

Part of the power of R is the enormous and constantly growing library of plug in tools (packages). These can be used to extend the system capabilities in terms of analysis, graphical outputs and connections. We will be using one of these packages to connect to PostgreSQL. 

R is a command-line driven tool, which means you need to know the right commands and syntax. The default installation has a simple graphical interface, but RStudio provides a more powerful interface, so we will be installing both the core R system, and RStudio. 

  1. Download the core R install files for Windows from a mirror site near you
  2. Install the software using the usual Windows method
  3. Download the RStudio interface from here and install as usual.
  4. Run RStudio
    1. From the Tools menu, click Install Packages
    2. Under Packages, enter RPostgres and click Install. Note, there is another package called RPostgreSQL, but this is not the one we want. 
  5. Connect to the PIISAC database from R
    1. The following R script illustrates the code to connect to the PIISAC database (using your connection credentials to replace the values in red), query the database, and display the results. You can use it as a template for your own data query and analysis.
# load the package to connect to the database
library(DBI)
# connect using the RPostgres connector, and your connection credentials
con <- dbConnect(RPostgres::Postgres(),
 dbname = 'database_name',
 host = 'IP_address', 
 port = port_number, 
 user = 'your_username', 
 sslmode = "require",
 password = 'your_password')

# ===========================================
# Example 1: number of records and date range
# ===========================================
# query the database
query <- dbSendQuery(con,
 "SELECT 
 count(*) as total_records,
 min(status_date) as start_date,
 max(status_date) as end_date
 FROM daily_records")

# get the results of the query
result <- dbFetch(query)

# display the result on the screen
result

# clear the query
dbClearResult(query)

# ===========================================
# Example 2: list the first 5 rows of the table
# ===========================================
query2 <- dbSendQuery(con, "SELECT * FROM daily_records limit 5")
result2 <- dbFetch(query2)
result2
dbClearResult(query2)

# ===========================================
# Example 3: query SRS mortality 
# by month of production, for 2017
# =========================================== 
query3 <- dbSendQuery(con, "SELECT srs_mort, 
 1+(productionday/30) as prod_month
FROM daily_records 
where date_part('year',status_date) = 2017")
result3 <- dbFetch(query3)

# plot the result 
plot(result3$prod_month, result3$srs_mort, 
 main="SRS mortality by production month",
 xlab="Production month", ylab="SRS mortality (fish)")
dbClearResult(query3)

# close the database connection when finished
dbDisconnect(con)

 

 

 

QGIS

QGIS is sophisticated geographical information system software for accessing spatial data, creating maps, and doing spatial data analysis. You will need an understanding of GIS concepts to use it effectively. The PIISAC PostgreSQL database incorporates PostGIS which enables you to use Spatial SQL to do powerful spatial data manipulation and analysis, in an SQL query. 

  1. Download and install QGIS using the normal approach. There are two installers available for Windows. The OSGeo4W is more flexible and powerful, but more complicated as well. The standalone installer for the most recent version of QGIS is usually the best one to use. 
  2. Once installed open the software.
  3. From the menu, select Layer, then Add layer, then Add PostGIS layer
  4. In the Data Source Manager dialog, under Connections, click the New button
  5. In the Create a New PostGIS Connection dialgog, enter the required information from your security credentials:
    1. Name: PIISAC 
    2. Service: leave blank
    3. Host: from your security credentials
    4. Port: from your security credentials
    5. Database: from your security credentials
    6. SSL Mode: Require
    7. Authentication: click on the Basic tab
      1. Enter your user name from your credentials, and click the Store check box
      2. Enter your password from your credentials but leave the Store checkbox unchecked. (You will have to enter your password each time you start a session, but this is more secure). 
    8. Click the Test Connection to make sure everything is working, then click OK
  6. Click Close. 
  7. There are two main ways to load data from the PIISAC database into a map in QGIS: loading a spatial layer (a table); or defining the spatial and other data you want using spatial SQL.
    1. Loading a spatial data layer from a table
      1. On the main QGIS window, find the Browser pane (usually top left), which lists the various possible data sources
      2. Find PostGIS and click on the arrow beside it to list the available connections
      3. Find the connection you created (PIISAC), and click on the arrow beside it to view the database schemas available. There will be many schemas listed but you don’t have access to most of them. The schemas that you need to use are your own dedicated schema (named after your company or research project); and the reference schema.
      4. Find your dedicated schema and click on the arrow to show the available tables
      5. Select one of the tables listed, and right-click. Then click the option Add selected layer(s) to canvas. The layer will appear on the map. 
      6. This approach is quick an easy, but only allows you to see the spatial data and other data in a single table. 
    2. Defining spatial data using SQL
      1. Using SQL allows you to join data from multiple tables, summaries it, and display it linked to the relevant spatial data. This is more complex but much more powerful. 
      2. From the main menu, select the Database menu, the DB Manager
      3. Click on the arrow next to PostGIS to list the available database connections
      4. Click on the arrow next to your connection (PIISAC) to open the database
      5. Click on the schemas to show the tables. Note that some table will not be available for you to access. 
      6. To write SQL to define the data you want, select Database from the main menu, and then SQL Window. Alternatively, click on the second icon. 
      7. Write your SQL in the top pane of the window. You can use Spatial SQL functions from PostGIS as well – look at the documentation
      8. Click the Execute button to run the SQL and see the results in the lower pane.
      9. Click the Load as new layer check box, then 
        1. Specify the key value column (i.e. the unique record ID)
        2. Specify the geometry column
        3. Click Load to display the layer with the selected attributes on the main map screen. (It will take a while for large datasets or complex spatial data to be downloaded from the server to your computer, so be patient). 
      10. You can then change the style of the features of your map as you wish. 

Example of spatial SQL to produce as simple map of total antibiotic usage by site for one year. 

select 
 row_number() over () as gid, 
 s.geom as geom,
 a.sitename, 
 sum(dose) as total_antibiotics
from antibiotics a
join sites s on s.sitename = a.sitename
where a.status_date between '2016-01-01' and '2017-01-01'
group by a.sitename, s.geom

What if I don’t have a fixed IP address?

By limiting access to specific IP addresses, we are able to prevent a huge proportion of data hacking attempts, and dramatically improve security.  By allowing access from any IP address, your data is at much greater risk. 

If you don’t have a static IP address, your options are:

  1. Contact your internet service provider and request moving to a static IP address.
  2. Check with your provider what the IP address range that may addressed to you is. Your can be assigned a group of IP addresses instead of just a single one. While this is still not ideal, it is much better no protection at all. 
  3. If you are only using your access occasionally, you could contact Ausvet to updated your IP address when you need to use the system (but you will lose access when your IP address changes). This is not ideal either, but may be adequate for very occasional access. 
  4. Ask Ausvet to remove the IP address security control. Sorry, we are responsible for the safety of your data, and we will say ‘No’.