Data providers (companies and other organisations) as well as authorised researchers can have direct access to their own data on the master PIISAC database. This allows any type of data extraction, analysis, mapping and reporting, which may enable analyses beyond those current available in existing data management systems.
- Who is this for?
- What data can I access?
- How do I get access?
- Does it cost?
- What skills do I need to use the data?
- What tools do I need to use the data?
- Is direct database access secure?
- So how do I connect to the database?
- How to connect with different software
- What if don’t have a fixed IP address?
Direct database access is designed for any data provider (company or other) or authorised researcher that needs low-level database access to authorised subsets of the PIISAC data.
You can only access a limited subset of the available data, depending on the situation:
- Company representative
- Full data from your own company, but no data from any other company. You may also have access to public data sources that have been integrated into PIISAC.
- Other data provider
- Full access to the data you have provided, but no data provided by companies or other data providers. You may also have access to public data sources that have been integrated into PIISAC.
- Authorised researcher
- You will be provided with access to a prepared data set relevant to your research. This is likely to contain data from multiple sources (for example, multiple companies), but be de-identified and filtered to include only the required fields.
You can request direct database access by asking the data manager (Ausvet).
For data providers (e.g. participating companies) there is no extra cost for accessing the data – this is included as part of your participating in the system.
For researchers, it is necessary to develop custom views to provide secure access to authorised subsets of the data. You will be charged a fee for the development of these views and setting up secure access.
When you use direct database access, you are linking directly to the master PostgreSQL database. In order to extract data, you will need at least basic skills in SQL (Structured Query Language). There is comprehensive documentation about PostgreSQL available, including detailed information on using the SELECT statement to extract data, and tutorial to get started.
In addition, you need to know how to use the data once it is extracted. You may want to summarise the data, do statistical analysis, or visualise the data using graphs or maps. You will therefore need to have an understanding of the skills for data analysis and visualisation, using your choice of software.
You will need:
- a computer
- a fixed IP address (for security reasons)
- client software that can connect to the database and perform the analysis or visualisation that I want.
This guide will explain how to use two different open source (free) software packages, but many others are available,
PgAdmin: a PostgreSQL client that allows you to ‘talk’ to the master database using SQL commands. This is very useful for exploring, extracting and summarising data.
R statistical environment: This is powerful data analysis and visualisation software. It can be complex to learn how to use it, but is extremely powerful once you have mastered it.
QGIS: This is a geographical information system that is able to load data from a wide variety of data sources (including PostgreSQL), and display it on maps.
There are several layers of security to ensure that your data is safe:
- Fixed IP: You can only access your data from a single IP address that you have nominated. If anybody tries to access your account from a different IP address (i.e. from a different site) then they will be blocked.
- User name and password: To access your data you will need a user name and password, issued to you by Ausvet. This gives you access to your secure area of the PIISAC database. Of course, you need to protect this carefully. If other people discover your user name and password, they could access your data (but only if they access it from the authorised IP address – i.e. your computer).
- SSL encryption: All data transferred between the master database and your computer is encrypted using banking-level encryption standards. This makes it almost impossible for anybody to intercept the data while it is being transferred to your computer
- Database permissions: Your account has been set up to provide you with permission to access only authorised data. Depending on the tool you use, you may be able to see that other tables and views exist, but you won’t be able to view the data in those tables and views.
Step 1: You will request access and tell Ausvet the fixed IP address that you will use to access the data.
Step 2: Ausvet will give you your security credentials which will include
- Your user name
- Your password
- The IP address of the master database
- The name of the database
- The port used to access the database
Step 3: Use these details in your software to connect to the database, extract your data, and start analysing!
This section contains instructions for three different software packages, depending on your needs. They are all based on the use of the Windows 10 operating system. For other systems, check the software website for instructions.
pgAdmin 4 is a PostgreSQL client, allowing communication with the PostgreSQL database using SQL commands.
- Download the software (latest version, for PostgreSQL 10.5) from here (83 Mb) Install the software as usual for windows.
- Start pgAdmin 4
- From the opening page, quick links, click on ‘Add New Server’
- In the Create – Server dialog:
- Enter a name for the server, for example ‘PIISAC’
- Click on the Connection tab
- Enter the port from your connection credentials
- For the Maintenance Database, enter the database name from your security credentials
- Enter your username
- Enter your password
- Do not check the ‘save password’ box
- Leave Role and Service blank
- Click on the SSL tab
- Change SSL Mode to ‘Require’
- Click ‘Save’
- On the browser tab click on the new database entry: PIISAC
- Click on the + sign next to Databases
- Click on the + sign next to Omnisyan
- Click on the + sign next to Schemas
- In the list of schemas, find the schema for your company or research project, and click on it
- Click on the + sign next the Views. This is a list of the data views that you have access to.
- From the main menu, click on the Tools menu, then select ‘Query Tool’ to start writing an SQL query
- Enter the a query to extract data. For example (depending on the views you have access to) you could write: SELECT * from daily_records limit 10;
- Run the query by clicking on the the Run icon
- You will see the results of the query in the window.
Note that pgAdmin 4 runs in a web browser. If you have any problems, you can try an older desktop version, pgAdmin 3.
R is an advanced statistical environment with powerful analytical and graphical capabilities. It has become the de facto standard for publishing statistical tools.
Part of the power of R is the enormous and constantly growing library of plug in tools (packages). These can be used to extend the system capabilities in terms of analysis, graphical outputs and connections. We will be using one of these packages to connect to PostgreSQL.
R is a command-line driven tool, which means you need to know the right commands and syntax. The default installation has a simple graphical interface, but RStudio provides a more powerful interface, so we will be installing both the core R system, and RStudio.
- Download the core R install files for Windows from a mirror site near you
- Install the software using the usual Windows method
- Download the RStudio interface from here and install as usual.
- Run RStudio
- From the Tools menu, click Install Packages
- Under Packages, enter RPostgres and click Install. Note, there is another package called RPostgreSQL, but this is not the one we want.
- Connect to the PIISAC database from R
- The following R script illustrates the code to connect to the PIISAC database (using your connection credentials to replace the values in red), query the database, and display the results. You can use it as a template for your own data query and analysis.
# load the package to connect to the database library(DBI) # connect using the RPostgres connector, and your connection credentials con <- dbConnect(RPostgres::Postgres(), dbname = 'database_name', host = 'IP_address', port = port_number, user = 'your_username', sslmode = "require", password = 'your_password') # =========================================== # Example 1: number of records and date range # =========================================== # query the database query <- dbSendQuery(con, "SELECT count(*) as total_records, min(status_date) as start_date, max(status_date) as end_date FROM daily_records") # get the results of the query result <- dbFetch(query) # display the result on the screen result # clear the query dbClearResult(query) # =========================================== # Example 2: list the first 5 rows of the table # =========================================== query2 <- dbSendQuery(con, "SELECT * FROM daily_records limit 5") result2 <- dbFetch(query2) result2 dbClearResult(query2) # =========================================== # Example 3: query SRS mortality # by month of production, for 2017 # =========================================== query3 <- dbSendQuery(con, "SELECT srs_mort, 1+(productionday/30) as prod_month FROM daily_records where date_part('year',status_date) = 2017") result3 <- dbFetch(query3) # plot the result plot(result3$prod_month, result3$srs_mort, main="SRS mortality by production month", xlab="Production month", ylab="SRS mortality (fish)") dbClearResult(query3) # close the database connection when finished dbDisconnect(con)
QGIS is sophisticated geographical information system software for accessing spatial data, creating maps, and doing spatial data analysis. You will need an understanding of GIS concepts to use it effectively. The PIISAC PostgreSQL database incorporates PostGIS which enables you to use Spatial SQL to do powerful spatial data manipulation and analysis, in an SQL query.
- Download and install QGIS using the normal approach. There are two installers available for Windows. The OSGeo4W is more flexible and powerful, but more complicated as well. The standalone installer for the most recent version of QGIS is usually the best one to use.
- Once installed open the software.
- From the menu, select Layer, then Add layer, then Add PostGIS layer.
- In the Data Source Manager dialog, under Connections, click the New button
- In the Create a New PostGIS Connection dialgog, enter the required information from your security credentials:
- Name: PIISAC
- Service: leave blank
- Host: from your security credentials
- Port: from your security credentials
- Database: from your security credentials
- SSL Mode: Require
- Authentication: click on the Basic tab
- Enter your user name from your credentials, and click the Store check box
- Enter your password from your credentials but leave the Store checkbox unchecked. (You will have to enter your password each time you start a session, but this is more secure).
- Click the Test Connection to make sure everything is working, then click OK
- Click Close.
- There are two main ways to load data from the PIISAC database into a map in QGIS: loading a spatial layer (a table); or defining the spatial and other data you want using spatial SQL.
- Loading a spatial data layer from a table
- On the main QGIS window, find the Browser pane (usually top left), which lists the various possible data sources
- Find PostGIS and click on the arrow beside it to list the available connections
- Find the connection you created (PIISAC), and click on the arrow beside it to view the database schemas available. There will be many schemas listed but you don’t have access to most of them. The schemas that you need to use are your own dedicated schema (named after your company or research project); and the reference schema.
- Find your dedicated schema and click on the arrow to show the available tables
- Select one of the tables listed, and right-click. Then click the option Add selected layer(s) to canvas. The layer will appear on the map.
- This approach is quick an easy, but only allows you to see the spatial data and other data in a single table.
- Defining spatial data using SQL
- Using SQL allows you to join data from multiple tables, summaries it, and display it linked to the relevant spatial data. This is more complex but much more powerful.
- From the main menu, select the Database menu, the DB Manager
- Click on the arrow next to PostGIS to list the available database connections
- Click on the arrow next to your connection (PIISAC) to open the database
- Click on the schemas to show the tables. Note that some table will not be available for you to access.
- To write SQL to define the data you want, select Database from the main menu, and then SQL Window. Alternatively, click on the second icon.
- Write your SQL in the top pane of the window. You can use Spatial SQL functions from PostGIS as well – look at the documentation.
- Click the Execute button to run the SQL and see the results in the lower pane.
- Click the Load as new layer check box, then
- Specify the key value column (i.e. the unique record ID)
- Specify the geometry column
- Click Load to display the layer with the selected attributes on the main map screen. (It will take a while for large datasets or complex spatial data to be downloaded from the server to your computer, so be patient).
- You can then change the style of the features of your map as you wish.
- Loading a spatial data layer from a table
Example of spatial SQL to produce as simple map of total antibiotic usage by site for one year.
select row_number() over () as gid, s.geom as geom, a.sitename, sum(dose) as total_antibiotics from antibiotics a join sites s on s.sitename = a.sitename where a.status_date between '2016-01-01' and '2017-01-01' group by a.sitename, s.geom
By limiting access to specific IP addresses, we are able to prevent a huge proportion of data hacking attempts, and dramatically improve security. By allowing access from any IP address, your data is at much greater risk.
If you don’t have a static IP address, your options are:
- Contact your internet service provider and request moving to a static IP address.
- Check with your provider what the IP address range that may addressed to you is. Your can be assigned a group of IP addresses instead of just a single one. While this is still not ideal, it is much better no protection at all.
- If you are only using your access occasionally, you could contact Ausvet to updated your IP address when you need to use the system (but you will lose access when your IP address changes). This is not ideal either, but may be adequate for very occasional access.
- Ask Ausvet to remove the IP address security control. Sorry, we are responsible for the safety of your data, and we will say ‘No’.