--- title: "An introduction to winfapReader" output: rmarkdown::html_document: theme: cerulean toc: TRUE vignette: > %\VignetteIndexEntry{IntroVignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r,echo=FALSE} isRNRFA <- requireNamespace("rnrfa", quietly = TRUE) isZOO <- requireNamespace("zoo", quietly = TRUE) isHTTR <- requireNamespace("httr", quietly = TRUE) isUTF8 <- requireNamespace("utf8", quietly = TRUE) is1 <- (isHTTR & isUTF8) is2 <- (is1 & isRNRFA & isZOO) ``` ```{r,echo=FALSE,eval=isUTF8} library(utf8) ``` ```{r,echo=FALSE,eval=TRUE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) #'arg' should be one of “default”, “cerulean”, “journal”, “flatly”, “darkly”, “readable”, “spacelab”, “united”, “cosmo”, “lumen”, “paper”, “sandstone”, “simplex”, “yeti” ``` The `winfapReader` package contains functions to interact with the information on extremes of instantaneous river flow in the United Kingdom (UK) made available by the [National River Flow Archive](https://nrfa.ceh.ac.uk) (NRFA). These information underlie most flood risk estimation projects in the UK, which are typically carried out using the [Flood Estimation Handbook](https://www.ceh.ac.uk/our-science/projects/flood-estimation-handbook) (FEH) statistical method and their [updates](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/291096/scho0608boff-e-e.pdf) as implemented by the software [WINFAP](https://www.hydrosolutions.co.uk/software/winfap-4/). Consequently, the NRFA publishes and routinely updates files which are suitable to be read by WINFAP: the collection of these files is referred to as the peak flow dataset and can be found [here](https://nrfa.ceh.ac.uk/peak-flow-dataset). The winfapReader package allows the user to interact with three different file-types: + The .AM files which contain the Annual Maximum (AMAX) peaks: these correspond to the largest river flow event in any given water year (which runs from October 1st to September 30th) + The .CD3 files which contain the Catchment Descriptors: these correspond to a set of descriptors for the catchment upstream the gauging station and for the station itself. + The .PT files which contain the peaks over threshold: these correspond to all peaks which are larger than a given threshold. The threshold is fixed by the NRFA and it should be such that there is an average of 3 to 5 peaks over threshold (POT) events per water year. It has often been reported that the POT records in different stations have varying reliability: since most flood frequency estimation methods used in the UK rely on annual maxima the AMAX records go trough a higher scrutiny than then POT records. Users should treat the information about peaks over threshold with caution and thorough quality checks should be performed before analysing them. The `winfapReader` package allows you to read into your R session the `.AM`, `.CD3` and`.PT` files. Importantly it is aware of the typical structure of the files in which rejected annual maxima and missing period of records for the peaks over threshold are recorded, and merges this information with the flow records. This allows the user to have all useful information to decide which parts of the record to include in the analysis. Recently the NRFA has developed an [API](https://nrfa.ceh.ac.uk/news-and-media/news/nrfa-releases-api-improve-data-access) which allows for a programmatic interaction with their datasets: the information about annual maxima, catchment descriptors and peaks over threshold can also be retrieved using this API. Beside the information on extremes for flood frequency estimation the NRFA maintains and distributes daily river flow records and several other river flow related variables, such as catchment averaged rainfall: the [`rnrfa`](https://cran.r-project.org/package=rnrfa) package allows one to retrieve these information and more with its `rnrfa::gdf` and `rnrfa::get_ts` functions (see more on this at the end of the vignette). The `winfapReader` package focuses only on handling river flow extremes information and has two sets of functions: + the `read_amax`, `read_cd3` and `read_pot` functions read the information from the `.AM`, `.CD3` and `.PT` files *once these have been downloaded into a local folder* + the `get_amax`, `get_cd` and `get_pot` functions get the information from the API: these functions therefore only work when an internet connection is available It is difficult to showcase the use of the `read_*` functions since these rely on the location of the WINFAP files within the users' working environment. Only the use of the `get_*` function will be showcased below. For the annual maxima and peaks over threshold the two sets of functions give the same output. ```{r setup} library(winfapReader) ### the get_* functions only works once you are connected to the internet ### they also need one to have the library httr installed ### verify if you have the library with (!requireNamespace("httr", quietly = TRUE)) ### if FALSE install it with ### install.packages("httr") ``` ### The `get_amax`function The `get_amax` function allows one to obtain information on annual maxima from the NRFA. The `read_amax` function will produce the same output as the `get_amax` function but is based on the locally saved files. ```{r showAmax, eval=is1} if(curl::has_internet()) amaxEx <- get_amax(c(42003,72014)) names(amaxEx); class(amaxEx) # let's look at only one of these a42003 <- amaxEx[["42003"]] ## what is the output head(a42003) ``` For each station the function outputs a `data.frame` with information on the station number, the water year, the date in which the highest flow in the water year was recorded, the river flow value and the river stage value (when available) for all annual maxima recorded at a station. Moreover it gives the information on whether the NRFA has deemed the maximum in a given year to be reliable or whether this has been rejected. The function can query the API for more than one station at the time: in that case the output is a named list with each element corresponding to a station id. ### The `get_pot`function The `get_pot` function allows one to obtain information on peaks over threshold data from the NRFA. The `read_pot` function will produce the same output as the `get_pot` function but is based on the locally saved files. ```{r showPOT, eval=is1} if(curl::has_internet()) potEx <- get_pot(c(42003,72014)) names(potEx); class(potEx) # let's look at only one of these p42003 <- potEx[["42003"]] ## what is the output class(p42003); names(p42003) ``` For each station the function outputs a list with three elements: * `tablePOT`: a `data.frame` with all the recorded exceedances above the threshold in the NRFA record. In particular information on the exceedance date, water year, peak flow and river stage are given. ```{r showtablePOT, eval=is1} head(p42003$tablePOT) ## notice: several events in the 1982 no events in 1983 ``` * `WaterYearInfo`: a `data.frame` with information on the percentage of valid record in each water year in the record. The potPercComplete column is derived by calculating the percentage of days which are not included in the POT Gaps or the POT rejected headers in the NRFA .PT files. The column potThreshold gives the information of the flow threshold used to extract the peaks for the station: this is a constant for each station. ```{r showWaterYearInfo, eval=is1} head(p42003$WaterYearInfo) ``` * `dateRange` gives the range of dates spanned by the POT record. This range might be wider than the range of the dates in the `tablePOT` table since it records the period in which the station was operational and no threshold exceedances occurred. ```{r showdateRange, eval=is1} (p42003$dateRange) ``` The function has an argument `getAmax` which defaults to `FALSE`. If `getAmax = TRUE` then information on the annual maxima is included in the `WaterYearInfo` table. ```{r showWaterYearInfowithAmax, eval=is1} p42003withAmax <- get_pot(42003, getAmax = TRUE) head(p42003withAmax$WaterYearInfo, 10) ``` Notice that in the period when no POT records are available all POT related information are set to NA. On the other hand, the fact that the annual maximum in water year 1983 is below the threshold confirms that the fact that no POT record are present for that water year is related to low flows throughout the water year rather than a mistake in the POT record. Notice also that for several of the first years in the record the annual maxima values are rejected and the proportion of valid POT records (as shown by `potPercComplete`) is null: the early part of the record for this station has been deemed by the NRFA to be unreliable and any analysis of this flow record should probably discard the information till water year 1995. ### The `get_cd`function The `get_cd` function allows the user to obtain information on the station (for example its location) and on the catchment upstream the station itself (for example the catchment area and the annual mean altitude for the catchment). More detail on several of the catchment descriptors can be found on the [NRFA website](https://nrfa.ceh.ac.uk/about-data) and in the FEH. The function gives a slightly different set of information than the `read_cd3` function, due to the difference in information made available by the NRFA API. ```{r showCD, eval=is1} if(curl::has_internet()) cdEx <- get_cd(c(42003,72014)) names(cdEx); class(cdEx) # let's look at only one of these c42003 <- cdEx[["42003"]] ## what is the output class(c42003); names(c42003) ``` The function has an argument `fields` which governs the amount of information obtained from the API. If `fields = "feh"` (the default) only the basic information used in the FEH methods is output. If `fields="all"` a data.frame with 104 columns is output. This contains several information about the station and the catchment, including data availability, land cover information and much more. ```{r showCDall, eval=is1} if(curl::has_internet()) cd42003all <- get_cd(42003, fields = "all") names(cd42003all) ``` ### Combining the `winfapReader` and `rnrfa` packages The `rnrfa` package provides a unique way to query several types of data from the NRFA. Information about extremes can also be retrieved using the `rnrfa` package, although there are some differences in the output provided when the data of interest are the peaks over threshold records. The `rnrfa::catalogue` function allows one to pull the list of stations (and related metadata), falling within a given bounding box. The metadata retrieved by the function are similar to the ones derived from `winfapReader::get_cd`. This function can be used to identify the stations in an area for which peak flow information can be obtained with `winfapReader`. The code below for example identifies stations surrounding the city of Lancaster and then displays the annual maxima flow with red lines indicating Rejected flow values. Notice that you would need to have the `rnrfa` package installed for the code below to work. ```{r, eval=is2} ## Lancaster coordinates: 54.04, -2.8 ## let's look around the city rivLanc <- rnrfa::catalogue(bbox = list(lat_min = 54.04-0.2, lat_max = 54.04+0.2, lon_min = -2.8-0.2, lon_max = -2.8+0.2)) ### let's select stations which have been deemed to be suitable for pooling ### that's the highest quality flag for annual maxima table(rivLanc[,"feh-pooling"]) ### 5 stations are suitable for pooling rivLanc <- subset(rivLanc,subset = as.vector(rivLanc[,"feh-pooling",drop=TRUE])) rivLanc[,1:3] ### notice that rnrfa outputs a tibble and not a data.frame idLanc <- rivLanc[,"id",drop=TRUE] ## a vector of ids amaxLanc <- winfapReader::get_amax(idLanc) names(amaxLanc) ``` Now display the stations all together in a panel. ```{r, echo=TRUE, eval=is2} par(mfrow=c(2,3)) invisible( sapply(amaxLanc, function(x) with(x,plot(WaterYear,Flow, type="h",col=ifelse(Rejected,2,4), main = unique(Station))))) ``` The large events which have hit the area in 2015 can be seen in the flow series plots. The `rnrfa` package also allows to pull the annual maximum flow recorded at any station. To also obtain the information about the water year which the NRFA has deemed to be of poor quality and therefore rejected set the `full_info` argument to `TRUE`. ```{r, eval=is2} par(mfrow=c(1,1)) ### the annual maxima for 72014 from rnrfa maxflow72014 <- rnrfa::get_ts(72014, type = "amax-flow", full_info = TRUE) ### the annual maxima for 72014 from winfapReader xx <- amaxLanc[["72014"]][,c("Date","Flow","Rejected")] plot(xx[,"Flow"], maxflow72014[,"amax-flow"], xlab = "data from winfapReader", ylab = "data from rnrfa"); abline(0,1) ### same information which(xx$Rejected) ## but two years should be rejected which(maxflow72014$rejected == 1) ## same two years ``` To obtain the POT records in `rnrfa` use `type = "pot-flow"`: using the `full_info = TRUE` option ensures that a rejected flag is given for the periods in which the POT records have been found to be unreliable or missing (see the NRFA website for more details on this). The `rejected` flag is built using the same information used to build the `WaterYearInfo` table in the `winfapReader::get_pot` function. The additional information provided in `WaterYearInfo` is useful to identify the years in which no POT record is found because the records are missing/unreliable and not because the threshold was never exceeded. ```{r, eval=is2} par(mfrow=c(1,1)) # the pot records for 75001 from rnrfa pot75001 <- rnrfa::get_ts(75001, type = "pot-flow", full_info = TRUE) pot75001[9:12,] # using winfapReader p75001 <- get_pot(75001) p75001$tablePOT[9:12,] # the same peaks are identified p75001$WaterYearInfo[1:5,] ### but notice that 1975 had a low proportion missing records # the lack of data in 1975 is due to all flow being low ``` The two packages can be used together to retrieve different type of information about river flow: in the example below daily gauged flow for the Conder at Galgate (station 72014) is displayed together with annual maxima (which are extracted from the instantaneous river flow). The latter are typically larger and can be seen to start further in the past than the daily flow data. ```{r dayAndAmax, eval=is2} ### get daily data from NRFA daily72014 <- rnrfa::get_ts(72014, type = "gdf") ## make daily data into data.frame daily72014 <- data.frame(Day = zoo::index(daily72014), DFlow = as.vector(daily72014)) plot(xx[,c("Date","Flow")], col = ifelse(xx$Rejected, 2, 4), pch = 4, ylim =c(0,1.05*max(xx$Flow))) title(main = "The Conder at Galgate") points(daily72014, type="l") ```