This readme file was generated on 2023-04-21 by Carolin Vorstius. GENERAL INFORMATION Title of dataset: Climate change risk to raw water quality data set 2: Subset catchment characteristics & water quality summary DOI: 10.15132/10000198 File name: DS2_SWCatchmentWaterQualitySubset.csv Context of data creation: The data was created as part of a PhD thesis investigating links between catchment characteristics and water quality to assess impacts of climate change on raw water quality. Description of dataset: The file contains information describing catchment characteristics for selected public water supply sources in Scotland, together with summary statistics of selected water quality indicators per catchment. The data were created using catchment boundaries, water quality data (2011 - 2016) from routine source water sampling, and freely available datasets describing natural and anthropogenic conditions (see section Methodological Information). The dataset comprises a subset of catchments provided by Scotland's public water supplier, Scottish Water. Author: Name: Carolin Vorstius ORCID: 0000-0003-0627-3854 Institution: University of Dundee Address: Nethergate, Dundee, DD1 4 HN, Scotland, UK Email: Supervisor: Name: John Rowan Institution: University of Dundee Email: Date of data creation: 2016-10-01 – 2017-03-31 Geographic location: Scotland Funding sources: The PhD was funded by the Scottish government under the Hydro Nation Scholarship Programme. SHARING/ACCESS INFORMATION Licences: CC-BY (except water quality data) Links to publications: Assessing and managing risks from climate change in drinking water supply sources – safeguarding raw water quality through improving catchment resilience (PhD thesis) Links to other datasets: The following datasets were also created under the above named thesis: Climate change risk to raw water quality data set 1: Catchment characteristics; DOI 10.15132/10000197; “DS1_SWCatchmentdataall.csv” Climate change risk to raw water quality data set 3: Water quality (TOC, Iron, Manganese, Turbidity & Colour); DOI 10.15132/10000199; “DS3_TOCColourSeries.csv” Climate change risk to raw water quality data set 4: Colour time series; DOI 10.15132/10000200; “DS4_ColourTimeSeries20102016.csv” Climate change risk to raw water quality data set 5: TOC & climate data; DOI 10.15132/10000201; “DS5_TOCClimateSeries20132016.csv” Climate change risk to raw water quality data set 6: E. coli & rainfall data; DOI 10.15132/10000202; “DS6_EcoliRainfallSeries20102016.csv” Data from other sources: To create this data, other source data has been used and licences need to be considered for reuse of this dataset (see links to datasets provided). To reuse water quality data, permission from Scottish Water needs to be sought. METHODOLOGICAL INFORMATION Methods for data creation: Catchment characteristics data created using an ESRI shapefile provided by Scottish Water for catchment boundaries of supply sources and datasets as described below. Data created in ArcGIS unless otherwise stated. Water quality summary statistics were calculated in R. Variables: GENERAL ID: Catchment ID Kind of source: River, Loch, Impounding reservoir; information provided with catchment boundaries. Area: in km^2, calculated from shapefile. TOPOGRAPHY: OS Terrain DEM (50m) downloaded from Digimap ( DEM tiles put together using the ‘Mosaic’ tool. ElevationMean: Mean catchment elevation in m AOD, derived through ‘Zonal Statistics as Table’ (or ‘Zonal Statistics’ in QGIS). ElevationMax: Maximum catchment elevation, as above. ElevationMin: minimum catchment elevation, as above. ElevationReliefRatio: (Mean elevation – minimum elevation) / (maximum elevation – mean elevation) BasinLength: Maximum length of river basin in m, calculated by putting a rectangle (by width) around the catchment using the ‘Minimum Bounding Geometry’ tool (add geometry characteristics). ReliefRatio: (Maximum elevation – Mean elevation) / Maximum length of river basin ElongationRatio: Ratio of the diameter of a circle with the same area as that of the basin, to the maximum basin length. SlopeLittle: Percentage of slopes in the catchment with 0-3°. Calculated using the ‘Slope’ tool, raster then converted to shapefile, percentage for each catchment calculated using ‘Tabulate Intersection’. SlopeModerate: Percentage of slope in the catchment with 4-16°. Calculated as above. SlopeSteep: Percentage of slope in the catchment with >16°. Calculated as above. Aspect: Percentage of aspects in the catchment facing south or southwest (158°-247°). Calculated using the ‘Aspect’ tool, raster reclassified into 0 (0-157 and 248-360 degrees) and 1 (158-247 degrees, South and Southwest), then converted into shapefile, percentage for each catchment calculated using ‘Tabulate Intersection’. CONTINENTALITY DistanceSea: Distance in m to nearest sea. Using polyline file of the Scottish coastline (, derived using the ‘Generate Near Table’ tool, using the coastline as ‘Near feature’. AnnualTemperatureRange: Taken as the difference between the maximum and the minimum mean monthly temperature. Conrads: Hyper-oceanic when CCI in between -20 to 20, oceanic/ maritime when CCI in between 20 and 50; sub-continental when CCI in between 50 and 60; continental when CCI in between 60 to 80 and as extreme/hyper-continental climate when CCI in between 80 and 120 (Gadiwala et al., 2013). Calculated with (1.7*Annual temperature range/sin (latitude+10))-14 (Conrad, 1946; Snow, 2005). Latitude derived through converting the XY coordinates of the centroids of the catchment polygons on GEOLOGY: BGS Geology 625k ( Converted shapefile into raster using the overall class, then reclassified raster into smaller classes. Then converted into shapefile and used ‘Tabulate Intersection’ to derive percentages for each catchment. GeologyIgneous: Percentage of igneous and metamorphic bedrock in the catchment. GeologyLimestone: Percentage of limestone bedrock in the catchment. GeologySandstone: Percentage of sandstone bedrock in the catchment GeologySedimentary: Percentage of other sedimentary bedrock in the catchment. SOILS: 1:250,000 Soil Map (National Soil Map) ( HOST1: Percentage of catchment with very well drained soils (includes HOST classes 1,2,3,4,5,11). Derived using ‘Tabulate Intersection’ on the HOST class and percentages added. HOST2: Percentage of catchment with well drained soils (includes HOST classes 9,10,14,16,17). As above. HOST3: Percentage of catchment with poorly drained soils (includes HOST classes 6,7,8,15,18,21,24,25). As above. HOST4: Percentage of catchment with very poorly drained soils (includes HOST classes 12,19,20,22,23,26,27,28,29). As above. TOCAverage: Topsoil organic carbon content, average over the catchment. Derived from map of topsoil organic carbon concentration (, percentage of different values per catchment determined using ‘Tabulate Intersection’. BFIAverage: Baseflow index value averaged over the catchment. Percentage of different values per catchment determined using ‘Tabulate Intersection’. SPRAverage: Standard percentage runoff value averaged over the catchment. As above. Dominant BFI: Baseflow index value that dominates (highest percentage) in the catchment. As above. Dominant SPR: Standard Percentage Runoff that dominates (highest percentage) in the catchment. As above. Peat: Percentage of the catchment area with soils with a peaty main component. Selected all polygons from the soil shapefile that had a peat main component and saved as new shapefile, then ‘Tabulate Intersection’ for deriving the percentage per catchment. ErodedPeat: Percentage of catchment area with eroded peat. Selected all polygons that had Eroded Peat included in description and saved as new shapefile, then ‘Tabulate Intersection’ for deriving the percentage per catchment. LAND COVER AND USE Urban: Percentage of catchment area under urban land use. CEH Land cover map (shapefile) 2007, downloaded from Digimap. Converted shapefile into raster, reclassified into groups, then converted to shapefile and percentage of cover per catchment determined using the ‘Tabulate Intersection’ tool. Water: Percentage of catchment area covered by water. As above. MixedWood: Percentage of catchment area with mixed woodland. As above. Arable: Percentage of catchment area under arable agricultural use. As above. ImprovedGrass: Percentage of catchment area with improved grassland cover. As above. SemiNat: Percentage of catchment area with semi-natural cover. As above. DecidWood: Percentage of catchment area with deciduous woodland. As above. ConifWood: Percentage of catchment area with coniferous woodland. As above. Water15: Percentage of catchment area covered by water. CEH Land cover map (shapefile) 2015, downloaded from Digimap. Converted shapefile into raster, reclassified into groups, then converted to shapefile and percentage of cover per catchment determined using the ‘Tabulate Intersection’ tool. Other15: Percentage of catchment area with semi-natural cover. As above. Heather15: Percentage of catchment area with heathland cover. As above. Conif15: Percentage of catchment area with coniferous woodland. As above. Decid15: Percentage of catchment area with deciduous woodland. As above. Urban15: Percentage of catchment area under urban land use. As above. Arable15: Percentage of catchment area under arable agricultural use. As above. Imprgrass15: Percentage of catchment area with improved grassland cover. As above. PrimeLand8120: Percentage of catchment area with a land capability class 1, 2, or 3.1 (prime land), calculated as in Brown et al. (2008), using climate data 1981-2000. PrimeLand2050mean: Percentage of catchment area with a land capability class 1, 2, or 3.1 (prime land), calculated as in Brown et al. (2008), using UKCP18 climate data, regional model, PPE mean, 2041-2060. LCA3_5_8120: Percentage of catchment area with a land capability class 3.2, 4, or 5, calculated as in Brown et al. (2008), using climate data 1981-2000. LCA3_5_2050mean: Percentage of catchment area with a land capability class 3.2, 4, or 5, calculated as in Brown et al. (2008), using UKCP18 climate data, regional model, PPE mean, 2041-2060. PrimeLandIncl3_8120: Percentage of catchment area with a land capability class 1, 2, 3.1 or 3.2, calculated as in Brown et al. (2008), using climate data 1981-2000. LCA4_5_8120: Percentage of catchment area with a land capability class 4 or 5, calculated as in Brown et al. (2008), using climate data 1981-2000. PrimeLandIncl3_2050mean: Percentage of catchment area with a land capability class 1, 2, 3.1 or 3.2, calculated as in Brown et al. (2008), using UKCP18 climate data, regional model, PPE mean, 2041-2060. LCA4_5_2050mean: Percentage of catchment area with a land capability class 4 or 5, calculated as in Brown et al. (2008), using UKCP18 climate data, regional model, PPE mean, 2041-2060. LCA4_8120: Percentage of catchment area with a land capability class 4, calculated as in Brown et al. (2008), using climate data 1981-2000. LCA4_2050mean: Percentage of catchment area with a land capability class 4, calculated as in Brown et al. (2008), using UKCP18 climate data, regional model, PPE mean, 2041-2060. LCA5_8120: Percentage of catchment area with a land capability class 5, calculated as in Brown et al. (2008), using climate data 1981-2000. LCA5_2050mean: Percentage of catchment area with a land capability class 5, calculated as in Brown et al. (2008), using UKCP18 climate data, regional model, PPE mean, 2041-2060. Rough8120: Percentage of catchment area with a land capability class 6, calculated as in Brown et al. (2008), using climate data 1981-2000. Rough2050mean: Percentage of catchment area with a land capability class 6, calculated as in Brown et al. (2008), using UKCP18 climate data, regional model, PPE mean, 2041-2060. ProtectedArea: Percentage of catchment area under designation of either: Local Nature Reserve, National Nature reserve, SSSI, SAC, Ramsar Wetlands, World Heritage Sites, Biosphere Reserves, Country Parks, SPA. Derived from individual shapefiles for conservation designations; Scottish Natural Heritage (now NatureScot; Shapefiles merged into one shapefile, then used ‘Tabulate Intersection’ to derive percentage per catchment. Deer: Mean number of deer per km^2 in the catchment. Derived from deer count density polygons (Scottish Naturl Heritage, now NatureScot,, identified using ‘Zonal Statistics as Table’ . Cattle: Average number of cattle per km^2 in the parish where the catchment is situated, calculted from Scottish Government Agricultural Statistics; provided for 2013-2016, per parish. Average number per km2 calculated per parish, then percentage of area with average number per catchment calculated using ‘Tabulate Intersection’ and catchment average calculated. Sheep: Average number of sheep per km^2 in the parish where the catchment is situated, as above. SepticTank: Number of septic tanks in the catchment, from the SEPA septic tank register. CLIMATE: 5km grid data from Met Office/CEDA Archive ( TempMeanAnnual: Mean annual temperature in °C, catchment average, long-term average over time series 1981-2010. Data reclassified into 100m raster, then Zonal Statistics in QGIS. PrecTotAnnual: Monthly mean rainfall in mm, catchment average, long-term average over time series 1981-2010. As above. PrecdaysAnnual: Mean number of days per month with rainfall >10 mm, catchment average, average over the years 2007-2011. For each year from 2007-2011: reclassified into 100m raster, then using Zonal Statistics in QGIS, and averaging over the five years. pwsurplus81: Summer effective rainfall (excess of rainfall over potential evapotranspiration, accumulated daily from April - September (Brown, 2017)), averaged over the period 1981-2000, in mm. aat55for810: Annual accumulated temperature over 5.5°C, averaged over the period 1981-2000. pwsurplus50: Summer effective rainfall (excess of rainfall over potential evapotranspiration, accumulated daily from April - September (Brown, 2017)), averaged from projection from UKCP18, regional model, PPE mean, for the period 2041-2060, in mm. aat55for205: Annual accumulated temperature over 5.5°C, averaged from projection from UKCP18, regional model, PPE mean, for the period 2041-2060. WATER QUALITY: Sample data from 2011-2016 provided by Scottish Water from routine source water sampling. Permission to reuse must be sought from Scottish Water. Alu5, Alu 25, Alu50, Alu 75, Alu95: 5th, 25th, 50th, 75th and 95th percentiles for aluminium concentration in µg Al/l. Col5, Col25, Col50, Col75, Col95: 5th, 25th, 50th, 75th and 95th percentiles for colour concentration in mg/l Pt/Co. Iron5, Iron25, Iron50, Iron75, Iron95: 5th, 25th, 50th, 75th and 95th percentiles for iron concentration in µg Fe/l. Mang5, Mang25, Mang50, Mang75, Mang95: 5th, 25th, 50th, 75th and 95th percentiles for manganese concentration in µg Mn/l. Turb5, Turb25, Turb50, Turb75, Turb95: 5th, 25th, 50th, 75th and 95th percentiles for turbidity concentration in NTU. pH5, pH25, pH50, pH75, pH95: 5th, 25th, 50th, 75th and 95th percentiles for pH values. Coli5, Coli25, Coli50, Coli75, Coli95: 5th, 25th, 50th, 75th and 95th percentiles for coliform concentration in CFU in 100ml. Ecoli5, Ecoli25, Ecoli50, Ecoli75, Ecoli95: 5th, 25th, 50th, 75th and 95th percentiles for E. coli concentration in CFU in 100ml. AluMean, AluSD, AluMin, AluMax: Mean value, standard deviation, minimum value and maximum value for aluminium concentrations in µg Al/l. ColMean, ColSD, ColMin, ColMax: Mean value, standard deviation, minimum value and maximum value for colour concentrations in mg/l Pt/Co. IronMean, IronSD, IronMin, IronMax: Mean value, standard deviation, minimum value and maximum value for iron concentrations in µg Fe/l. MangMean, MangSD, MangMin, MangMax: Mean value, standard deviation, minimum value and maximum value for manganese concentrations in µg Mn/l. TurbMean, TurbSD, TurbMin, TurbMax: Mean value, standard deviation, minimum value and maximum value for turbidity concentrations in NTU. pHMean, pHSD, pHMin, pHMax: Mean value, standard deviation, minimum value and maximum value for pH values. ColiMean, ColiSD, ColiMin, ColiMax: Mean value, standard deviation, minimum value and maximum value for coliform concentrations in CFU in 100ml. EcoliMan, EcoliSD, EcoliMin, EcoliMax: Mean value, standard deviation, minimum value and maximum value for E. coli concentrations in CFU in 100ml. ANALYSIS RESULTS: Various results from water quality and catchment characteristics analysis (see PhD thesis named above) PC1, PC2, PC3, PC4, PC5: Scores for principal components 1-5 for a principal compoment analysis on median concentrations for all eight water quality parameters. Cluster: Assigned cluster from a partitioning around medoids cluster analysis using median concentrations of all eight water quality parameters. TOCMedian: Median total organic carbon concentrations (in mgC/l) from samples taken from 2013-2016 in Scottish Water's routine source water monitoring. Permission to reuse must be sought from Scottish Water. TOC_RangedevMedian: (Maximum total organic carbon concentration - Minimum total organic carbon concentration)/Median total organic carbon concentration (in mgC/l), from samples taken from 2013-2016 in Scottish Water's routine source water monitoring. Permission to reuse must be sought from Scottish Water. ColourCurveShape: Assigned category for shape of colour seasonal pattern, from a shape-based clustering analysis, when using the daily mean for 2011-2016. ColourCurveShape2018: Assigned category for shape of colour seasonal pattern, from a shape-based clustering analysis, when using interpolated daily values for 2018. ColourCurveShape2012: Assigned category for shape of colour seasonal pattern, from a shape-based clustering analysis, when using interpolated daily values for 2012. Category_Ecoli: Assigned category from a rainfall senstivity analysis using Spearman's rank correlation analysis with E. coli concentrations and different rainfall amount variables. Category_TOC: Assigned category from a climate senstivity analysis using Spearman's rank correlation analysis with total organic carbon concentrations and different rainfall amount and temperature variables.