-->

Pages

Friday, January 12, 2018

Birding Data

Warning: this is a long post. One day I'll get around to explaining things in more detail, but for now I just wanted to get this down while it is still fresh in my mind.

A few years back my family decided to do a Big Year competition. Whoever spotted the most species after 1 year would win. We used the Audobon birding app as a means of keeping track of totals. The app is good because it also has a birding directory and a nice social element to it, we could post photos and add notes to the bird we logged. It also kept track of where the bird was logged, and I believe it passed on that data to ebird, so we were doing some citizen science work at the same time. There were a couple of things that we didn't like about the app method. There was no way to sort the birds, so finding duplicates was very tedious. After a year of looking at birds you can't really remember if you already saw some of them. The data had a desktop version that was, and is, very clunky to use. This last year we wanted to do another Big Year, but we wanted a different method for logging birds. We looked around at different options but I didn't like any of them, so I set out to create my own system for logging birds.

I created an applet through IFTTT that used the 'note' and 'google sheet' widgets.Every time you added a note a new row was added to a google sheet with the note, date, latitude, and longitude. I created a shared google folder with all the sheets and had everyone add the applet to their phone. Now, as people saw new birds they could log it in the applet and all the data was saved in a spreadsheet that would be easy to manipulate, sort, identify duplicates, etc.

There was no method for logging extra notes about your sighting, unless you added something in along with the bird name, so I created a shared instagram account for sharing pictures of birds. Now we had a solid substitute for the Audobon app that gave us complete control of the data, kept everything we liked, and fixed a lot of the issues we didn't like. The only thing I couldn't recreate was adding our data to ebird, that would have to be done separately.

I needed a method for displaying our leaderboard and I also wanted a notification system for notifying everyone in the group when someone logged a bird. For the leaderboard I found google sites to be a very simple way of displaying data on a public site. I created a new leader board sheet that pulled in data from everyone's individual sheets and sorted it daily using a custom script:

function sortdaily(){
  // select correct sheet
   var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getSheets()[0];

  // select range (exclude top row)
  var range0 = sheet.getRange("B2:C16");
  // actually do the sorting
  range0.sort({column: 3, ascending: false});

  // select range (exclude top row)
  var range1 = sheet.getRange("D2:F16");
  // actually do the sorting
  range1.sort({column: 5, ascending: false});

  // select range (exclude top row)
  var range2 = sheet.getRange("G2:I16");
  // actually do the sorting
  range2.sort({column: 8, ascending: false});

Sorting by each category, so this script goes on and on, but you get the idea.

I set up multiple categories so that we could have multiple winners and goals to work for. We had members living in more bird diverse areas, others that traveled a lot, it wasn't fun to compete when it was so easy for some. I set up categories for most birds, most days logging a bird, ave birds per logged day, biggest day, biggest month, largest area, smallest area, birds per sq mile, furthest north, south, east, and west, distance traveled, biggest state, most unique states.  As time went on the leader board got bigger and bigger. At this point I'm simply annoying my family with updates. I don't think anyone actually checks my website anymore.

Here's my code for taking a latitude and longitude and returning a state:
function reverseGeocode(addressType, lat, lng) {
    Utilities.sleep(1500);

    if (typeof addressType != 'string') {
        throw new Error("addressType should be a string.");
    }

    if (typeof lat != 'number') {
        throw new Error("lat should be a number");
    }

    if (typeof lng != 'number') {
        throw new Error("lng should be a number");
    }
    var response = Maps.newGeocoder().reverseGeocode(lat, lng),
        key      = '';
    response.results.some(function (result) {
        result.address_components.some(function (address_component) {
            return address_component.types.some(function (type) {
                if (type == addressType) {
                    key = address_component.long_name;
                    return true;
                }
            });
        });
    });
    return key;
}
Now you take a cell in google sheets with this argument "=reverseGeocode("administrative_area_level_1",D1,E1)" and the state that the point was logged in is returned.

To pull distance from latitude and longitude you must first convert degrees to radians and then you can get a distance. I used this script:

function distance() { var sheet = SpreadsheetApp.getActiveSheet(); var range = sheet.getDataRange(); var row = 0 var total = sheet.getSheetValues(1, 8, 1, 1) var totalend = total -1 Logger.log(totalend) var totald = 0 while (row < totalend){ var row = row + 1 var row2 = row + 1 var lat1 = sheet.getSheetValues(row, 4, 1, 1) var lat2 = sheet.getSheetValues(row2, 4, 1, 1) var lon1 = sheet.getSheetValues(row, 5, 1, 1) var lon2 = sheet.getSheetValues(row2, 5, 1, 1) if (lat1>1 && lat2 >1 && lon1 <-1 && lon2 <-1){ Logger.log(lat1) Logger.log(lat2) Logger.log(lon1) Logger.log(lon2) var R = 6371000; // metres var φ1 = (lat1/180)*Math.PI//.toRadians(); var φ2 = (lat2/180)*Math.PI//.toRadians(); var Δφ = ((lat2-lat1)/180)*Math.PI//.toRadians(); var Δλ = ((lon2-lon1)/180)*Math.PI//.toRadians(); var a = Math.sin(Δφ/2) * Math.sin(Δφ/2) + Math.cos(φ1) * Math.cos(φ2) * Math.sin(Δλ/2) * Math.sin(Δλ/2); var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)); var d = R * c; Logger.log(d) var totald = totald + d Logger.log(totald) } } var cell = sheet.getRange(9,8); cell.setValue(totald); var cellmiles = sheet.getRange(10,8); cellmiles.setValue(totald*0.000621371); }

And then added up the distance between every bird logged. I used a variation of this script to also determine the biggest and smallest area so as to reward those who were and weren't travelling as much:
function mbr_distance() { var sheet = SpreadsheetApp.getActiveSheet(); var range = sheet.getDataRange(); //grab the corners of the MBR var minx = sheet.getSheetValues(6, 18, 1, 1) var miny = sheet.getSheetValues(6, 19, 1, 1) var maxx = sheet.getSheetValues(4, 18, 1, 1) var maxy = sheet.getSheetValues(4, 19, 1, 1) Logger.log(minx) Logger.log(miny) Logger.log(maxx) Logger.log(maxy) //convert degrees to meters and distance for horizontal line of rectangle and store in spreadsheet var R = 6371000; // metres var φ1 = (miny/180)*Math.PI//.toRadians(); var φ2 = (maxy/180)*Math.PI//.toRadians(); var Δφ = ((maxy-miny)/180)*Math.PI//.toRadians(); var Δλ = ((minx-minx)/180)*Math.PI//.toRadians(); var a = Math.sin(Δφ/2) * Math.sin(Δφ/2) + Math.cos(φ1) * Math.cos(φ2) * Math.sin(Δλ/2) * Math.sin(Δλ/2); var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)); var d1 = R * c; Logger.log(d1) var cell = sheet.getRange(2,19); //convert to miles cell.setValue(d1*0.000621371) //convert degrees to meters and distance for vertical line of rectangle and store in spreadsheet var R = 6371000; // metres var φ1 = (maxy/180)*Math.PI//.toRadians(); var φ2 = (maxy/180)*Math.PI//.toRadians(); var Δφ = ((maxy-maxy)/180)*Math.PI//.toRadians(); var Δλ = ((minx-maxx)/180)*Math.PI//.toRadians(); var a = Math.sin(Δφ/2) * Math.sin(Δφ/2) + Math.cos(φ1) * Math.cos(φ2) * Math.sin(Δλ/2) * Math.sin(Δλ/2); var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)); var d2 = R * c; Logger.log(d2) var cell = sheet.getRange(2,18); //convert to miles cell.setValue(d2*0.000621371) }
My categories were getting bigger and bigger, here is how the individual sheets now look:

To notify people as birds were logged I got 'maker' status in IFTTT which allowed for me to make more complicated applets. I made a new applet that updated a google sheet AND a slack channel. So now whoever wants to subscribe to the slack channel can get real time updates of the birds the group is logging.

I created columns that would split up the date given to me by IFTTT. The format given was like so "January 06, 2018 at 03:06PM". I used a sheet function to create a new column that would take the first three characters of the month and put that in a new column and another column that removed the time. The two new columns for the example would be "Jan" and "January 06, 2018". This allowed me to create new categories based on months and days. I have to manually run this function on each row as new birds were logged. I could have automated it, but I enjoyed seeing everybody's new birds and this gave me an excuse to do that. I quickly lost track of which sheets needed updating, and with 15 sheets to manage I needed help. I created another value in the sheet that would subtract the total rows in one column filled out by IFTTT from another column that holds my manual updates. If the value was 0 the sheet was up to date, if it was more than 0 I needed to run the function over new rows. The 'update' values were all pulled in the spreadsheet and sorted along with all the other categories. I created another javascript program that would alert me by email when a sheet was more than 10 birds outdated and needed manual update:

function sendEmailalert() {
  var sheet = SpreadsheetApp.getActiveSheet();
  var range = sheet.getDataRange();
  var flag = sheet.getSheetValues(2, 41, 1, 1);
  Logger.log(flag)
  if (flag > 9)
  {var emailAddress = "clark929@gmail.com";
   var message = "Dear Future Adam," + '\n' + "I hope this finds you well, this letter is to inform you that the birding database has grown out of control and needs weeding." + '\n' + "-Killick"
   var subject = "Here we go again";
   MailApp.sendEmail(emailAddress, subject, message);}
}
Now that the year is over I may automate the update for those that want to keep using this service.

For the end of the year I wanted to do something fun with the data. I created maps for each person based on their birds with lines going in order from bird to bird. I also created a timelapse map of the birds logged throughout the year. Converting all the sheets into point and line files and converting them to kml used a python script I created:

from arcpy.sa import *
import glob
from shutil import copy2
import shutil
arcpy.CheckOutExtension('spatial')
env.workspace = r'H:\TEST\bird2017\big_year_2017_12_19\Big Year 2017\csv\test'
files = glob.glob(r'H:\TEST\bird2017\big_year_2017_12_19\Big Year 2017\csv\test\*.csv')
sr = arcpy.SpatialReference(4326)
print files
for file in files:
    #make sure we have the right files
    print file
    print file[:-4]
    #create variables
    output = file[:-4] + "_points"
    output_mbr = file[:-4] + "_mbr"
    output_lines = file[:-4] + "_lines"
    print output
    #print saved_output
    #create points from lat long and save as a layer file
    arcpy.MakeXYEventLayer_management(file, "lat_xstart", "long_ystart", output)
    arcpy.SaveToLayerFile_management(output, output)
    print file[-13:] + " points created"
    #find the minimum bounding rectangle of these points and save as a shapefile and then a layer file
    arcpy.MinimumBoundingGeometry_management (output, output_mbr, "ENVELOPE")
    arcpy.MakeFeatureLayer_management (output_mbr + ".shp", output_mbr)
    arcpy.SaveToLayerFile_management(output_mbr, output_mbr)
    print file[-13:] + " MBR created"
    #create line segments from each point and save as a shapefile and then as a layer file
    arcpy.XYToLine_management (file, output_lines, "lat_xstart", "long_ystart", "lat_xend", "long_yend", "GEODESIC", "fid", sr)
    arcpy.MakeFeatureLayer_management (output_lines + ".shp", output_lines)
    arcpy.SaveToLayerFile_management(output_lines, output_lines)
    print file[-13:] + " Lines created"
    ##convert to kml
    arcpy.LayerToKML_conversion (output + ".lyr", output + ".kmz")
    arcpy.LayerToKML_conversion (output_mbr + ".lyr", output_mbr + ".kmz")
    arcpy.LayerToKML_conversion (output_lines + ".lyr", output_lines + ".kmz")

Here is the end product of that script for my sheet:

Here is the R script used to create the timelapse map, followed by the gif created:
library(ggplot2)
library(maps)
library(ggthemes)
library("data.table", lib.loc="~/R/R-3.3.2/library")
library(tibble)
library(lubridate)
library(gganimate)
library(animation)

##set workspace where all csvs are located
setwd("H:/TEST/bird2017/big_year_2017_12_19/Big Year 2017")
## read in all csvs from GDAL zonal script
county <- map_data("county")
states <- map_data("state")
ut_county_df <-subset(county, region =="utah")
ut_df <- subset(states, region == "utah")
ut_df
fl_df <- subset(states, region == "florida")
fl_df
birds = fread("H:/TEST/bird2017/big_year_2017_12_19/Big Year 2017/allbirds_merge_no_ak_clean.csv")
hist(birds$julian, breaks=c(30))
utbirds = fread("H:/TEST/bird2017/big_year_2017_12_19/Big Year 2017/allbirds_merge_utah_clean.csv")
utbirds = fread("H:/TEST/bird2017/big_year_2017_12_19/Big Year 2017/allbirds_merge_utah_clean_onlybirddates.csv")

flbirds = fread("H:/TEST/bird2017/big_year_2017_12_19/Big Year 2017/allbirds_merge_florida_clean.csv")
birds_cumsum = fread("H:/TEST/bird2017/big_year_2017_12_19/Big Year 2017/allbirds_merge_cumulative_sum.csv")
plot(birds_cumsum)
cumsum(birds_cumsum$sum)
plot(cumsum(birds_cumsum$sum),xlab="Julian Date", ylab="Total Number of birds", pch=20, panel.first = grid())
##works for ggplot
usa <- map_data("usa")
##works for gganimate
usa <- ggplot() + borders("state", colour = "white", fill = "gray") + theme_map()


##plots all sites with USA borders
ggplot() + geom_polygon(data = usa, aes(x=long, y = lat, group = group)) + geom_point(aes(x = long, y = lat), data = birds, colour = 'red', alpha = .1, size = 5) + geom_point(aes(x = long, y = lat), data = birds, colour = 'black', alpha = 1, size = 1)
##plots UTAH borders.
ut_base<- ggplot() + geom_polygon(data = ut_df, aes(x=long, y = lat, group = group))+geom_polygon(data = ut_county_df, aes(x=long, y = lat, group = group), fill = NA, color = "white") + coord_fixed(1.3)# + geom_point(aes(x = long, y = lat), data = utbirds, colour = 'red', alpha = .1, size = 5) + geom_point(aes(x = long, y = lat), data = utbirds, colour = 'black', alpha = 1, size = 1)
ut_base
##plots FLL borders
fl_base<- ggplot() + geom_polygon(data = fl_df, aes(x=long, y = lat, group = group))+ coord_fixed(1.3)# + geom_point(aes(x = long, y = lat), data = utbirds, colour = 'red', alpha = .1, size = 5) + geom_point(aes(x = long, y = lat), data = utbirds, colour = 'black', alpha = 1, size = 1)
fl_base
#plots all sites with borders. Used as input for gganimate
map <-usa + geom_point(aes(x = long, y = lat, frame = julian, cumulative = TRUE), data = birds, colour = 'red', alpha = .1, size = 5) +  geom_point(aes(x = long, y = lat, frame = julian, cumulative = TRUE), data = birds, colour = 'black', alpha = 1, size = 1)
utmap <-ut_base + geom_point(aes(x = long, y = lat, frame = julian, cumulative = TRUE), data = utbirds, colour = 'red', alpha = .1, size = 5) +  geom_point(aes(x = long, y = lat, frame = julian, cumulative = TRUE), data = utbirds, colour = 'black', alpha = 1, size = 1)
utmap
flmap <-fl_base + geom_point(aes(x = long, y = lat, frame = julian, cumulative = TRUE), data = flbirds, colour = 'red', alpha = .1, size = 5) +  geom_point(aes(x = long, y = lat, frame = julian, cumulative = TRUE), data = flbirds, colour = 'black', alpha = 1, size = 1)
flmap
ani.options(interval = 0.2)
gganimate(map, interval = .2, ani.width = 600, ani.height = 400)
gganimate(utmap, interval = .2)
gganimate(flmap, interval = .2)

It's very messy, but I don't have the time to clean it up. Sometime in the future I'll have to clean up the code and explain better what is going on there. 
These gifs below scroll through the year, you can see the year and julian day in the top left. 
Scrolling julian date is in top left. Can you see when and where we took road trips?


This one doesn't scroll through every day of the year like the other two, but only skips to the days that logged a bird, making the animation much shorter.

Here are the totals by month created in Google Sheets:
Easy to see that we got a lot at the beginning of the year and during spring migration, but things tapered off after that.