Plotting schools Port Elizabeth using shell scripting
Earlier this year I was in Port Elizabeth, South Africa to help the organization Ready4Life with their Smart School Network project.
To gain an overview of all the schools that can be connected to the Smart School Network we wanted to plot all the schools in Port Elizabeth and the surrounded townships onto Google Maps.
The script explained in this post can be found on the wiki.
My South African IT colleague was able to find his way on the ‘net’ and found a list of all the schools in Port Elizabeth. As you can see in the image the list of schools he found had an suburb for every school.
In this overview there was not an exact address or sometimes there was no suburb provided for a school. We could solve this challenge by manually looking up every school.
Since the list consisted of 175 schools manually looking up every school would take quite some time.
My South African friend tried really hard to look up every school and collect coordinates, but at the end of the day he only found the coordinates for a couple of schools.
The next morning I thought we can do this easier, time to create a shell script for it. We copied all the information from the webpage into Excel and with the help of some Excel formulas created an easy overview of all the schools.
This overview could then be exported as a CSV file that used the ‘;’ (semicolon) sign as a separator to separate the data of the different fields such as the name of the school, suburb or telephone number.
Exporting all the data to a CSV file made it easy to use it in Linux and write a script to process the data. Less then an hour of scripting resulted in the script that’s explained further in this post. It’s not that ‘fancy’ but it gets the job done with just a couple lines of code.
I’m going to use the Google Maps Places API to retrieve coordinates for a school. For those who don’t know what an API is, in short ‘an API is the messenger that runs and delivers your request to the provider you’re requesting it from, and then delivers the response back to you’. For more information see this visual explanation.
To be more accurate I’m using the Google Maps Text Search Service. A short description from the Google Maps API documentation:
“The Google Places API Text Search Service is a web service that returns information about a set of places based on a string — for example “pizza in New York” or “shoe stores near Ottawa”. The service responds with a list of places matching the text string and any location bias that has been set.”
This means that I’m asking Google Maps for coordinates (latitude, longitude) given a point of interest. In this case it’s based on the school’s name and place, for example Philip Nikiwe P School Port Elizabeth needs to result into 33.8974800 , 25.5875800 (coordinates).
Before using this Google Maps service you need to retrieve an API key first, to retrieve an API key have a look at the documentation.
After saving all the retrieved coordinates, names of the schools and addresses into a new CSV file we could upload the data to Google’s Fusion Tables to automatically plot it onto Google Maps. The result can be viewed here.
Now it’s time for the technical part! Whenever you write a shell script you need to start with ‘#!/bin/bash‘ so the computer knows what interpreter (language) is going to be used to process the instructions.
The ‘#‘ (hash) sign in a shell script will allow you to write comments, like reminders.
Lets start with the first line of code ‘the while loop’. The basic principle of the while loop in this case is while reading a file do something.
For this script it means while reading a line from the file schools1.csv do read all the input into different variables that are separated by the ‘;’ sign to store the data.
This makes it possible to read a line and store the name of a school into the variable called ‘name’ or to read the telephone number and store it into the variable ‘tel’.
while IFS=”;” read -r name tel phase suburb special
done < “schools1.csv”
Onto the next line:
poi=$(echo $name | sed “s| |+|g” | sed “s|To Be Updated||g” | sed “s|Port Elizabeth||g” | sed ‘s|$|+Port+Elizabeth|’)
Here we create a variable called ‘poi‘ (short for point of interest). The data in this variable will be used in a next step to request data via the Google Maps API. We need to prepare the data a bit so it has always the same structure, otherwise it’s possible that some of the requests won’t give a correct result due to misunderstanding the data when performing a request.
The first thing we do is ‘echo $name‘, which will put the name of the school into the variable ‘poi‘. The data could sometimes be containing more then only the name of a school. For this reason we are going to replace or remove certain text strings with the ‘sed‘ command so the data has always a certain structure.
The sed command ‘sed “s| |+|g”‘ will replace all empty spaces (also called white space) with a ‘+’ sign, this is necessary for the API request. By replacing empty spaces with a ‘+’ sign the Google Maps API will know that there needs to be an empty space when proccesing the request.
Going back to the school example: Philip Nikiwe P School Port Elizabeth will translate into Philip+Nikiwe+P+School+Port+Elizabeth. When you look at an URL of a website you will never see an empty space in the URL (it’s not allowed), normally it’s converted to a ‘+’ (plus) or ‘-‘ (hyphen) or ‘_’ (underscore) sign.
The next two commands (sed “s|To Be Updated||g” | sed “s|Port Elizabeth||g”) will replace ‘To be Updated’ and ‘Port Elizabeth’ with nothing, it’s removing it if it’s a part of the data.
The last command ‘sed “s|$|+Port+Elizabeth|”‘ will append ‘+Port+Elizabeth’ to all the data. By doing this Philip+Nikiwe+P+School will result into Philip+Nikiwe+P+School+Port+Elizabeth.
The different variables in the shell script.
Next up doing the actual request to the Google Maps API. This is done by the next line in the script:
request=$(curl –silent “https://maps.googleapis.com/maps/api/place/textsearch/xml?query=”$poi”&key=insertyourkey”)
Here we create a variable called ‘request‘. We use the Linux ‘curl‘ program to do a web request (ask a question). You can compare it to you typing the web request in the URL field of your browser. The ‘–silent‘ will suppress output to the command line, this means the generated output won’t be shown on the screen.
Next is the Google Maps API URL address. The full request will look something like:
request=$(curl –silent “https://maps.googleapis.com/maps/api/place/textsearch/xml?query=Philip+Nikiwe+P+School+Port+Elizabeth&key=APIKEY1234”)
As you can see after textsearch we define that the response needs to be in XML format. For those reading that don’t know what XML means the Wikipedia definition:
“Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.”
To make it more visual in the image below the XML output of the request for Philip Nikiwe P School Port Elizabeth.
Photographs by Christiaan Colen available on Flickr
Next step: formatted=$(echo $request | grep -o -P ‘(?<=<formatted_address>).*(?=</formatted_address>)’)
Here we create a variable called ‘formatted‘. The purpose of this variable is to get the ‘formatted address’ Google Maps provided. According to the Google documentation:
“formatted_address is a string containing the human-readable address of this location. Often this address is equivalent to the “postal address,” which sometimes differs from country to country.”
How do we get the formatted address?
It’s quite easy since the output we requested is XML we only have to get the values between the formatted_address tags. For the variable we first ‘echo‘ (‘print’) the output of the web request and then use ‘grep‘ to get all the data between the ‘<formatted_address>‘ and ‘</formatted_address>‘ strings. ‘grep -o‘ means that it will only take the data that is asked, no data surrounding the found data. That’s it.
Back to the Philip Nikiwe P School Port Elizabeth example, the output of the above command will result into ‘New Brighton 1, Ibahyi, 6200, South Africa’.
The same is done for the latitude (<lat>) and longtitude (<lng>) values.
Getting the latitude and longitude values.
Once we have these three values the next step is to ‘echo‘ (‘print’) all the data to the screen so you can see it while the script is running.
After generating the output to the screen the script will go to ‘sleep‘ for 5 seconds. This means that no command is executed for 5 seconds and had to be done because I got problems with asking the Google Maps API too much too fast.
The last step:
echo $name”;”$formatted”;”$lat”;”$long >> schools_w_coordinates.csv
This will store all the data into the file ‘schools_w_coordinates.csv‘ separated by ‘;’ sign.
In case of the Philip Nikiwe P School example it will look like:
echo Philip Nikiwe P School;New Brighton 1, Ibahyi, 6200, South Africa;-33.8974800;25.5875800
The ‘>>‘ (two times greater than sign) is used to append the data to the file every time an entry in the while loop is done. If you use one ‘>‘ sign then it will create a new file for every entry that is processed which means (since the filename is the same for every entry) it would overwrite the existing file.