Mapping public transit in Bangalore

In February, I spent some time looking at BMTC data from openbangalore.org to understand the network better. This post first appeared on Mapbox. 

Buses are Bangalore’s most popular mode of transport. The Bangalore Metropolitan Transport Corporation (BMTC), one of the oldest transport organizations in India, operates over 2,000 routes with a fleet strength of about 6,500 buses. BMTC recorded a ridership of 5.02 million every day in September 2015, that’s on the order of daily subway ridership in New York City.

To understand this massive network better, we need open data. Public transit data in India are not available by default, but activist groups like Open Bangalore go out and create them. I spent some time last week analyzing the network, location of bus stops, timing and distribution.

Longest route

BMTC is known for its many long routes. Route 600 is the longest, making a roundtrip around the city, covering 117 km in about 5 hours. There are 5 trips a day, and these buses are packed throughout.

Frequency

Next, I wanted to look at the frequency of different routes. In the image below, stroke thickness indicates how many trips each route makes in a day. You can see north Bangalore has fewer, but more frequent routes, whereas the south has more routes with less frequency. You can also see the Outer Ring Road, which circles the entire city.

Reachability

I defined reachability as the destinations a passenger can get to from a given stop without changing busses. The BMTC network operates long but direct routes covering the entire city. The map shows straight lines between bus stops that are connected by a single route. The furthest you can get is from Krishnarajendra Market to the eastward town of Biskuru: roughly 49 km as the crow flies.

 

Direction

Which directions does BMTC run? It is interesting that BMTC covers the city North – South (blue) and East – West (brown) with almost equal distribution.

Coverage

BMTC routes are categorized into different series. Starting from 1 – 9 and A – W. I analyzed coverage based on series 2 (blue) and 3 (green) and they make up almost 76% of the entire network.

For this analysis, I used QGIS and Turf.js to inspect the route data. You can see some of the scripts on Github and the maps are all made using Mapbox Studio.

2.5 years.

I joined the Karnataka Learning Partnership in May 2012. We just launched version 11.0 of our web platform.

I’ve been meaning to write this post for a while now. Recently, it marked 2.5 years of my association with the Karnataka Learning Partnership. This is not my going away post and I’m not going away. I want to talk about how amazing the environment is for someone entirely new to the public education space. I want to talk about how extraordinary the team is. I want to talk about how different our processes are. And I want to talk about how patience, understanding and teamwork will go a long way.

If you haven’t heard about the Akshara Foundation or the Karnataka Learning Partnership, KLP is incubated at Akshara Foundation and they both work closely to improve schooling in Karnataka. KLP does great amounts of work on streamlining data collection processes, building dashboards and acting as an interface between several organisations in the education space in Karnataka and India.

When I joined KLP, it was a team of three – me, Megha Vishwanath, and Shivangi Desai, headed by the inimitable Gautam John. My role was largely around organising the wealth of spatial data that we just collected and collated from various sources. Wrangling dirty data and organising things makes me happy. KLP has a pretty strong software infrastructure history. Even though Python and PostgreSQL made it to the mix, there were other unmaintained open source software and a lot of unseen complexity in the code base. To be true to my open map heart, I rewrote the maps infrastructure. It was mostly a matter of throwing in couple of new end-points to the API and writing the front-end in JavaScript. Pretty basic. I wasn’t quite ready to dive into the madness of our existing API, so I made sure I touched little of the legacy code base. Couple of months down the line, I built a dashboard for showing SSLC performance. We worked on an array of projects right after – DISE dashboard, reports, status and a lot more. We were quite sure about an overhaul of our infrastructure and soon started talking about it. I was fairly open to the idea, but concerned, considering that there are only 3 of us and a rewrite is a lot of work.

Today, we are seven people, people I must mention because I owe them so much for understanding how diverse a team could be and accepting the fact that everybody is trying – Gautam John, Bibhas Debnath, Sanjay Bhangar, Vinayak Hegde, Brijesh Poojary, Deviprasad Adiga. Megha and Shivangi did much of the tough work of getting all the things in place for us to start. Everything we have built is based on what we have had. If there is a new feature, we have made sure that all of us thoroughly understand that it is required and how it should work. Our team is geographically disconnected – between Bombay, Calcutta, Pune and Bangalore. This is not the only thing we are doing. Each of us are involved in several other exciting projects and have weird schedules that barely overlap. To me, it is amazing to see how we have pulled this new platform in under five months. We have met twice in the process and rest of it happened over video conferences and telephone calls. I personally have had a very rough time over the last two months, which was when we were gearing up towards the crunch period. I was moved by the way the team managed to make sure that I was involved in a very limited capacity, but made sure that what needs to be done was taken care of.

To be honest, rewrites are hard. I try to avoid them as much as possible but when software becomes a liability, you have to make sure it works and continues to do so. Rewrites are usually a fairly intense process, because you are never starting from a clean slate. For the most part, you have a team which doesn’t quite understand most of how the legacy system works. Disconnect within the team is not uncommon, but it’s difficult. I have seen how the seven of us accommodated each of the ideas on the table. It’s about listening and understanding each other. It’s about respecting and realising that the other person is also committed to building the best product. It’s about being polite and being patient. And finally it’s about embracing the fact that there are only three certainties – life, death and fucked up data.

Update: We also work with some of the best technologists on and off, most recently Vamsee Kanakala and Rahul Gonsalves along with his team at Uncommon.

Data and Maps workshop for IT for Change, Bangalore.

Few weeks back, I ran a three day workshop for the developers at IT for Change in Bangalore. They have been quite involved in OpenStreetMap by mobilising graduate students in various towns of Karnataka to map public infrastructure. Apart from this spatial data, they have collected demographics about these towns on several aspects. The workshop was intended to give them a complete coverage about geospatial technology, data and representation. The outline and code are available on Github.

Designing a New Map Portal for Karnataka Learning Partnership.

Wrote a rather detail post about the new maps for Karnataka Learning Partnership on the geohackers.in blog.

The map is an important part of our project, action and process because it serves as the pivot point of navigation. I will quickly talk about the data and tools before we discuss the design aspects.

We have a fairly large dataset of schools in Karnataka. The name of the school, location, number of girls and boys etc. in a database. Fortunately, the data was clean and properly stored in a PostgreSQL database with PostGIS extensions. Most of my task was to modify the API to throw GeoJSON to the client using the ST_AsGeoJSON function and export the data.

read more…

Indic Wikipedia: Visualizing Basic Parameters.

Riju (Sumandro) and I are with The Centre for Internet and Society to understand how the Indic Wikipedia community is growing. Today, we published the first set of visualizations and a blog post about why, what and how we did this.  Cross-posting from the CIS website.

Introduction

Understanding how the Indic or the Indian language Wikipedia projects are growing is something that we have been interested in for quite sometime. We were delighted to come across this opportunity from the Centre For Internet and Society (CIS) and Wikimedia Foundation. We divided our analyses into three focus areas: (1) basic parameters, (2) geographic patterns of edits, and (3) exploring the topics that receives the greatest number of edits. The existing infographics and data visualisations that we found about Indic Wikipedias mostly engaged on the first area, and also emphasised on yearly aggregates. We thought a more granular, that is monthly, understanding and a focus on the geographic and thematic spread of the edits would be very helpful to further appreciate the activities.

We began by collecting data about the following basic parameters:

  1. Number of Editors
  2. Number of Articles
  3. Page Views
  4. Number of Active Editors
  5. Number of New Articles
  6. Number of New Editors
  7. Edit Size

Acquiring the data

We explored the MediaWikiAPI, ToolServer and the Wikimedia Statistics Portal. These are several ways of obtaining data about Wikipedia in general. Depending on the use case, such as the quantity of data required or the need for customised/selective data scraping, any one or more of these methods of data gathering can be chosen. The API had limitations in terms of how much data you can access, and it is meant to be used to access actual Wikipedia entries. We, however, were looking for metadata about the entries/articles (such as when it was first created, when and how many times it was edited, etc.) and not the actual entries/articles, that is the actual contents of Indic Wikipedias. ToolServer is an excellent way of running custom scripts. Although, this takes for granted that user (of ToolServer) has substantial command over the back-end infrastructures and processes that Wikipedia runs on. We wrote a few scrapers to extract metadata about Indic Wikipedia projects from the ToolServer but not exactly being experts in the Wikipedia back-end systems, we found scraping from ToolServer rather time-and effort-intensive. The statistics portal is a well organised and an accessible place for collecting data for analyses. However, we came across several missing parameters and projects, that is the statistic portal did not have all the parameters and Wikipedia projects we were interested in. In our search for Indic Wikipedia datasets so far, we realised that the Wikimedia Analytics Team (WAT) puts a lot of effort in writing scripts and collecting various data at different levels. Wikimedia developer Yuvi Panda and the Access to Knowledge team at CIS, aware of our difficulty in obtaining the data, also pointed us towards the WAT. While we were already scraping data on some of the parameters, we approached the WAT whose prompt and very supportive response much accelerated our work process. The fantastic Wikimedia developers, especially Evan Rosen (a big ‘thank you’ for him) shared the needed data, which we cleaned up and archived at the Github repository for the project.

We obtained data for the period from January 2001 to December 2012. It appears that the Indic Wikipedia projects began their activities around 2005. A big part of cleaning the data involved identifying when each of the projects started and dropping data. There are 20 Indic Wikipedia projects with 4,98,964 articles, 5,689 editors and over 3,35,49,102 readers.

Deciding upon chart types

We spent quite some time discussing different methods of visualising the data. The major difficulty is that there are too many entities to be plotted. As each language must be plotted as a separate entity — point, line, circle, etc. — the chart has a tendency to become cluttered and illegible. Even if we take only one variable — say New Editors — there will still be 20 points or lines to be plotted. Hence, using any of the conventional charts becomes difficult. For example, if we chose a line chart with New Editors on the Y-axis and months on the X-axis, there will be 20 lines each of a different colour, representing different languages. Also, the five-six year monthly timeline translates into 60-72 temporal data points.

We have adopted two strategies, and related chart types, to address this difficulty.

Firstly, we used a monthly calendar-like heatmap chart that limits the temporal spread of data to one year for each section of the chart and uses a positionally uniform set of columns for each language so as to make reading the chart easier. Limiting each chart section to 12 months allow the user to focus on more granular movements of the variable concerned, say the number of New Editors per month. By representing each languages on an unique column, and not by an upwards-and-downwards moving line as in a line chart, makes it easier for the user to follow movements in each language (where movement is shown by the intensity of colour, as characteristic of heatmaps) without the need to have a separate coloured entity — point, line, circle — for each language.

Secondly, we used a motion chart, as made famous by Dr. Hans Rosling, that removes the temporal axis from X- and Y-axes of the chart and uses animated transition to represent temporal change. Motion chart has the unique ability to handle as many as five variables in an organised manner, using the following visual elements: X-axis, Y-axis, Z-axis (animated temporal transitions), size of bubbles, and colour of bubbles. It is, however, recommended that represented variables be limited to a maximum of four for easier legibility. In our case, we have used the X- and Y-axes to plot various related variables (which can be selected by the user) such as New Editors and New Articles, the Z-axis to represent time, and the colour of the bubbles to represent a third optional variable (also can be selected by the user). Since different Indian language Wikipedia projects often take a wide range of values for most variables, using the size of the bubble to represent any of those variables is avoidable. Further, the motion chart gives the user a lot of controls to explore the various projects and variables according to their interest and especially to compare particular projects and variables to each other.

Discussing the chart types with the Access to Knowledge team, we decided to use simpler line charts — emphasising upon single Indic Wikipedia projects — on the language-specific pages that we will be creating next.

Calendar charts

Calendar Chart

We visualised three parameters using the calendar heatmap strategy: (1) New Articles, (2) New Editors, (3) Active Editors.

The New Articles Calendar shows new articles posted on every Indic Wikipedias for every month since 2004. It was interesting to note the few number of articles in 2012 for all the languages. The first language to have the most number of new articles is Bengali. Hindi picks up around same time with fewer number of articles. Except Urdu and Nepali, every other language dropped in the number of new articles. However, we should remember that a lower number of new articles does not necessarily indicate at low overall activity in the project concerned.

Like the new articles, we wanted to explore the patterns in the number of new editors across all of the Indic Wikipedia projects. As you run through the new editors calendar chart, it is evident that there is consistent growth in the editor base for few projects like Hindi, Marathi, Bengali, Telugu, Tamil, Kannada and Malayalam. If one takes a step back and compares this with the number of new articles chart, something is not very clear — in some of the projects, there is a growth in the number of editors but not many new articles are posted. We are very keen to understand why this has happened.

If we look at the active editors calendar, Tamil started with 2 active editors in January 2004 and with few ups and downs grew to about 115 active editors in December 2012. Malayalam started slow in late 2004 with 2 editors and grew to 155 active editors in December 2012. We are sure the viewers should be able to find out more patterns by studying the charts closely and comparatively.

Motion chart

Motion Chart

We developed a motion chart comparing five variables: (1) Active Editors (> 5 edits per month), (2) New Editors, (3) Total Editors, (4) New Articles, and (5) Total Articles. When the visualisation is opened, Total Editors is plotted on the X-axis, Total Articles is plotted on the Y-axis, the colour of the bubbles indicate the Active Editors (Blue is low and Red is high) and the sizes of the bubbles are kept the same for easier comparison.

The user can click on the drop down menus at the X- and Y-axes, and next to the size and colour variables, and make them represent different variables.

We chose to configure the X- and Y-axes to show the data in logarithmic scales and not in linear scales. Since most projects experience small increments over time and there exists a wide difference between the most and the least popular/active projects, the logarithmic scale is better suited to represent the changes in the given data. The user has the option to select linear scale at the end of both X- and Y-axes (click on “Log”).

As evident in the visualisation, the Newari project and the Hindi-Malayalam project cluster show very interesting contrasting dynamics — while both achieve similar Total Articles numbers, the latter is much more editor-heavy. This suggests a smaller but more active editor community for the Newari project.

Please click on the image of the motion chart below to open the interactive version in a separate window. The code can be accessed at the project repository on Github.