Gathering Indian Railways Data

Over the last few months, Sanjay and I have been gathering data about the Indian Railways. We started with an interest in schedules and worked off of some old data that Sanjay had. We spent several weekends experimenting, gathering and verifying a lot of the data out on the Internet, and we are super happy to put all of this out for everyone to use.

Trains, routes, stations. Click to view interactive map.

The dataset has three major subsets – trains, stations, schedules – all related. The train number and the station code will help you connect the three together if you need to. There are a lot of problems with this dataset. Some trains are old, have different schedules or don’t run anymore. There are new trains that are not part of this set. It’s missing several stations and doesn’t capture the right location for many. We are publishing this in the CC0 license – for everyone to use with no restrictions. Here’s something that I made with this data:

Reachability map – all destinations you can get to from a station, without switching. Click to view interactive map.

We found this an interesting experiment and a great opportunity to learn about one of the largest railway systems in the world. You can read more and download the data here, and drop a line to me ( or Sanjay ( if you have questions. If you find something obviously wrong, open a ticket, or make a pull request.

OpenStreetMap as Infrastructure – The Fifth Elephant, Bangalore, 2014

I spoke at the premier data conference in the country yesterday, the Fifth Elephant run by HasGeek, about using OpenStreetMap as infrastructure for geographic datasets and applications featuring Moabi – a collaborative mapping platform for monitoring natural resources extraction in the Democratic Republic of Congo. Find the summary of the narration here and a more detailed outline here. The slide deck is below.

Download (PDF, 6.12MB)



Bringing together map enthusiasts in Bangalore.

A few months back, Kaustubh and I started GeoBLR – Bangalore’s monthly spatial gathering. Few weeks back we did the fifth meetup. I have been wanting to write why I’m organising GeoBLR and more importantly, why I am taking it slow.

We started off GeoBLR to keep it quite different from other events in town. To begin with, we don’t necessarily have someone speaking at all the events. The GeoBLR events are more open in terms of what we want to do. GeoBLR runs closely with the DataMeet on issues centric to spatial data and maps. We had several discussions around the spatial data situation in India and how building proper relationships at the government and non-government level is important for the community to gain access to the datasets.

There has never been a space in Bangalore to talk about issues like this. When proprietary companies around the world are trying to be the single source of ground truth, opening map data is a critical task. Improving OpenStreetMap is at the core of our agenda. GeoBLR will be a space for people to come and talk about maps, discover new tools, identify missing datasets, find them, and solve spatial data related issues.

I have been getting fair amount of requests to turn GeoBLR into a series of workshop and I’m not keen on that idea. GeoBLR, in a way, is an experiment for me. From simple things like ‘which day of the week/month is best for people to attend the event’ to ‘what model the event should focus’, I’m learning a lot and subsequent events get more clarity.

If you want to learn more, join us!

Conversations at the Mozilla Summit 2013.

I was privileged to attend the Mozilla Summit this year in Santa Clara. Over the period of three days, I had some interesting conversation with people from different technology backgrounds and this post is more of a self reference for me to come back to.

Robert Kaiser and I had an interesting conversation about his maps application for FirefoxOS. His app uses just HTML Canvas to visualise the tiles on the client side and also take care of all the interactions – no third party libraries.

I grabbed breakfast with Mark Giffin one morning and we started talking about rendering indic language and making them print ready. At Akshara, we use several techniques like Phantom.js to achieve this. Mark suggested that we should conside DITA. DITA provides comprehensive solutions for typesetting.

Amir Aharoni of the Wikimedia Foundation joined our discussion and introduced Firefox as part of the solution. Pointing out that Firefox works very well in rendering indic language from his experience working with the language team at Wikimedia. That’s most of what we are doing at Akshara right now, but there are local dialects which need more work.

I have known James Hughman for couple of years now, since his visit to India for the Droidcon. He joined Mozilla recently and I was excited by the fact that I would get to see him at the summit. We spoke about the books that we are reading, the new DRM policies and so much.

Toby Elliot introduced the new location services that Mozilla is building. I had  a chat with him about how we can use OpenStreetMap data and probably help improve the infrastructure. There’s a very exciting email thread going on between us right now to figure out how we can get this going.

Bill Walker was curious about the new maps project that we are doing in Congo. His brother being an archeologist does a lot of mapping and have been considering building platforms for collaborative mapping. We shared and talked about some of the existing systems and how we can adapt them for the custom usecases.

There are more people that I have spoken to than the above, but definitely these are the conversations that will continue and probably make way for more posts!

The Reader’s Digest Great World Atlas of 1961.

I acquired the Reader’s Digest Great World Atlas of 1961, first edition, yesterday at a very old bookstore in Bangalore.


It’s an amazing addition to my collection of maps. Interestingly, I couldn’t find any information about the atlas on the Internet.

The atlas was ‘planned’ under the direction of the famous geographer Frank Debenham. With involvement of the British and Foreign Bible Society, British Broadcasting Corporation, FOA, WHO, Information Service of India and numerous other organizations and individuals, the atlas is spread out in four sections.

Paradise is somewhere in the far east. Jerusalem is the center of all nations and countries, and the world itself is a flat disk surrounded by oceans of water. So the monks, map-makers of the Middle Ages, saw the world they lived in.

Jerusalem was considered to be the center of the world, while the geographic center was first calculated in 1864, revised in 1973 and finalised in 2003 by Andrew J. Woods. The atlas attempts to fix these wrong notions by collecting the sum of knowledge from the explorations and scientific discoveries at that time.

The first section called the Face of the World portrays some fascinating relief maps like the ones below. They are structurally and geographically to the utmost detail that I’ve seen in any of the old representations.

reliefreleif 2

The atlas employs various projections like the Conic, Lambert’s Azimuthal Equal Area, Bonne and the Van der Grinten projection. The following map of oceans is better depicted in the Van der Grinten projection.


And the following interesting illustration about continental drift.


There are more to the atlas than these. I hope to post them when I find time to read through this amazing record of history.


Designing a New Map Portal for Karnataka Learning Partnership.

Wrote a rather detail post about the new maps for Karnataka Learning Partnership on the blog.

The map is an important part of our project, action and process because it serves as the pivot point of navigation. I will quickly talk about the data and tools before we discuss the design aspects.

We have a fairly large dataset of schools in Karnataka. The name of the school, location, number of girls and boys etc. in a database. Fortunately, the data was clean and properly stored in a PostgreSQL database with PostGIS extensions. Most of my task was to modify the API to throw GeoJSON to the client using the ST_AsGeoJSON function and export the data.

read more…

Visualizing SSLC results over seven years.

I spent most of the last one month building a dashboard for the SSLC results at the Karnataka Learning Partnership. We released it in beta yesterday and here’s the blog post I wrote detailing how we went about it.

Patterns in examination results are something which we are always interested at the Karnataka Learning Partnership. After the design jam in June 2012, where we tried to understand the SSLC data – it’s content and structure, and visualized performance of Government and Private schools in contrast to each other, we decided to take a step deep and find patterns from the past seven years. Results of this effort is what you find here, in beta.

The Karnataka Secondary Education Examination Board shared the data over the last seven years in a combination of several Microsoft Access Database. It came with very little meta data and Megha did all the hard work of making sense of this and pulled it into a PostgreSQL database. Inconsistencies are everywhere, and the quality always depends on how you handle each exception in isolation. Among other things, we decided to look at three aspects of the data to begin with – performance of Government and Private schools, performance in Mathematics, Kannada and English, and performance of each gender. All three across seven years (from 2004-2005 to 2010-2011) for each district in Karnataka.

One of the important data wrangling that we did this time was to aggregate this data at the district level. The raw data came at the educational district level and unfortunately, we did not have geographic boundary shapes for this classification. What we have instead is the geographic boundary at the political level. We massaged the shapefiles, geocoded the data and converted it to GeoJSON in QGIS. We wrote a bunch of Python scripts to perform the aggregation and generate JSON (JavaScript Object Notation) required for the visualization. Every bit of code that we wrote for this project is on Github.

A dashboard of this sort is something which we have never attempted, and honestly it took a while for us to get around it. I had tried D3.js sometime last year and found it to be amazing. D3 is Data Driven Documents, a brilliant JavaScript library to make infographics on the web driven completely by the data. What makes D3.js awesome for me is that everything is an SVG (Scalable Vector Graphic), and there are barely any limits to the representation and interaction that you can bring into the browser with it. I’ve had good experiences with Twitter’s Bootstrap to quickly design and be consistent on the page layout and aesthetics. There are some issues while you work with D3.js and Bootstrap together, especially the way bootstrap manages events. The best way is to trust D3.js and use Bootstrap features of scaffolding and layout.

We found few interesting facts from this exercise. As you may guess, private schools perform better than government schools consistently. Western districts like Udupi, Uttar Kannada and Belgaum performs better than rest of the state. North Karnataka, especially Bidar performs terribly across the last seven years. Something which we are very curious to know why. Bangalore Rural performs better than Bangalore Urban. Government schools does much better and comparable to private schools in Bangalore Rural than Bangalore Urban. Private schools grab the cap in all the three subjects across the last seven years. Girls performs way better than boys in private schools consistently across seven years in every district. Boys does a better job in Bangalore Urban while girls dominate in Bangalore Rural.

This research will continue while we churn few more aspects from the data as the dashboard gets out of beta.