OpenStreetMap as Infrastructure – The Fifth Elephant, Bangalore, 2014

I spoke at the premier data conference in the country yesterday, the Fifth Elephant run by HasGeek, about using OpenStreetMap as infrastructure for geographic datasets and applications featuring Moabi – a collaborative mapping platform for monitoring natural resources extraction in the Democratic Republic of Congo. Find the summary of the narration here and a more detailed outline here. The slide deck is below.

Download (PDF, 6.12MB)

 

 

‘He who controls the map, controls the future’ – On Maps and Politics.

Last week, I spoke at the National Institute of Design, Bangalore about maps and how they involve in politics. This is a topic that I have been wanting to explore. Many thanks to Riju and Sanjay for all their inputs. Here’s the slide deck.

Bringing together map enthusiasts in Bangalore.

A few months back, Kaustubh and I started GeoBLR – Bangalore’s monthly spatial gathering. Few weeks back we did the fifth meetup. I have been wanting to write why I’m organising GeoBLR and more importantly, why I am taking it slow.

We started off GeoBLR to keep it quite different from other events in town. To begin with, we don’t necessarily have someone speaking at all the events. The GeoBLR events are more open in terms of what we want to do. GeoBLR runs closely with the DataMeet on issues centric to spatial data and maps. We had several discussions around the spatial data situation in India and how building proper relationships at the government and non-government level is important for the community to gain access to the datasets.

There has never been a space in Bangalore to talk about issues like this. When proprietary companies around the world are trying to be the single source of ground truth, opening map data is a critical task. Improving OpenStreetMap is at the core of our agenda. GeoBLR will be a space for people to come and talk about maps, discover new tools, identify missing datasets, find them, and solve spatial data related issues.

I have been getting fair amount of requests to turn GeoBLR into a series of workshop and I’m not keen on that idea. GeoBLR, in a way, is an experiment for me. From simple things like ‘which day of the week/month is best for people to attend the event’ to ‘what model the event should focus’, I’m learning a lot and subsequent events get more clarity.

If you want to learn more, join us!

Crowdsouricng Data: Strategies, Stories, Tools.

I was a facilitator at the Techcamp Bangalore where I introduced several strategies of crowdsourcing data to the broad group of non-profit organisations. My idea was to walk the participants through two stories that I’m personally part of – Akshara Foundation and the Humanitarian OpenStreetMap Team – by asking four key questions – What data to collect, from whom to collect, how to collect and, how to verify. We had some interesting conversations at the event but most of them were in to learn about more tools. I put together a small slide deck and a repository for collecting tools and reading material.

Data and Maps workshop for IT for Change, Bangalore.

Few weeks back, I ran a three day workshop for the developers at IT for Change in Bangalore. They have been quite involved in OpenStreetMap by mobilising graduate students in various towns of Karnataka to map public infrastructure. Apart from this spatial data, they have collected demographics about these towns on several aspects. The workshop was intended to give them a complete coverage about geospatial technology, data and representation. The outline and code are available on Github.

Conversations at the Mozilla Summit 2013.

I was privileged to attend the Mozilla Summit this year in Santa Clara. Over the period of three days, I had some interesting conversation with people from different technology backgrounds and this post is more of a self reference for me to come back to.

Robert Kaiser and I had an interesting conversation about his maps application for FirefoxOS. His app uses just HTML Canvas to visualise the tiles on the client side and also take care of all the interactions – no third party libraries.

I grabbed breakfast with Mark Giffin one morning and we started talking about rendering indic language and making them print ready. At Akshara, we use several techniques like Phantom.js to achieve this. Mark suggested that we should conside DITA. DITA provides comprehensive solutions for typesetting.

Amir Aharoni of the Wikimedia Foundation joined our discussion and introduced Firefox as part of the solution. Pointing out that Firefox works very well in rendering indic language from his experience working with the language team at Wikimedia. That’s most of what we are doing at Akshara right now, but there are local dialects which need more work.

I have known James Hughman for couple of years now, since his visit to India for the Droidcon. He joined Mozilla recently and I was excited by the fact that I would get to see him at the summit. We spoke about the books that we are reading, the new DRM policies and so much.

Toby Elliot introduced the new location services that Mozilla is building. I had  a chat with him about how we can use OpenStreetMap data and probably help improve the infrastructure. There’s a very exciting email thread going on between us right now to figure out how we can get this going.

Bill Walker was curious about the new maps project that we are doing in Congo. His brother being an archeologist does a lot of mapping and have been considering building platforms for collaborative mapping. We shared and talked about some of the existing systems and how we can adapt them for the custom usecases.

There are more people that I have spoken to than the above, but definitely these are the conversations that will continue and probably make way for more posts!

The Reader’s Digest Great World Atlas of 1961.

I acquired the Reader’s Digest Great World Atlas of 1961, first edition, yesterday at a very old bookstore in Bangalore.

cover

It’s an amazing addition to my collection of maps. Interestingly, I couldn’t find any information about the atlas on the Internet.

The atlas was ‘planned’ under the direction of the famous geographer Frank Debenham. With involvement of the British and Foreign Bible Society, British Broadcasting Corporation, FOA, WHO, Information Service of India and numerous other organizations and individuals, the atlas is spread out in four sections.

Paradise is somewhere in the far east. Jerusalem is the center of all nations and countries, and the world itself is a flat disk surrounded by oceans of water. So the monks, map-makers of the Middle Ages, saw the world they lived in.

Jerusalem was considered to be the center of the world, while the geographic center was first calculated in 1864, revised in 1973 and finalised in 2003 by Andrew J. Woods. The atlas attempts to fix these wrong notions by collecting the sum of knowledge from the explorations and scientific discoveries at that time.

The first section called the Face of the World portrays some fascinating relief maps like the ones below. They are structurally and geographically to the utmost detail that I’ve seen in any of the old representations.

reliefreleif 2

The atlas employs various projections like the Conic, Lambert’s Azimuthal Equal Area, Bonne and the Van der Grinten projection. The following map of oceans is better depicted in the Van der Grinten projection.

vandergrintens

And the following interesting illustration about continental drift.

continents

There are more to the atlas than these. I hope to post them when I find time to read through this amazing record of history.

 

The Last 30 days.

I can’t believe that I’m sitting in my Bangalore home and writing this post after what happened in the last 30 days. I don’t have words to thank all those amazing people who took care of me over these days to bring me back and bouncing. Not quite there yet, but in a while. But I’m alive, for that matter.

I was between Italy and Germany during June 23 – July 9. We had an amazing time at the Info Activism Camp and later in Berlin with Kaustubh and Rome with Tin. It was fantastic. Towards the end of the trip I was quite tired from a sunstroke and irregular fever. On my flight back the fever decided to test the case and did the trick. 109 degree Fahrenheit with rigor. I arrived in Bangalore the next morning and went straight to a hospital.

From there until the last week, I have been to 4 hospitals, consulted 9 doctors, subjected to 7 blood diagnosis, 4 different radio-imaging, 3 antibiotics and a lot of stress. This was no fun. Not to any extent. I’ve cried and I’ve seen my mum crying at the same time. I was struck by an unidentifiable fever. I’ve lost weight and hair, and for whatever reasons my heart is heavy and life is rough.

It took a while to identify that I was suffering from a precursor of Enteric Fever. I’ve recovered now, though hopes weren’t too high in my mind. Time heals and patience count.

I want to thank Rahul – for coming over to check on me while I was down in Bangalore, staying over without sleep, taking care of me and taking me to another hospital the next day. I want to thank my mum and dad. I’ll easily run out of words here. What they went through is nothing compared to the pain I suffered. My aunts and brothers – for sending me food and supporting mum whenever she was alone in the hospital. I want to thank Gautam – for taking care of everything so that I could stay away from work as long as I wanted, checking on me and sending me one of my favorite books when I was getting bored. Francesca and Ashima – for talking to me when I wanted to. RijuShashank and Ayesha for letting me know that they miss me and I need to be all right soon.

And thank you everyone – your prayers and wishes helped me through.

Designing a New Map Portal for Karnataka Learning Partnership.

Wrote a rather detail post about the new maps for Karnataka Learning Partnership on the geohackers.in blog.

The map is an important part of our project, action and process because it serves as the pivot point of navigation. I will quickly talk about the data and tools before we discuss the design aspects.

We have a fairly large dataset of schools in Karnataka. The name of the school, location, number of girls and boys etc. in a database. Fortunately, the data was clean and properly stored in a PostgreSQL database with PostGIS extensions. Most of my task was to modify the API to throw GeoJSON to the client using the ST_AsGeoJSON function and export the data.

read more…

Indic Wikipedia: Visualizing Basic Parameters.

Riju (Sumandro) and I are with The Centre for Internet and Society to understand how the Indic Wikipedia community is growing. Today, we published the first set of visualizations and a blog post about why, what and how we did this.  Cross-posting from the CIS website.

Introduction

Understanding how the Indic or the Indian language Wikipedia projects are growing is something that we have been interested in for quite sometime. We were delighted to come across this opportunity from the Centre For Internet and Society (CIS) and Wikimedia Foundation. We divided our analyses into three focus areas: (1) basic parameters, (2) geographic patterns of edits, and (3) exploring the topics that receives the greatest number of edits. The existing infographics and data visualisations that we found about Indic Wikipedias mostly engaged on the first area, and also emphasised on yearly aggregates. We thought a more granular, that is monthly, understanding and a focus on the geographic and thematic spread of the edits would be very helpful to further appreciate the activities.

We began by collecting data about the following basic parameters:

  1. Number of Editors
  2. Number of Articles
  3. Page Views
  4. Number of Active Editors
  5. Number of New Articles
  6. Number of New Editors
  7. Edit Size

Acquiring the data

We explored the MediaWikiAPI, ToolServer and the Wikimedia Statistics Portal. These are several ways of obtaining data about Wikipedia in general. Depending on the use case, such as the quantity of data required or the need for customised/selective data scraping, any one or more of these methods of data gathering can be chosen. The API had limitations in terms of how much data you can access, and it is meant to be used to access actual Wikipedia entries. We, however, were looking for metadata about the entries/articles (such as when it was first created, when and how many times it was edited, etc.) and not the actual entries/articles, that is the actual contents of Indic Wikipedias. ToolServer is an excellent way of running custom scripts. Although, this takes for granted that user (of ToolServer) has substantial command over the back-end infrastructures and processes that Wikipedia runs on. We wrote a few scrapers to extract metadata about Indic Wikipedia projects from the ToolServer but not exactly being experts in the Wikipedia back-end systems, we found scraping from ToolServer rather time-and effort-intensive. The statistics portal is a well organised and an accessible place for collecting data for analyses. However, we came across several missing parameters and projects, that is the statistic portal did not have all the parameters and Wikipedia projects we were interested in. In our search for Indic Wikipedia datasets so far, we realised that the Wikimedia Analytics Team (WAT) puts a lot of effort in writing scripts and collecting various data at different levels. Wikimedia developer Yuvi Panda and the Access to Knowledge team at CIS, aware of our difficulty in obtaining the data, also pointed us towards the WAT. While we were already scraping data on some of the parameters, we approached the WAT whose prompt and very supportive response much accelerated our work process. The fantastic Wikimedia developers, especially Evan Rosen (a big ‘thank you’ for him) shared the needed data, which we cleaned up and archived at the Github repository for the project.

We obtained data for the period from January 2001 to December 2012. It appears that the Indic Wikipedia projects began their activities around 2005. A big part of cleaning the data involved identifying when each of the projects started and dropping data. There are 20 Indic Wikipedia projects with 4,98,964 articles, 5,689 editors and over 3,35,49,102 readers.

Deciding upon chart types

We spent quite some time discussing different methods of visualising the data. The major difficulty is that there are too many entities to be plotted. As each language must be plotted as a separate entity — point, line, circle, etc. — the chart has a tendency to become cluttered and illegible. Even if we take only one variable — say New Editors — there will still be 20 points or lines to be plotted. Hence, using any of the conventional charts becomes difficult. For example, if we chose a line chart with New Editors on the Y-axis and months on the X-axis, there will be 20 lines each of a different colour, representing different languages. Also, the five-six year monthly timeline translates into 60-72 temporal data points.

We have adopted two strategies, and related chart types, to address this difficulty.

Firstly, we used a monthly calendar-like heatmap chart that limits the temporal spread of data to one year for each section of the chart and uses a positionally uniform set of columns for each language so as to make reading the chart easier. Limiting each chart section to 12 months allow the user to focus on more granular movements of the variable concerned, say the number of New Editors per month. By representing each languages on an unique column, and not by an upwards-and-downwards moving line as in a line chart, makes it easier for the user to follow movements in each language (where movement is shown by the intensity of colour, as characteristic of heatmaps) without the need to have a separate coloured entity — point, line, circle — for each language.

Secondly, we used a motion chart, as made famous by Dr. Hans Rosling, that removes the temporal axis from X- and Y-axes of the chart and uses animated transition to represent temporal change. Motion chart has the unique ability to handle as many as five variables in an organised manner, using the following visual elements: X-axis, Y-axis, Z-axis (animated temporal transitions), size of bubbles, and colour of bubbles. It is, however, recommended that represented variables be limited to a maximum of four for easier legibility. In our case, we have used the X- and Y-axes to plot various related variables (which can be selected by the user) such as New Editors and New Articles, the Z-axis to represent time, and the colour of the bubbles to represent a third optional variable (also can be selected by the user). Since different Indian language Wikipedia projects often take a wide range of values for most variables, using the size of the bubble to represent any of those variables is avoidable. Further, the motion chart gives the user a lot of controls to explore the various projects and variables according to their interest and especially to compare particular projects and variables to each other.

Discussing the chart types with the Access to Knowledge team, we decided to use simpler line charts — emphasising upon single Indic Wikipedia projects — on the language-specific pages that we will be creating next.

Calendar charts

Calendar Chart

We visualised three parameters using the calendar heatmap strategy: (1) New Articles, (2) New Editors, (3) Active Editors.

The New Articles Calendar shows new articles posted on every Indic Wikipedias for every month since 2004. It was interesting to note the few number of articles in 2012 for all the languages. The first language to have the most number of new articles is Bengali. Hindi picks up around same time with fewer number of articles. Except Urdu and Nepali, every other language dropped in the number of new articles. However, we should remember that a lower number of new articles does not necessarily indicate at low overall activity in the project concerned.

Like the new articles, we wanted to explore the patterns in the number of new editors across all of the Indic Wikipedia projects. As you run through the new editors calendar chart, it is evident that there is consistent growth in the editor base for few projects like Hindi, Marathi, Bengali, Telugu, Tamil, Kannada and Malayalam. If one takes a step back and compares this with the number of new articles chart, something is not very clear — in some of the projects, there is a growth in the number of editors but not many new articles are posted. We are very keen to understand why this has happened.

If we look at the active editors calendar, Tamil started with 2 active editors in January 2004 and with few ups and downs grew to about 115 active editors in December 2012. Malayalam started slow in late 2004 with 2 editors and grew to 155 active editors in December 2012. We are sure the viewers should be able to find out more patterns by studying the charts closely and comparatively.

Motion chart

Motion Chart

We developed a motion chart comparing five variables: (1) Active Editors (> 5 edits per month), (2) New Editors, (3) Total Editors, (4) New Articles, and (5) Total Articles. When the visualisation is opened, Total Editors is plotted on the X-axis, Total Articles is plotted on the Y-axis, the colour of the bubbles indicate the Active Editors (Blue is low and Red is high) and the sizes of the bubbles are kept the same for easier comparison.

The user can click on the drop down menus at the X- and Y-axes, and next to the size and colour variables, and make them represent different variables.

We chose to configure the X- and Y-axes to show the data in logarithmic scales and not in linear scales. Since most projects experience small increments over time and there exists a wide difference between the most and the least popular/active projects, the logarithmic scale is better suited to represent the changes in the given data. The user has the option to select linear scale at the end of both X- and Y-axes (click on “Log”).

As evident in the visualisation, the Newari project and the Hindi-Malayalam project cluster show very interesting contrasting dynamics — while both achieve similar Total Articles numbers, the latter is much more editor-heavy. This suggests a smaller but more active editor community for the Newari project.

Please click on the image of the motion chart below to open the interactive version in a separate window. The code can be accessed at the project repository on Github.