About the project

Why this project?

For a country of 1.2 billion people, India doesn't have a single public data portal that provides easy access to crime data. Getting India's crime data has always been painful. The National Crime Records Bureau releases an annual report more than 600 pages long, detailing the different kinds of crimes recorded in the country. Below is a screenshot from the latest report i.e. 2014.

This report goes on for hundreds of pages with different tables and charts. Every year, journalists, social scientists, academics and anybody else interested in this data spend hours ploughing through these dense PDF documents to analyze and process this information.

And if you want to compare crimes across cities and different years, all you can do is download the PDFs for each year and manually put together a spreadsheet for analysis.

How does this website help?

This application frees the data trapped in these PDFs, so that you can easily compare crimes in different cities and across different years. You can choose a city and know which crimes are affecting its people the most. Or you can choose a crime and know which cities have the highest rates of that particular crime.

We've made it easy for you to discern trends and patterns from the five latest available data sets for crimes in India i.e. 2010 to 2014. You don't have to download any PDF documents or convert them into spreadsheets.

The data

The data comes from India's National Crime Records Bureau, which releases a compiled report of crimes recorded by police stations all around the country. The app has data from 2010 to 2014 i.e. the five latest years of data available. The latest report can be found here. Reports from the previous years can be found here.

There are some assumptions and modifications to the data that are necessary to point out.

Credits

This application was built by Saurabh Datar as part of his thesis project for the Stanford Journalism program. It would not have been possible without the guidance of Dan Nguyen and all other professors at the Department of Communication at Stanford University.

The icons used on the home page have been designed by parkjisun from The Noun Project.

The map of India is courtesy Datameet. It was converted to a TopoJSON using mapshaper.

Technical details

Since the data came in PDFs, ABBYY FineReader (courtesy Dan Nguyen) was used to convert them into spreadsheet format. The data cleaning was done in Python, with the occasional use of Pandas for wrangling and pivot tables.

The app itself is powered by Python's Flask framework. The website mainly uses Bootstrap for styling and responsive design. The visualizations are powered by Highcharts. The map on the front page is made in D3.