Methodology - Brazil Open Data Census

This page explains the methodology behind the 2017 (Local) Open Data Index. If you have any further questions or comments about our methodology, please reach to us through the Open Data Index forum.

The Open Data Index (ODI) is an independent assessment of open government data publication from a civic perspective. ODI enables different open data stakeholders to track government’s progress on open data release. ODI also allows governments to get direct feedback from data users. The Index gives both parties a baseline for discussion and analysis of the open data ecosystem in their country (or cities) and internationally. We encourage all interested parties to participate in an open dialogue to allow for ownership of the results and to make the Index as relevant as possible.

Research scope

Like any other benchmarking tool, ODI tries to answer a question. In our case, the question is as follows: How do governments around the world publish open data?

From this question, other important questions emerge, such as:

Which governments readily publish open data? Which governments still need to improve open data publication?
What is the most open dataset? What is the least open dataset?
Which aspects of open data are easiest or hardest to implement?

In this year’s edition, we also experimented and measured aspects of “practical openness” like data findability and usability. These are also acknowledged by the International Open Data Charter Principles. The information we gained from this assessment is displayed in the results and is available to download. It will also inform internal research which can be tracked on GitHub.

What ODI does NOT cover?

ODI intentionally limits its inquiry to the publication of national government data. It does not look at other aspects of the common open data assessment framework such as context, use or impact. This narrow focus enables it to provide a standardised, robust, comparable assessment of open data around the world. While we are only looking at publication, we are yet to cover data quality which is a significant barrier to reuse. We hope we will be able to do this in the future.

Research assumptions

This section presents the key assumptions that were taken into consideration while collecting and assessing the data.

Different countries have different governance structures (Federal vs. National government, etc.) and different policies regarding open data. We set out here our key assumptions that inform our approach and that were taken into consideration while collecting and assessing the data.

Assumption 1: Open data is defined by the Open Definition.

We define open data according to the Open Definition. The Open Definition is a set of principles that define openness of data and content. It is also simple and easy to operationalise. We note one small deviation from the current v2.1 of the Open Definition. The only part of our methodology that is not aligned with the Open Definition is our assessment of ‘open, machine-readable’ formats. We give a full score to machine-readable formats even if their source code is not open. Instead, formats must be usable with at least one free and open source software. Thereby the Index gives preference to practical openness over the actual openness of a format.

Assumption 2: The role of government in publishing data.

In the past, there have been questions about the role government should play to ensure the publication of open data. Government services may be privatised, which means the data can be owned and produced by a company and not the state. We assume that for the key data categories we survey, the government has a responsibility to ensure their publication, even if it is held and managed by a third-party.

Assumption 3: The (Local) Open Data Index is an indicator for cities.

We acknowledge that not all cities have the same political structure. It is possible that not all of the sub-national governments produce the same data as they are potentially subject to different laws and procedures. ODI, therefore, assess data publication in the city-level but not necessarily provided by the city government. Sometimes, “National” publication of open data overwrites city-level efforts because of three main reasons:

The data describes national government processes or procedures (government entities operating on the highest administrative level).
The data is collected or produced by national government or a national government agency (on highest administrative level).
The data describes national parameters and public services for the entire national territory but is collected by sub-national actors. Only in cases where we see legal and administrative autonomy from a higher government, ODI will look into sub-national territories individually.

What data does the Index look at?

ODI measures the openness of clearly defined data categories. Any open data that does not fall within these categories is not regarded for our assessment. All Index scores exclusively refer to our data categories and should be understood as a proxy for the availability of open government data at large. This has three reasons. Firstly, ODI assesses open government data that has proven to be useful for the public. User stories helped us to define categories that are most useful for the public. Secondly, ODI is a comparative indicator. In the past, we have used broader categories and compared very different datasets, at the expense of comparability. Thirdly, a standardised procedure supports our researchers to reduce bias and personal judgement.

Each data category contains the following information:

A minimum of 3 characteristics: The data characteristics describe the mandatory content of a dataset. Usually, all data characteristics are required to qualify for assessment. Usually if a dataset is missing one of the characteristics, it will be considered that the dataset is not published. For two categories - water quality and draft legislation we have lowered the bar by making some characteristics optional. This is because we are trying to understand better what data is out there and to improve definitions for these datasets in the future.
Aggregation level: Some data is available in different levels of aggregation. For example, water quality data can exist for each individual water source, or it can be presented as total annual pollution for regions or the country. In most cases ODI assesses detailed, disaggregated data. Comprehensive data increases the use cases and broadens the insights people can draw from it. The International Open Data Charter also emphasises that the data should be published in its raw, original format as disaggregated data. Being clear about the aggregation level helps to guide our researchers looking for the correct dataset.
Time intervals: Different datasets are updated in different time intervals. Our survey includes the question “This data should be updated every [TIME INTERVAL]. Is it up-to-date?” to assess whether data is up-to-date. Data that is not up-to-date often is less useful.

Governments often publish data on multiple websites, and in many files and formats. To make an informed and consistent decision about which data to pick, reviewers followed two approaches:

1) Choosing one reference dataset: Reviewers find one reference dataset or file that contains all relevant characteristics. They answer the survey using this dataset. This can be a CSV file, a shapefile, or data presented on a website. If reviewers have to choose between two or more similar datasets, they should choose the one that scores highest and document their choice in a comment.
2) Referencing multiple datasets (if one reference file is not available): Reviewers could not find a reference dataset because the data is split across many files, formats and places. In this case, they refer the survey to different files. It is important that the sum of these files contains all required data characteristics. Example: if one dataset displays vote on bills and are in a machine-readable format, but another one contains bill texts and is not machine-readable, then the data is not considered to be machine-readable.

The list of data categories

Our data categories reflect key data that is relevant for civil society at large. The categories have been developed in partnership with domain experts, including organisations championing open data in their respective fields. In some cases, we base our definition on international data production and reporting standards used by governments around the world. Each year we refine our definitions to reflect learnings from these experts.

Government Budget

What we look at? City government budget at a high level. This is planned government expenditure for the upcoming year, and not the actual expenditure.To develop this category the Index drew on work from Open Spending.

Why we look at it? Open budget data allows for well-informed publics. It showing what money is spent on, how public funds develop over time, and why certain activities are funded. See here a list of cases how budget data has been used in the past.

Characteristics Following data must be online to qualify for assessment: Budget for each government department, ministry, or agency, Descriptions for budget sections, Level of granularity: Budget separated into sub-department, political program, or expenditure type.

Government Spending

What we look at? Records of actual (past) city government spending at a detailed transactional level. Data must display ongoing expenditure, including transactions. A database of contracts awarded or similar will not be considered sufficient. Also, a database only showing subsidies will not be sufficient. To develop this category the Index drew on work from Open Spending.

Why we look at it? Open spending data shows whether public money is efficiently and effectively used. It helps to understand spending patterns and to display corruption, misuse, and waste.

Characteristics Following data must be online to qualify for assessment: Government office which had the transaction, Date of transaction, Name of vendor, Nominal amount of individual transaction, Level of granularity: Individual record of each transaction.

Procurement

What we look at? All tenders and awards of the city government aggregated by an office. It does not look into procurement planning or other procurement phases such as implementation (i.e. actual money transfers, which are part of our spending category). To develop this category the Index drew on work from the Open Contracting Partnership.

Why we look at it? Open procurement data may enable fairer competition among companies, allow to detect fraud, as well as deliver better services for governments and citizens. Monitoring tenders helps new groups to participate in tenders and to increase government compliance.

Characteristics Following data must be online to qualify for assessment: Tender phase: Tenders per government office, Tender name, Tender description, Tender status. Award phase: Awards per government office, Award title, Award description, Value of the award, Supplier's name

Election Results

What we look at? This data category looks at results for the latest mayoral electoral contest. Election data informs about voting outcomes and voting process. What are electoral majorities and minorities? How many votes are registered, invalid, or spoilt? The Index consulted the National Democratic Institute (NDI) to develop this data category.. For more information, see the NDI’s Open Elections Data Initiative.

Why we look at it? To enable the highest level of transparency, the Index assesses polling station-level data. Polling stations are the locations at which voters cast their vote. Having this data allows for independent scrutiny of each stage of the voting and counting process. It also helps electoral stakeholders better target their voter education and mobilization efforts for the next elections.

Characteristics Following data must be online to qualify for assessment: Results for mayoral electoral contests, Number of registered votes, Number of invalid votes, Number of spoiled votes (not required, if a digital voting system is assessed, that does not recognize spoiled votes), Level of granularity: Data available at polling station level.

Company Register

What we look at? Lists of registered (limited liability) companies in the city. The submissions in this data category do not need to include detailed financial data such as balance sheets.This category draws on the work of OpenCorporates.

Why we look at it? Open data from company registers may be used to many ends: enabling customers and businesses to see with whom they deal, or to see where a company has registered offices.

Characteristics Following data must be online to qualify for assessment: Name of company, Company address, Unique identifier of the company, Register available for entire city (usually assessed through sample: it is answered with “Yes” if a register indicates companies in different regions)

Land Ownership

What we look at? Maps of lands with parcel layer that displays boundaries. Also a land registry with information on registered parcels of land.The assessment criteria were developed in collaboration with Cadasta Foundation. For more information on land ownership datasets, see Cadasta Foundation's Data Overview.

Why we look at it? The Index focuses on assessing open land tenure data (describing the rules and processes of land property). Responsible use may enable tenure security and increase the transparency of land transactions.

Characteristics The following characteristics must be included in cadastral and registry information submitted: Parcel boundaries, Parcel ID, Property Value (price paid for transaction or tax value), Tenure Type (public, private, customary, etc.)

City Maps

What we look at? A geographical map of the city including traffic routes, stretches of water, and markings of heights. The map must at least be provided at a scale of 1:250,000 (1 cm = 2.5km), a scale feasible for most countries. The Index developed this category based on a landmark report of the United Nations Committee of Experts on Global Geospatial Information Management (UNGGIM).

Why we look at it? Geographic information is instrumental for many use cases, including journey planning, the mapping of topography, as well as demographic indicators.

Characteristics Following data must be online to qualify for assessment: Markings of traffic routes, Markings of relief/heights, Markings of water stretches, City borders, Coordinates - Note: To qualify, data must contain geographic projections that enable to interpret coordinates

Administrative Boundaries

What we look at? Data on administrative units or areas defined for the purpose of administration by a (local) government.The development of this category draws on work of FAO Global Administrative Unit Layers (GAUL)project, as well as the UNGIWG.

Why we look at it? Open data about administrative zones has many use cases: Who are the candidates in my region? Which government bodies administer my region? How is wealth distributed across regions?

Characteristics Following data must be online to qualify for assessment: Boundary level 1 (e.g.: administrative areas), Boundary level 2 (e.g.: neighborhoods), Coordinates of administrative zones (latitude, longitude), Name of polygon (department, region, city), Borders of polygon - Note: To qualify, data must contain geographic projections that enable to interpret coordinates.

Locations

What we look at? A database of postcodes/zipcodes and the corresponding spatial locations regarding latitude and longitude (or similar coordinates in an openly published coordinate system). The data has to be available for the entire city. The Index drew on work of the Universal Postal Union to develop this category.

Why we look at it? Open location data shows the addresses of public and private buildings. While mainly used to route postal services, this data has many use cases: to calculate the number of persons in a city district, to provide homes with services, or for direct mailing and marketing.

Characteristics Following data must be online to qualify for assessment: Zipcodes, Addresses (required, if zip code does not include the address), Coordinates (latitude, longitude), Data available for entire city - Note: To qualify, data must contain geographic projections that enable to interpret coordinates

City Statistics

What we look at? Key city statistics on demographic and economic indicators such as Gross Domestic Product (GDP), or unemployment and population statistics. These statistics can be published as aggregates for the entire city.

Why we look at it? As Open Data Watch states: "Official statistics provide an indispensable element in the information system of a democratic society, serving the Government, the economy and the public with data about the economic, demographic, social and environmental situation".

Characteristics Following data must be online to qualify for assessment: City Population (Required: census data, updated every year), Gross Domestic Product (measured in current or constant prices, updated quarterly, last update must not be more than 3 months ago), City unemployment (absolute numbers, or expressed as percentage of entire population, updated quarterly, last update must not be more than 3 months ago).

Draft Legislation

What we look at? Data about the bills discussed within the city council as well as votes on bills (not to be confused with passed national law). Data on bills must be available for the current legislation period. This data category draws on work by the National Democratic Institute (NDI) and the Declaration of Parliamentary Openness.

Why we look at it? Open data on the law-making process is crucial for parliamentary transparency: What does a bill text say and how does it change over time? Who introduces a bill? Who votes for and against it? Where is a bill discussed next so that the public can participate in debates?

Characteristics Following data is required. It must be online for the data to qualify for assessment: Content of bill, Author of bill, Status of bill, Votes on bill per member of the city council, Transcripts of debates on bill, Available for current legislation period.

City Laws

What we look at? This data category requires all national laws and statutes to be available online, although it is not a requirement that information on legislative behaviour e.g. voting records is available.This data category draws on work by the National Democratic Institute (NDI) and the Declaration of Parliamentary Openness.

Why we look at it? Access to open data on a city's legal code (i.e. city laws) supports compliance with law, enables to keep track of legal changes, and also enables public deliberation around a law.

Characteristics Following data must be online to qualify for assessment: Content of the law / status, Date of last amendment, Amendments to the law (if applicable).

Air Quality

What we look at? Data about the daily mean concentration of air pollutants, especially those potentially harmful to human health. Data should be available for all air monitoring stations or zones in a city. The Index evaluates the openness of key pollutants as defined by the World Health Organisation (WHO).

Why we look at it? Air quality is a key factor for human health and environment.

Characteristics Following data must be online to qualify for assessment: Particulate matter (PM), Sulphur oxides (SOx), Nitrogen oxides (NOx), Carbon monoxide (CO), Ozone (O3), Volatile organic compounds (VOCs), Available per air monitoring station/zone.

Water Quality

What we look at? Water quality data by water source. The data category regards the quality of designated drinking water sources. If data on designated drinking water sources is not available, it refers to environmental water sources (lakes, rivers, groundwater). Data per each water source is desirable.

Why we look at it? This information is essential for both the delivery of services and the prevention of diseases.

Characteristics In order to satisfy the minimum requirements for this category, data should be available on level of the following chemicals: Fecal coliform, Arsenic, Fluoride levels, Nitrates, Total Dissolved Solids, Data per water source, Available for the entire city.

Public Schools

What we look at? School-level data on enrollment and localization for the entire city, including all public schools (Kindergarten, Elementary and High School).

Why we look at it? This is one of the new dimensions for the city-level index. We understand that education is a very important aspect of development and should not be outside the index. The data required is the bare minimum of information on public schools in a city.

Characteristics In order to satisfy the minimum requirements for this category, data submitted should meet the following criteria: Enrollment, Coordinates (Address or Latitude/Longitude), Data available per public school, Data available for Kindergarten, Elementary and High School.

Crime Statistics

What we look at? These are basic indicators of crime - robbery, murder, rape and firearm seizures – available for the entire city per neighborhood or equivalent. For firearms, data should include seizure classification – e.g. weapon type and caliber – and whether it is legal or illegal.

Why we look at it? Crime statistics are important everywhere, but in the city-level for developing countries it is crucial. This new dimension draws on the most used indicators to measure insecurity, acknowledging that this is a very difficult phenomenon to measure.

Characteristics In order to satisfy the minimum requirements for this category, data submitted should meet the following criteria: Number of robbery, Number of murder, Number of rapes, Firearm seizures, Data available per neighborhood or equivalent.

Public Transportation

What we look at? Information on public transport schedule, itineraries, stations/stops and bike lanes, for the entire city. Itineraries, stops/stations and schedule should be presented for every transportation mode (bus, subway, tramway, etc) and for every line and stop/station.

Why we look at it? Also a new dataset for the city-level index, public transportation should not be outside of a discussion on open data for cities, since urban mobility is a globally discussed issue. The data required are substantial for citizen information and for app development.

Characteristics In order to satisfy the minimum requirements for this category, data submitted should meet the following criteria: Stop/stations for every transportation mode, Line schedule for every stop/station, Itineraries for every line of every public transportation mode, Bike Lanes mapping for the entire city, Data should show connections between different types of transport.