During the fourth OGP Global Summit 2016 conference held in December this year, FutureTDM together with Open Knowledge International was selected to hold a session on Text and datamining (TDM).
Introduction
The Open Government Partnership (OGP) consists of 70 member countries and hundreds of civil society organizations. Participants to the summit include Heads of State and governments, ministers, public servants, members of parliament, local authorities, civil society representatives, researchers, start-ups and digital innovators, who come together to share their experiences.
Workshop ‘Mine the Government’
The session was organised to discuss barriers and enablers for TDM in relation to governments and data made available by governments. After a brief introduction of the FutureTDM project, Freyja van den boom presented the findings and main insights from the stakeholder consultations on TDM.
After the FutureTDM project presentation our invited speakers were introduced to share their experiences and insights with the OGP session participants.
Access to information’ does not mean you are free to use it
First invited speaker was Prof Mireille van Eechoud from the Institute for Information Law, University of Amsterdam on some of the legal aspects of Open Data and TDM.
She explained how the current legal framework can be a barrier when it comes to access and re-use of public sector information for TDM. Trends she identified are as follows
- Right to information laws are increasing worldwide but they tend to be traditional and not regulate (active) disclosure of data
- Open licensing of OGP commitments could do with stronger legal backing
- To what extent intellectual property laws allow TDM is controversial, the EU copyright reform would only allow TDM for non-commercial (academic) research, not civil society (or business)
- Awareness of data protection issues is growing, but a wicked problem in a ‘data drenched society’.
The following presentations shared their insights from specific projects on open data.
How can you do text-and data mining if you can't even find the data in the first place!?
The first case study was The Global Open Data Index, which is an international ranking of the publication of open data. Katelyn Rogers from Open Knowledge International showed the current status of the Index and discussed the workings behind it.
Sharing her experiences with the Index she discussed the existence of license ambiguity, problems with data quality and the discoverability of data and finally the expertise gap which included a lack of:
- Legal expertise to understand the data licensing
- Political expertise to understand data collection
- Technical expertise to understand data quality, usability and formats
- County specific knowledge.
All of these confirm barriers we have previously identified for text- and datamining and provided useful additional insights for FutureTDM.
Diana Krebs, Project manager for fiscal data projects with Open Knowledge International presented the openspending.org use case on accessibility and data quality around obtaining fiscal data of EU members states.
Her team is currently working on a project called subsidystories.eu, which aims to offer all available European Regional and Development (ERDF) datasets on one platform. This thematic dataset collection is to help data journalists and CSOs to analyze and mine the data around funding in the EU.
Diana explained the process of making datasets available on the platform. First the datasets have to be obtained after which a lot of work goes into enhancing the quality of the available data for further usage. Once the datasets are cleaned and the csv file meets the criteria for uploading via the OpenSpending Packager the data can be made available on the platform.
A lot has improved in the field of open data over the years with governments improving their data quality and following open data standards, the Open data agenda and tools like open spending, however a lot of time still has to be spend on getting the data ready for analysis, which should no longer be necessary. For Diana the OGP summit should function as a reminder to Governments to handle their data with great care.
After the presentations there was time for a lively discussion in which the participants discussed their work which included interesting projects in Greece and Slovenia. It was a very useful workshop that provided some new insights for the FutureTDM project.
You can find more information about the OGP conference online, including some of the session recordings which have been made available here.
We will follow up with the participants and continue to reach out to the various TDM communities as the project continues. If you have anything you would like to share on any of the issues discussed, do not hesitate to contact us.
// All blog posts are the personal opinion of the bloggers. For more information see FutureTDM's DISCLAIMER on how we handle the blog. //