Policy Framework

Policy Framework

Here we introduce a policy framework below that we highly recommend using for future TDM practices in Europe. It consists of a hierarchical model of principles and recommendations, where the recommendations are concerned with regulatory and policy, as well as recommendations for ‘interventions’ or actions by influential stakeholders. The recommendations are categorised in four areas in which we identified the barriers to uptake of TDM in the EU:

  • Legal Rules & Policies
  • Education, Skills & Awareness
  • Economy & Incentives
  • Technical & Infrastructure

For each area, we first provide a short recap of the barriers that we have established. Next, the recommendations will follow, which aim to overcome these barriers. For this purpose, we have formulated three fundamental principles that stakeholders should take into account when addressing barriers:

Awareness & Clarity

The benefits of TDM should be known, and it should be clear what they are, and if they can, and how they can, be achieved.

TDM without boundaries

Within Europe, TDM should not be subject to different national rules, skills and knowledge should flow across domains and sectors, and standardisation and interoperability should be achieved for data, software and infrastructure.

Equitable access

There should be access to sources, tools, infrastructure and money to enable TDM in academia, industry and other types of organisations. However, legitimate opposing interests should be taken into account as well.

Every fundamental principle has resulted in subprinciples, that will guide more specific (groups of) stakeholders, followed by more concrete recommendations aimed at certain stakeholders. Please hoover with your mouse on footnotes or icons for additional information, where applicable.

Recap of barriers

Copyright & database law

·      Uncertain: scope of exceptions often unclear

·      Fragmented: different national scope of exceptions

·      Restrictive: broad reproduction right and narrow exceptions

Data protection law

·      Uncertain: how to comply

·      Fragmented: different national interpretations, bodies and policies

·      Restrictive: TDM of personal data potentially unlawful

Documentation

·      Uncertain: not always clear whether TDM permitted

·      Fragmented: TDM practitioners have to deal with many different licence conditions

·      Restrictive: TDM often not permitted under non-OA licences

Legend for colour coding the stakeholders

Stakeholders’ action in Content creation

Stakeholders’ action in Content dissemination

Stakeholders’ action in Text and data mining

Stakeholders’ action in Knowledge utilisation

AWARENESS AND CLARITY

Subprinciple 1 – Make clear rules on copyright, database and data protection law

@Lawmaker

Minimise borderline cases where it is unclear whether copyright or related laws apply, by clearly defining terms and concepts (e.g. research organisation, text and data mining or scientific research purposes) in law-making 1


@Policy makers

Provide explanatory documentation to accompany relevant laws, that TDM practitioners can refer to for guidance 2

Subprinciple 2 – Create guidelines on the law: what is permitted and what not?

@FutureTDM

Provide practitioner guidelines on how the law works, what the rules are and how to best comply 3

@European Data Protection Board & national data protection authorities

• Provide general guidelines on TDM to help practitioners comply with data protection law and other laws relating to information privacy and confidentiality

• Offer certification of data research and/or self-regulation and/or codes of conduct concerning TDM research or activities dealing with personal data, to provide certainty for practitioners

• Create guidelines on what is to be understood as archiving purposes and historical, statistical and scientific purposes to help practitioners understand when these apply 4


@Professional associations representing ‘personal-data intensive’ companies and research institutes

Draft self-regulation or codes of conduct ensuring compliance, in particular with information privacy and data protection rules, for companies and research institutes to follow 5


@Research institutes & (start-up) companies

Implement ethical and legal review practices for large research projects dealing with big data 6


@Research libraries & librarians

Create a physical or virtual centre or platform for knowledge on TDM licences and rules for researchers interested in TDM to consult


@Research funders

When paying for researchers to publish their results open access, require that grant receivers use well-known OA licenses, attached to the relevant publication or dataset in human- and machine-readable metadata, to help harmonise licences in OA content. Where necessary provide guidelines to help researchers do this.


@All content creators and all content providers/ disseminators

Use clear licence statements in the metadata and in the text, preferably well-known ones and in machine-readable form, to improve accessibility and re-usability of content by TDM processes 7

TDM WITHOUT BOUNDARIES

Subprinciple 1 – Create harmonised and mandatory rules on copyright, database and data protection law

@European lawmaker

• Introduce a harmonised, mandatory exception to copyright for TDM activities, and minimise the leeway for deviating interpretations by national legislators and courts to ensure harmonisation in practice as well as in principle

• Ensure this TDM exception allows for the same exempted activities under both copyright and database law

• Ensure this TDM exception does not discriminate between commercial and non-commercial organisations or activities, as this distinction may not always be possible in practice

• Ensure this TDM exception applies only in cases where practitioners have lawful access to content, to respect the rights of content owners 17

• Ensure that TDM results can be disseminated while protecting the interests of other stakeholders

• Create a fully harmonised data protection regime and minimise deviating interpretations of the General Data Protection Regulation to ensure harmonisation in practice as well as in principle

Subprinciple 2 – Harmonise licences for TDM 18

@Content creators and providers

As much as possible, minimise variation among TDM licences, and minimise the number of different TDM licences used to reduce the complexity of re-using content for TDM (with deviation only for reasons of privacy, data protection, etc.)


@Research institutions

Where practical, avoid signing licenses that prohibit or restrict TDM, to avoid unnecessary restrictions on content

Subprinciple 3 – Harmonise interpretations of key principles in data protection law

@EDPB

Achieve consensus on key concepts such as personal data, historical, statistical and scientific purposes and archiving purposes to reduce uncertainty for TDM practitioners

Subprinciple 4 – Harmonise rights in research data 19

@Research funders

Align research data sharing and research publishing conditions required of research grant recipients to ensure that all public research in Europe is available under the same – TDM friendly – conditions


@European lawmaker

Discuss with Member States the benefits of mandatory rules that ensure that a researcher may publish publicly-funded research publications in open (university) repositories or scholarly communications networks for non-commercial purposes, with consideration for reasonable embargo periods 20

EQUITABLE ACCESS

Subprinciple 1 – Ensure legal rules reflect a fair balance between the interests of TDM practitioners and rightsholders (of copyright and database rights), and reaffirm that ideas and facts as such are not protected

@European lawmaker:

Introduce a mandatory copyright exception for TDM activities that:

• requires lawful access 21 by the user

• is not overridable by contract 22

• only permits technical measures that are necessary, reasonable and proportionate to guarantee the security and integrity of content providers’ infrastructure 23

Make sure that circumvention of technical protection measures (TPMs) and digital rights management (DRM) is permitted, 24 without harming the said security and integrity


@EU and national governments

In relation to the previous recommendation, provide guidelines and best practices on what is regarded to be reasonable and proportional in the context of (technical) measures to guarantee the security and integrity of the system 25

Subprinciple 2 – Ensure rules and policies on data protection reflect a fair balance between privacy interests of data subjects and value creation from TDM

@Lawmakers

Make sure that TDM research is allowed when individuals’ privacy interests are not severely affected – and make clear when that is the case – to avoid unnecessary restrictions on TDM activity


@European Data Protection Board & national data protection authorities

Provide for procedures that promote access to data for TDM research, while safeguarding the privacy interests of data subjects

Subprinciple 3 – Encourage researchers to freely share scientific publications and research outcomes, especially when they are publicly funded, so they can be mined without restrictions 26

@Research funders

• Encourage publications from publicly-funded research to be published under an Open Access licence, such as CC-BY 4.0 (and no previous versions, since they do not include sui generis database rights), to increase access to content for TDM activities 27

• Make sure that researchers have resources to publish in OA journals when (reasonable) article processing fees apply 28

• Where practical, mandate timely deposit of a peer-reviewed version of publications in an open repository, to provide access to research results for TDM activities


@Research institutions

• Develop OA policies, that take into account impact of research and the benefits of open access to research for TDM 29

• Provide researchers with sufficient information on OA licences, OA journals and repositories to make informed decisions about OA publishing


@Researchers

Where practical, make research outputs available in a publicly available repository under standard OA licences to improve accessibility for TDM

Subprinciple 4 – Encourage open access to underlying research data, especially in publicly-funded research 30

@Researchers

Require research projects to include a comprehensive data management plan, with emphasis on the value of open data approaches, that considers and implements safeguards that address conflicting interests related to privacy, data protection and confidentiality

Subprinciple 5 – Guarantee reproducibility of TDM research

@Lawmakers

Ensure that a TDM exception allows the retention of copies for reproducibility and verifiability of TDM research 31


@Publishers and other content providers

Allow TDM practitioners to retain long-term copies of content used for TDM for the purpose of reproducibility and verifiability of the TDM research 32

Recap of barriers

Education

·      Uncertainty: poor awareness of potential benefits

·      Restrictive: lack of education in data management and legal/licensing issues

Skills

·      Fragmented: poor understanding of domain-specific needs; poor understanding in academia of knowledge transfer requirements; skills gap on both sides between domain experts and TDM experts

·      Restrictive: lack of access to skilled data analysts; high barrier to entry for use of complex tools; lack of skilled in-house experts

Knowledge access

·      Uncertainty: lack of knowledge around whom to consult with queries

·      Fragmented: best practices/techniques not shared across domains/sectors

·      Restrictive: access to skilled practitioners prohibitively expensive for academics

Legend for colour coding the stakeholders

Stakeholders’ action in Content creation

Stakeholders’ action in Content dissemination

Stakeholders’ action in Text and data mining

Stakeholders’ action in Knowledge utilisation

AWARENESS AND CLARITY

Subprinciple 1 – Ensure all citizens in society have awareness and basic understanding of the potential impact and value of (big) data

 @Educational institutions and governments

Establish (and mandate) training and courses in data literacy, “computational thinking” and fundamental ideas about data science early in the educational system, as well as in lifelong learning, to foster societies that are better able to recognise the potential value of data and data analysis

Subprinciple 2 – Ensure all researchers and industries – including those in traditionally less data-driven fields – have awareness and basic understanding of the potential uses and benefits of TDM (OR: “Sell the concept better”)

@FutureTDM

Provide evidence of the positive results of TDM, including the economic growth potential, to demonstrate the concrete value of TD


 @FutureTDM

Provide examples, best practices and use cases to sell the ‘TDM concept’, particularly in underrepresented fields, to ensure all sectors are able to benefit from the use of TDM


 @Universities, research organisations, library associations, medical community, businesses, and members of the content mining community

Advocate the benefits of content mining as trusted voices in their respective communities


 @Universities and higher education institutions

Integrate education about data management and analysis across all subjects, not just traditionally data-driven fields, to ensure all sectors can benefit from the use of TDM

Subprinciple 3 – Ensure everyone working with data has access to education, information and experts to consult about TDM, e.g. as regards legal issues, tools, algorithms, ethics, etc.

@Libraries

Serve as information hubs and provide training for researchers and citizen scientists in data, tools and TDM services, as well as advice on licences and legal concerns around TDM


@Governments

Establish central, comprehensive, accessible information resources for people interested in TDM, managed by an institution that can connect expertise on a national and international level, to make it simple for TDM practitioners to access comprehensive guidance and support

Subprinciple 4 – Raise awareness of the benefits that sharing research data can bring to researchers and to others

@Research libraries

Help researchers to develop skills in research data management (RDM) by:

• Explaining the benefits of sharing research data

• Sharing best practices around making research data discoverable and re-usable

• Building open science partnerships to demonstrate their value

Subprinciple 5 – Ensure users have realistic expectations of technical capabilities and limitations of TDM tools/services

@Developers/providers of TDM tools and services

Inform clients about what they can expect – and not expect – as output from TDM activities, so that clients can make informed decisions about how to employ the use of TDM. For example, if appropriate, this could be achieved by pursuing openness of algorithms used, to reveal the choices made in the analysis which influence the outcomes of TDM

TDM WITHOUT BOUNDARIES

Subprinciple 1 – Make it easy for existing and potential TDM practitioners to discuss and share skills and best practices across disciplines and sectors

 @Universities, and research libraries

Set up internal communication channels and platforms where TDM researchers and practitioners from different disciplines can discuss and share their skills and best practices, to foster cross-disciplinary knowledge transfer


 @Professional associations

Establish communication channels among TDM practitioners in different companies, disciplines and sectors of the economy, to foster cross-disciplinary knowledge transfer


  @Universities

Promote public-private partnerships (PPPs) in TDM research to bridge gaps in skills and needs between industry and academia and increase the impact of TDM research


  @TDM researchers and users in academia and industry

Participate in knowledge and skills exchanges between companies and/or researchers through conferences and seminars devoted to data mining, to facilitate sharing of best practices

Subprinciple 2 – Ensure education in TDM prepares practitioners for the diversity of TDM tools and applications

 @Educational institutions

Provide courses that prepare practitioners for a diverse TDM landscape by focussing on core principles and skills that can be applied to a variety of different tools in practice

 @Educational institutions, industry and researchers

Enter into dialogue so that courses in TDM prepare students for a diverse labour market

EQUITABLE ACCESS

Subprinciple 1 – Promote exchanges of skills and knowledge between companies and universities

 @Universities

Enhance knowledge transfer activities regarding text and data mining technologies, theories and other knowledge, e.g. through networking with TDM companies, carrying out contract research, participating in PPPs to develop technologies and knowledge, and improving the mobility of employees between universities and industries 33


 @Companies and universities

Initiate, support and collaborate on private learning initiatives that work at the interface of academia and industry 34

Subprinciple 2 – Ensure that minority groups are not disproportionately disadvantaged or discouraged from TDM activities

 @Universities and industry

Promote and raise visibility of underrepresented minorities in data analytics – e.g. as role models – who carry out interesting TDM projects


 @FutureTDM

Shared best practices of TDM should also promote the work of TDM practitioners from underrepresented minorities in the field

33 Cf. the different ways of knowledge transfer as identified by Finne, Håkon, Adrian Day, Andrea Piccaluga, André Spithoven, Patricia Walter, and Dorien Wellen, A Composite Indicator for Knowledge Transfer: Report from the European Commission’s Expert Group on Knowledge Transfer Indicators, 2011 href=”https://ec.europa.eu/research/innovation-union/pdf/kti-report-final.pdf, section 2.1. 34 An example of such an initiative is S2DS http://www.s2ds.org

Recap of barriers

Investment issues

·      Uncertain: lack of understanding TDM

·      Fragmented: organisational ‘data silos’ across different sectors and businesses

Limited use of potential

·      Uncertain: lack of understanding TDM

·      Fragmented: organisational ‘data silos’ across different sectors and businesses

Supply & demand

·      Uncertainty: companies are hesitant to invest largely in TDM activities that many not produce expected results

Decision making

·      Uncertain: how to relate TDM to data-based management | how TDM brings organisational value

Legend for colour coding the stakeholders

Stakeholders’ action in Content creation

Stakeholders’ action in Content dissemination

Stakeholders’ action in Text and data mining

Stakeholders’ action in Knowledge utilisation

AWARENESS AND CLARITY

Subprinciple 1 – Promote TDM and the value it can bring to businesses and organisations

@FutureTDM and government agencies 

• Disseminate knowledge on TDM use cases and their effects and utilisations in a business context

• Disseminate success stories to show how TDM can benefit companies


@Research funders 

• Promote research that showcases and studies the financial benefits of TDM for SMEs

• Promote research that develops state-of-the-art ways of calculating return on investments in TDM for companies


@ governments, professional organisation, advocacy groups, businesses and universities

Organise hackathons and similar events to promote TDM and create awareness of its opportunities 35

Subprinciple 2 – Promote a data-savvy culture so that all stakeholders are aware of the potential benefits of TDM

 @Businesses

• Make analytics-based decision-making part of an organisational culture

• Broaden analytical capabilities portfolio to cover all areas from descriptive, diagnostic, predictive and prescriptive analysis, to human input, decision and, finally, action


 @FutureTDM, academia and consulting firms

• Show specific ways of calculating the positive effects of introducing TDM in organisations

• Develop and promote a better understanding of the business opportunities enabled by TDM

• Identify, validate and promote sustainable business models based on TDM in different sectors

35 An example of such an event is the yearly EU Hackathon, see e.g. the 2016 version http://2016.euhackathon.eu

TDM WITHOUT BOUNDARIES

Subprinciple 1 – Promote sharing among data ‘silos’ of different businesses and sectors

@FutureTDM and government agencies 

• Disseminate knowledge on TDM use cases and their effects and utilisations in a business context

• Disseminate success stories to show how TDM can benefit companies


@Research funders 

• Promote research that showcases and studies the financial benefits of TDM for SMEs

• Promote research that develops state-of-the-art ways of calculating return on investments in TDM for companies


@Governments, professional organisation, advocacy groups, businesses and universities

Organise hackathons and similar events to promote TDM and create awareness of its opportunities 35

Subprinciple 2 – Promote a data-savvy culture so that all stakeholders are aware of the potential benefits of TDM

 @Businesses

• Make analytics-based decision-making part of an organisational culture

• Broaden analytical capabilities portfolio to cover all areas from descriptive, diagnostic, predictive and prescriptive analysis, to human input, decision and, finally, action


 @FutureTDM, academia and consulting firms

• Show specific ways of calculating the positive effects of introducing TDM in organisations

• Develop and promote a better understanding of the business opportunities enabled by TDM

• Identify, validate and promote sustainable business models based on TDM in different sectors

35 An example of such an event is the yearly EU Hackathon, see e.g. the 2016 version http://2016.euhackathon.eu

EQUITABLE ACCESS

Subprinciple 1 – Promote sharing among data ‘silos’ of different businesses and sectors

  @Research funders

Provide more funding for research that uses TDM, taking into consideration the necessary budget for all stages of the TDM value chain (including infrastructure, storage and scaling)

Subprinciple 2 – Dedicate more funding to companies creating value from TDM

@Governments

Set up platforms where TDM companies, academic researchers and capital investors can meet and discuss the advantages of TDM products and services, and articulate their potential value to funders

Subprinciple 3 – Provide more recognition and acknowledgement for TDM uptake

 @Universities, research organisations, research funders and businesses

Introduce incentives to reward those who are using TDM, e.g. by noting and commending TDM in evaluation of processes and proposals

Recap of barriers

Data(sets)

·      Fragmented: data heterogeneity

·      Restrictive: poor quality of data, annotations and metadata

Tools & Infrastructure

·      Uncertain: user unfriendly interfaces

·      Fragmented: architectural mismatches | incompatibility

Languages

·      Fragmented: lack of availability of language resources

·      Restrictive: availability of language resources

Documentation

·      Uncertain: vagueness

·      Fragmented: mismatch between documentation and tool versions

·      Restrictive: absence of documentation

Legend for colour coding the stakeholders

Stakeholders’ action in Content creation

Stakeholders’ action in Content dissemination

Stakeholders’ action in Text and data mining

Stakeholders’ action in Knowledge utilisation

AWARENESS AND CLARITY

Subprinciple 1 – Provide clear documentation and user manuals for TDM tools, technologies and datasets, for other (interoperability-seeking) developers

@Developers 

• Write clear and well-written specifications for TDM tools, to help others use them

• Keep documentation up-to-date and accessible from a single, easy-to-find access point


@Creators of datasets and metadata curators

Maintain clear and up-to-date specifications and guidance for the use of annotations and other metadata schema, to help content owners use consistent metadata


 @FutureTDM

Share best practices around documentation of TDM tools and methods

Subprinciple 2 – Data(sets) should be consistent and complete

@Data(base) producers

As much as possible, provide for ‘clean’ datasets – that is, datasets that minimise the amount of processing and normalisation necessary for TDM activities


@Developers, research institutions, libraries and their representing organisations

• Develop appropriate platforms for annotating, amending and normalising datasets, to help create interoperable and re-usable data

• Draft guidelines for annotations that are strict and clear, and implemented accordingly


@Developers and researchers

Publicly share metadata of datasets, open to re-use and correction, to make it easier to accurately identify and understand the contents and context of datasets

Subprinciple 3 – Minimise barriers to entry for the use of TDM by lay-users or those with limited computational skills

@Developers

Create user-friendly TDM tools, workflows and infrastructure, e.g. through user-friendly interfaces, for the benefit of users with limited computational skills


@Research infrastructures

Adopt standards for interoperability to link datasets from various sources, and provide access to these via open, user-friendly APIs

TDM WITHOUT BOUNDARIES

Subprinciple 1 – Encourage consistency in the use of standards 36

@Registries, repositories and industry

Standardise data formats, communications protocols and middleware used by different components of a system to improve interoperability and make it easier to connect and use data from a variety of sources

Subprinciple 2 – Use open standards

@Content creators and providers

Provide datasets in open standards, instead of proprietary standards, to ensure everyone wishing to use those datasets has access to the relevant standards


 @Research funders

Require publicly-funded research to use open standards for tools and dataset formatting, to ensure data is as accessible as possible

Subprinciple 3 – Make TDM stronger among all languages 36

 @Developers

Adjust and tune existing tools to support more European languages, to ensure as wide a range of users as possible has access to these tools


 @Governments

Promote or incentivise support for more European languages with funding, contests or other instruments that award these efforts made by developers

Subprinciple 4 – Ensure standards reflect the large variety of TDM tools and applications

@Developers

When working together on developing standards, take into account the variety of their users and applications, to ensure that standards are actually applicable to the breadth of the TDM landscape

Subprinciple 5 – Create a common infrastructure for all sciences

@EU government

Fund, enable, promote and/or initiate a common infrastructure where researchers can share, store, and access research outputs and data inter alia for TDM purposes. Make sure that current initiatives, such as OpenMinTeD 37 and the European Open Science Cloud 38, succeed

36 Standard setting in the context of (big) data technologies is also among the European Commission’s priorities. See European Commission, ‘Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions “ICT Standardisation Priorities for the Digital Single Market” COM(2016) 176 Final’, 2016 http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=15265, section 3.1. 37 http://openminted.eu. 38 http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud.

EQUITABLE ACCESS

Subprinciple 1 – Increase access to and enhancement of TDM tools by making them available under open source licences

 @Research funders

Require that TDM tools and technologies developed through publicly-funded research are made available under an open source licence, to maximise the value they offer to society