Here we introduce a policy framework below that we highly recommend using for future TDM practices in Europe. It consists of a hierarchical model of principles and recommendations, where the recommendations are concerned with regulatory and policy, as well as recommendations for ‘interventions’ or actions by influential stakeholders. The recommendations are categorised in four areas in which we identified the barriers to uptake of TDM in the EU:
Policy Framework
- Legal Rules & Policies
- Education, Skills & Awareness
- Economy & Incentives
- Technical & Infrastructure
For each area, we first provide a short recap of the barriers that we have established. Next, the recommendations will follow, which aim to overcome these barriers. For this purpose, we have formulated three fundamental principles that stakeholders should take into account when addressing barriers:
Awareness & Clarity
The benefits of TDM should be known, and it should be clear what they are, and if they can, and how they can, be achieved.
TDM without boundaries
Within Europe, TDM should not be subject to different national rules, skills and knowledge should flow across domains and sectors, and standardisation and interoperability should be achieved for data, software and infrastructure.
Equitable access
There should be access to sources, tools, infrastructure and money to enable TDM in academia, industry and other types of organisations. However, legitimate opposing interests should be taken into account as well.
Every fundamental principle has resulted in subprinciples, that will guide more specific (groups of) stakeholders, followed by more concrete recommendations aimed at certain stakeholders. Please hoover with your mouse on footnotes or icons for additional information, where applicable.
Recap of barriers
Copyright & database law
· Uncertain: scope of exceptions often unclear
· Fragmented: different national scope of exceptions
· Restrictive: broad reproduction right and narrow exceptions
Data protection law
· Uncertain: how to comply
· Fragmented: different national interpretations, bodies and policies
· Restrictive: TDM of personal data potentially unlawful
Documentation
· Uncertain: not always clear whether TDM permitted
· Fragmented: TDM practitioners have to deal with many different licence conditions
· Restrictive: TDM often not permitted under non-OA licences
Legend for colour coding the stakeholders
Stakeholders’ action in Content creation
Stakeholders’ action in Content dissemination
Stakeholders’ action in Text and data mining
Stakeholders’ action in Knowledge utilisation
AWARENESS AND CLARITY
Subprinciple 1 – Make clear rules on copyright, database and data protection law
@Lawmaker
Minimise borderline cases where it is unclear whether copyright or related laws apply, by clearly defining terms and concepts (e.g. research organisation, text and data mining or scientific research purposes) in law-making 1
@Policy makers
Provide explanatory documentation to accompany relevant laws, that TDM practitioners can refer to for guidance 2
Subprinciple 2 – Create guidelines on the law: what is permitted and what not?
@FutureTDM
Provide practitioner guidelines on how the law works, what the rules are and how to best comply 3
@European Data Protection Board & national data protection authorities
• Provide general guidelines on TDM to help practitioners comply with data protection law and other laws relating to information privacy and confidentiality
• Offer certification of data research and/or self-regulation and/or codes of conduct concerning TDM research or activities dealing with personal data, to provide certainty for practitioners
• Create guidelines on what is to be understood as archiving purposes and historical, statistical and scientific purposes to help practitioners understand when these apply 4
@Professional associations representing ‘personal-data intensive’ companies and research institutes
Draft self-regulation or codes of conduct ensuring compliance, in particular with information privacy and data protection rules, for companies and research institutes to follow 5
@Research institutes & (start-up) companies
Implement ethical and legal review practices for large research projects dealing with big data 6
@Research libraries & librarians
Create a physical or virtual centre or platform for knowledge on TDM licences and rules for researchers interested in TDM to consult
@Research funders
When paying for researchers to publish their results open access, require that grant receivers use well-known OA licenses, attached to the relevant publication or dataset in human- and machine-readable metadata, to help harmonise licences in OA content. Where necessary provide guidelines to help researchers do this.
@All content creators and all content providers/ disseminators
Use clear licence statements in the metadata and in the text, preferably well-known ones and in machine-readable form, to improve accessibility and re-usability of content by TDM processes 7
TDM WITHOUT BOUNDARIES
Subprinciple 1 – Create harmonised and mandatory rules on copyright, database and data protection law
@European lawmaker
• Introduce a harmonised, mandatory exception to copyright for TDM activities, and minimise the leeway for deviating interpretations by national legislators and courts to ensure harmonisation in practice as well as in principle
• Ensure this TDM exception allows for the same exempted activities under both copyright and database law
• Ensure this TDM exception does not discriminate between commercial and non-commercial organisations or activities, as this distinction may not always be possible in practice
• Ensure this TDM exception applies only in cases where practitioners have lawful access to content, to respect the rights of content owners 17
• Ensure that TDM results can be disseminated while protecting the interests of other stakeholders
• Create a fully harmonised data protection regime and minimise deviating interpretations of the General Data Protection Regulation to ensure harmonisation in practice as well as in principle
Subprinciple 2 – Harmonise licences for TDM 18
@Content creators and providers
As much as possible, minimise variation among TDM licences, and minimise the number of different TDM licences used to reduce the complexity of re-using content for TDM (with deviation only for reasons of privacy, data protection, etc.)
@Research institutions
Where practical, avoid signing licenses that prohibit or restrict TDM, to avoid unnecessary restrictions on content
Subprinciple 3 – Harmonise interpretations of key principles in data protection law
@EDPB
Achieve consensus on key concepts such as personal data, historical, statistical and scientific purposes and archiving purposes to reduce uncertainty for TDM practitioners
Subprinciple 4 – Harmonise rights in research data 19
@Research funders
Align research data sharing and research publishing conditions required of research grant recipients to ensure that all public research in Europe is available under the same – TDM friendly – conditions
@European lawmaker
Discuss with Member States the benefits of mandatory rules that ensure that a researcher may publish publicly-funded research publications in open (university) repositories or scholarly communications networks for non-commercial purposes, with consideration for reasonable embargo periods 20
EQUITABLE ACCESS
Subprinciple 1 – Ensure legal rules reflect a fair balance between the interests of TDM practitioners and rightsholders (of copyright and database rights), and reaffirm that ideas and facts as such are not protected
@European lawmaker:
Introduce a mandatory copyright exception for TDM activities that:
• requires lawful access 21 by the user
• is not overridable by contract 22
• only permits technical measures that are necessary, reasonable and proportionate to guarantee the security and integrity of content providers’ infrastructure 23
Make sure that circumvention of technical protection measures (TPMs) and digital rights management (DRM) is permitted, 24 without harming the said security and integrity
@EU and national governments
In relation to the previous recommendation, provide guidelines and best practices on what is regarded to be reasonable and proportional in the context of (technical) measures to guarantee the security and integrity of the system 25
Subprinciple 2 – Ensure rules and policies on data protection reflect a fair balance between privacy interests of data subjects and value creation from TDM
@Lawmakers
Make sure that TDM research is allowed when individuals’ privacy interests are not severely affected – and make clear when that is the case – to avoid unnecessary restrictions on TDM activity
@European Data Protection Board & national data protection authorities
Provide for procedures that promote access to data for TDM research, while safeguarding the privacy interests of data subjects
Subprinciple 3 – Encourage researchers to freely share scientific publications and research outcomes, especially when they are publicly funded, so they can be mined without restrictions 26
@Research funders
• Encourage publications from publicly-funded research to be published under an Open Access licence, such as CC-BY 4.0 (and no previous versions, since they do not include sui generis database rights), to increase access to content for TDM activities 27
• Make sure that researchers have resources to publish in OA journals when (reasonable) article processing fees apply 28
• Where practical, mandate timely deposit of a peer-reviewed version of publications in an open repository, to provide access to research results for TDM activities
@Research institutions
• Develop OA policies, that take into account impact of research and the benefits of open access to research for TDM 29
• Provide researchers with sufficient information on OA licences, OA journals and repositories to make informed decisions about OA publishing
@Researchers
Where practical, make research outputs available in a publicly available repository under standard OA licences to improve accessibility for TDM
Subprinciple 4 – Encourage open access to underlying research data, especially in publicly-funded research 30
@Researchers
Require research projects to include a comprehensive data management plan, with emphasis on the value of open data approaches, that considers and implements safeguards that address conflicting interests related to privacy, data protection and confidentiality
Subprinciple 5 – Guarantee reproducibility of TDM research
@Lawmakers
Ensure that a TDM exception allows the retention of copies for reproducibility and verifiability of TDM research 31
@Publishers and other content providers
Allow TDM practitioners to retain long-term copies of content used for TDM for the purpose of reproducibility and verifiability of the TDM research 32
Recap of barriers
Education
· Uncertainty: poor awareness of potential benefits
· Restrictive: lack of education in data management and legal/licensing issues
Skills
· Fragmented: poor understanding of domain-specific needs; poor understanding in academia of knowledge transfer requirements; skills gap on both sides between domain experts and TDM experts
· Restrictive: lack of access to skilled data analysts; high barrier to entry for use of complex tools; lack of skilled in-house experts
Knowledge access
· Uncertainty: lack of knowledge around whom to consult with queries
· Fragmented: best practices/techniques not shared across domains/sectors
· Restrictive: access to skilled practitioners prohibitively expensive for academics
Legend for colour coding the stakeholders
Stakeholders’ action in Content creation
Stakeholders’ action in Content dissemination
Stakeholders’ action in Text and data mining
Stakeholders’ action in Knowledge utilisation
AWARENESS AND CLARITY
Subprinciple 1 – Ensure all citizens in society have awareness and basic understanding of the potential impact and value of (big) data
@Educational institutions and governments
Establish (and mandate) training and courses in data literacy, “computational thinking” and fundamental ideas about data science early in the educational system, as well as in lifelong learning, to foster societies that are better able to recognise the potential value of data and data analysis
Subprinciple 2 – Ensure all researchers and industries – including those in traditionally less data-driven fields – have awareness and basic understanding of the potential uses and benefits of TDM (OR: “Sell the concept better”)
@FutureTDM
Provide evidence of the positive results of TDM, including the economic growth potential, to demonstrate the concrete value of TD
@FutureTDM
Provide examples, best practices and use cases to sell the ‘TDM concept’, particularly in underrepresented fields, to ensure all sectors are able to benefit from the use of TDM
@Universities, research organisations, library associations, medical community, businesses, and members of the content mining community
Advocate the benefits of content mining as trusted voices in their respective communities
@Universities and higher education institutions
Integrate education about data management and analysis across all subjects, not just traditionally data-driven fields, to ensure all sectors can benefit from the use of TDM
Subprinciple 3 – Ensure everyone working with data has access to education, information and experts to consult about TDM, e.g. as regards legal issues, tools, algorithms, ethics, etc.
@Libraries
Serve as information hubs and provide training for researchers and citizen scientists in data, tools and TDM services, as well as advice on licences and legal concerns around TDM
@Governments
Establish central, comprehensive, accessible information resources for people interested in TDM, managed by an institution that can connect expertise on a national and international level, to make it simple for TDM practitioners to access comprehensive guidance and support
Subprinciple 4 – Raise awareness of the benefits that sharing research data can bring to researchers and to others
@Research libraries
Help researchers to develop skills in research data management (RDM) by:
• Explaining the benefits of sharing research data
• Sharing best practices around making research data discoverable and re-usable
• Building open science partnerships to demonstrate their value
Subprinciple 5 – Ensure users have realistic expectations of technical capabilities and limitations of TDM tools/services
@Developers/providers of TDM tools and services
Inform clients about what they can expect – and not expect – as output from TDM activities, so that clients can make informed decisions about how to employ the use of TDM. For example, if appropriate, this could be achieved by pursuing openness of algorithms used, to reveal the choices made in the analysis which influence the outcomes of TDM
TDM WITHOUT BOUNDARIES
Subprinciple 1 – Make it easy for existing and potential TDM practitioners to discuss and share skills and best practices across disciplines and sectors
@Universities, and research libraries
Set up internal communication channels and platforms where TDM researchers and practitioners from different disciplines can discuss and share their skills and best practices, to foster cross-disciplinary knowledge transfer
@Professional associations
Establish communication channels among TDM practitioners in different companies, disciplines and sectors of the economy, to foster cross-disciplinary knowledge transfer
@Universities
Promote public-private partnerships (PPPs) in TDM research to bridge gaps in skills and needs between industry and academia and increase the impact of TDM research
@TDM researchers and users in academia and industry
Participate in knowledge and skills exchanges between companies and/or researchers through conferences and seminars devoted to data mining, to facilitate sharing of best practices
Subprinciple 2 – Ensure education in TDM prepares practitioners for the diversity of TDM tools and applications
@Educational institutions
Provide courses that prepare practitioners for a diverse TDM landscape by focussing on core principles and skills that can be applied to a variety of different tools in practice
@Educational institutions, industry and researchers
Enter into dialogue so that courses in TDM prepare students for a diverse labour market
EQUITABLE ACCESS
Subprinciple 1 – Promote exchanges of skills and knowledge between companies and universities
@Universities
Enhance knowledge transfer activities regarding text and data mining technologies, theories and other knowledge, e.g. through networking with TDM companies, carrying out contract research, participating in PPPs to develop technologies and knowledge, and improving the mobility of employees between universities and industries 33
@Companies and universities
Initiate, support and collaborate on private learning initiatives that work at the interface of academia and industry 34
Subprinciple 2 – Ensure that minority groups are not disproportionately disadvantaged or discouraged from TDM activities
@Universities and industry
Promote and raise visibility of underrepresented minorities in data analytics – e.g. as role models – who carry out interesting TDM projects
@FutureTDM
Shared best practices of TDM should also promote the work of TDM practitioners from underrepresented minorities in the field
33 Cf. the different ways of knowledge transfer as identified by Finne, Håkon, Adrian Day, Andrea Piccaluga, André Spithoven, Patricia Walter, and Dorien Wellen, A Composite Indicator for Knowledge Transfer: Report from the European Commission’s Expert Group on Knowledge Transfer Indicators, 2011 href=”https://ec.europa.eu/research/innovation-union/pdf/kti-report-final.pdf, section 2.1. 34 An example of such an initiative is S2DS http://www.s2ds.org
Recap of barriers
Investment issues
· Uncertain: lack of understanding TDM
· Fragmented: organisational ‘data silos’ across different sectors and businesses
Limited use of potential
· Uncertain: lack of understanding TDM
· Fragmented: organisational ‘data silos’ across different sectors and businesses
Supply & demand
· Uncertainty: companies are hesitant to invest largely in TDM activities that many not produce expected results
Decision making
· Uncertain: how to relate TDM to data-based management | how TDM brings organisational value
Legend for colour coding the stakeholders
Stakeholders’ action in Content creation
Stakeholders’ action in Content dissemination
Stakeholders’ action in Text and data mining
Stakeholders’ action in Knowledge utilisation
AWARENESS AND CLARITY
Subprinciple 1 – Promote TDM and the value it can bring to businesses and organisations
@FutureTDM and government agencies
• Disseminate knowledge on TDM use cases and their effects and utilisations in a business context
• Disseminate success stories to show how TDM can benefit companies
@Research funders
• Promote research that showcases and studies the financial benefits of TDM for SMEs
• Promote research that develops state-of-the-art ways of calculating return on investments in TDM for companies
@ governments, professional organisation, advocacy groups, businesses and universities
Organise hackathons and similar events to promote TDM and create awareness of its opportunities 35
Subprinciple 2 – Promote a data-savvy culture so that all stakeholders are aware of the potential benefits of TDM
@Businesses
• Make analytics-based decision-making part of an organisational culture
• Broaden analytical capabilities portfolio to cover all areas from descriptive, diagnostic, predictive and prescriptive analysis, to human input, decision and, finally, action
@FutureTDM, academia and consulting firms
• Show specific ways of calculating the positive effects of introducing TDM in organisations
• Develop and promote a better understanding of the business opportunities enabled by TDM
• Identify, validate and promote sustainable business models based on TDM in different sectors
35 An example of such an event is the yearly EU Hackathon, see e.g. the 2016 version http://2016.euhackathon.eu
TDM WITHOUT BOUNDARIES
Subprinciple 1 – Promote sharing among data ‘silos’ of different businesses and sectors
@FutureTDM and government agencies
• Disseminate knowledge on TDM use cases and their effects and utilisations in a business context
• Disseminate success stories to show how TDM can benefit companies
@Research funders
• Promote research that showcases and studies the financial benefits of TDM for SMEs
• Promote research that develops state-of-the-art ways of calculating return on investments in TDM for companies
@Governments, professional organisation, advocacy groups, businesses and universities
Organise hackathons and similar events to promote TDM and create awareness of its opportunities 35
Subprinciple 2 – Promote a data-savvy culture so that all stakeholders are aware of the potential benefits of TDM
@Businesses
• Make analytics-based decision-making part of an organisational culture
• Broaden analytical capabilities portfolio to cover all areas from descriptive, diagnostic, predictive and prescriptive analysis, to human input, decision and, finally, action
@FutureTDM, academia and consulting firms
• Show specific ways of calculating the positive effects of introducing TDM in organisations
• Develop and promote a better understanding of the business opportunities enabled by TDM
• Identify, validate and promote sustainable business models based on TDM in different sectors
35 An example of such an event is the yearly EU Hackathon, see e.g. the 2016 version http://2016.euhackathon.eu
EQUITABLE ACCESS
Subprinciple 1 – Promote sharing among data ‘silos’ of different businesses and sectors
@Research funders
Provide more funding for research that uses TDM, taking into consideration the necessary budget for all stages of the TDM value chain (including infrastructure, storage and scaling)
Subprinciple 2 – Dedicate more funding to companies creating value from TDM
@Governments
Set up platforms where TDM companies, academic researchers and capital investors can meet and discuss the advantages of TDM products and services, and articulate their potential value to funders
Subprinciple 3 – Provide more recognition and acknowledgement for TDM uptake
@Universities, research organisations, research funders and businesses
Introduce incentives to reward those who are using TDM, e.g. by noting and commending TDM in evaluation of processes and proposals
Recap of barriers
Data(sets)
· Fragmented: data heterogeneity
· Restrictive: poor quality of data, annotations and metadata
Tools & Infrastructure
· Uncertain: user unfriendly interfaces
· Fragmented: architectural mismatches | incompatibility
Languages
· Fragmented: lack of availability of language resources
· Restrictive: availability of language resources
Documentation
· Uncertain: vagueness
· Fragmented: mismatch between documentation and tool versions
· Restrictive: absence of documentation
Legend for colour coding the stakeholders
Stakeholders’ action in Content creation
Stakeholders’ action in Content dissemination
Stakeholders’ action in Text and data mining
Stakeholders’ action in Knowledge utilisation
AWARENESS AND CLARITY
Subprinciple 1 – Provide clear documentation and user manuals for TDM tools, technologies and datasets, for other (interoperability-seeking) developers
@Developers
• Write clear and well-written specifications for TDM tools, to help others use them
• Keep documentation up-to-date and accessible from a single, easy-to-find access point
@Creators of datasets and metadata curators
Maintain clear and up-to-date specifications and guidance for the use of annotations and other metadata schema, to help content owners use consistent metadata
@FutureTDM
Share best practices around documentation of TDM tools and methods
Subprinciple 2 – Data(sets) should be consistent and complete
@Data(base) producers
As much as possible, provide for ‘clean’ datasets – that is, datasets that minimise the amount of processing and normalisation necessary for TDM activities
@Developers, research institutions, libraries and their representing organisations
• Develop appropriate platforms for annotating, amending and normalising datasets, to help create interoperable and re-usable data
• Draft guidelines for annotations that are strict and clear, and implemented accordingly
@Developers and researchers
Publicly share metadata of datasets, open to re-use and correction, to make it easier to accurately identify and understand the contents and context of datasets
Subprinciple 3 – Minimise barriers to entry for the use of TDM by lay-users or those with limited computational skills
@Developers
Create user-friendly TDM tools, workflows and infrastructure, e.g. through user-friendly interfaces, for the benefit of users with limited computational skills
@Research infrastructures
Adopt standards for interoperability to link datasets from various sources, and provide access to these via open, user-friendly APIs
TDM WITHOUT BOUNDARIES
Subprinciple 1 – Encourage consistency in the use of standards 36
@Registries, repositories and industry
Standardise data formats, communications protocols and middleware used by different components of a system to improve interoperability and make it easier to connect and use data from a variety of sources
Subprinciple 2 – Use open standards
@Content creators and providers
Provide datasets in open standards, instead of proprietary standards, to ensure everyone wishing to use those datasets has access to the relevant standards
@Research funders
Require publicly-funded research to use open standards for tools and dataset formatting, to ensure data is as accessible as possible
Subprinciple 3 – Make TDM stronger among all languages 36
@Developers
Adjust and tune existing tools to support more European languages, to ensure as wide a range of users as possible has access to these tools
@Governments
Promote or incentivise support for more European languages with funding, contests or other instruments that award these efforts made by developers
Subprinciple 4 – Ensure standards reflect the large variety of TDM tools and applications
@Developers
When working together on developing standards, take into account the variety of their users and applications, to ensure that standards are actually applicable to the breadth of the TDM landscape
Subprinciple 5 – Create a common infrastructure for all sciences
@EU government
Fund, enable, promote and/or initiate a common infrastructure where researchers can share, store, and access research outputs and data inter alia for TDM purposes. Make sure that current initiatives, such as OpenMinTeD 37 and the European Open Science Cloud 38, succeed
36 Standard setting in the context of (big) data technologies is also among the European Commission’s priorities. See European Commission, ‘Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions “ICT Standardisation Priorities for the Digital Single Market” COM(2016) 176 Final’, 2016 http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=15265, section 3.1. 37 http://openminted.eu. 38 http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud.
EQUITABLE ACCESS
Subprinciple 1 – Increase access to and enhancement of TDM tools by making them available under open source licences
@Research funders
Require that TDM tools and technologies developed through publicly-funded research are made available under an open source licence, to maximise the value they offer to society