25 years ago, when Laurents Sesink was still a history student, his thesis on political internal relations included a lot of reading and tally marks. Back then he already thought “There must be a better way to do this”, so he built a database and started to get into informatics and digitisation. Now he is the head of the Centre for Digital Scholarship at the library of Leiden University. The centre kicked off in July 2016, and they are currently figuring out the best way to support research.
The Centre for Digital Scholarship at Leiden University is still very new. How did it start?
It was initiated by the executive board of the university, because they wanted scientists to be able to use new tools and technologies faster. There were already a lot of activities going on in the field of data management and open access in this university. That’s why the vision is to have a one-stop-shop, that works together and refers to the existing expertise spread out through the university. The centre is explicitly not meant to take over this expertise, but to connect it better.
What does the Centre for Digital Scholarship actually do?
We do different things. Firstly, we support researchers to make their publications open access and to find open access publications. Secondly, we support them with research data management, this also includes digital preservation of data. But the real fun and new thing we are doing, is that we support researchers with data science. Text and data mining is also part of this package.
Do other university libraries have a similar approach?
Mainly in the US and the UK I think. In The Netherlands we are the first. In most Dutch university libraries, the activities are part of the ‘research support’, so it does happen. But in Leiden we chose to cluster it, and to have dedicated staff.
Why is supporting researchers with data science so much fun?
Because it is new and because we are still figuring out how to organise it, what to focus on. That’s what makes it interesting. There are thousands of researchers in this university, and we only have 6 FTE in our centre. We hope to extend it to 12 FTE. But even then, we still have to make choices.
What kind of choices?
We cannot only support researchers and deliver results, we also have to train them to do for example text and data mining themselves. Currently we are setting this up through a Library Carpentry, to make sure that our own librarians become good conversation partners when it comes to data science. We are starting this up together with the National Library and the Vrije University. We are also starting with Data Carpentry, together with the Dutch Tech Centre for Life Sciences. Our own digital scholarship librarians are also doing a workshop, in order to be able to train others.
You told me before, that you are working closely together with scientists to find out what support they need. Can you tell a bit more about this?
We use the Agile project methodology, with elements of scrum. An example: one of the researchers we work with investigates Sino-Malaysian literature. We broke down the research into big steps. First we looked at the data, what is necessary to have them FAIR? In the second step we looked at the availability of the data. Books for example, should still be recognized as books. Then we went to the analyses: which analyses can this researcher do, which tools are available?
Sounds like fun to work this closely with researchers!
Absolutely, but it is also very labour intensive. So we did learn that we need a different approach, especially when it comes to TDM. In the future, we will break down our support into 3 levels:
- An introductory course on text and data mining, with examples. We also have to see if we can connect to something that already exists.
- Advise researchers who already have an understanding of TDM on the tools that are available, and how they can adapt them to their needs.
- Supporting a researcher who already knows exactly what he or she needs, but the existing tools are not sufficient, so they need more support. But we will only be able to offer this to a limited number of researchers.
We did learn a lot from working with researchers, that we could not have learned while sitting at our own desks. And they learn from us, so it is a nice interaction.
Do you mainly work with humanities researchers?
When it comes to text and data mining: yes. Well, humanities, law, a bit of social sciences. Different disciplines come to us with different questions. Life scientists are more used to developing tools on their own. They are mainly looking for support with open access and research data management.
What do you think is the biggest challenge for text and data mining?
To keep the overview of the work that has been done, the technologies, the results, the best software tools… There already is a lot and new things pop up all the time. It can be challenging to figure out what works best in each case. And then: if you want TDM to be taken up more widely, knowledge has to be improved. At some point, everybody should have the basic expertise, and then we can start training at the expert level.
What do you think is the role of libraries in the TDM landschape?
At the moment we mostly work demand-driven. We should not offer things if there is no demand. This demand can come from policy, or directly from scientists. But I think it is also important to look ahead, for example to see which technologies are already being developed, the trends that are occurring. That will make it easier to prepare the transition. There will be bumps in the road, and it is much easier to deal with them if you are better prepared. Libraries are traditionally more focused on the administrative side of things, but I think we can be more adaptive as well, without losing reliability of course. Here we can learn from research institutes.
Martine Oudenhoven, LIBER