FEED Issue 08

42 GENIUS INTERVIEW Stephen McConnachie, BFI

non-proprietary codec and it’s lossless and compressed, so much more affordable for your huge collection. The container is Matroska – with the file extension mkv. This is a new, revolutionary thing in our world, using FFV1 and Matroska as an open, non-proprietary solution built by a community of archivists. We’ll aim to preserve all of the video digitisation to that new FFV1-Matroska combination. FEED: With that amount of material, how do identify a title, catalog it, assign metadata? Are you looking at using AI or other technology around that? SM: For the film project, the 10,000 films we digitised were published on BFI Player. We had a clear understanding of what metadata we had to create to make it publicly searchable and comprehensible. That was baked into the project plan. As part of the budget, we made sure we resourced the cataloguing, and we went really deep on that. We went so far as to do geographic location cataloguing of latitude and longitude for each film, building a map we called ‘Britain on Film’. On BFI Player, you can go to the Britain on Film map of the UK and zoom in to your town and watch films tagged with that latitude and longitude. We wanted to give back to UK viewers and say, "here are films from where you grew up, or where you were born, or where you live now, or where your grandparents lived". However, the video digitisation project, because we’re scaling up massively to try to digitise 100,000 titles, is a much bigger challenge. Metadata is definitely a more difficult and complex challenge. So we are looking at the potential of AI. It won’t solve the problem, but it’s potentially a useful methodology to give us more metadata than we could ever achieve by manual, human cataloguing. Speech to text is a good example of what we hope to explore. A lot of the videotape will be from broadcast, so it will have not only spoken dialogue, but narration and commentary. And there’s object recognition and face detection. We hope to explore those machine learning potentials to see what metadata we can generate without having to undertake manual, human cataloguing. It’s a massive challenge for this project, because the scale of it is so large. FEED: What storage infrastructure do you have to handle all this video data? SM: Part of the five-year project to digitise film included building a preservation solution to store the zeros and ones. That

was really a mammoth undertaking. We didn’t do a standard procurement where you write down your requirements in huge detail and someone sells you the equipment. Instead, we wrote down our objectives, our strategic aims, and procured a system integrator, Ovation Data. Their job was to make recommendations of all the system components to build our infrastructure, and that meant networking. We chose a 10Gbps network – you really need that when you’re moving these big files around all day every day. And they recommended a data tape storage solution. We went with Spectra Logic. We deployed two Spectra Logic tape libraries at opposite sides of our conservation centre in Hertfordshire. We stocked one with LTO-6 tape - popular with archives because of its open architecture, and its road map is documented. The other library we stocked with IBM tapes, which are much denser. With LTO-6, we get about 2.5 terabytes per tape. With IBM, about 8.5 terabytes per tape. The other thing we procured from Spectra Logic was BlackPearl. BlackPearl is Spectra Logic’s gateway to the data tape libraries. The reason BlackPearl is revolutionary is it’s a big disk cache, a lot of storage on disk but with a REST API interface, and that means you can get that BlackPearl data over REST API commands over the network. BlackPearl also has the intelligence to store your data into both tape libraries, maybe across multiple tapes because the files are so big: it becomes an intelligence layer in front of your data tape libraries and manages the complexity. Spectra Logic publishes SDKs for BlackPearl in Python and other languages, so we can write Python applications that integrate with it, liberating us to be much more in control of our own digital preservation destiny. SM: Yes, we built and deployed it about three years ago and since then it’s been filling up with data from collections held by the BFI National Archive. By last count we’re just over three petabytes of content stored. And that means three petabytes times two, because we have a clone – so six petabytes in total. But it grows every day. We have an automated ingest pipeline. We record television off air – 17 channels of UK television – automatically, which flows into the preservation system. FEED: Why didn’t you go with a cloud solution? FEED: And that’s all being used now?

non-broadcast. It’s also artist moving image and workshop collective filmmaking. FEED: What are the big challenges for the television archiving project? SM: There are huge challenges, firstly because there are no more manufacturers of the equipment for videotape playback. My colleagues in the conservation teams have had to create an archive of videotape machinery to play these tapes in order to digitise them. The file standards for videotape digitisation are different, too. We don’t use DPX sequences because you don’t capture a frame of video in quite the same way. But there are exciting developments in this field. There’s a new moving image preservation file format called FFV1 (FastForward Video Codec 1), which comes from the FFmpeg project to create a set of open, non- proprietary tools to use in video processing and preservation. Every archive and moving image organisation uses FFmpeg – it’s a FOR FILMARCHIVES, THATYOUDOCUMENT THEFILMTOTHEBEST QUALITYANDHIGHEST RESOLUTIONYOUCAN ACHIEVEATTHETIME IS AREALLY IMPORTANT PRINCIPLE

feedzine feed.zine feedmagazine.tv

Powered by