In 2016, our interviewee, Gordon Dunsire received the IFLA medal “For distinguished service to IFLA and international librarianship, advancing the field of bibliographic data, linked data and the Semantic Web.” BDSLIfe went to visit him at his home in Edinburgh to find out more about his life and work…
Information, Catalogues and the Future
John Hudson: Can you tell us something about your background and how you became interested in cataloguing?
Gordon Dunsire: Well, it’s all accidental. Which is a thread which is going to run through the whole conversation. While I was at university, I got interested in information science. But even before I went to university, I came across a book in Foyles in London on information science and technology. This was ’69 so it was a relatively new idea. I kept it in my head. I did physics and maths at university and towards the end of my final year my thoughts started to drift as to what I could actually do, and I knew somebody who worked in the library. She was going to be a librarian. So, I wanted to do an MSc in information science at Sheffield – at the time it was all brand new. But they advised in their course pack that you work in a library for a year. That was good advice. So, I heard of a job going at Napier College as a library assistant. The interview was going so so until they until they asked, “Why are you interested in working in a library?” I told them and everything suddenly changed.
Napier was a small library school at the time. Rennie McElroy was the tutor and he ran evening courses for the external professional exams for the LA (Library Association). They were hoping to get an employee to take the course so they could get proper feedback about it. They said, “if we offered you the job would you take a day off a week to take a course and do what you want to do?” I said, “Okay, sounds like a deal.”
So I joined the library as a library assistant. It was a tiny place with about 6 staff. On the third day, I was shown the cataloguing department which was a back room with a backlog, a manual typewriter, a stack of pink cards and a stack of white cards and nobody in it. Rennie demonstrated the cataloguing process which was something like “I’ll put a 1 against the title, a 2 against the SOR, 3 against the principle author and you just type it and transpose it at the same time”. I started doing this and I quite enjoyed it. I particularly enjoyed not having to interact with the customers out there, which I think is true of cataloguers. Most cataloguers prefer to work with the material and the machines than with humans.
What excited me was the intellectual challenge. I outpaced Rennie’s mark-up fairly quickly and realised that there was a kind of game going on where you have a variety of material brought and you got things with no titles or things with stupid titles and you had to think about what you were doing. There were no rules within that particular establishment. But things started to automate, and I have lived through the whole history of automation in cataloguing.
I started with a manual typewriter but after about six months the librarian asked me if I could use an electric typewriter. I said fine. All the secretaries were getting them in the university. So, I taught myself to touch type during the summer and my productivity zipped up, especially as I no longer had to correct errors with Tippex but could use the backspace. From there I went through a whole series of automation devices. I persuaded the college to buy a NorthStar microcomputer for the library to do non-standard cataloguing jobs. We were members of the Scottish Libraries Automation Project and we were getting trained in MARC, UKMARC it was at the time, and AACR2 had just been published. That was with the great Ruth Hope, so I spent a week there at the NLS and came back and carried on what I was doing. We then got a work experience team to retro-convert the catalogue, type the ISBNs in to the British Library’s local cataloguing supply, Lynne Brindley [eventually Chief Librarian at the British Library] was our contact, and so it just kept going. The standards kept coming in, the college was expanding, the level of materials was going up and up to undergraduate and then post grad levels, so the whole thing just got more challenging and kept my interest up.
I had abandoned thoughts of information science at this point. I did the Scottish Certificate in Information Science on day-release and that’s where the book I co-authored, Linked Data and Bibliographic Information, started with me. We went to Boston Spa for three days. It was quite inspirational. The thing about Boston Spa was that it had a copy of every journal. It was designed for that purpose, to put them under one roof. That meant that you could do a citation search in situ; whereas back at the college a lecturer who was doing research would come in, put in a ILL submission, it would come in three days later; two or three days later, they would come in and say now I want to trace such and such and this went off and come back days later and it went on and on, typically in weeks. At Boston Spa you could do it all in hours. This was great stuff. I saw the power of mass, data in one place and properly catalogued and indexed.
On the very last day they said, “We want to show you something. It may not work but we are part of a local area network with Hull Uni and the Uni of York and Hull is connected via an undersea cable to Trondheim and Trondheim is connected to the trans-Atlantic undersea cable which comes out in New York and joins another local area network which includes the National Library of Medicine which has MedLine Online. So, we are going to try and do a MedLine search through six connections. Usually, it doesn’t work first time, something breaks or loses connection”. We only had half an hour before we left, but that day I saw the Internet about three months before it was invented. This would have been, what year, August ’75, something like that? I didn’t realise any of this at the time, but I thought, “Wait a minute, the kind of technical aspects I am interested in are beginning to make an impact on what I do, what I am intellectually interested in”. That’s really where it all started to mesh. I carried on with the cataloguing, got promoted and then had the opportunity to move to Strathclyde to the Centre for Digital Library Research.
ORIGINS OF LINKING
JH: It’s interesting, because that experience of pulling data or information from the other side of the Atlantic, we take for granted now, it just spins around the globe. I remember, the first satellite images of Calais coming in, it was phenomenal. It was another part of the world.
GD: I watched all the BBC things. I was really interested in this, passionately interested. I watched the first live transmission from Paris to London the BBC did back in ‘66 or ‘67. I watched the first transatlantic live satellite communication on the BBC. I watched the landing on the moon. I remember one of the most stunning images I ever saw was taken by a lunar orbiter of one of the craters on the moon, and I remember thinking to myself, “This is science fiction. I mean, here’s a photograph taken from a spacecraft that’s orbiting the moon. No person has ever seen this before”. It was marvellous that this could happen. And you know, it just was the other day we got images of Asteroids in the Kuiper Belt while Voyager 2 is 300 billion miles out now and still going. This is about being human. It’s what humans do with their heads. They create machines that do more of the stuff that they can do.
JH: I want to bring you back to something. You refer to the transition from manual cataloguing, the writing on the cards to automated cataloguing and we get terms like MARC, for example. But what is actually meant by automated cataloguing? What is that doing? I mean where does the process start? Does a machine start from the very origin?
GD: It could do. One of the interesting and somewhat disappointing things, and part of the work I’m doing with RDA just now, is the interaction between the catalogue itself as some kind of information retrieval engine. Even a bunch of cards in the card catalogue is organised as an information retrieval engine. So, automation means automating all aspects of that from the data capture through to the organisation of the data and eventually its use by other human beings. This is not an AI, this is just pure automation. There is no machine intelligence involved in this.
It’s like using tools to improve the work that you do, the output that you have. It improves it in quantity, certainly, I’ve alluded to productivity going up, but it also improves in quality. The machines can do spell checking, they can compare headings and say these two things are different and they can do all this very fast and to a certain extent you can programme them where you think things are safe; where obvious typos exist, you can let the machine make the decisions and change the data. Increasingly, machines are being used for the preparation of the original information resource.
Most things start off with an e-text. The data is already in the machine-readable format, so you can do two things with that by machine. You can do keyword extraction which is the entire basis of Google: keyword extraction and the ability to measure the proximity of keywords. That’s why Google wants to see us typing enquiries because we put keywords one after the other in certain order and then they can use pattern matching. Then there’s also the use of pattern recognition within the data itself.
One of the things we’ve done in RDA is answer the question which lots of non-cataloguers working in libraries ask: “Why is our cataloguer still typing with two fingers the title of this book is”? It’s insane. When the book has already got machine-readable titling with the layout of the title page, which is how humans know that it’s the intended title of the book. It’s an issue of simple pattern recognition so you get machines to do that as well.
And so, I see machines being used in those two ways: for the resources themselves through mass-indexing and, secondly, by extracting more structured data from structured sources within the data itself. Descriptive cataloguing will vanish, it’s really not that important, machines can do a lot of it. In most instances, the actual resource is available anyway, so why spend too much time saying that there’s an introduction starting on page five when you can go to page five?
So that’s the boundary where we go before you get AI techniques. Machines are very, very good at doing this.
JH: So, once machines are really involved and once you’ve got virtually unlimited storage and certainly from the point of view of the texts of books in the world, it doesn’t actually occupy that many terabytes compared to what is available, the idea of describing something and then having its equivalent somewhere in full is quite pointless. You might as well just have the thing involved and extract things from it as and when you need it.
FIVE INFORMATION AGES
GD: Exactly. You know, traditional classical cataloguing has as part of its function to act as a surrogate for the collection itself, this is all classical stuff. You can arrange certain catalogue cards in the same order as the books on the shelf, which means that when books have been taken out on loan, the catalogue cards are a reminder that these books exist and where they should be. That is no longer necessary and that’s completely subsumed by machines.
Where it gets interesting is going beyond that, onto a new way of looking at the world and, in particular, a new way of looking at information and how it is stored and disseminated. I split it into five information ages.
The first is cave paintings and representative art and to a certain extent abstract art. Obviously, the stuff has mostly vanished as this goes way back in time, but basically the stuff is not reproducible. So, everything is a one-off and it’s generally not portable. The consumers have to go to the place rather than the place come to the consumer. The reason for this is memory. Preliterate oral cultures can only get so far in the transmission of information from generation to generation. The survival mechanism here is memory. The aim is to remember things that will go when you die, so that descendants can then get that information, situation, flavour, sensation back. This is the first Information Age; paintings in caves etc.
Then there’s the invention of writing. This turns representation into symbolism. This makes it much more flexible to record things in more detail and I believe that starts from commercial pressure from trade. You exchange tokens. You keep the tokens wrapped in a cylinder, clay cylinder, both of you have the same, you break it when you come back to complete the transaction to prove that you sent three sheep not two sheep.
And then of course, you’ve got hundreds of these damn things in your bag as you go around trading. You have to put marks on the outside, they say that this is the one I break when I meet John. Then why can’t the marks say what’s inside? Three counters for three sheep can be represented by a sheep face three times. We don’t need the counters. This is the invention of writing.
Writing allows information to be recorded. Again, this is the pressure to record information, but it not only makes it more flexible, it makes it portable. The information can be carried with you in scrolls etc., very portable objects. You can take information around, to the consumer.
The third information age is the mechanisation of that process with the invention of printing. When suddenly, you’re doing the same thing but now you can make many, many copies. This means that you can leave a copy when you travel with somebody else or more than one of you can take copies and disperse them, disseminate them and so on.
The fourth information age starts roughly with Marconi’s radio transmission and goes up until about 1995 or ‘96. These ages are going from tens of thousands of years progressively, logarithmically, shorter and shorter, to decades.
The fifth information age is global telecommunications and the ability not to transport a printed book from here to China in three months, but actually transporting the printed book from here to China in two seconds. And that changes everything. It’s highly disruptive technology; it’s globalisation, the kind of thing that is going on in the world right now. This information age is still in its infancy but everything is going to be completely immersive. We’re not quite there yet but we’re going to be there real soon.
We’re all going to be wearing headsets or other devices that listen to what we say and know what we say and transcribe it. They’ll be beaming virtual reality at our eyeballs so that we will be interacting both with the real world and a simulation of the real world. Machines will do all of this automatically and we won’t even think about it. The machines will do eyeball tracking so machines know what you’re looking at, what you’re interested in and it will be able to use ambient sounds, patterns, everything becomes totally immersive, which is what I believe it’s like for children at the moment. The use of information is interactive constantly, 24/7 interactive.
So, you know, to like something or dislike it, thumbs up or thumbs down, for example, is a very, very crude mechanism for judgement, which is going to get a lot more sophisticated with machine interfaces and stuff.
IMMERSION AND METADATA
JH: So, in this ocean of information, we can virtually be anywhere we want to at any one point, pulling anything out from that ocean of information and not even knowing, in some instances, that we wanted that information because we have been predicted via the machinery that is working around us.
GD: Exactly. It’s absolutely totally going to happen. The question then, for our profession is, what’s the impact? I usually try and describe this by looking at Edinburgh. You are walking around, the machine knows that you’re a tourist, and you’ve told the machine that you’re interested in literary stuff. You pass a plaque saying, “Burns stayed here”. As you pass it, the machine is programmed to advertise this to you but a good machine will be able to tell from the slight crease of the brow that you’re not interested in that right now or an increase in heartbeat that you are very interested right now and subtly turn it off or on.
What I think catalogues will end up doing, I hope, is being a support for this kind of immersive interactive world where much of the data maybe fake. There’s nothing we can do about that; people will say what they want and they will lie to you and steal and cheat and all that other stuff. But cataloguers have always seen their profession as a noble art, we tell the truth. I don’t hold to that particularly, but I do see a role for curated metadata in this fifth information age as a serious role. I actually believe that governments in the West are about to pour untold quantities of money into this. Is the library profession even thinking along these lines? Of course, it’s not.
I think one of the ways forward here is more linked data, the stuff that we’re doing with the semantic web. I saw a glimpse of this on this programme that’s on the BBC at the moment, the Great Wall of China. This pulls together a lot of embryonic technology. What they did was they flew drones over the entire length of the Great Wall of China, and stitched all the film together, so it’s a 90 minute or two-hour film just flying along the Great Wall of China. Beautiful. Beautiful. There is no sound of any sort. You have this serene thing going on and it has captions come up. One of the captions said, “The Drone is now approaching a section that was seen by an English traveller in 1890 who produced an image which has become the most well-known image of the Great Wall of China.” I thought, “Well, why aren’t you showing it to us?” And then of course it popped up.
But if it was an interactive thing, then it should tell by my skin resistance that as it said something like, “This length of the Wall is where the Mongols invaded” it should pop up a whole ton of information or offer it to me. Where is that information going to come from? How’s it going to get linked? It is cataloguers who will produce data that allow a geolocation, the drone knows where it is, therefore the film knows where you’re looking at, that is then tied up to that location in history, which then gets tied up to the Mongols and then you’re off. The resources are pulled in from libraries around the world, image banks, whatever, and you follow your nose.
Follow your nose is one of the new paradigms, kids do this. They’re not interested in structured information, they just want to say, “Oh that, now that’s interesting, that’s interesting and that’s interesting”. And of course, they’re naive in their trustworthiness because they are children. But I think, again, that’s where we the cataloguers can step in and say, well, there’s a trust to be had in this metadata as opposed to, say, data that is simply trying to sell a bag of sweets to kids.
FOLLOW YOUR NOSE
JH: It’s interesting because I think there’s an element of that with the creative process, for example in poetry, insofar as if you don’t follow your nose you become very stilted and forced.
GD: Yes. And this analogy actually works on the job. You know, I’ll be doing something in the catalogue, and I will come across something that needs fixed. Okay, I’ll just fix it. But I’m remembering where I started and I’ll come back to that. And the same thing in authority control, where you’re doing something and you notice that cross-reference is wrong and you follow that and then, wait a minute, that heading is wrong or these two things are the same thing, so you merge the things together.
This is very difficult to articulate but it seems to me that natural cataloguers have always been doing this and the machine’s going to allow us to do it much more quickly and much more comprehensively than in the past.
JH: In some ways this is the way the human brain works.
GD: This is exactly what I’m trying to say.
JH: We’re mirroring our brains but on a global scale.
THE UNIVERSE AS INFORMATION
GD: There’s some wacky physical theories to support this. John Barrow, the anthropomorphic cosmic principle, you know, why are we here? Why do we think? Well, it’s because the universe is arranged just so. It’s a just so story. I don’t particularly agree with it, but bits of me do. Information and energy, I’m talking physics now, they are intimately tied up and there are some good solid theories to suggest that if the universe is expanding into infinity, then what’s eventually going to happen is entropy will remove all of the information in the universe. That means I know exactly where everything is. That I’m here and there is an atom there and there’s an atom there, that’s it, that’s all the information in the universe. There’s an atom every three cubic meters or something. The universe is information. The question is where does that information come from? Because like energy, it cannot be created from nothing, it can only be transformed. The total amount of information in the universe has to come from The Big Bang and that’s a very strange sort of idea. But these are the cutting-edge ideas in physics at the moment.
Another way of looking at this fifth information age theory is that lots of technology is assumed to be extensions of the human body. You build machines that can lift things more than you can. See further, run faster, this kind of thing. Only now, just now in the fifth information age is technology actually extending the basic function of the brain it seems to me. So you end up in this immersive thing where the input to your brain and the outputs from your brain are all inextricably linked in some way. You know again, this enters psychology theories, why can we think? Is this proof that there’s some kind of external order to the sensorium? If there wasn’t, no regular machine or organ could develop. There must be order to the world in other words, an order I believe is governed by physical laws. Which again, in turn seem to be based solidly somewhere in information, in concepts of information. So, to sum up, what we’re doing as cataloguers or metadata engineers is enabling the pathways in the extended brain which function roughly similar to the way our own heads function.
JH: And these paths are continually evolving?
GD: Absolutely and you know again in the fifth age this comes totally naturally. Everything you do or say can be recorded. We’re only at, I don’t know, a 1% or 2% level at the moment for most people but I have every reason to suspect that that’s going to climb to 90 plus percent fairly rapidly. The technologies now exist to do this and it seems there are powerful commercial pressures to do it because of advertising and other business models. So, I think it’s almost inevitable that it’s going to happen, but it’s going to be completely immersive. I like thinking about it like quantum mechanics. The situation, it seems to me, is that as soon as you use a piece of information, you change that piece of information. So, the mere fact the computer knows you’ve noticed something is adding another piece of information about you and about the thing that you’ve noticed and the interaction itself becomes another thing, another piece of information.
Now, people get freaked out by scaling when they try to imagine the scaling of this. Doesn’t freak me out at all because I think on cosmic scales anyway. I can easily imagine this happening. To a point, in fact, where most of the work that the machine is doing is preventing information getting to you. Not enabling it but preventing it because otherwise you’d go crazy.
THE NEED FOR FILTERING
JH: Our biological systems that can’t support that kind of activity.
GD: There’s been this little hoo-ha about LSD in The Guardian recently, because they’ve done some testing to suggest that it prevents filtering mechanisms that are occurring in the hypothalamus. And those mechanisms were absolutely essential for you not to be overwhelmed by experience every second of the day. Similar mechanisms have to exist outside in this bigger brain we’re talking about. But that bigger brain, of course, is subject to much greater interference and things than the inner brain. So filtering is going to become the most important thing it seems to me. So, the idea of Authority.
Authority is going to become very, very important. “Who said this? If it’s Donald Trump, I’m not interested”, kind of thing. This itself could turn into some kind of feedback loop, you know, so for example, the trust rating for the government of this country starts to dip and the Civil Service notices this and asks what to do to restore confidence. I see this as being highly interactive and driven by the mass, by the crowd. The crowd is never wrong.
The provenance of the data – and this is built into RDA – the provenance of the data is becoming very, very important in the future. That’s why I think Western governments are about to invest unimaginable quantities of money at resolving these problems and we, the cataloguers, should be sitting there saying, give us one per cent. One percent so that we can transform the world. But I don’t think the profession is really thinking along these lines. It doesn’t recognise what it does. I’m talking about cataloguing in particular. RDA may change that. We’re hoping it will shift people’s attitude and opinion because Authority in all its forms becomes very important.
NOTIONS OF AUTHORITY
JH: When you say all its forms, can you unpack those forms?
GD: Yes. The traditional way of doing Authority in libraries is to pick one label and say, “This is the Authority for” and “We picked it because it’s unique”. Often the label is not natural, it’s been artificially constructed by the cataloguer but its quality is that it’s unique in some kind of context. When you go to the linked data, none of this is required because with linked data, your URI is your unique indicator and that means for data management, you don’t have to have Authority forms. But Authority forms are very useful for browsing and for human display, displaying the URI is useless.
You need some form of label to display and it might as well be a common label, which is another word for an authoritative form, but you can switch labels. You can have labels aimed at kids, you can have labels that don’t include people’s dates of birth, all this kind of thing can be mediated by a machine. The idea of authority control for identity control is dead in the water now, right. Identity control, identity management has shifted to the problem of URIs. That’s already happened in forward thinking. You can have multiple URIs for the same person so that is where that work now lies. It’s determining that these are in fact the same person or the same book.
No machine can do that. There’s still a place for highly trained professionals to make the assessment to look for the clues. This is what you do in Authority control anyway, but it has a different purpose, which is, it’s really about identity. And bringing the linking data together about the same thing so that’s one form of Authority.
The second form is to use a common form, so its Authority is commonly agreed form of name for display purposes or to refer to this thing in this immersive world.
And then there’s kind of self-authority, that is something that’s beginning to creep in. This is an ethical thing. I don’t actually like somebody determining what my name is and the Authority control system wants me to determine it. So, how can I do that? There’s all sorts of ethical issues that have started to come up over this. If I change sex for example, and I want to change my given name from Gordon to Gordana, then how do I go about this if my Authority is then clashing with the authority given by the librarian?
And the other form of Authority is provenance, the normal use of the word Authority. It’s not top down, it’s not a Bible to answer the question. But we haven’t abandoned the idea of a handed down Authority from on high in favour of you know, anybody can say anything kind of thing. It’s mediated, it occupies the middle ground. National libraries are generally regarded as authoritative in these areas. But we know from VIAF (Virtual International Authority File), for example, that each National Library has its own different Authority form or somebody’s name and VIAF is an identity management system that says these different forms actually refer to the same person.
All meanings of Authority are wrapped up with something like VIAF but Authority in the future is going to be a filtering mechanism and I see this being a very, very dynamic thing. I’ve already alluded to this with governments. What about libraries then? How does the National Library of Scotland remain or retain or improve its position in the midst of all of this stuff? They need to look at this, and other National Libraries are looking at all of this, the British Library is looking at it very much so are the DNB, BNF. They have to take data in from other sources, they cannot put their own provenance on it, so they have to manage Authority within their institutions. Authority comes in, gets guddled around and it goes out again and they have to be very careful that what comes in doesn’t contaminate what goes out and so the second big idea after identity management is Authority management. The best way of Authority management is to do it all yourself; I did this I did this but… I need to be able to say John did this and then my Authority passes to what you did, and it can chain like this. Blockchain by the way is another emerging technology to handle this. Blockchain technologies prevent outside interference in the data. This is a means for guaranteeing machine readable Authority within these new emerging systems.
I can see a role for National Libraries and indeed organizations like BDS here, where they act as mediators. There’s lots of different information coming in and not all of it can be curated properly. Some of it has to be accepted as is. You know what the sources is, you are careful to retain the source or provenance. It gets rebranded as your data, it all ends up in the main catalogue of the British Library and the British Library now has a problem on its hands for it publishes data that it had to put in without curation. So, data has to be much, much more granular. Every piece of data carries with it its own history.
JH: So, we can end up with huge quantities of data and machine interpretations of that. I mean when I say huge quantities, I mean like we’ve never seen before. And we need the computing power also, to deliver a grasp of that quickly.
GD: It’s not going to happen if it can’t happen virtually instantaneously.
JH: Yes. But the science fiction image of the machine effectively holding a conversation and surprising you. It’s not going to happen?
GD: I can imagine conversations with machines being a surprise. But only because you go, “Ah, of course. Yes. The machine knew there’s a connection between the Great Wall of China and Mongol invasion, well that was surprising.”
And humans are always able, this is like one of the killer comments I’ve got about AI, humans are always able to make a joke out of it, they play on words. Humans use words as a game and so, no matter what algorithm or rule you come up with, humans are going to thwart it. So, the machine itself can’t keep up.
JH: You mean it’s going to be beaten by a pun.
GD: Yes, beaten by a pun because new puns evolve all the time. I think that’s the mark of true intelligence, even children recognize puns, it’s built in, right? It’s built in. I had many fascinating conversations with a friend of mine when I was cataloguing cards way back then and he was doing stuff on child psychology and psychology on autism and we spoke about the similarity between the way the children acquire language and the stuff I was doing in cataloguing. So, and this will be wrong, but very roughly speaking, around about the age of two, a pre-set mechanism in the brain switches in and language becomes all important, and children play with language. They love puns, love puns; as long as they’re simple puns. The example my friend gave me was of a child using a banana like a telephone. But also there’s a whole ton of other things which psycholinguistics can categorise as different kinds of fakery punnery.
They’ll deny things that are in plain sight, they’ll invent things that are not in plain sight but these are not imaginary friends, these are linguistic inventions, linguistic games they are playing and all of this appears to be absolutely necessary for the development of the normal brain.
JH: So, is there a false arbitrary division between the mind and the outside world?
GD: I think so; I think metaphor, as a continuous process inside the brain, you don’t notice it, but it’s there, constant metaphor. “This is like, this is like, this is like, this is like this” and it’s possible that the memory itself is based on these mechanisms. What I do know, and again, this is one of the reasons why I still refuse to accept that there’s going to be a breakthrough in AI, is there’s absolutely no knowledge of how and what goes on inside the brain. We don’t know why we sleep and yet it’s clearly integral. We don’t know why all living creatures sleep. We know that they sleep because it gets dark but what is the function of sleep? There’s lots we don’t know how memory works, we don’t know how anybody comes up with a new idea. We don’t know actually what’s going on even in the language of the brain.
I think this is all tied up. I think there’s a deep, deep connection with all of this. There is a deep connection between poetry and language and information retrieval in this immersive cataloguing thing. When your memory is external, not internal, what then happens? You know, this interaction, this immersive interactive thing, is that a memory thing? Are we using this as an extension of memory? What does that mean? I think it is being constantly revisited. It’s all connected with language. What cataloguers are doing is, in the main, constructing labels for things, descriptors for things and those labels act as surrogates for memory. So, they trigger immediate things going on in the brain, but increasingly they are linked to other information out there that is not in your brain. I think it is really important that cataloguers will be, in the future, putting these labels on things, presenting things all linked up in ways that are useful.
JH: Does it give a cataloguer a new kind of responsibility?
GD: I think it does. That is an interesting question. The responsibility is not to tell the truth, that’s the first thing we get rid of because that’s almost impossible. There’s a responsibility to reflect a certain truth about the world, not some kind of absolute thing.
We are not describing a bibliographic universe where if a publication claims to have been written by Moses, that we take some kind of objective truth from the statement. Books lie just as much as humans do. I had a long fight about the treatment of fictitious legendary characters. In classical cataloguing, it makes no difference practically; in Authority Control systems, it looks the same, treated the same so you’ve got state and responsibilities that says, Snoopy’s ABC by Snoopy and so you go, well, that’s on the label. Are there any other Snoopys around? Well, no, not really except there’s a rapper that’s called Snoop Dog. We’d better put Snoopy, fictitious character or character as the label but we’ll treat Snoopy, that label, as the objective reality, the entity, the book. Well, the Library Reference Model says no way. Responsible agents have to be persons. Snoopy’s ABC doesn’t magic out of nowhere so we have to assume that Snoopy is a pseudonym. And now we’re fine, but you have to separate out the label from the referent.
And you can still treat the label in your Authority systems, Snoopy (fictitious character), whatever, whatever. But do not make the claim that Snoopy is a human or an agent and that has become important with linked data because we’re going to link our data with data that derives from the real world. And people in the real world know that cartoon dogs don’t write things in the real world. There’s a question I always ask people: who gets the royalty cheque in the real world? In fact, it’s illegal for anybody to open a bank account on behalf of anything that isn’t a human being. In most countries, you have to have proof of humanity. You can’t say I’m opening a bank account on behalf of a cartoon dog, there’s no way, it’s not going to happen because we have legal obligations which make assumptions that all people who have bank accounts are persons.
So, our data is to interact with data from elsewhere. You know, maybe nobody’s ever written a book about the Mongol invasion of China, but we have archaeological evidence etc., that doesn’t mean we exclude it from our searches. So that data has to interact with real world data.
There are dangerous things happening with this, I mean people got very upset about this, you know. I was accused at one point of denying animal rights. Because it’s not just obvious pseudonyms. Lassie the Wonder Dog, that’s not a pseudonym, it’s the entity that appears as a dog in the film. But the dog isn’t responsible in any sense for its performance in the film. The dog doesn’t come on set going, “I think I’m going to extemporise today. I don’t like the directions you’re giving me.” Dogs don’t do that. They do what they’re trained to do.
You know, when I said this: for cataloguing terms, we cannot treat a dog as a self-conscious entity, I was accused of denying dogs their rights. And then people take great delight in telling me about the gorilla who took a selfie but once you go there, the whole thread unravels. In data terms, we have some famous examples. Most are corrected so we don’t know of them, but OCLC’s WorldCat has that service that generates a timeline for your publications. And at one point the cataloguer was cataloguing the autobiography of James T Kirk, and treating it as if it was real. They are thinking: James T Kirk, okay. Well, does more than one James T Kirk exist on the planet. What’s his middle name? Tiberius. Kirk, James T brackets Tiberius…, oh, there’s still somebody else called this. I need to add his birth date. They looked into the autobiography. Born in 2043, no problem; and for about three weeks that appeared in WorldCat. James T Kirk, born in 2043 and the timeline just went haywire, as you might imagine. This guy was publishing books before he was born and that’s the problem.
THE MAGNIFICATION OF ERRORS
JH: With machines generating more and more information via their own interactions without us doing anything except inquiring from that machine occasionally, is there not a danger that an error becomes a snowball, becomes a mountain, becomes a black hole.
GD: Absolutely and history is littered with these errors caused by humans, not necessarily by machine. The machine can scale it obviously. So yes, which is why we should be very careful about the provenance of data that’s purely from the machine. I would want to know more about algorithms it’s using in order to arrive at a statement.
JH: You’re an expert in that but most of us aren’t. An algorithm to most people in the street means nothing at all.
GD: So, trust comes in and the crowd comes in. So most normal people can’t make an assessment quite rightly and so they have to trust the provenance. Errors eventually come out because a datum has got to fit with all the other data. The sensorium is complete, is coherent, sooner or later these things bang up against reality. The crowd goes, oh, and the machine goes, dislike, dislike, dislike, dislike, dislike, and that source of provenance is now way down in confidence; all this is happening in real time. I think it’s a self-correcting mechanism. I mean, this is something to be very afraid of. It shifts balances, but it doesn’t change the absolute nature of being human and how humans communicate. What it does is it shifts balances and scales.
NEW MODELS, OLD MODELS
JH: You can see the old school panicking at the thought. They want Encyclopaedia Britannica that tells them that this is the truth about Africa in the 19th century and that is the established authority.
GD: This kind of top-down Authority model that cannot work, mainly because we know that there are two things here. One is, generally speaking, in our limited world that we’re describing in metadata, we’re all looking at the same thing with roughly the same brains, right? So, it doesn’t matter whether I’m Chinese or South African or South American or an American or a Scot, I’ve got a similar brain as all of these other people. I know that. Otherwise we wouldn’t be here and we’re all looking at the same thing. I’m looking at the same book that you’re looking at, I’m looking at the same photograph that you’re looking at, are we going to describe it the same? Of course we’re not, we can’t because we describe things in metaphor, as we discussed already, and my metaphors are going to be different from yours because they’re culturally driven. When there are cultural differences, as with VIAF, there are likely to be differences in the descriptions but we should embrace that and not regard it as a threat.
This is this middle ground I’m talking about; you still need the authority, you still need the National Library of China, the National Library of South Africa to act as the arbiters of this but they don’t have to do it in the same way. It comes down to identity management. The way I describe something varies even within a single culture. You describe a book for a children’s audience in a different way that you would describe the book for the adult audience or even the male and female audience. I don’t know, you describe something differently for native-born speakers as opposed to people using a second language. We do this quite normally.
JH: Is this the promise of the semantic web?
GD: The promise of the semantic web? There was a buzz around when the Scientific American article appeared by Berners-Lee and I eventually got hold of a copy and read it and thought this is complete nonsense and I’m really quite surprised. This guy needs to talk to a librarian, you know, of course, I was a librarian and the reason I said that was because the article gave every impression that this could be done by machine. And then just like a great coincidence, I was sitting here having breakfast one day and John Humphries was on the radio and he said, “I’ve got very special guest with me today, it’s Tim Berners-Lee”. And in five minutes, he asked about the semantic web and Berners-Lee said, “Oh, yeah, so don’t get me wrong, it’s not something that’s going to come out of artificial intelligence. As human beings, we’re going to have to sit down and record these links, we’re going to have to say this is the same as that because no machine can do it”. So, Humphries said, “Isn’t this a massive amount of effort”? And Berners-Lee replied “Yes it is. The scale is unimaginable, but that’s what they said when the internet first came out. That’s when the first IBM 360 was produced there’s the famous quote, saying we might sell five of these worldwide and then we’ll move on to another model”.
When personal computers came out people said, “Only one in fifty households in the UK is expected to purchase one of these things”. And now we all walk around with enormous computing power on our smartphones and think nothing of it, and it’s ubiquitous, everybody’s got one.
JH: And several examples, you probably count eight or nine in this room.
GD: Easily. And so, here we are again, it’s wrapped in the conversation. I think we’re reaching that immersive scale, where it’s still human beings who say “This is linked to that with authority”, whatever that means. The best you can do is get the machine to say “He said that”, then that will produce the scaling of the semantic web. Why hasn’t that happened so far? Partly because all the other shiny toys associated with this have been far more attractive. Google, I didn’t expect Google to do what it did. I remember warning people off Google years ago saying that it could only go so far before the business model would collapse. It’s reaching that stage now. But it’s got so much money out of that business model that it is now able to invest in semantic technologies and an awful lot is going on. They keep it concealed from everybody what they’re trying to do, but ultimately all of these things are trying to substantiate all of this by machines using large quantities of data. That’s wrong, it’s never going to work like that. What we need to do is engage large numbers of human beings with this process and that’s what I think is about to happen. There are large numbers of human beings engaged with the infrastructure. They’re increasingly engaged with the information through likes and dislikes and if we can keep that level of simplicity, which you can with eyeball tracking and other stuff, then I think Berners-Lee’s idea starts to have traction. Large quantities of human beings, human brains are involved in creating this.
The reason the semantic web hasn’t happened so far is distraction. Companies that could have made it happen are being distracted. The cataloguing profession hasn’t been able to keep up with any of this and we’re still struggling to accept linked data. Everybody thinks, “Now that’s a wonderful thing”; they don’t know what they’re talking about. It’s a paradigm shift that’s a disruptive technology and soon as people hit these things, they go off it; they retreat into classical mode. This is all causing delays.
PROGRESS IS HUMAN
JH: I really like the idea that you’re placing the nodes of development of the semantic web as the human being rather than computers talking to each other whilst we walk around making cups of coffee. But in fact, it’s quite the opposite it’s us–
GD: Absolutely the opposite. The machines can’t think. You know, the concept of machine semantics is extremely limited. There’s a lot of conning going on. You know, there are researchers into AI, it’s been their career for forty years now and they are getting nowhere, but they’re saying that they are, all sorts of weird things, but they’re not really getting anywhere and are glossing the situation.
JH: It seems to be a lot of investment in getting a robot make a cup of coffee.
GD: It really is nonsense. You know, I think human brains are the semantic component of the semantic web. We have an infrastructure emerging there that enables that to be captured and recorded and more importantly to be dynamically adjusted. We have to think about this as a stage in humanity’s evolution. And I mean humanity’s evolution not the evolution of human beings.
We’re in a position where I think where we can preserve local cultures, we can preserve all of that and in fact, make it even better, that is, allow micro cultures to flourish. We should embrace that if we have an over-arching mechanism mediated by the machine that is doing the balancing.
One of the great revelations for people starting to use the internet for the first time is, all right, this is amazing. I thought I was the only person on the planet that has an interest in science fiction and West Highland Terriers and I suddenly discovered there’s a thousand societies called the West Highland Terrier Science Fiction Appreciation Society. There are hundreds of thousands of people like me out there and I can communicate with them. And we formed a club and from a club we formed the culture, this is where it’s going. It’s happening all the time already and some of these cultures don’t produce anything worth recording, except you can’t help but record the stuff, so maybe it’s not worthy of other people’s attention, but it’s still sitting there. And if it is never discovered then it has edited itself out of existence, so it’s done its job.
ARE LIBRARIES READY?
JH: There’s a question here coming to my mind which a bit is potentially controversial. But I think you’ve alluded it a couple of times during our conversation so far, which, is our library infrastructure and cataloguing infrastructure as handed to us through the library system really rising to the challenge?
GD: Not at all. Some people are with RDA. I was Chair of the RDA Committee for five years, and it happened to be at the right time, and because of my own professional history, I was able to talk fairly comfortably with people in Germany, France, Spain, the UK, curiously not the United States, about what was going on in national libraries and how the cataloguing standards should be developed to help these processes out. One of the very first things that came up was this: currently RDA says, if you transcribe a title, then you’ve got to do various things to it and you’ve got to get correct capitalisation and you’ve got to add punctuation to make it clearer and all sorts of stuff. And I knew from my own experience that this took a lot of time and effort, but I was also aware that other information retrieval services out on the web like IMDb don’t care at all if something’s all in caps or a name is in direct order or all this kind of thing. So, one of the things we did was build that in as an option into RDA so you can still do your capitalisation.
It’s incredible; the metadata record as a literary work, you know? So, one thing we were able to build in here was that more machine assistance needs to be built into the process. We should not be excluding things that you can do dumbly and cheaply with your smartphone. Take a photograph of the title page, optically character recognise it, you’re done. The whole thing takes five seconds. You’re done, and not five minutes to type away pedantically. So, the use of existing technology was one of the things that we were able to build into RDA to meet the challenges of the future.
But overcoming AACR2 artefacts that they simply haven’t had time to sort out, the drag of an AACR legacy, where large libraries turn around and say, “We can’t do that.” I said, “What do you mean? It’s just a content standard.” Yes, but if we follow the standard that will cost us two million dollars to retro convert the data, so we don’t want that in the standard”, and I’m going, “What about the rest of the world?”
Cataloguers are passionate about what they do. They really do recognise some aspects of what I’ve been saying. I could go in a Napier University Library and go to the Philosophy section and still find the book that’s got my initials on it and say “I catalogued that” because it’s got GD on the session stamp and then I could probably go and look at the catalogue record and see that it’s never been changed since I catalogued it, and that was 35 years ago. Now I’m proud of that fact that that record is still being used by students at Napier University way down the line, very proud of that, it makes me feel creative. Lots of cataloguers feel like this because cataloguing is an intellectual game, and you have to have a creative streak. Most cataloguers enjoy cataloguing, really want to do it. And by all working together, cataloguers now realise that collaboration and cooperation is the key because the machines allow us to do it.
If we all work together, we can crack the problem and that idea of professional community is still very strong. I think cataloguers lack the paradigm to assess this new thing that’s going on. An awful lot of cataloguers, unfortunately, tend to have liberal arts backgrounds and not scientific. It’s very, very difficult for people who haven’t had a scientific background to get some of what I’m talking about. Professionally it doesn’t alter the core goals of the catalogue. I think there needs to be a paradigm shift in terms of attitude and appreciation of context. Machine context as well as the global context where we’re all going. And I think confidence, that’s a big, big issue.
A QUESTION OF CONFIDENCE
JH: Where does that confidence come from?
GD: It comes from the standards and this has been a drag you see. An awful lot of cataloguers are highly conservative because they feel if the standard changes then somehow it’s demeaning what they’ve done already or it’s very expensive to retro-convert and an awful lot of them have got themselves into a comfort zone where what we’re doing is right and it doesn’t need to be changed. To a certain extent that’s true with RDA; we’ve made a promise to people. Apart from one or two well-known areas, which we warn people about generally speaking, you can carry on doing what you’re doing. But if you want to make what you do more effective and efficient and useful for future generations, we’ve also open things up so that you can choose other options which allow these other things to kick in, including the obvious one, linked data, which runs right the way through RDA. An astonishing thing for me, and it was astonishing for the rest of the Steering Committee when I pointed it out, is in the whole of the original RDA there was one mention of the URL, that was it. One mention and when it was mentioned, it was an example. There was no recognition of linked data, absolutely none. You know, we’re talking about a product that was released only ten years ago.
Nothing. Absolutely nothing in it. So, we’ve tried to build all that in. I think the standard cataloguers quite rightly want to follow standards. I think their context to the standard shifted radically as we were saying. It’s no longer top-down authority. It’s not chaos either, it’s something in between and cataloguers don’t know how to use machines or see machines mediating the stuff, and producing the character-balancing and sadly, most cataloguers are still head down cataloguing for their organisation and their customers. They recognize that the data can be used elsewhere, but often, in their heads, is the idea it will get re-curated or intermediated before it’s used elsewhere and they’re completely and blissfully unaware of what’s going on at the British Library, for example. Eighty percent of ingest doesn’t get curated at all. They have to shove it straight through and all the rest is handled by BDS. People don’t understand that and it frightens them that the whole catalogue is going to fall apart. Well, it’s true actually, so it’s slowly degrading. That’s only because you’re trying to measure it in terms of unique authority headings. These measurements are no longer useful.
JH: Do we need more investment in the whole and more esteem-raising of the catalogue and the catalogue in process within our culture?
GD: Absolutely. The problem is that, what makes a good cataloguer doesn’t attract esteem. Good cataloguers do tend to be more inward-looking than gregarious, you know, socially awkward except when they’re talking to other cataloguers. I think the qualities that made a good traditional classical cataloguer is working somewhat against that perception of the cataloguers. To me, it’s no accident that cataloguing is in nosedive. When I started the career, cataloguing was seen as the supreme art of being a librarian and that’s now completely reversed. We have to assume, as cataloguers, that we have been as much to blame for that reversal as economics or the false promise that it is all to be done by machine and we don’t need cataloguers any longer.
So, I think there’s a rebalance required there. I think it’s a golden opportunity with fake news and, yes, be prepared to be more political. But it does mean abandoning some long-cherished ideas about what cataloguing is. I mean quality and by quality I don’t mean data provenance, I mean, like, is there a full-stop at the end of the MARC tag? That kind of quality. For goodness sake, it doesn’t matter.
THE CATALOGUE IS FUNDAMENTAL
JH: In the world you’re describing, the projected world we’re heading towards, the cataloguer is pretty fundamental.
GD: Absolutely. Part of this is emerging say with Wikipedia. My wife does a lot of Wikipedia in-editing and has been engaged with Wikimedia. People are now talking about Wiki data for catalogues, stuff like this. This idea of a common self-policing-plus crowd feedback way of moving forward seems to me to fit an awful lot of the qualities of cataloguing which are mentioned.
I think one of the significant issues is the standards we’re using and how or whether they are applied. I won’t say whether it’s good or bad when a cataloguer has been asked to do something by somebody and they say, “No, the rules won’t let me”. Now that’s good in some instances. I remember at Napier, the Librarian saying “There’s a backlog. We should cut down on how much cataloguing we’re doing”. So I said, “Fine, let’s have the tutor librarians come in and I’ll take them through every MARC tag that we use and we’ll decide which ones we can stop using. I knew what I was doing, I went through every MARC tag and the only vulnerable one was ‘300 physical description’ and that was the one that they were going to cut. So, they said, “Do we really need physical description?” and I said, “Well, not really. I mean, it doesn’t really matter if something’s 200 pages or 250 pages long.” They said, “But it does matter if it’s two pages long.” “What do you mean?” I said. “Well, we’ve got an awful lot of government publications that are only two pages long”.
The title says, “An Enquiry into the Universe, the World and Everything” and it’s two pages long, does that make a difference? I think it would. And I said, “What about volume works, is it not of interest to a user, that they’re going to have to borrow three volumes to get the whole thing and it’s going to be three of their five physical borrowings?” Oh, yes. So that was it. In that sense of the rules, the standards worked.
But I’ve seen it happen in reverse so often, stupid things. I had one last year. Somebody got in touch with me. They were transgender and their business was translation of children’s book and so on a whole bunch of their early works it said, ‘translated by’ and it was a male given name. And now they had changed gender and were going to translate with a female name. You can see where this is going. I give the cataloguer the biography, but they go and check it in the National Library’s catalogue, and then the next thing is they come back and say that the name used to be male. The work just vanishes because it’s kids.
So, this person approached the National Library said, “I would like you to change the heading. I know you can’t do it with the statements of responsibility but for goodness sake can you change the heading? And the National Library turned around and said “No, we can’t do that because RDA won’t let us”. I said to my colleagues, “This is nonsense, this is total nonsense. Is this true?” and they replied, “Yes”. I said, “Well, that’s got to become an option immediately. You know, we live in a world where this happens all the time.”
That leads onto a question about ethics and truth. What do we do about the person who comes in and says “I don’t want people to know how old I am so I want you to remove my date of birth from my heading?”
Gender is another one and, in some instances, religion is another one. And this used to cause me a dilemma, you know thinking, well, you have to reflect the objective nature of the stuff. But now I’m thinking, wait a minute. No, no, no and in RDA we just said it’s up to the library how it makes distinguishable Authority Labels. And so, if you’ve got a Jane Smith and Jane Smith does not want her birthday added to distinguish her then use another device, Jane Smith of Glasgow. Just use another device because we don’t have to have everything listed with dates in the next column. People stopped doing that years ago and yet we’re insisting that the data has got to follow these rigid rules.
In the past, there was a reason for this, back in the 50s and 60s, but now, there isn’t any reason. Do you have to choose one form of heading? No because I can get a computer to rotate every single word in a person’s name and index it. You know, I think that needs to be gotten across. The standards need to shift, and this is what we’ve tried to do with RDA. Shift away from this, it has to be said, Western, fixed word.
The catalogue record is perfect. This is my Holy Grail – and I throw this at people, you know – we’re acting like the Knights of the Round Table. We’re acting like every object has a perfect catalogue record. How do we measure perfection? We look at the rules. Does it measure up to the rules? Oh yes, it does in every respect therefore it is perfect. Right? This is like the Holy Grail. Most of us recognise that none of us is Percival. Somewhere out there, probably in the National Library, there’s the head of authority control and he or she is the closest we’ve got to Percival who sees the Grail? This is nonsense.
OPEN WORLD AND RDA
There’s a thing in the semantic web, the open world assumption, which I think is very powerful, that there’s always something new to be said about something. So, where’s your catalogue record, now? It can’t be perfect.
We might suddenly start measuring the colour of the paper. Well, why would you want to do that? Why do you record the fact that a book goes X1 1 comma 1 5 6 comma X 1 1 1 P mean, what’s that got to do with anything? Why not say 250 pages, you’re done? Why are we doing this? It’s the simulation of the book, the simulacrum of the book we’re trying to get across here and that’s the shift I think that we need to make. We do need standards; they’re not rules, they are best practices for conducting your business in a local context.
One of the things we’re trying to get across with RDA is you can pretty well select any options. There’s lots of options, everything’s an option and that’s causing consternation. “Tell me what to do,” “why should I be deciding the option to take?” Well, your boss will answer that. It’s kind of in direct opposition to cataloguing, it’s like an intellectual game. Two things don’t coexist, but they do in people’s heads, “tell me what to do” and “it’s a game” at the same time. Choice but no choice.
So, I think what we say to cataloguers is: “You can use any of the options. But we have set up the semantics behind RDA, so it doesn’t matter how you describe an object, what fields you use, what recording methods you use, your data will inter-operate with somebody else’s data who has followed the same set. It may look radically different, the actual content may be completely and utterly different but it will inter-operate because that’s where the machines power lies.
Hidden behind the surface gloss of RDA is this huge amount of structured semantically coherent data. That’s what we’ve done. We’ve shifted the burden of keeping this coherent from the human to the machine. Allowing the human to be creative. But the machine has to be coherent.
JH: That’s the first time I’ve actually understood its purpose.
GD: And that’s the first time I’ve said it in these terms. I love interviews like this because they get me going. Yes, that’s it, the coherency of the semantic web is embedded in the technical stuff the cataloguers don’t need to know about. Cataloguers should then feel liberated into following their nose, saying what they want about something and they need guidance from their own institutions because they do have local audiences and they do have local collections to look at.
JH: And institutions to satisfy.
GD: Exactly. But they can do that safe in the knowledge now that whatever they do it will be accommodated by the system, by the machine. I’m hoping publishers will see this, that publishers will follow RDA, and their metadata can inter-operate at the certain level with curated metadata. It’s becoming a continuum. I’ve had lots of chats with Alan Danskin [of the British Library] about this because he’s interested in how metadata gets formulated for publishers purposes and then flows into the library systems where sometimes it’s curated, sometimes it isn’t but it all has to be brought together at the end. It can’t be discarded because to discard even low-quality metadata is to prevent access to a resource. You have to take what you’ve got but I can see it’s becoming a continuum much more than a chain, where publishers may be able to interact with the data or even after its left and could be feeding back. And we know publishers are very interested in some aspects of this, the linked data part, because then they can see that their publication is being associated with this TV program, which is associated with this event, and maybe we should have a stall with this event.
A BOOK, PERHAPS?
JH: I’m going to ask you just two more questions. Is the Five Ages of Information you talked about going to become a book?
GD: No. I have been discussing this with a friend for years, and he’s urging me to do so saying he’s not come across this anywhere else and it forced him into a paradigm shift and he’s now saying that he can’t think of politics or anything else he is interested in without looking at it in this light, so he’s really pushing me, but right now I’m exhausted by what I’ve done. I know how difficult it is to write a book, I may or may not then write a book, but I may or may not do this. I may use a different method of doing it like blogging or something like that but right now and for the foreseeable future, and that’s certainly to the end of this year and probably next year, I’ve got too much other stuff that I need to finish off and it’s much easier just to idly muse about these things. But I’m definitely under pressure to do this. I still haven’t worked it all out in my own head and that’s part of my reluctance. Yes, it’s an interesting idea.
JH: To conclude, Lesley Whyte at BDS has asked me to ask you, what’s been the highlight of your career working in libraries?
GD: One of the stock answers is what I’m doing right now and it’s true. The stuff I’ve been doing with RDA has brought together everything I know. I mean, including the quantum mechanics we raised in this discussion. All things I’m interested in suddenly come to bear on this activity right now so I have to say that that’s the high point. But another answer is nothing to do with that. Everything is transient and, yes, those catalogue cards, all my catalogue records that last through time. But I’ve got ton of other stuff I’ve done, you know, like Scottish Collections Network and stuff like this which never came to anything.
The other thing now I am particularly proud of is my interaction with people. Cataloguers, people who end up in cataloguing departments, can be misfits. In my career I would say, four or five people I’ve come across who have really been struggling to come to terms with the world, they’re working but feel a failure. I’ve just been able to talk with them, show them things and through cataloguing they became very, very confident people. They found their place in the world, wherever that is, and they thank me for it so I’m very proud of that, my impact on people. And I think I only had to do that once got to justify my existence; to be able to do it four or five times…
I assume I’m doing it in a much bigger way with people who are reading what I do and using the outputs that I create but that direct influence on people…. And this is a purely cataloguing thing, because people who end up in cataloguing usually can’t find a place to fit anywhere else in the organisation. I think that it’s not me, it’s just that I’m able to show them this phenomenon about the world. This is what we’ve been talking about, the whole conversation. What’s in here, is mirrored out there, and we can make sense of it. We can mediate it in some way and that is your role in life. You know for a cataloguer, I think that that’s a very important thing but, of course, is what most people do naturally without thinking. But these are misfits I’m talking about.
I’ve got a little story. I went to a party at the National Library of Scotland with my wife, and she has a student working with her on placement who for various reasons comes over and sits with us. We’re chatting away and the student is at Strathclyde, doing an MSC at the Library School. I don’t know how cataloguing came up but she says to me, “Oh, do you know anything about cataloguing?”
And I said, ” Yes. Yes. I’m a cataloguer.”
And she says, “Oh, wow. Do you know of anybody who could tell me and some of my classmates, what cataloguing is because we only get an hour of it in the course? And we only get an hour, so it doesn’t seem that important. But when we look at job prospects, every job seems to say cataloguing experience or knowledge of and then a whole bunch of acronyms that we don’t understand. So, do you know of anybody who could help us?”
And I said, “Well, yes. Me. I like doing this.”
I said, “I work at Strathclyde. So, I know we can easily set something up in the University because I work there.”
She says, “Oh that’s great. That is really fantastic. How much would you charge for this?”
And I said, “Sorry you misunderstand. I work in your department. I’ll be at arm’s length and it is part of my job. Here’s the deal. I’m busy, find five dates and I’ll pick one of them and then I’ll find a room, sort all that out and I’ll just turn up and give you a lecture on cataloguing.”
So this happened. I walked into the room and she said, “Oh, it will just be me and a few classmates.” You know, the entire class was there. All of them and this is extracurricular, this is in their thesis preparation time. So I said, “Well, I’ve only got an hour and I’ve been thinking about what to say, so I’ll take you over some of the components like content-standard Anglo-American cataloguing, RDA, ISBD, Encoding standards, MARC, Dublin Core and infrastructure standards like Relational Database Management Systems and ILS systems and stuff like that.”
And I said, “I’ve been trying to think about how to sum up.” I said, “First of all, the other thing I need to tell you is cataloguers are anal. What makes a good cataloguer is the ability to find joy in picking up the next book off the backlog and thinking “My adventure starts again”. When you finish that, you pick the next book off the shelf, you don’t know what’s going to happen, the adventure starts again.”
“That’s what makes a good cataloguer, and you just do this day in day out and you never get bored. It’s the most intellectually stimulating thing I know but it takes a certain kind of person to do that. What do cataloguers do? What cataloguers do is they interpret the intentions, very broadly speaking, of people who actually create and write things in ways the people who read things can understand. Cataloguers bridge the mind of the author and the mind of the reader and they act as a bridge because the author may use these terms, the reader may use these terms. We supply the missing middle bit”.
And I was quite proud of that. Right now, I disagree with that, I don’t agree with myself any longer on that one. I don’t think we’re in the business of interpreting the minds of authors and creators, we’re in the business of making sense of or digesting this huge quantity of disparate data out there in a form that makes it easier for readers to use themselves. It’s not an intellectual pursuit, it’s a social pursuit, that’s the difference. In my head, it’s gone from being pure intellectualism into much more of a social pursuit.
JH: Gordon Dunsire, thank you very much.
GD: Thank you, because I’ve had at least one original thought during the conversation.