CCLA-Los Angeles Meeting Notes

Expert Workshop: Crowdsourcing and the Futures of Libraries

UCLA Learning Lab, February 6, 2015, 10 AM-4 PM

Summary

In February of 2015, the IMLS-funded Crowdsourcing Consortium for Libraries and Archives (CCLA) held an expert workshop on the topic of “Crowdsourcing and the Futures of Libraries” at the UCLA Learning Lab. The CCLA (www.crowdconsortium.org) aims to forge national and international partnerships to examine how crowdsourcing technologies, tools, and platforms can help libraries, museums, and archives; the UCLA-hosted workshop was the latest in an ongoing series of meetings and webinars conducted by the CCLA to extend this conversation. The daylong event, organized by Dr. Mary Flanagan, Sherman Fairchild Distinguished Professor in Digital Humanities and Founding Director of the Tiltfactor Game Design Research Laboratory at Dartmouth College, and facilitated by Jake Dunagan and Matthew Manos from the design and innovation consultancy firm verynice, focused on both identifying the present needs and challenges faced by libraries and museums as well as forecasting the future trends and “signals” that will likely shape the future of crowdsourcing in the cultural heritage domain. The workshop’s fifteen invited attendees included leading crowdsourcing experts from universities (UCLA, UC-Berkeley, UC-Irvine, UC-Riverside, USC, and Stanford), libraries (e.g., the California Digital Library) museums (the Getty), archives (the National Archives, the Internet Archive), and organizations (OCLC).

Through a series of synthesizing discussions and creative explorations led by Dunagan and Manos, the game and passionate group converged on several key practical and ethical concerns regarding the current state of crowdsourcing (e.g., the issue of fair pay for crowdsourcing workers, the desire for greater openness in the sharing of platforms and data); conceptualized crowdsourcing platforms for hypothetical future scenarios presenting particular design constraints (e.g., centralized, paid crowdsourcing systems versus decentralized, unpaid ones); and generated an expansive list of “big questions” that should inform the planning (as well as the funding) of crowdsourcing activities and collaborations for libraries and archives. The key themes and ideas that emerged from this conversation will directly guide the agenda and activities of the CCLA, most notably the initiative’s culminating national meeting to be held in Washington, DC in later 2015.

Notes and supplementary photos from the workshop are listed below.

Meeting Facilitators

Jake Dunagan: Global Foresight Lead at verynice.co and a research affiliate at the Governance Futures Lab. He specializes in social invention and tactical media.

Matthew Manos: Founder of verynice.co and a neo-philanthropist, creative director, and author who is dedicated to disrupting the way the design industry operates.

 

Meeting Attendees

Steve Anderson: Associate Professor of Interactive Media & Games at the USC School of Cinematic Arts and studies the intersections of media, history, technology and culture.

Daren Brabham: Assistant Professor at USC Annenberg. His research focuses on the potential of crowdsourcing to serve the public good.

Tom Cramer: Chief Technology Strategist and Associate Director of Digital Library Systems and Services for the Stanford University Libraries.

Stephen Davison: Head of the UCLA Digital Library Program.

Susan Edwards: Head of the Social Sciences Division at the University of California, Berkeley.

Mary Elings: Archivist for Digital Collections of the Bancroft Library at the University of California, Berkeley. She manages all aspects of the Digital Collections Unit.

Mary Flanagan: Professor in Digital Humanities and Founder of the Tiltfactor game design research laboratory at Dartmouth College.

Brian Geiger: Director of the Center for Bibliographical Studies and Research at the University of California Riverside.

Geoff Kaufman: Postdoctoral fellow in social psychology at the Tiltfactor game design research laboratory at Dartmouth College.

Shu Liu: Metadata and Digital Resources Librarian at the UC Irvine Libraries. Shu leads the Libraries’ efforts in planning, implementation, and assessment of metadata schema and web interfaces related to e-research and digital scholarship services.

Dominic McDevitt-Parks: Cultural professional specializing in Wikipedia, digital engagement, open access, and open government. He works at the National Archives and Records Administration.

Matthew McKinley: Digital Project Specialist at UC Irvine, managing services related to digital object preservation and access, and contributing to local and UC system-wide digital curation policies and procedures.

Miriam Posner: Digital Humanities Program Coordinator at UCLA.

Merrilee Proffitt: Senior Program Officer in OCLC Research. She provides project management skills and expert support to institutions represented within the OCLC Research Library Partnership.

Alexis Rossi: Director of Web Services (Content & Access) at the Internet Archive and currently manages all aspects of Internet Archive collections work for movies, audio, TV, and books.

Kathleen Salomon: Assistant Director of the Getty Research Institute and Co-Director of the Getty Research Institute Library.

Adrian Turner: Data consultant at the California Digital Library (CDL). He supports Online Archive of California (OAC) and Calisphere operations and initiatives, and provides technical and user support services for new and existing libraries, archives, and museums in California that are contributing members to these two systems.

 

Overall Introduction and Context

Mary F: I wanted to give a little bit of background on the CCLA project. This all started a few years ago for me. I’ve really been obsessed with the idea of democratizing the archive and using games for this purpose (being that I’m a game designer; I teach at Dartmouth College). I have a project called Metadata Games, which is about using games to crowdsource metadata. We started this project, it got a lot of attention, including from the IMLS, who invited me to give a talk at their WebWise conference. It seems that IMLS and other funding agencies are receiving a lot of crowdsourcing proposals, and other funding bodies are getting all kinds of crowdsourcing questions. As a larger community in libraries, museums, and archives, we don’t really have any kind of infrastructure for these projects; we don’t know who’s doing what; we don’t know what’s good or what’s bad.

I put in a proposal to create the Crowdsourcing Consortium for Libraries and Archives, and this has been funded as a six-month project, as of last fall. We started with a regional meeting at the Boston Public Library. We’ve also done some webinars and surveys, and this is our second in-person regional meeting between the coasts, and in May the conversations from these meetings will be taken to DC for a two-and-a-half day workshop with a lot of the funding agencies (IMLS, Sloan, and others will be present).

I’m really here to help engage this conversation with you. These conversations will directly inform the ways that our funding agencies and government think about crowdsourcing. It’s important work, and I really want to thank you for taking the time to contribute.

CCLA (which is the Crowdsourcing Consortium for Libraries and Archives) is what we make it. Our agenda is the agenda. There is no outside organization telling us what to do. We are it now, so we can decide what an agenda is a national agenda for crowdsourcing. We’re all coming to this as leaders from our organizations to give our ideas and perspectives about this work. We should disagree, we should have different experiences, we should have different engagement with these topics, and that’s important.

I wanted to make sure all of our voices are brought out in this conversation, so we have design facilitators who are here from verynice.co.

Jake: I’m based in Austin, Texas. I come from future studies and the political science program at the University of Hawaii. As a futurist, my work is all across the board, looking at neuro-politics (how people use knowledge about the brain to change relationships), political systems design, experiential futures (to create artifacts, performances, films, media, ways to put people in different possible states or futures). Our goal as futures-thinking, foresight people is to bridge the gap between what we can experience now and an inherently abstract future. Over the years I have developed a host of techniques and processes to help people bridge between the present and a range of possible futures. I have done a bit of work in crowdsourcing, particularly crowdsourcing foresight and forecasting. The history of futures studies is based on methods like the Delphi method (where you get small groups of 8-10 experts to aggregate their opinion). In my work at the Institute for the Future, where I was prior to verynice, we did quite a bit of crowdsourcing foresight and forecasting, creating game platforms to engage hundreds or thousands of people to think about the future, inputting all those signals and synthesizing those in useful ways.

Matthew Manos: I’m the founder, managing partner, and global strategy lead for verynice. What that means is I focus on developing new strategic workshops that we can do with clients and the community, sometimes around gaming actually, currently exploring how can you integrate dice and serendipity into decision making (or lack of decision making) in certain scenarios. I’m in interested in experimental economies and business modeling in general. The reason we’re called verynice is that we give half of our work away for free to non-profit organizations. We have been doing this for about 7 years and actually started the company on this campus as an undergrad.

Jake: Verynice—the organization itself is less than 20, but we have dozens and hundreds of people we engage with…

Matt: especially through crowdsourcing, which is something the company is very interested in, as it’s very core to our model for production of work. We’ve also engaged in future of library types of projects as well as design thinking and education types of projects.

 

Introduction to Agenda

Jake: We’re looking at big questions about the future of the libraries and archives. If anything else, we want to get the right sensibility in place. If we get the sensibility right of this fearless exploration, we’re going to uncover a lot of new insights together. That’s the fun part — bringing together all these different expertise together and seeing emergent insights can come out of it. So, big questions: how do we connect knowledge, data? How can we create the next generation of engagement tools for libraries and archives?

Our goals? We’re going to explore the tools and platforms and the ways that libraries and archives can use crowdsourcing technologies to enhance their collections and user experiences. We’re going to identify opportunities for CS, look at individual expertise and shareable assets, prototype the next generation of public engagement/citizen archivists/humanists, and explore ways of partnering with industry and FOSS initiatives.

You’re bringing your own expertise and experience in your different domains. We’re going to pull them together and see what comes out of it. A lot of this is going to be emergent. We have to be open to these emergent insights that are going to come up. We’re here to course correct — we have an agenda that is laid out in a structured way, but we want to be open to modifying as we need to.

 

Participant Introductions and “Signals from the Future”

Daren: What I’ve seen come up in the last five years or so is the crowd is pushing back more than they had. A couple of quick examples. The recent Doritos “Crash the Super Bowl” contest, there’s a kind of outcry right now in the video community about how the winner was actually quite professional, had a lot of professional training, access to movie sets, that kind of thing, producer for “Tosh.0,” I think. So it’s designed to be this amateur space, and it’s not. That’s really been happening for a long time, but people are starting to get a little outraged. There have been lawsuits against Amazon’s Mechanical Turk for not paying fair wages. And there’s the No Spec movement, which is graphic designers banding together against these crowdsource contests for design. I think we’re starting to see a bit of “civil rights era” for workers, and we might see this is a future ethical concern and labor concern.

Kathleen: Every time we work on anything that’s discipline-specific in the library, we wonder, “Is it a good thing or not? What does it do for scholars?” We’re always concerned about how we’re getting our material out to the edges — websites, digital publications, people aren’t addressing those. I just think the fact that we’re struggling with this is our signal point. This goes into the future of libraries.

Brian: My signal, which might be more of an anecdote. I received a letter from a small town in Texas with a handwritten note from a women saying how much she loved the CDNC [California Digital Newspaper Collection], and she had enclosed a check for $500. I realized I needed to find out who else had donated, and I wrote a letter to each of them to thank them. It made me realize there’s a real community out there that we have to engage and connect with, which requires some commitment on the part of the center to figure out who your community is, to engage with the community in their media.

Matthew McKinley: One is the almost unconscious crowdsourcing, like ReCapcha. I think that’s a really cool thing — these dual-purpose design types of things that allow people to crowdsource without really thinking about it. My other one involves serendipitous discovery. I don’t know if you’ve heard of this news story about an art scholar discovering a lost painting in the movie “Stuart Little” in 1999. It was in the background of an apartment in one of the scenes, and it was a painting that was thought to have been lost for more than ninety years. That was a totally random, unexpected event. Just the idea of getting more eyeballs on more culture with more avenues for feedback will really enhance the opportunities for weird, awesome stuff to happen. You cannot plan for this sort of stuff, but if you make the environment where it can happen… this a real human interest story that gets the world thinking about art and museums and libraries.

Susan: I guess my signal for the future is one from the past. I’m intrigued by whether crowdsourcing can help with “repatriation,” a concept that applies to physical remains, in terms of making connections to people and getting the stories back to the people who created them. I’m fascinated by the idea of metadata being embedded within archives that aren’t full-text searchable. Things from the past can have meaning in the future for people who produced them and who are excluded from consuming them. How do we bridge that gap? These are issues with access and social justice.

Mary E: We’ve been trying to get material out the door so that people can find it, in digital form, because they’re locked away in truly hidden special collections. My signal is getting this stuff digitized is really something that’s being done now. A lot of this is about supporting access and aggregation at the very highest level in a dynamic way (which linked data will help us do). We’ve down hack-a-thons, and edit-a-thons, which are little pots of crowdsourcing. We need a bigger, bigger effort to digitize and connect materials to people. Some materials are so fragile and in such a state that they cannot be digitized. Maybe 75-80% of a special collection is non-digitizable (for cost, condition, etc.).

Tom: Open annotation, we see a lot of systems where people are doing typing or annotation or crowdsourcing within an application, but all of those things have potential use outside of that system. And by using linked data and very purposefully designing that in at the beginning of the system, it’s quite possible to take flags from Flickr and incorporate them into your library catalog, for example, or vice versa. That’s where I think there’s a lot coming, not just crowdsourcing in a particular environment, but crowdsourcing on the web as a web of things.

Alexis: One of the things we do [at the Internet Archive] is digitize books, which we’ve done since 2005, and there weren’t very many people doing that at the time. We wanted to have a more open, accessible version of what Google was doing that would be available for anyone to use. There are a lot of communities out there doing a spectacular job of digitizing media and providing metadata. About two years ago, a guy who works for me came to me and told me he belongs to a community that has a million CDs worth of music. I can’t necessarily make them public, due to rights issues, but it turns out there are communities out there. I couldn’t have done this for ten million dollars, yet there are communities out there doing it for us, for music, for artwork for textbooks. That’s invaluable to us. It fed into this idea of “let’s let people make their own collections” and hopefully the platform is big enough for people to do it. There are clearly people out there who are experts at these things, who really care, who want to spend their time digitizing media and making them available. So let’s give them a way to do it.

Adrian: In terms of signal, there’s clearly a lot of crowdsourcing types of activities out there at different scales and dimensions. How can we create links or workflows between different “nodes” or institutions or crowdsourced data? There’s also this issue of determining what’s the “version of record,” especially as existing metadata is getting supplemented and added to by the crowd. What constitutes the version of record, the archival description at this point? It’s really the aggregate. What can we do to try to connect these things?

Miriam: My background is in film studies, and I’ve been marginally involved in a few crowdsourcing projects. My students and I have had good experience with augmenting metadata from cultural heritage institutions. I’m a little bit agnostic about CS. A lot of times I’ve seen libraries express enthusiasm about CS, it’s often been in cases when the work should really be paid. The parts I find exciting are related to the notion of reaching out to cultural heritage communities and incorporating their knowledge. I’m watching with interest but a certain level of concern.

Jake: Almost a revolution coming. How do you fairly reward contributions? What is a fair scheme for aggregating all this work together and giving back the value?

Shu: I am the project lead of our metadata games. Right now the project is ongoing, but we’re facing the challenge of assessing the quality of those tags. We’re interested in the motivation of users. For general types of projects that involve projects, what is the motivation and what is the reward for user participation? The other is the assessment. We can get all kinds of information from users. I think the benefits of crowdsourcing is the assumption that public good exists. But what about public evil? People who provide information for their own interests. These are some of the question marks that lie ahead.

The underlying principle for the growth of the web is its openness and universality. I feel our current crowdsourcing activities are only at the beginning stage, because all the activities are surrounding the digital content life cycle. What’s next? How about we start thinking about the impact of making our cultural heritage content accessible on the web using CS? Maybe our gigantic user crowd could make videos or stories about how these materials have impacted their life. People are using more time watching TV than doing web activities – maybe people could do some kind of role play or make their own TV shows — somehow build upon our current crowdsourcing activities to make them more interesting. There’s more that could happen there.

Steve A: I took this in an overly literal design fiction kind of way. I came prepared to talk about signals I received from hybrid human robot forms telling us to stop thinking so small. Stop making a distinction between data and metadata, dissolve the distinctions to open up the possibility of “round-tripping” between archives and scholarly use. Forget all your dumb ideas about intellectual property, we don’t care about that anymore!

Dominic: At the National Archive, we have an online catalog API, one of the first writeable APIs in cultural heritage. To me that’s the thing that excites me for the future. With crowdsourcing we’re usually talking about data sets; if we want things to be participatory with the public, we need to allow easy, machine-readable reuse of the data and allow the data to be a two-way street. Eventually we’re working with a catalog in which all the fields of archival metadata can be written to by the public. How can we get the public to contribute?

Merrilee: The things I’m most interested in are the important role that identifiers play. We’re very much moving through an environment where we’re moving from strings to things, where all communities need identifiers to disambiguate. The other signal from the future is the financial picture for cultural heritage institutions. With the cultural heritage sector, we never have the recovery period the rest of the economy has. Being mindful about continuing to make good choices. Finally, when I think about contributions from outside the community, I think less about crowdsourcing and more about fining allied organizations, either informal or formal, with people who share interests.

Mary: My signal that came to me from my past… in the 1990s when I worked at a software design firm in Austin, we were approached by a Saudi-Arabian princess whose family had taken their collection of stuff during the first Gulf War and put them in warehouses. She wanted a system she could use to learn about this stuff, and this was before wifi. We had this rambling, mobile technology to go out to people to collect information directly, to get regular people’s data. It was crowdsourcing — going to people who had something to say and adding their voice.

Geoff: My signal has to do with all of the novel and emerging ways that people are using of describing, categorizing, and cataloging objects, ideas, and experiences that are appearing on social media platforms. Things like hashtags, memes, tumblrs, and so forth, are creating new connections between ideas — and between people — and I’m interested in how crowdsourcing platforms will adjust and take advantage of all of these user-created, “folksonomic” systems. I think they really can set the stage for sometimes random, but also sometimes serendipitous connections and discoveries in the cultural heritage space.

 

Challenges, Complaints, and Needs for Libraries and Archives

List generated by attendees:

  • Protocol Path Dependence
  • Integrity of Data / Discovery
  • Communication Portals for Serendipity / Discovery
  • Citizen communities Data Assembly Lines
  • Labor Issues → Jobs become volunteer work
  • Role identifiers play (disambiguate, creating “authorities,” strings → things)
  • “Public Evil” – griefers, misinformation
  • Stories (user-generated, more interactive)
  • Synchronized Crowdsource Communities
  • Difficulty of Digitization/Importance of Access
  • Unconscious Crowdsourcing (dual-purpose)
  • Attention to User Experience → design, curation, more signal/less noise
  • Online Catalog API – tags, transcriptions
  • What’s the actual benefits?
  • Crowdsourcing on 3rd Party Platform
  • Openness and Universality
  • Who controls the protocols?  Whose worldview?
  • Motivation and Reward for User Participation
  • Challenge of Incentives for the Contributors
  • “Gig Economy” Integration
  • Crowdsourcing more art than science
  • Open Annotation Protocols – Web of Things
  • Scan-a-Thon
  • (Re)building the Wheel – how to build on efforts?
  • Generate data but not publishing
  • Vendor Relations
  • Attention of Leaders/Exec Funders
  • Budget/Staffing
  • Dilemma: More discoveries, fewer (internal) resources to process
  • Cultural Heritage Sector Busts w/o Boom
  • Less Crowdsourcing, More Alliances
  • Tensions: Trust vs Fluidity, Structure vs Efficiency, Authority vs Openness, Signal vs Noise, Professional (fragment, layer) vs Amateur, Sacred vs Profane, Finished vs Unfinished, Intentional vs Unintentional, Enclosures vs Networks, Standards vs “Good Enough”

 

Mapping the Future: Trends, Emerging Issues

Jake: Really important to expand one more time. You had your signals that came up. Now I want you to think even bigger, beyond your world, and I want to put up other big trends and emerging issues. I’ll give you some time to think more broadly about our landscape, push it out a bit further to think beyond the issues we’ve discussed already. It can be as baseline as demographic trends, data, or new emerging issues or signals. Think ten years ahead, that gives us a horizon to think about technological change. A step beyond libraries and archives, broader economic issues, social issues. Major trends that are drivers toward where we’re going, could be some of the more disruptive things that are going on.

List generated by attendees:

  • Globalization / Multilingual Expectation
  • Less tolerance for North/South divide
  • Class gap widens
  • Gender?
  • Globalization of Research / Tech Transfer
  • AI, NLP, etc. → Siri will do the digging for you and assemble answers
    • → Critical thought?
  • Open Government Data/Records
  • Crowdsource Decision Making
  • Most Internet content comes from Asia
  • Less interest in discovery via libraries/museums
  • Knowledge Chip
  • Too much data to handle
  • Crowdsourced translation of media
  • Expertise is not dependent on professional training
  • Experts are “connectors” of knowledge, not in-depth knowledge
  • Voice to text makes audio and video keyword searchable
  • Searching by image is standard — no one uses words in search bars anymore
  • Access and research via hardware/software emulation
  • Leisure becomes boutique and specialized
  • Books are no longer published, except as novelties
  • 50% of today’s academic institutions no longer exist due to unsustainable funding models
  • Crisis of Unis: MOOCs, financial issues, political strife, etc. → Universities provide significant support to Libs.
  • Humanities become hip
  • New repetitive stress injuries
  • New storage tech makes it possible to carry an entire library with you
  • More data, less persistence
  • Privatization of content (in .orgs, .com, .net sites)
  • Solving the copyright problem, so collections can travel online
  • Casual, incidental, mobile work (microvolunteering, but also remote sensing, ReCaptcha-type stuff)
  • Every device is networked and passive data/usage collection
  • Open control becomes the norm
  • Less tolerance for DRM/copyright
  • Institution as full-fledged participants in communities
  • Crowds become hierarchical or bureaucratic, narrowing contribution
  • No need for batteries
  • Self-driving cars = more free time
  • Global WiFi coverage
  • Internet as utility
  • Lack of paper docs (letters, diaries) will create a gap in the primary source record unless contributed/shared
  • Motivating users by recognizing contributions a part of tenure profile or grad training
  • Motivating users through gamification mechanisms
  • Motivating users through discourse of public service or “service to country”

 

Sense-Making: Clustering Major Themes

Matt: First for context, what we’ll come to is a list of words.

Jake: This is leading to a process based on 2X2 method of critical uncertainties. Look at a range of issues or possibilities, things that could go one way or another. Could look at a future where there’s high versus low amount of trust. Then you plot them against another one to get four different zones. For example, high trust and high engagement or high trust and low engagement.

Matt: Want to arrive at two 2X2s in terms of a new product, service, or partnership. This world of community versus institutions. Community as in creators of content (people in history who have created content, discoverer of content, collector of content). Institution as libraries and archives, private sector, agencies. Connecting the two, there are issues of communication (how do we increase participation, measure participation), efficiency (making systems more interesting, engaging, etc.). Between these two (creating perfect communication, perfect efficiency), there are these emerging issues. We want to come to four of these, either through a vote or a hybridization.

Clusters generated by Matt and attendees:

Community: Incentives, Commitment, Perspective (e.g., gender), Credibility (what makes content credible or helpful?)/Assessment, Safety (safety of protecting content), Ethics (ethics of labor), Trust, Privacy, Helpful, Connected/Fragmented, Skills/Credentials

Institution: Integrity, Intentions (what will we do with this stuff?)/Utility/Commercialism/Outcome, Home (where does this stuff live?), Authority, Legality (e.g., IP), Access/Openness, Expense/Resources, Legibility (some things difficult to scan and archive), Scale, Controlled (Centralized versus Decentralized; related to “Home”), Sustainability (brings up issues of access and resources), Serendipity/Surprise, Investment (resource, emotion), Legacy

 

Closing Remarks

Mary F: I hope this feels like a beginning. This needs to be an ongoing conversation; we have so many questions, so many ideas here, and we are a small part of this community. This a regional meeting that’s part of a national and international conversation. By being part of this conversation, it’s a chance not only to help shape where we go with these conversations but also to really make a difference. We are at a moment in time that’s given us a lot of agency. This is the time for us to figure out what this looks like, and we’re the people to do it.

I instill upon you to help us shape this [CCLA] website, provide your feedback about this meeting as we move forward to the DC meeting, and where we need to drill down. It seems like we need a technological subcommittee to throw out platforms. How do we move forward technically, how do me move forward in terms of the conversation, and how can we address all of these ethical and social questions that are a part of a bigger conversation? We can share a dialogue, share events, and so forth, to keep our eyes on the prize. It’s 2015 that we’re talking about a shared digital platform. Decisions are going to be made this year, and I want them to smart decisions informed by people who really care.

Jake: The spirit I was talking about was here. Thank you all for your openness and energy. Gifting these high-level questions to the national meeting. It helps map our minds to the things we need to think about. There’s a sense of urgency, but also a sense of what we don’t want to do, and we want to understand the context of what’s going on. There’s a lot of emergent, really original ideas here, nuggets for thought that will be evocative and useful.

Matt: Aside from the layer of imagination you all were open to putting on this, something I also enjoyed seeing was the synchronicity of everyone’s passions in this field. We came in preparing for all of the possible angles this could take, and it ended up funneling in pretty fast around certain issues. The conversation, the language and key words that came up over and over again, was pretty exciting. Bringing all of you together really created a clear path toward creating something together.