[http://manuscripttranscription.blogspot.com] [http://beta.fromthepage.com]

Ben Brumfield, Collaborative Manuscript Transcription

from the page

I’d like to draw in pretty deeply on one best practice. The best practice I would like to talk about today is turning the product of the crowdsourcing effort back to the volunteer. Now what do I mean by “product”? I’m not talking about the final product, the item-level finding aids, the published research papers, and the scholarly press. I’m talking about the raw product, the actual contribution of individual volunteers and their fellow volunteers: the corrected text of a newspaper article, the transcribed letter, the comments, the identifications that they have made personally or that people within their community have made.

Why should we do this? The first reason is, it’s the right thing to do. Yesterday we heard a lot about reciprocity, about social justice, but this is a pretty old concept. Deuteronomy says, “Thou shalt not muzzle the ox when he treadeth out the grain.” Many, many platforms for transcription support this immediately. If you’re using a wiki-like system like FromThePage, like WikiSource, and I believe Scripto, that’s just a side effect of the platform. With others, users can’t get at any of their contributions. I would like to call out the Smithsonian Institution Transcription Center. They add the ability for the users to download PDFs of all of their contributions and all of the community contributions. They added this feature on purpose to their tools in order to support this because they believed it was the right thing to do.

Now that we’re done quoting the Bible, what are the instrumental reasons to do this? For one thing, if you expose the raw data early on we can align our projects with the things that incentivize our volunteers. Our volunteers often do not care about our institutions. They are not passionate about clean metadata in the catalog system, that’s not what motivates them. The reason they are doing this is because they are immersing themselves in the subject matter. They are sitting with a bird watcher in 1918. They are marching alongside a soldier in the Civil War. Exposing the things that have immersed them allows them to share this and show others what excites them about it.

On one of the first projects that I worked on we got a super-volunteer early on who just blazed through all of our material. Afterwards I talked with him on the phone and asked, “What can we do to thank you for this?” He said the thing that he valued most, the thing that would be most important, would be if he could print out and publish and have a bound copy of the Mexican-American War diary that he had transcribed because the way that the heritage organization to which he belongs rewards and advances its members requires those members to publish a book. Well, the thing that he had done matched the contributions of any of his fellow amateur members of the Sons of the Republic of Texas, but without actually delivering that to him in a useful format, he wasn’t going to be able to get that recognition. So there is this idea of that extrinsic reward.

Another reason to expose the work in process, these raw contributions, is to enhance recruitment. I have often told the story of one of the super-volunteers on a number of projects on FromThePage who found the site by doing a vanity search. He Googled his own name, and the top response turned out to be all of the entries in the Julia Brumfield Diaries which mentioned a man who had the same name he did. It was the diarist’s postman. He recognized that this was his great-uncle, jumped in, transcribed an entire diary on his own, and then moved on to use his previous experience analyzing ichthyology records and field books to transcribe scientific field books on the same platform.

If we had not exposed the work in progress from a previous volunteer, he would not have gotten there. It’s also possible that if we had exposed it in some other format, some official site, he still wouldn’t have got there. He found the transcript within the crowdsourcing project and he knew how to contribute immediately.

The last reason I’m advocating this is for engagement and productivity. For the last three years I have been involved in a nonprofit in the UK called Free UK Genealogy. For 15 years now they have had volunteers from around the world who transcribe genealogical records of interest—census records, the civil registers of births, marriages and deaths—all using offline tools including spreadsheets, putting CD ROMs in the Royal Post, and getting things online to an online database that they can publish.

I was brought in to revise their transcription system, overhaul it completely and bring it into the modern century. And I encountered incredible resistance. The volunteers had a system that worked for them. Why change it? What was this going to give them? After a few months we switched course and we focused instead on the delivery system, a searchable website that people can go to and see. So we replaced the oldest of the systems, one called FreeREG, with a new one we built completely from scratch. What happened is that once we launched that, once we replaced the old delivery system, the old product of the volunteer effort, with a new one, for one thing it was the most positive project launch I have been involved with in two decades of software development. The people loved it. For another thing, it re-engaged the volunteers. Since we’ve gone live with this we have seen contributions using the old spreadsheet-based online system go up. We just passed 32 million records.

In conclusion, expose volunteer efforts within your crowdsourcing system as they’re produced. It’s the right thing to do.

This presentation was a part of the workshop Engaging the Public: Best Practices for Crowdsourcing Across the Disciplines. See the full report here.