|
Several of our customers have asked for some 'best practices' help in getting the best possible results when Smartsourcing their work done. I've looked for good reference sites, but frankly, haven't found any that aren't written for developers or quants. So, here are some of my own observed best practices in 4 areas. 1. Testing your questions for optimal results To put context to my points, I’ll use data from my recent submission. I had run a second survey to test the results of the Reader Quality Score data in last week’s post. This run asked the question in a different way. Rather than, “grade the substance and thoughtfulness of the blog’s comments and therefore their Readers”, I asked, “Based on what you read in an article’s comments, do you think the Readers are productive in their daily work lives and why.” There wasn’t much new news here. The top and bottom half of the lists remained within their previous halves. The order did change some, the scale widened a bit, and the explanations for scoring were very entertaining. However, since I’d carefully managed the entire process throughout, it does serve as great data for some best practices suggestions. The 1,250 individual responses came in across 4 days last week (represented by 8, 9, 10 and 11 in the charts below – April 8, 9, 10 and 11). 1) Test your question Submitting 10 – 25 rows as tests is also wise as even very large volumes of work will have a tendency to be completed within hours of the submission. Therefore, it’ll often be done before you have a chance to cancel the run when you see poor results. In my test, over a hundred distinct workers submitted answers during the 4 days. The color bands depict volume by a specific worker. As you can see, most of the work was done in the first day. The x-axis labels refer to the day of the month April 8, 9, 10 and 11. 2) Demand a High Approval Rating and Pay Well Pay well so you attract the best workers. The additional cost is more than offset by the reduction in time consumed to manage poor or fraudulent work. My rule of thumb for ‘paying well’ is to pay better than $10/hr. This means, doing one of my own requests and determining how long it takes to do it right, then applying this table to pricing which equates to about $0.17/minute. $ 0.02 7 secs Vary the parameters when testing for quality of work at particular prices. Pricing can be dramatically reduced by improving the workers speed and reducing any uncertainty associated with your expectations. Some of the key ways to do this are:
3) Build a reputation and set a tone for your tolerance of what is acceptable work In my example, over half the work was through the door before I began accepting or rejecting the results on the second day (9 in the chart below – purple = Rejected, green = Approved). As you can see from the work submission timeline below, the results started out mostly good, but as time wore on without repercussion, the shoddy and fraudulent work increased. It is important to note that some of the shoddy work was a result of workers assuming these requests were the same as previous work I’d submitted. Despite the title, instructions and data fields being entirely different, their familiarity with me as a requester and the similar nature of the research, was enough for them to perform the work incorrectly. With the onset of my accepting and rejecting, the work returned almost entirely to an acceptable set of results. While hundreds of workers often contribute to your work, they tend to follow the 80-20. Eighty percent of the work is done by 20% of the workers. Because so much of the work was done before I began establishing a tone, I did not establish as high a concentration of success in the 20% workers as desired. Each bar below represents a single worker’s total submissions. Again, purple is rejected, green is approved. Your best workers will most often participate across multiple days, so doing multiple runs is fine as they will search out the work they like each day and typically look to see if you’ve posted any more of what fit their interests. The graph below shows the worker id of the 20 biggest contributors across the 4 days of work submissions (each day a different color in the bar). 4) Submit work during the Daytime hours in the U.S. Again, assume that a large portion of your work will be performed in the hours following your submission. There are dozens of other dimensions to consider in Smartsourcing, but hopefully these will give you some useful baselines. -Brent |
||
Comments
How big is the MTurk community?
Great article Brent. It's difficult to tell how big Amazon has built up the community. Do you think it can really scale for HITs that in the thousands?
It's a very big MTurk community - 100s of Thousands
Shawn,
There are more than 250,000 people that regularly or semi-regularly sign on to do MTurk work. We routinely submit 10,000 HIT jobs that are done in a matter of days if not hours (depends on how much we are paying per HIT). Others submit jobs in the 100s of thousands of HITs with results in days. It's big.
-Brent
Post new comment