Crowd Sourcing and Game With A Purpose
- 1 Readings
- 2 Reading Critiques
- 2.1 Vineet Raghu 1:21:03 11/28/2015
- 2.2 Mahbaneh Eshaghzadeh Torbati 17:12:36 11/29/2015
- 2.3 Zihao Zhao 9:33:30 11/30/2015
- 2.4 Priyanka Walke 0:16:25 12/1/2015
- 2.5 Kent W. Nixon 14:37:40 12/1/2015
- 2.6 Mingda Zhang 16:50:03 12/1/2015
- 2.7 Chi Zhang 22:07:10 12/1/2015
- 2.8 Adriano Maron 22:11:28 12/1/2015
- 2.9 Manali Shimpi 0:27:29 12/2/2015
- 2.10 Lei Zhao 1:39:40 12/2/2015
- 2.11 Jesse Davis 6:21:15 12/2/2015
- 2.12 Darshan Balakrishna Shetty 8:33:00 12/2/2015
- 2.13 Sudeepthi Manukonda 8:40:16 12/2/2015
- 2.14 Matthew Barren 9:00:51 12/2/2015
- 2.15 Ankita Mohapatra 9:58:55 12/2/2015
- Designing Games with a Purpose Von Ahn, L, and Dabbish, L., CACM 8/2008.
- Crowdsourcing user studies with Mechanical Turk, Kittur, A., Chi, E., Suh, B., In Proc. of CHI 2008.
Vineet Raghu 1:21:03 11/28/2015
Designing Games with a Purpose In this paper, the authors describe the design of games with a purpose, which are computer games in which user’s actions can generate data that is useful to AI algorithms in solving computationally difficult problems. A key aspect of these games is that they should be entertaining to users so that they are willing to play them. Instead of trying to quantify entertainment value, the authors instead examine directly whether humans are willing to play the games, while still maintaining productive output for the computationally difficult problem. The authors then describe various types of these games including input-agreement games, output-agreement games, and inversion-problem games. Next the authors detail specific design principles that should be met in order to achieve successful games with a purpose. Some of these principles include that the game should involve randomness to maintain interest for the participants, and that the game should be difficult enough to challenge players. It is interesting to note that these principles are fairly standard for all games, and aren’t specific to GWAP. There are other principles mentioned that apply more specifically to GWAP. The first of these is maintaining output accuracy through repetition of outputs. Another crucial goal of these games is to prevent player collusion, which can be achieved through random matching of players. The final segment of the paper discusses how to evaluate games with a purpose. One of the used metrics is throughput which quantifies the amount of data generated by users per unit time. The other is the “enjoyability” of the game, which can come down to playing time per user. These metrics are combined by the author into one single expected contribution metric that measures the success of a GWAP. Overall, this article provides a very interesting introduction to GWAP. It gives detailed description of the current state of these games, and design principles that can guide future progress in GWAP. However, my only critique of the article is that many of the principles seemed relatively straightforward, i.e. it is somewhat obvious how to evaluate GWAP, as the goals are clearly to get people to play and to get people to generate useful data. -------------------------------------------------------------------------------------------------------- Crowdsourcing User Studies with Mechanical Turk This article focuses on micro-task markets such as Amazon Mechanical Turk. These types of markets use crowdsourced users to perform mini tasks to generate useful data for user studies for experimenters. Specifically, here the authors were trying to determine whether a micro-market such as Turk could be a beneficial area of research for user studies. To examine this, the authors performed two experiments using mechanical turk. In both of these experiments, users were asked to read and rate a Wikipedia article that had been previously rated by a set of expert admins of Wikipedia. In the first experiment, users were simply asked to rate the article, and provide textual feedback as to how well the article was constructed. This resulted in many fraudulent/effortless responses that were deemed invalid, since they did not provide any reasonable feedback. In addition, these ratings were not statistically significantly correlated with the experts’ article ratings. In the second experiment, users were given four verifiable questions that required users to read the article. In doing so, the responses given by the users were much more reliable, and the ratings they gave were statistically significantly correlated to the expert ratings. These experiments point to some necessary design principles for user studies in a micro-market. One is that verifiable questions to assure user effort are extremely beneficial in guaranteeing truthful responses. In addition, designing the task in such a way that completing the task properly requires as much effort as completing it incorrectly forces users to try their best in a task. Mechanical Turk could be a possible environment for user studies, but it’s clear from this article that more research needs to be done in micro-markets before these studies can be reliably done. In particular, there must be more research into the proper design of tasks for micro-markets before unleashing them into the micro-market landscape.
Mahbaneh Eshaghzadeh Torbati 17:12:36 11/29/2015
Critique for Designing Games With a Purpose: Authors in this paper are trying to use game as front end to do some task that can be only done by human. Their research is valuable, since not only it introduces the idea of games with a purpose (GWAPs), but also proposes examples and ways for system improvements. Since computer is not smart enough yet to do all the tasks, it needs human contribution too. Some of the work is boring so that people don’t want to do that. The problem is that these tasks are most of the time boring in users’ mind. There are lots of people playing games everyday for a long time. By making advantage of this behavior, GWAP can encourage human to contribute in doing tasks. Design a game that people can do the task by playing games will make the work even interesting. If people can use this time on the GWAP, there will be a lot of task done in a very short of time. However, how to make the game fun need to be considerate a lot, and also how to balance the throughput and enjoyability needs to be solved. Well-developed GWAP will make the task done in good quality while user can aslo get entertained. In nowadays’ market, there are some similar products. Instead of doing task, they do job to help user to learn something. Some of them help users to learn language, and some of them help users to exercise their brain. Even though they are different, the idea, that using game to help user to do something with pleasure, is leading all of them. I think this idea can be used a lot in the future since it can help the human computer interaction. Critique for Crowdsourcing User Studies With Mechanical Turk: This article is about online collection of data and user study by which we can get more feedback with less cost. This paper deals with micro-task market and the analysis of one example Mechanical Turk. The main point is that big sample size will make the experiment result more accurate. This paper is a try to find a way to collect sample sizes with big size and low cost. Micro-task market system provides users doing small tasks with reward. It is great for collecting users’ feedback. The traditional way of doing user study was hard to reaching big and generalized sample set. A non-generalized set may lead to bias in the result. Moreover, most of the time, traditional ways are in need of lots of cost. Mechanical Turk, which is an example of micro-task market, makes the user study become much easier to be conducted. One of the advantages is that due to the big collected sample size, the result can be more accurate. Users can preview a task and see how much money they can get from the task. For checking the tool performance and validity, the author did the group of experiment. The result is satisfactory. A good design of tasks can lead to a good rating closer to expert ratings. This kind of user study is good to get a great result due to the big sample size. But there is one problem that when the task load becomes big, time used for the task will increase. People may have less interest to finish the user study even if they can get paid from it. It may lead to a smaller sample size. Also online user study may lead to some security problem. Users secret information may leak, which is forbidden by most of the rule of user study so that increase the security of online user study can make it better.
Zihao Zhao 9:33:30 11/30/2015
“Designing Games With A Purpose” is an article which introduced three templates for ESP designers. I am quite interested in this paper, the Art of Science of this research is that there people spend many hours on computer games and we can take use of the side effects of the computer games to do some computations which is hard for computers to do solely. These kind of contribution can be applied to machine learning area as well as computer vision areas. The three type of templates are output-agreement games, inversion-problem games and input agreement games. When I read this paper, I realized that I played some kind of inverse-problem games which were quite attractive to me when I was in China. The game was playing in groups and one of the group member describe the objects and the other group members guess it. I think it was interesting because it inherently build an competitive mechanism. When people compete with each other, the game becomes interesting. I think this point can be applied to a major point to increase player enjoyment. Since games is an area very difficult for researchers to quantify some of it’s attributes, the user applied some simple but useful metrics to evaluate the games contribution and popularity. Throughput was introduced to the evaluation part which is impressing to me. Throughput here means the average number of problem instances solved per human hour. Besides the three basic templates to design a high quality of Game with A Purpose, the paper also suggests some other principles to develop these games. Such as showing the ranking status of the high skilled players in order to motivate the players interests and thus increase player enjoyment. I remember there was a time when I played a game in my undergraduate study, I put in a large amount of time to play it in order to improve my ranking status.—————————————————————————————————————— “Crowdsourcing User Studies With Mechanical Turk” is an article which introduces the studies on the Amazon Micro-task Markets Mechanical Turk. There are two studies in this paper, one is followed by another with small changes. The hypothesis of these studies is that the users’ ranking collected from the Mechanical Turk for the wiki pages have a strong correlation with those collected by the Administrators. However, the result of the first study reviews that there exits only marginal correlation between the information from Mechanical Turk and the original information. This may be caused by the “play” to the investigation by the participants. The study also reviews that there exits nearly half of the participants who randomly choose the rankings in order to get small bonus. In order to eliminate the influence by these participants, the researchers designed another study with a slight change. That’s to add some small questions before the ranking concerning the keywords from the questionnaire. These is useful because they can recognize the “playing” participants from the “true” participants. These paper is quite short with only 4 pages but it has a great influence with more than 800 citations. And there is no prototype constructions in this study. I think the reason why it is so successful is that it successfully find a way to evaluate the research in Crowdsourcing User Studies. Besides, I learned a lesson from this paper that a good research comes from good idea. The most important idea in this paper is to add some questions about the content of the pages of the wiki pages in order to figure out how familiar the user is to the questionare and thus figure out who is just “playing” with the study. The problem of finding those who is ranking randomly is hard for the computers to solve but they are easy to accomplish by people—the participants. This is some point of view from Computer Science Collaborative Work. In crowd computing, there exists some unsolved challenges. Specifically in user study on crowdsourcing, we can not control the environment the participants are in. And the users are from all over the world and thus hard to control specific participants in order to generate some conclusion in specific circumstances.
Priyanka Walke 0:16:25 12/1/2015
Reading Critique on Crowdsourcing User Studies with Mechanical Turk This paper introduces Mechanical Turk, a crowdsourcing platform hosted by Amazon to assign micro tasks to a wide user base for achieving monetary benefits. The paper mentions that the problem faced by the researchers was about gathering of people for conducting the user studies. These studies may be long, and finding people ready for participation without any gain over their time investment has problems. So in order to find users through the interne at low cost is what Mechanical Turk provides. A major problem discussed in a previous paper is about the users trying to Game the system. These kind of users can definitely harm the results that is set to be achieved and their presence and effect can be clearly seen from the first experiment conducted by the authors. The silver lining can be seen in the second experiment when we use quantifiable questions whose answer can be verified so segregate the malicious users. We can also ban such users to improve performance. The higher benefit of using such a platform is that it gives ready access to all users over the world and also the speed for conducting the user studies is fast. There are many variables which are not under our control while conducting the experiment and hence affect our results. This method is highly conceivable, but we still need to work out many problems, maybe using such techniques for local recruitment can prove to be beneficial but its application for conducting many studies still seems a bit far-fetched. We can see that the ratings derived from comparatively amateur users seem to match those of expert admins and a more detailed study could reveal some useful facts as to how to better design studies, like checking qualification of users, their reputation level and make the studies more interactive and use some principles from the previous paper. ========================================== Reading Critique on Designing Games with a Purpose This paper motivates the use of games to collect user data for performing a variety of AI and ML tasks. Though performing computation is an easy task for humans, it is indeed a difficult task for computers. Considering the significance of GWAP, it’s for sure that average human by the age of 21 spends approximately 10,000 hours playing games. Existing implementations were ESP, Peekaboom, Phetch, Verbose etc. The paper discusses the design principles of GWAP in order to specify the major features and advantages of such games and also the key points that need to be considered while making them. It also encompasses the guideline to design the games so that administrative work can be incorporated into those games and playing them should provide the desired outcome thereby giving a direct one-to-one correspondence. It provides with the three games structure templates namely, input-oriented, output-oriented and inversion-problem games. The sheer purpose is to make people interested in playing those games because they like it and not to meet any monetary benefits. This is achievable because people like playing games in teams in order to achieve the common goal of winning. It also states about the problem of correctness which includes labelling and identification games discusses in the paper. The use of Taboo words helps in improve the verbosity of the information gathered. The paper can be summarized by saying that such an approach is promising in its ability to use crowdsourcing to contribute to the field of AI. The idea of creating a first person shooter game for the task of system administration is a unique and promising concept.
Kent W. Nixon 14:37:40 12/1/2015
Designing Games With A Purpose In this article, the authors discuss the lessons they have learned designing games with a purpose (GWAP), and a general framework for designing future GWAP. GWAP are used to crowdsource solutions to problems which are not computationally tractable with computers (image labeling, caption generation, etc.). To do this, GWAP cleverly disguise the work the users are doing as a fun game, so that people are more willing to do "work." The authors describe how such games must be designed to be fun, and discuss three main design styles such games can take. They also discuss how to avoid cheating/incorrect answers. This is an interesting part of the article, as it discusses how multiple inputs from multiple users can be combined to identify the "correct" responses, with users being screened out as unreliable if they provide incorrect responses to appropriately seeded known-solution inputs. They also talk about how by enforcing randomness in play partners (in the games which have them), the likelihood of collusion can be minimized in the asymmetrical games. This article is not at all related to my research, but it was interesting in how it described the requirements for the underlying work to be somehow integrated into the core game mechanics of the GWAP. Certainly, if this is mishandled, the game would become too transparent and would no longer be a game. I actually visited the authors' site in hopes of trying some of these games, but found that it was closed in 2011. Crowdsourcing User Studies With Mechanical Turk In this article, the authors discuss the reliability of task crowdsourcing platforms such as Amazon's Mechanical Turk. The idea behind systems such as this is that a person or company may require some task to be completed which is difficult with a computer, but easy for a human. Such tasks include identifying objects in images and extracting information from receipts. In these cases, the task may be posted to a crowdsourcing platform with an accompanying monetary payment for completion. Anyone may sign up to the service and complete the task. Of course, such a system does not guarantee that the resultant work is correct. The authors demonstrate this through a Wikipedia article quality measurement task, in which users are asked to read and rate articles. In an initial task, with no effort made to guarantee response credibility, a significant amount of recorded responses were garbage, with lots of copy and paste occurring and answers being submitted after a minimal amount of time, far less than necessary to complete the task. This was also apparent in that the aggregate article scores did not match up with recorded expert reviews. Only after the authors added verification questions (which forced users to actually read the articles) did the response quality increase. This article was interesting, and provided some insight. Especially in regards to this class, were we have not received back any grades for 2 of the homework assignments or any of the readings. With minimal time left in the semester, I doubt anyone will ever read this, and grades will simply be based on whether or not something was actually submitted. So here you go.
Mingda Zhang 16:50:03 12/1/2015
Designing Games With a Purpose In this paper, the authors proposed an interesting idea of combining the game playing process with solving some problems, especially those are difficult for computers. They named it as GWAP, Game With A Purpose. The authors introduced several design principles for developing and evaluating such games, and gave a few well-known examples as successful pioneers. From my own perspective, this could be a win-win solution for many real world problems. Researchers in University of Washington have developed an online puzzle video game called Foldit. They constructed a game based on protein folding and encouraged players to find the most appropriate structure for proteins. In fact they did make some unbelievable achievements. Some of the most difficult protein structures were solved by thousands of players in a few weeks. This idea sounds totally promising but my concern is about the motivations. As pointed out in the paper, people play games for fun, rather than some novel ambitions of making contributions to science. Therefore the developers have to think about a suitable way to transform the scientific problems into some attractive games. This is indeed much harder than it seems to be. Crowdsourcing User Studies with Mechanical Turk This paper analyzed the advantages and drawbacks of Amazon's Mechanical Turk system from two evaluation experiments. The authors investigated the utility of a micro-task market by collecting user measurements. They also discussed about the design considerations and gave some high level suggestions for others. From my own experience, Amazon's Mechanical Turk provided a platform to distribute jobs in a wider area thus might be a good news for some user studies, but it also has some inherent concerns such as the job quality. In summary, this could be treated as a meaningful attempt but still requires improvements.
Chi Zhang 22:07:10 12/1/2015
Critiques on “Designing Games with a Purpose” by Chi Zhang. This paper talks about general method for seamlessly integrating computation and gameplay. However, there’s still much work remaining to be done. This paper mentions GWAP approach, which represents a promising opportunity for everyone to contribute to artificial intelligence. According to this paper, by leveraging the human time spent playing games online, GWAP game developers are able to capture large sets of training data that express uniquely human perceptual capabilities. It’s a very good paper, very innovative thoughts. ------------------------------------------------------------- Critiques on “Crowdsourcing user studies with Mechanical Turk” by Chi Zhang. It is important for many aspects of the design process to have user study. And it involves techniques ranging from informal surveys to rigorous laboratory studies. But interviewees are required to trade off sample size, time requirements, and monetary costs. The paper mainly investigates the utility of a micro-task market for collecting user measurements. It discusses design considerations for developing remote micro user evaluation tasks. This is a very interesting paper and it gives out many inspiring insights.
Adriano Maron 22:11:28 12/1/2015
Designing games with a purpose: GWAPs (Games With A Purpose) have had increasingly importance in helping several areas of computer science, such as image searching, artificial intelligence, and others. Success stories such as ESP Game, Peekaboom, and Phetch, demonstrate the effectiveness of using a large number of users to provide subjective information about some digital object. With this topic in mind, this article proposes means to develop and evaluate GWAPs so that more and better games can be created. Given that the GWAPs do not provide financial reward to the users, it is necessary to apply certain techniques to create a game that keep users entertained and motivated to keep playing. The authors propose 3 templates to be used in order to transform the computation problem to be solved into a GWAP: (i) output-agreement games, where multiple players see the same input (e.g., image) and must agree on the output (e.g., label) without knowledge of each other's choice; (ii) inversion-problem games, where both players cooperate towards a common goal; and (iii) input-agreement games, where players must guess each other's inputs based on user-produced outputs describing their inputs. The key factors among all templates are: Initial Setup (how many players and inputs), Rules (what the players are allowed to do) and Winning Condition. The combination of the rules and the winning condition must encourage the players to enter correct information in order to win, therefore creating a scenario where both players cooperate towards the common goal. Besides the generic templates, additional factors such as timed responses, scores, skill level and others, increase the player enjoyment and motivate them to keep playing. Finally, GWAPs' evaluation is another important factor to define whether or not a game is fulfilling its purpose. Throughput (average number of problem instances solved per human-hour), lifetime play (amount of time played by each user), and expected contribution (number of problems solved by each user) are some of the relevant metrics. The first quantifies the efficiency of the game; the second, the enjoyability; the third, measures the quality of a GWAP. This is a very interesting reading, and provides a good overview about the general requirements for games based on crowdsourcing. This is definitely a promising area, where such studies are necessary. However, it would be interesting to see a detailed analysis of the GWAPs mentioned using the metrics proposed in the paper. =================================================== Crowdsourcing User Studies With Mechanical Turk: User evaluation that requires a large number of participants often requires a large effort in terms of work-hours and financial support. However, services such as Amazon's Mechanical Turk allow one to create micro-tasks that can be posted on-line, and many users can solve those tasks in exchange for a small monetary reward. In such approach, a large number of users spend little time in solving the tasks, but the collection of those results are important for the researchers. This paper focus on the design decision of such micro-tasks identified after two experiments where users of the service had to classify Wikipedia articles. The authors found out that, when a simple rating system was provided, users were providing close-to-random answers and not reading the articles in details. For the second experiment, the authors asked for more detailed information about the articles, requiring more time from the users. I think this paper does not give any meaningful insight about the utility of Mechanical Turk. First of all, the monetary reward for the micro-tasks is very small, and it is obvious that users won't put a large effort into the task if that is not strictly necessary in order to receive the reward. Second, such service is useful for tasks that require a small effort from the users, given that the reward becomes interesting for turkers when they complete dozens of tasks in a short period of time. I find hard to believe that one would spent 5 minutes reading a random article in exchange for a few cents.
Manali Shimpi 0:27:29 12/2/2015
Designing games with a purpose: This paper introduces methods that enables games to be designed in such a way that humans can perform computations as a part of the game. These computations can then be enhanced using AI. It is very difficult for a computer to perform such computations that are done by humans easily. Thus these methods provide a way for computers to learn these computations from users by collecting the data. Author calls it, GWAP, Games with a Purpose. The example of GWAP mentioned in this paper is ESP game which is a google image labeler. The types of games that can be turned into GWAP are input and output agreement games and inversion problem games. Thus efficiency of GWAP is the combination of throughput and the ability to entertain users. There are challenges in achieving accurate output along with players’ enjoyment. It is highly likely that answers provided by the players are subjective or under religious or cultural influence. -------------------------------------------------------------------------------------------Crowd sourcing user studies with mechanical turk: In this paper, Author discusses the method of collecting data from micro-tasks. The author also discuss design considerations for developing remote micro user evaluation tasks. As an example, author discusses Amazons mechanical turk, that is used by any user to post tasks all over the world and provide monetary and non-monetary rewards in an exchange. The task involved require very less time and efforts. It can be challenging to adopt to such a task from the original problem. The author performed two tests on this system. In the first test, users were asked to assess the wikipedia page and analyze if the changes made were correct. The users answers them compared with that of admins. It was found that many users tried to game system by providing junk answers to gain maximum pay.
Lei Zhao 1:39:40 12/2/2015
Title: Designing Games with a Purpose A game which can let user improving AI algorithms while playing with entertainment is called GWAP. There are three kinds of game mentioned in this paper that proved to be successful GWAP: output-agreement games, inversion-problem games, and input-agreement games. GWAP (for Games With a Purpose) was an academic project at CMU that explored the idea of Human Computation games to solve problems that computers cannot solve. I think this is a very good idea in doing visualization. It is very difficult for computers to determine what is in an image. If user can help adding tags for the image, later search will be easier. The problem is how to let user doing such a job without being boring. The entertainment is very important, as the authors suggest in this paper several times. The authors offer 3 successfully kinds of games. I am interested in the unsuccessful kinds of games. I think the authors should list some of them and tell us why they are not successful. Title: Crowdsourcing User Studies With Mechanical Turk This paper investigate the utility of a micro-task market for collection user measurements, and discuss design consideration for developing remote micro user evaluation tasks. Amazon’s Mechanical Turk is the example the authors use to examine the utility. “Amazon Mechanical Turk is based on the idea that there are still many things that human beings can do much more effectively than computers, such as identifying objects in a photo or video, performing data de-duplication, transcribing audio recordings, or researching data details. Traditionally, tasks like this have been accomplished by hiring a large temporary workforce (which is time consuming, expensive, and difficult to scale) or have gone undone.” I think the description Amazon gives is different from describing in this paper. I believe it is proper to say this Amazon mechanical Turk is a GWAP. The entertainment of these games is the rewards small amount of money. This paper also mentions that special care is needed in formulating tasks in order to harness the capabilities of the approach. Although the micro-task markets offer a quick access to a large user and data pool, we still need to pay extra attention to the safety and reliability of the data that collected this way.
Jesse Davis 6:21:15 12/2/2015
Designing Games with a Purpose This was a terrific article/paper on the subject of designing games with the purpose of helping us accomplish tasks that are considered otherwise boring or repetitive. Personally, I’ve read the separate paper on the ESP game and it was a wonderful game that went into a lot of detail to try to get as much clean data as they could (data that was uncorrupted/untainted by “trolls” that tried to trick the system). The article is very well written and as a [hopefully] future big name game designer I agreed with many of the points that the paper made and felt they hit all (most) of their points in an organized manner that made sense. This is a great article that can help pave the way for future game designers looking to accomplish a similar goal. Crowdsourcing User Studies with Mechanical Turk This paper was an excellent read/study/summation of the Amazon Mechanical Turk tool. I myself used this last semester (Spring 2015) to gather data for my final project for Computer Vision. One of the neatest parts about this tool is that you are able to determine the “expertise” upon request of the user so that you know you’re getting a quality tester. However, the higher the expertise, the higher the demand for these testers and the less likely they will be to take on a task that isn’t worth much payment wise. Short paper, so this is a short summary; I feel this paper did a good job of summarizing and testing out the Amazon Mechanical Turk tool and makes the reader wary of the ins and outs as well as some of the downfalls when using it.
Darshan Balakrishna Shetty 8:33:00 12/2/2015
Designing games with a purpose: This paper shows how people can solve problems when playing the game, so that we can use those techniques to solving hard problems. The paper basically shows three templates of games which can let users provide useful annotations which they enjoy the game. The annotations here are very simple. I played those games, and I don’t think they’re fun. But this is an interesting research direction. If we think about other interesting problems, like automatic driving. If we turn “need for speed” into a user data collector, we might be able to learn useful info for the computer. But, think of solving the language translation problem, then it is hard to think about a fun thing to do for users which can also generate useful annotations. As the authors say “people enjoy the game makes them want to continue playing, in turn producing more useful output.” Which really inspires me a lot, it is the reason why the real measure of utility for a GWAP is a combination of throughput and enjoyability. ---------------------------------------------------------------------------- Crowd sourcing User Studies with Mechanical Turk: This paper shows several aspects about the Micro-task Markets, Mechanical Turk. The authors first give some introduction about Mechanical Turk; then briefly talks about the benefits from Mechanical Turk; thirdly, give two experiments on Mechanical Turk and at last, provide some design recommendations. I agree with the first design recommendation in this paper. Explicitly verifiable questions are very important in Mechanical Turk. Actually, there is a lot attention given on how to design these questions which is very important. Those questions should be used to exclude malicious user but it is difficult to promise that verifiable questions succeeds to keep those malicious user away.
Sudeepthi Manukonda 8:40:16 12/2/2015
“Designing Games with a Purpose” is an interesting paper that touches upon the idea of gaming and the logics behind that. There are many tasks that are trivial for humans but implementing them using computer programs is a tedious task. This is the challenging task. People spend a lot of time playing games. The main idea behind gaming is the combined work of both Computation and Game Play. The main question is how to design games that everyone will love playing them as well as the games that produce high quality inputs. Several structural achieves and several goals are encouraging players to produce correct outputs, partially verify the output correctness and provide an enjoyable social experience. Games could be made more interesting by introducing challenge, introducing competition, introducing variation and introducing communication. The output accuracy can be measured by random matching, player testing, repetition and taboo outputs. GWAP success can be checked by checking the product of throughput and average lifetime play. To conclude this, there are many questions that arise by the end of this paper and intact the authors only end with a few questions. Such questions invoke brainstorming, and questions such as how to motivate accuracy as well as creativity and diversity, what kind of problems fall outside of GWAP, ensure good future to the gaming field.
Matthew Barren 9:00:51 12/2/2015
Summary of Designing Games with a Purpose: The authors explore GWAP’s, Games with a Purpose, where player output and interactions are used as data for computer system learning. Specifically, the authors focus on games that train computers by identifying knowledge through competition. GWAP’s engage users in a competition to train a computer system about a set of data. Doing so, allows for users to have an enjoyable experience, and simultaneously, a computer system can become more knowledgeable about the content being evaluated. Additive to the game are competition feedback and incentives to keep users engaged, such as timing and leader boards. It is particularly interesting how the games are structured in such a way that incentivizes accuracy. Of course, the games use point systems to motivate accurate. In addition, the games are structured in such a way that encourages accuracy. For example, the output-agreement games require the users to write down similar answers for a particular stimulus. The more answers the users match will result in more points. Another key feature to avoiding inaccurate answers is framing. The games are framed in a manner that does not explicitly tell users to produce the correct answer, and rather, this is an implicit outcome. Evaluation of GWAP’s is important to assign the best game type to achieve a particular result. The authors chose several key parameters to evaluate the multiple dimensions of a GWAP such as throughput, lifetime play, and expected contribution. The importance of these parameters is that they balance between measuring the quality and efficiency in both computer and human contexts. This is key because there is a symbiotic relationship between the human and computer in this instance, and in order for both to get the desired results; the other must want to contribute. Games with a purpose have the potential to be extended to many other learning situations. The games highlighted are more overt acts of training a computer system to recognize and recall. Instead, there is the potential for gaming to do more observing of human interactions, and then draw conclusions from these interactions. This type of studying will be far more abstract, but could provide more rich insight.
Ankita Mohapatra 9:58:55 12/2/2015
Designing Games with a Purpose This paper seems like a great research contribution because it covers some fundamental principles that will help guide others to create great games with purpose that may have real benefit for society. We can assume the authors are credible because they have created many GWAPs. I immediately thought of the game Foldit, which may have been developed after this paper. The drawbacks highlighted in the related work section were interesting. Specifically, I began to wonder how this paper would recommend eliminating error from user contributions that could occur in the Open Mind Initiative. Another interesting observation was that gameplay should be tightly integrated with the work being accomplished to make the tasks appealing. The scope is refined so that “success” is measured in terms of human hours played, assuming that this represents users’ desire to play. After providing the basic templates of gameplay, the authors turn the focus to increasing player enjoyment, which will increase the hours that people play as well as improve the quality of their output. They outline game-design principles that increase enjoyment. A clear theme running through this discussion was the use of a variety and multiplicity of feedback for motivational purposes. Although this is inconsistent with earlier statements in the paper about success being measured in human hours, the metric of “efficiency” was introduced for gauging success in terms of number of problems solved per human hour. ------------------------------------------------------------------------- Crowdsourcing User Studies with Mechanical Turk The title of this paper states the unique opportunity that researchers have to improve user study sample size at low cost in utilizing “micro-task markets.” The authors claim to have discovered special considerations that must go into user studies taking advantage of the crowdsourcing approach. I have actually used Mechanical Turk briefly just after hearing about it by word of mouth. Many potential difficulties with using Mechanical Turk were identified. Most of these issues have to do with lack of knowledge about the users and the lack of mechanisms for ensuring quality responses from users who may be malicious. They conducted two experiments, one quantitative another qualitative (I thought this was important), to try and assess the potential of micro-task markets in research studies. The researchers made a wise decision in giving users the “Featured article criteria” rubric; one would assume this improved the consistency of user responses by giving users a shared set of expectations. Results from that experiment, unsurprisingly, demonstrated that Mechanical Turk was extremely susceptible to users who wish to game the system (i.e. not provide meaningful responses) and receive rewards quickly. Their redesign of the experiment attempted to address these issues. I did not feel that the discussion of the redesign decisions was very clear. Why, for example, does providing concretely verifiable information reduce low-quality submissions? Were users blocked from submitting their responses if this information was incorrect? The verb “required” is used but it’s not clear that a user could not simply type in random numbers and continue on. If my assumption is correct, then this was a clever technique to ensure that even users wishing to rush through their response would be required to do the same tasks for as long as the well-intended users, exposing them to the contents of the article and thereby improving their response quality.