Crowd Sourcing and Game With A Purpose

From CS2610 Fall 2016
Jump to: navigation, search

slides


Readings

Reading Critiques

Haoran Zhang 22:20:28 10/30/2016

Designing Games with a Purpose: People play not because they are personally interested in solving an instance of a computational problem but because they wish to be entertained. And nowadays, entertainment part plays a more important role of games. A lot of people just play games for fun. Thus, it may be helpful to design games that not only entertained people, but also let them solve computational problem at the same time, so that we can use it to do experiments. The main idea is to increase player enjoyment, and make more fun during the game. It is essential that the number of tasks for players to complete within a given time period is calibrated to introduce challenge and that the time limit and time remaining are displayed throughout the game. The GWAP system gives a couple of rules for designing a game, and a way for evaluation. For me this is good, because everyone loves to play games, if people can play games while helping designer to design a better game. Everyone will get benefit from the system. Crowdsourcing user studies with Mechanical Turk: Amazon’s Mechanical Turk allows users to finish some task, and get payment from finish the task. In the system user can preview a task, see the payment offered, and how many instances remain available. Thus, many data mining or machine learning researchers use this system to collect data from huge number of users. They just introduce some task, and let users to finish it to collect data. But most of time, they just want user to work on the data itself, for example annotating the data. But actually, this system also can be used as an evaluation system, to let users finish evaluating task in the system, and pay for the users. With this platform, researchers could collect more data with even less money. In this paper, authors talk about the possibility of doing this, and give few experiments. They prove that such platform are promising platforms for conducting a variety of user study tasks, and it can handle surveys to rapid prototyping to quantitative performance measures very well. But it also has its own limitation, for example, ecological validity cannot be guaranteed, since there is no easy way for experimenters to fully control the experimental setting, leading to potential issues such as different browser experiences or distractions in the physical environment. This is not the only problem it has, it also has other problems, thus, for some specific problem, this system is really helpful.

Zhenjiang Fan 19:10:44 11/17/2016

Designing Games with a Purpose::::::::::::::::::::::Introducing Machine Learning and AI techniques into the development of games should be interesting, but you have to think how to integrate these techniques into games. The set of guidelines the article has articulated for building GWAPs represents the first general method for seamlessly integrating computation and gameplay, though much work remains to be done. Indeed, it hopes researchers will improve on the methods and metrics it has described here. Other GWAP templates likely exist beyond the three the article has presented, and it hopes future work will identify them. It also hopes to better understand problem-template fit, that is, whether certain templates are better suited for some types of computational problems than others. The game templates the authors have developed thus far have focused on similarity as a way to ensure output correctness; players are rewarded for thinking like other players. This approach may not be optimal for certain types of problems; in particular, for tasks that require creativity, diverse viewpoints and perspectives are optimal for generating the broadest set of outputs.17 Developing new templates for such tasks could be an interesting area to explore. The data collected from games could be a great resource for training data for both ML and AI.:::::::::::::::::::::::::::::::::::: Crowdsourcing User Studies With Mechanical Turk::::::::::::::::::::::::::::::::User studies is very important part of design process. In this study we examined a single user task using Mechanical Turk, finding that even for a subjective task the use of task-relevant, verifiable questions led to consistent answers that matched expert judgments. However, Mechanical Turk also has a number of limitations. Some of these are common to online experimentation: for example, ecological validity cannot be guaranteed, since there is no easy way for experimenters to fully control the experimental setting, leading to potential issues such as different browser experiences or distractions in the physical environment. Further work is needed to understand the kinds of experiments that are well-suited to user testing via micro-task markets and determining effective techniques for promoting useful user participation. Hundreds of users can be recruited for highly interactive tasks for marginal costs within a timeframe of days or even minutes by using Mechanical Turk. But it has its limitations too, for example, it does not consider the unique context of the underlying user study.

Steven Faurie 14:16:19 11/30/2016

Steve Faurie, Designing Games With a Purpose: This paper describes the development of games that are intended to have players solve problems as a side effect of playing the game. Several examples given describe a game where two isolated players are shown an image of something and they have to agree on a label for that image. Another game is similar to the game twenty questions. Where one of the users attempts to describe something to another user and that user has to guess what they’re describing. The answers provided by the players can be used as labels or data for other tasks like machine learning or image recognition. The paper goes on to describe general rules of game development that should lead to more enjoyable games. Things like adding timers to increase difficult and ranking systems so users can track their progress lead to increased engagement. The evaluation of GWAPs is interesting. You might think that the game that produces the most viable output in the shortest amount of time would be the best game. But you also need to account for how enjoyable the game is to play. For they reason the authors developed three metrics. Throughput, the average number of problem instances solved per human hour. Expected contribution, the throughput multiplied by how much time you expect an individual user to put into the game. And finally ALP, the amount of time you can expect one person to spend playing the game. This is an interesting concept and one I have heard has been successfully used to solve protein folding problems for biologist. … Crowdsourcing User Studies with Mechanical Turk: This paper looks at the viability of using mechanical turk to conduct user studies. Mechanical Turk is operated by Amazon. It lets people perform small tasks for a small fee. One of the first experiments described by the authors noted how this incentivizes completing tasks quickly and maybe not that accurately. In this experiment they asked users to rate how good certain Wikipedia articles were. They found that there were some users who seemed to game the system by inputting useless responses. They also found there was only a week connection what Wikipedia had rated the article and how Mechanical Turk users did. Experiment two was similar however it required users to input verifiable answers into the system about the article along with a discussion about what was good and back about the article. They received fewer responses per user however there were fewer invalid responses and the responses were more like the expert ratings. The findings indicated that when using mechanical turk you should provide questions that have verifiable answers, answering correctly should not be much harder than entering nonsense and you should track things like task duration to make sure users are actually paying attention. The authors concluded that Mechanical Turk has potential to be useful in user studies, but the researchers need to be aware of its limitations and know how to appropriately design experiments for the system.

Anuradha Kulkarni 20:50:43 11/30/2016

Designing games with a purpose: This paper gives an insight of how people can solve problems when playing the game. These techniques can be employed in solving hard problems. A game which can let user improving AI algorithms while playing with entertainment is called GWAP. There are three kinds of game mentioned in this paper that proved to be successful GWAP: output-agreement games, inversion-problem games, and input-agreement games. The paper talks about three templates of games which can let users provide useful annotations which they enjoy the game. The annotations here are very simple. This is an interesting research direction. The main issue here is that the users don’t find all the games fun or interesting. The author claims that “people enjoy the game makes them want to continue playing, in turn producing more useful output.” The authors mention 3 successfully kinds of games but doesn’t list unsuccessful kinds of games. Crowd sourcing User Studies with Mechanical Turk: This paper gives an insight the utility of a micro-task market for collection user measurements, and discuss design consideration for developing remote micro user evaluation task. This paper talks about the several aspects about the Micro-Task Markets, Mechanical Turk. The paper starts with some introduction about Mechanical Turk; then moves on to the benefits from Mechanical Turk and gives two experiments on Mechanical Turk and provides some design recommendations. This paper also mentions that special care is needed in formulating tasks in order to harness the capabilities of the approach. Although the micro-task markets offer a quick access to a large user and data pool, still extra attention is needed to the safety and reliability of the data.

Alireza Samadian Zakaria 22:28:11 11/30/2016

“Crowdsourcing User Studies with Mechanical Turk” is a paper about using Amazon’s Mechanical Turk as a tool for collecting data in the user study. Amazon’s Mechanical Turk is a system that assigns simple tasks to the users and by doing that they can earn a little amount of money. According to the paper, there are three challenges in using this system: tasks should be small, the answers should be bona fide, and the diversity and unknown nature of the users which can be good sometimes. In order to test the utility of Mechanical Turk as a user study platform, the authors have conducted two experiments. In the first one, they gave Mechanical Turk users a task in which they should rate 14 Wikipedia articles based on the criteria mentioned in Wikipedia and they compared their answers to the answers from a group of Wikipedia admins. The correlation between these two sets of answers was marginally significant. They conducted a second experiment, in this experiment they gave some warm up questions by which the users can be prepared to rate the articles in a better way. This time, the results were better the p-value was less than 0.01; this shows that we can get a judgment same as the experts’ judgment by designing a good task. At the end, the authors propose some methods that we can use to design a good task.

nannan wen 22:53:46 11/30/2016

Designing Games With A Purpose by Von Ahn, L, and Dabbish, L. Review: In this paper, the authors introduced three templates for ESP designers. I think it is an interesting paper, because there are a lot of people who spend quite a lot of time on computer games, but in this paper, they used the features we can get from those games to do some computations which is hard to realize otherwise. These kind of applications can be applied to machine learning areas as well as computer vision areas. Even though performing computation is an easy task for humans, but it is very difficult for computers. Considering the significance of GWAP, it’s for sure that average human by the age of 21 spends approximately thousands of hours playing games. There are some existing implementations like ESP, Peekaboom, Phetch, Verbose etc. The paper discusses the design principles of GWAP in order to specify the major features and advantages of such games and also the key points that administrative work can be incorporated into those games and playing them should provide the desired outcome thereby giving a direct one-to-one correspondence. It provides with the three games structure templates namely, input-oriented, output-oriented and inversion-problem games. It also talked about the correctness which includes labelling and identification games. ------------------------------Crowdsourcing User Studies with Mechanical Turk by Kittur, A., Chi, E., Suh, B., Review: In this paper, the author introduces Mechanical Turk, a crowdsourcing platform hosted by Amazon to assign micro tasks to a wide user base for achieving monetary benefits. The paper mentions that the problem faced by the researchers was about gathering people to conducting studies, of course, participants will get paied if they do that. These studies my be long, and finding people ready for participation without any gain over their time investment has problems. So in order to find users through the internal at low cost is what Mechanical Turk trying to provide, I tried to do the survey before, it seems that Mechanical Turk does pretty well in this perspective. A major problem discussed in a previous paper is about the users trying to Game the system. These kind of users can definitely harm the results that is set to be achieved and their presence and effect can be clearly seen from the first experiment conducted by them. However, I think the most important idea in this paper is to add some questions about the content of the pages of the wiki pages in order to find out whether participants are taking the study seriously or just to playing with it.

Tazin Afrin 0:21:43 12/1/2016

Critique of “Designing Games With A Purpose”: More than 200 million aggregates hours of computer games are played each day. This data generated as a side effect of the game can be used to train some AI algorithms to gain insight of user behavior and solve some computational problem. To do constructive channeling of human brainpower, the authors present some general design principles of computer games with purpose (GWAP). An example of computer game is labeling some image and getting reward for it, like tapping certain number of human faces within a certain time constraint. If a human plays this game, he will not plays it to solve some computational problem, rather just to be entertained. But the data collected from this type of games can be used to solve some computation AI problem later. However the GWAP approach is characterized by three motivating approaches. The increasing number of people around the world connected to the internet these days, human friendly tasks, and shear number of time that people send online gaming or just online surfing. The most important part is the player entertainment. A GWAP must be designed to be enjoyable to the players. Because if players don’t think that the game is fun, they will not want to continue, rather they will automatically continue if can get the fun of the game. Also a game need to be challenging. One of the challenging part the timely response. Setting some time limit challenges a user to finish some work, if the player cannot then maybe come back again and do the task again. In this way data for repeatative work can also be collected. Also introducing scoring system is challenging and some randomness in the game also help holding player’s attention. Although a lot of works need to done, but the set of methods given in this paper are quite fundamental. Some future work can be finding some template for GWAP described beyond the paper. Overall, it is promising opportunity for future research. ------------------------------------------------------------------------------------- Critique of “Crowdsourcing user studies with Mechanical Turk”: Probably one of the very important and interesting topic that we learned in this course is how to do a good user study and how to evaluate data. These day Amazon Mechanical Turk or simply mturk is one of the bests tool to do user study through crowdsourcing within a very short period of time that would take a long time otherwise. In this paper, the authors introduce the utility of micro-task markets such as mturk for collecting user data and present some design consideration of the tasks. Although these systems are automated and has huge potential, special care should be taken while performing a study on mturk. To evaluate the authors ran two experiments. In the first experiment, they found high proportion of suspect ratings. They found that there were only marginal correlation of workers’ quality rating and with expert admins. However in experiment 2 they changes the design study a little bit and found a better match to expert ratings. Form these experiments they authors found that, the use of relevant and verifiable questions led to consistent answers. This result suggests that, mturk could be used for rapid and iterative prototyping by asking users subjective questions. However, the limitation of mturk is that the ecological validity cannot be guaranteed. Also it does not support to control the participant assignment, which makes a very simple between subject study nearly impossible. So although it is a great platform to do some user study, special care should be taken in designing the task.

Keren Ye 1:34:09 12/1/2016

Designing Games With A Purpose The article talks about games designed by the research group. It proposed the general idea at the very beginning stating that people play not because they are personally interested in solving an instance of a computational problem but because they wish to be entertained. In the related works, the authors mentioned about 1) networked individuals accomplishing work, 2) open mind initiative, 3) interactive machine learning, and 4) making work fun. Then, in the next part, the authors stated the key idea concretely, that is, the gamer are desired to be entertained. By explain the idea, the authors explored three game-structure templates that generalize successful instances of human computation games: output agreement games, inversion-problem games, and input-agreement games. The authors claimed that It is essential that the number of tasks for players to complete within a given time period is calibrated to introduce challenge and that the time limit and time remaining are displayed throughout the game in order to increase player enjoyment. And the real measure of utility for a GWAP is therefore a combination of throughput and enjoyability. In the last part, the authors described their evaluation methods in details. In conclusion, by using the games designed by the developers, the game developers are able to capture large sets of training data that express uniquely human perceptual capabilities. This data can contribute to the goal of developing computer programs and automated systems with advanced perceptual or intelligence skills. Crowdsourcing User Studies With Mechanical Turk The paper presents a brief introduction of using the Mechanical Turk, which offers a potential paradigm for engaging a large number of users for low time and monetary costs to do user studies. It investigates the utility of a micro-task market for collecting user measurements, and discusses design considerations for developing remote micro user evaluation tasks. In the experiments, the authors conducted two experiments to test the utility of Mechanical Turk as a user study platform. They used tasks that collected quantitative user ratings as well as qualitative feedback regarding the quality of Wikipedia articles. The results show that the Mechanical Turk has some promising aspect, however, special care must be taken in the design of the task, especially for user measurements that are subjective or qualitative.

Xiaozhong Zhang 1:43:46 12/1/2016

Designing Games with a Purpose The author articulated a set of guidelines for building GWAPs, which represents the first general method for seamlessly integrating computation and gameplay. The game templates the author developed have focused on similarity as a way to ensure output correctness; players are rewarded for thinking like other players. This approach may not be optimal for certain types of problems; in particular, for tasks that require creativity, diverse viewpoints and perspectives are optimal for generating the broadest set of outputs. The games the author designed have focused on problems that are easily divided into subtasks. The GWAP approach represents a promising opportunity for everyone to contribute to the progress of AI. By leveraging the human time spent playing games online, GWAP game developers are able to capture large sets of training data that express uniquely human perceptual capabilities. This data can contribute to the goal of developing computer programs and automated systems with advanced perceptual or intelligence skills. Crowdsourcing User Studies With Mechanical Turk The paper claimed that user studies are important for many aspects of the design process and involve techniques ranging from informal surveys to rigorous laboratory studies. However, the costs involved in engaging users often requires practitioners to trade off between sample size, time requirements, and monetary costs. Therefore, the paper introduced one of the Micro-task markets, Amazon's Mechanical Turk, which offers a potential paradigm for engaging a large number of users for low time and monetary costs. The author investigated the utility of a micro-task market for collecting user measurements, and discuss design considerations for developing remote micro user evaluation tasks. Although micro-task markets have great potential for rapidly collecting user measurements at low costs, the author found that special care is needed in formulating tasks in order to harness the capabilities of the approach.

Debarun Das 4:17:51 12/1/2016

“Designing Games With a Purpose”: This article discusses about the design and development of ‘Games With A Purpose’ (GWAPs). As a ‘side effect’ of playing these tasks, 'people perform such tasks that computers are not able to perform'. This means that user’s action can generate data to help solve a computationally difficult problem in AI. In the words of the author, users are more inclined to play a game because of entertainment, not because of solving a computationally hard problem. So, entertainment is a key for developing these games. This paper initially discusses the related works, about the different basic emerging concepts related to this field (like Open Mind Initiative, Interactive Machine Learning, Making Work Fun etc). Then, it discusses about the different types of games (keeping in mind the desire to be entertained with respect to the users). Finally, it discusses about the different design guidelines for developing these games. Overall, I believe it is an interesting article and presents a promising topic. ===================================================================== “Crowdsourcing user studies with Mechanical Turk”: This paper discusses about performing user study and collecting data from micro tasks. The author initially discusses about a Micro task market called ‘Mechanical Turk’—the basic concepts, ideas and benefits of it. Further, it discusses about two experiments on Amazon’s Mechanical Turk. Finally, it discusses about the limitations and advantages of the system and provides design recommendations that can be applied. Overall, this paper discusses about a novel platform to do user studies (ranging from surveys to prototyping etc) where users can be recruited for little cost. However, special care should be taken when the user study is qualitative. Although this was written in 2008, I believe it provided the basic idea and initiative for more research in this area.

Zuha Agha 9:01:50 12/1/2016

Designing Games with a Purpose. This paper discusses the design principles of games that have a purpose, where user interactions and data help train the computer to become more knowledgeable and solve difficult problems. The paper discusses the underlying themes for such games and describes several types of games including input-agreement, output-agreement and inversion problem games. Some of the most important aspects for such games are that they should keep the player engaged, have incentives for the player and be challenging enough so that the player does not get bored. Moreover, accuracy is another important aspect for such games and several strategies could be employed in the design to increase user’s accuracy including making the user match similar answers explicitly or framing implicitly. The paper then discusses the methodology for evaluation of games with a purpose and comes up with an expected contribution metric that encodes rate of data generated by the players and the level of their interest in the game simultaneously. Overall, the paper draws interesting conclusions but seems too simplistic and very non-technical in my opinion.------------------------------------------------------------------------------------Crowdsourcing User Studies with Amazon Mechanical Turk. In this paper, the authors examine the usefulness of crowdsourcing platforms such as Amazon Mechanical Turk for conducting user studies. To investigate, the authors conducted two experiments using Mechanical Turk where users were asked to rate Wikipedia articles which were later to be compared against ground truth ratings provided by Wikipedia expert admins. The first experiment required users to provide free form feedback on quality of Wikipedia article and the second experiment asked users to answer some questions from the article to ensure that the users had read the article. Results showed that majority of MTurker responses agreed with the expert responses, which revealed that conducting a user study through crowdsourcing should be designed in such a way that it ensures that the user has invested the effort into the task and is not responding randomly. In my opinion the topic of the paper id great and it is very important to determine the validity of data collected via crowdsourcing platforms such as MTurk as they are being extensively used for data collection, annotations and surveys in the research community in computer science. However, most data collection schemes in the research papers rely on majority voting schemes to improve the reliability of the data collected via crowdsourcing. I feel that the paper did not address the pitfalls of majority voting schemes and the verifiability approach proposed in the paper is weak.