Crowd Sourcing and Game With A Purpose

From CS2610 Fall 2017
Jump to: navigation, search

slides


Readings

Reading Critiques

Ahmed Magooda 23:40:45 11/27/2017

Opinion Space: A Scalable Tool for Browsing Online Comments: In this paper the authors aiming to tackle some problems of comments on topics like (huge amounts of comments, dominance of strong comments). The authors introduced a tool "Opinion Space" which is an on-line tool designed to collect and visualize user opinions on topics. This system aims to solve the problems by providing some dimensionality reduction using a machine learning and statistics technique called (PCA). When working around the design space of the opinion representation, authors proposed several parts that the user can interact with. Upon writing an opinion in a comment, it will be presented with five opinion profile propositions that he will be asked to rate. And here comes the rule of PCA, the five dimensions are projected into two dimensions using PCA analysis. The authors then go and discuss the study they made, first they propose three interfaces(List interface, Grid interface and Space interface). They then presented each of the interfaces in random order to 12 study participants in a within-subject study using the Space interface as the experimental condition and the List and Grid interfaces as two control conditions. Then The authors discussed five different hypotheses relating the experiments they did.

Tahereh Arabghalizi 14:46:22 11/28/2017

Designing games with a purpose: This paper addresses the use of games to collect training data for different artificial intelligence and machine learning tasks. The authors aim to utilize the large amounts of time that people spend on Games. The goal is to make people interested in playing because they enjoy it and not because of some monetary benefit. This can be accomplished as people like playing games while collaborating with individuals to achieve the common goal of winning. The authors present three general classes of games containing all the GWAPs that they have created. They also describe a set of design principles along with a set of metrics defining GWAP success. In conclusion, this paper is an impressive research contribution because it highlights the way we can make crowdsourcing more entertaining and still achieve the desired results. ----------------------------------------------------------------------------------------- Crowdsourcing User Studies with Mechanical Turk: This paper is an introduction to Mechanical Turk, a crowdsourcing platform hosted by Amazon to assign micro tasks for conducting a variety of user study tasks to a wide user base for some monetary benefit. The main benefit of using such a platform is that it gives access to users all over the world, user studies can be conducted very fast and in an unbiased way. However, we have no control on the settings and over the experiment in general and many variables can affect the viability of our outcomes. Therefore, as the authors suggest, a good design of tasks can lead to a good rating that closer to expert ratings. For instance, the questions in surveys should have definite answers to avoid participants from giving unreliable and inaccurate answers. But when the task becomes big, participants might lose interest to complete the user study, so it seems that such platforms can be only reliable for small and local experiments. In conclusion, this paper can be very useful for those interested in crowdsourcing user studies but the tasks must be carefully well-designed.

Sanchayan Sarkar 1:18:55 11/29/2017

CRITIQUE 1 (Designing Games With a Purpose) In this paper, the authors posit the design principles that govern channeling the human brainpower for creating computationally intensive artificial intelligence problems. The authors present the argument that the huge amount of data generated from playing video games can be effectively used for solving AI problems in the future. In order to do that, the author gives a set of principles to design such “Games with a purpose” (GWAP). However, it also suggests that the author does not feel he is feeding data to an AI model but simply being entertained. I believe there is merit to this point. Often behavioral psychology experiments requires one to do some primary task while the secondary task feeds as the input. Another merit of this approach is with the increase in the number of people connected digitally, the data acquisition can be done much more efficiently and abundantly. The authors mention some suggestions like putting time constraints to do some work or introducing scoring mechanisms; everything to do the game interesting. Capturing the attention of the user would definitely help in efficient gathering of data. Only demerit of this method is that in order to get effective data for AI data, the scenarios need to set up with special care. I wonder if that would restrict the entertainment factor of the game and thereby unable to grab the user’s locus of attention. ------------------------------------------------------------------------------------------- CRITIQUE 2 (Crowdsourcing User Studies With Mechanical Turk ) In this paper, the authors presents the utility of micro-task markets such as mechanical turk for user data collection for task design considerations. One of the merit of this paper is that the author exposes one of the issues with mechanical turk where the ratings don’t correspond that well to the expert administrators. A possible solution is given by the authors in experiment 2 where the ratings are also augmented with verifiable questions. This increases to a consistent correlation between the ratings and the expert administrators. The paper shows one thing : Mechanical Turk is good for iterative models where one needs rapid prototyping but is not reliable as far as validity is concerned. The onus is on the designer’s side to design careful user studies. This relates to current Computer Vision recognition problems, where for high semantic recognition, mechanical turk is being used. However, the veracity of the ratings remain unreliable. This paper is a good read as it brings into light the importance of a carefully designed user study while dealing with crowdsourced ratings.

Sanchayan Sarkar 1:20:25 11/29/2017

CRITIQUE 1 (Designing Games With a Purpose) In this paper, the authors posit the design principles that govern channeling the human brainpower for creating computationally intensive artificial intelligence problems. The authors present the argument that the huge amount of data generated from playing video games can be effectively used for solving AI problems in the future. In order to do that, the author gives a set of principles to design such “Games with a purpose” (GWAP). However, it also suggests that the author does not feel he is feeding data to an AI model but simply being entertained. I believe there is merit to this point. Often behavioral psychology experiments requires one to do some primary task while the secondary task feeds as the input. Another merit of this approach is with the increase in the number of people connected digitally, the data acquisition can be done much more efficiently and abundantly. The authors mention some suggestions like putting time constraints to do some work or introducing scoring mechanisms; everything to do the game interesting. Capturing the attention of the user would definitely help in efficient gathering of data. Only demerit of this method is that in order to get effective data for AI data, the scenarios need to set up with special care. I wonder if that would restrict the entertainment factor of the game and thereby unable to grab the user’s locus of attention. ---------------------------------------------------------------------------------------------- CRITIQUE 2 (Crowdsourcing User Studies With Mechanical Turk ) In this paper, the authors presents the utility of micro-task markets such as mechanical turk for user data collection for task design considerations. One of the merit of this paper is that the author exposes one of the issues with mechanical turk where the ratings don’t correspond that well to the expert administrators. A possible solution is given by the authors in experiment 2 where the ratings are also augmented with verifiable questions. This increases to a consistent correlation between the ratings and the expert administrators. The paper shows one thing : Mechanical Turk is good for iterative models where one needs rapid prototyping but is not reliable as far as validity is concerned. The onus is on the designer’s side to design careful user studies. This relates to current Computer Vision recognition problems, where for high semantic recognition, mechanical turk is being used. However, the veracity of the ratings remain unreliable. This paper is a good read as it brings into light the importance of a carefully designed user study while dealing with crowdsourced ratings.

Spencer Gray 11:20:25 11/29/2017

In the first paper, Designing games with a Purpose, the authors do not invent games with a purpose. However, the authors do create design principles in order to guide games with a purpose developers to create games that are not only fun, but also help solve a complex problem that a computer cannot solve on its own. With an increasing amount of the world's population having access to the internet, it is logical that more people will use it for both work and play. The idea to harness this play, while not invented by the authors, is extremely insightful. This can improve many computationally difficult problems by crowd sourcing the computation in a way that the players do not even realize they are helping. This paper is significant in the HCI field because it provides guiding principles for the designers to ensure that the players contribute to these computations in a way that benefits both parties. Developers can use these guidelines as a starting point when they are designing their own games with a purpose. The availibility of these guidelines will increase the ease of creating these games with a purpose and result in a large benefit to the Artificial Intelligence community. The second paper, Crowdsourcing User Studies With Mechanical Turk, the authors describe a paradigm for recruiting users and collecting results in user studies in a cheap and efficient way. They analyze Amazon Mechanical Turk, which is an existing framework for the crowd sourcing of these user studies. Crowd sourcing these user studies is extremely important because it is very costly to recruit and compensate users for a study. This leads to either high costs or few number of users in a study. Few users results in findings that are not as accurate as a study with many participants. This paper is important in the HCI field because it warns against the blind use of micro-task markets such as Mechanical Turk. The authors find the use of micro-task markets to be very important and very helpful, but the design of these studies must be careful, especially with qualitative questions. This is significant because it can completely change they way we do user studies to make them more accurate and widespread, as long as they are designed with care.

Mingzhi Yu 16:01:45 11/29/2017

Designing Games with a Purpose. This paper presents a series of design principle that how to design and evaluate a kind of game that is designed for a purpose. These games have common features that the players perform some tasks during playing, and those tasks are not achievable by computers. This idea is interesting but also no very new to us today. It seems like this concept is similar to the concept of "Big Data." However, this won't make this work less contributed because it will provide general design guidelines for many of this kind of games( GWAP) and came up with the approaches to evaluate it. Design a game that is not only entertained but also can collect desired information from users is not an easy task. The authors mentioned three templates: input-oriented, output-oriented and inversion-problems. After collecting these data, the authors also discussed how to ensure the correctness of this information through labeling and revision. In general, this work highlights that idea of utilizing the crowdsourcing during the entertaining and achieve some purpose, which is valuable and worth thinking. Crowdsourcing User Studies With Mechanical Turk. Today, I believe the MT is not something very new and unfamiliar to most of the researchers. It is a platform designed by Amazon that utilizing crowdsourcing of user studies. Not in academia, even for industrial, user studies are the very significant process to evaluate a product. However, some of the studies are long and always hard to recruit participants (unless they are paid well). MT aims at solving this problem by explicitly hiring workers on the internet with the reasonable payment. The workers can be recruited from all over the world and can satisfy any studies with requiring any number of participants. This is a brilliant idea in general even though sometimes we might doubt the correctness of the response since there are so many factors that are not entirely under the control of the conductor. The status of a participant might be falsely reported and potentially hurt the confidential of the studies. How to address these issues are crucial in the future of MT. In general, as the authors said in the paper, the platform is promising.

Xiaoting Li 20:45:43 11/29/2017

1. Designing a Game with a Purpose: Due to three motivating factors including the increasing proportion of world’s population getting access to the Internet, certain computer tasks cannot be completed by programs, and people spend lots of playing games, “game with a purpose” is being used in the field of Artificial Intelligence. In this paper, the authors present general design principles for the development and evaluation of “games with a purpose”. The authors articulate three GWAP game “templates” representing three general classes of games containing all the GWAP that are up to date. In addition, the authors present several guidelines to help design GWAPs to make them higher efficiency and productivity and at the same time more entertaining. In the paper, the authors give examples of ESP game and Verbosity to help audience understand the ideas presented in the paper. 2. Crowdsourcing User Studies with Mechanical Turk: Even though Micro-task markets, such as Amazon’s Mechanical Turk, offer potential paradigm for engaging a large number of users with low time and monetary costs, not every task can be a good candidate using such platforms. In this paper, the authors carried out two experiments on Amazon’s Mechanical Turk platform to investigated the utility of a micro-task market for collecting user measurement. The authors discuss the advantages and limitations using such platforms and propose some design recommendation.

Mehrnoosh Raoufi 23:04:51 11/29/2017

Designing Games With a Purpose: In this paper, the idea of "game with a purpose" or GWAP was explored. These kinds of games let users solve some computational problems while they are playing without their awareness. They just enjoy playing the game as it entertains them. Moreover, by playing GWAPs AI algorithms can be trained. These are kind of tasks that human can perform easily but machines cannot perform them that easily. For instance, the authors mentioned some sample games that cause users to provide useful descriptive annotations for images. In this paper, three types of games which can successfully be used as a GWAP were introduced; output-agreement games, inversion-problem games, and input-agreement games. One of the challenges in this work is how to determine the accuracy of the result of a solved problem that is made by a user playing a game? How to value different results made by different users playing the same game? These challenges limit the type of games that can be considered as a GWAP. For the future work, they indicated that further kind of games, in addition to the three types they presented can be explored in this area. I think it is appealing research direction since it leverages human brain capability for solving hard problems. In my opinion, it is a promising approach because the human brain can do better than a machine in some context.---------------------------------------------------------------------------------- Crowdsourcing User Studies with Mechanical Turk: This paper investigated micro-task markets potential to be used for general-purpose user studies. Amazon's Mechanical Turk is a micro-task market that was the case study in the paper. Since conducting a user study has been always a trade-off between time and cost, the authors suggested that micro-task systems may offer both low-cost and less time-consuming user study. To explore their hypothesis they conducted to experiment. In the first experiment, they asked Mechanical Turk users to rate 14 Wikipedia articles based on principles mentioned on Wikipedia website. Then, they compared user collected inputs with Wikipedia admins review. The result showed only a marginal correlation between turker's rating and those of experts and a high proportion of ratings were suspect. In the second experiment, they made a subtle modification, before asking users to rate, they gave them some warm-up questions so that they got familiar with the primary principles of rating. This time, the results were more promising. Turker's ratings were close to ratings of experts. They conclude that micro-task markets can be promising platforms for conducting various user study if practitioners consider three recommendations they proposed; first, it is important to have verifiable questions as part of the task. Second, the design of the task should be in a way that its accurate completion requires less or as much effort as malicious or random one. Third, it is useful to have multiple ways to detect suspect responses. To my mind, it was an interesting paper because it introduces a new perspective for user-study.

Yuhuan Jiang 23:18:53 11/29/2017

Paper Critiques for 11/30/2017 == Game With a Purpose == This paper describes games which achieves both the purpose of labeling data and entertaining people. The authors coined a term Game with a Purpose (GWAP) for such type of games. The major contribution is the set of guidelines for building effective GWAPs. For example, the output of an GWAP game should be designed to be enjoyable. Players should not be explicitly instructed to produce the labels. Instead, when the users entertain themselves with the game, the labels should be implicitly added during the process. The paper also discuses how to increase the enjoyment of players. Three basic templates can be used as a start. An example of GWAP named The ESP Game is analyzed. == Crowdsourcing User Studies With Mechanical Turk == This paper discusses using Amazon Mechanical Turk for conducting user studies. An experiment where the participants were asked to assess the quality of Wikipedia articles is discussed. The authors found that there is only weak correlation between mechanical turn user ratings and expert user ratings. This does not support utility of mechanical turks. This further reveals susceptibility of the system to malicious user behavior. A modified version of the experiment was conducted. This time, the ratings were changed to quantitative and verifiable questions. A statistically significant positive correlation between mechanical turk and wikipedia admin ratings is found. The main point that the paper makes with these experiments is that when researches conducts user studies using Amazon Mechanical Turk, the design of the experiment affects the validity of the experiments significantly.

Xingtian Dong 23:28:28 11/29/2017

1. Reading critique for ‘Designing Games with a Purpose’ This paper is really interesting. The author considered that games should not only focus on entertainment, they should have a purpose. The games can also collects data which is useful for AI when users are playing them. The author also classify games into three classes: input-agreement games, output-agreement games and inversion-problem games. What’s more, the author brought out some principles for designing a successful game with purpose. Like the game should include randomness and should be difficult enough. After reading this paper, it reminds me UPMU wanted to recruit students to develop games for patients and they will collect data of the patients while they are playing games. Games with purpose will benefit both users and scientist, the users are more willing to take part into experiments and scientist can collect more data. This is really win-win idea. 2. Reading critique for ‘Crowd Sourcing user studies with Mechanical Turk’ Actually this paper is also how to collect more data. But the author aims at how to make a big, accurate experiment online with less cost. There are two studies in this paper, the second only has small changes compared with the first one. These experiments shows that principles for user studies in a micro-mark is needed. One of the principle is that verifiable questions to assure user effort are really good to guarantee truthful responses. This paper provides ideas to make better and cheaper user studies. I think it is more like a marketing paper. But also it is really useful us to design user study for our own study. Besides, security problem is a big issue of online user study, we should try to avoid information leak. But what ever, micro-markets will become more and more popular.

MuneebAlvi 0:12:29 11/30/2017

Critique of Designing Games with a Purpose: Summary: This reading describes various GWAP which allow users to play games while also teaching computers information that would not be easy for them to obtain themselves using methods like computer vision and machine learning. I have heard of games with a purpose before but i was not aware of any examples. This reading describes various examples and their benefits are pretty clear. For example, the ESP game allows computer to learn what objects are contained within images. This could probably be applied to google image's algorithms to obtain more accurate results. Also Verbosity could probably be used to allow google autocomplete to be more accurate or let the final search results be closer to what the user was looking for. I wonder if many games that are designed for pure entertainment purposes could be used to teach computers. Maybe they could use the voice communication between players to learn idioms or learn about how players work together in teams. Of course there are some privacy concerns so maybe this should be anonymous data. One game that added a learning component later is called Forza Motorsport. The fifth entry added something called Drivatars which allows the AI to drive similarly to the players that they are trying to copy. This could potentially be used to learn about player driving patterns in other situations. Critique of Crowdsourcing user studies with mechanical turk Summary: This reading describes how micro task communities such as Amazon's Mechanical Turk can be used to conduct experiments in a cost and time efficient manner. I never thought of actually using mechanical turk to assign tasks. I had always thought of it in terms of performing tasks for others. Therefore, I appreciated the unique take of the authors to see if an experiment with usable results could actually be conducted on such a site. Of course the experimenters took a lot into account such as malicious users which is to be expected when crowd sourcing. I think the methods used in this paper show that crowd sourcing can be used for more than completing tasks or accumulating money. By trying to conduct experiments, many participants can be accumulated. However, because the users are in a nearly uncontrolled environment, the methods used have to be constructed very carefully. The surveys and all methods should be designed with the fact that the participants might not end up cooperating in the way the experimenters would hope.

Akhil Yendluri 1:29:17 11/30/2017

Designing Games with Purpose
In this paper the author talks about using games as a medium to solve today's computational problems. It is a very interesting concept as everyone loves to play games. Games are now also used as a method to teach difficult concepts to kids in a way that they understand. This approach of using games to solve problems is quite interesting and can be very effective too. The author talks about GWAPS or "Games With A Purpose" a general design principle for the development and evaluation of games. It helps in giving a smooth transition between computation and gameplay. The author has represented three GWAP templates but also concludes that many more are to be analyzed. The author also concludes by telling that the GWAP contributes to the growth of AI in computers and hopes that this would be a direction for future research.
Crowdsourcing User Studies With Mechanical Turk
This paper talks about the importance of user studies in laboratory studies. The author compares the basic advantages and dis-advantages involved in conventional user study methods while comparing it with micro-task markets such as Amazon Mechanical Turk. The author conducts multiple experiments to establish the relationship between the both. He concludes that the Amazons Mechanical Turk is effective for rapid prototyping to quantitative performance measures. Users are more readily available than traditional methods but the author concludes that the task must be designed carefully by analyzing the issue.

Ahmed Magooda 1:35:45 11/30/2017

Designing Games With A Purpose (GWAP): In this paper the authors discuses some methods which can be used to allow games to be designed in such a way that makes humans perform computations as part of a game, where these computations and interactions can be used as data for computer system learning. The authors focus on games that train computers by identifying knowledge through competition. GWAP approach is based on three factors; 1- An increasing number of people have access to the internet. 2- What is easy for a human to perform is sometimes quite difficult for a computer to replicate. 3- People spent a lot of time playing video games. The authors introduced some of the games types than can be modeled such as {Output agreement games, Input agreement games and Inversion problem games". The authors then describe the various methods to increase player enjoyment and output accuracy. Then the authors started discussing the different key parameters they selected to evaluate the dimensions of a GWAP such as {Throughput, Lifetime play and expected contribution}. The importance of these parameters is that they balance between measuring the quality and efficiency in both computer and human contexts.

Ronian Zhang 1:51:45 11/30/2017

Designing games with a purpose: In this paper, the author discusses the game that are intended to collect data from the players as the solution to current problems. The paper begins by giving the number of avg hours that young American might have spent. Then, it gives example that help to label the image where the data is collected when players are also entertained at the same time. The paper discusses the general ways to develop such games and make them enjoyable. By adding a timer, the player could get more evolved. By keeping the scores and adding ranks or show high score list, the players could be better motivated. Adding more randomness could make the game more challenging. And assigning random partner is also a good way to motivate repeated play/ The best of the paper is that, it doesn’t only talks about basic rules when designing that kind game, but also tries to provide a systematic way to evaluate these games. It uses average number of problem solved per hour and average overall amount of time the game is played bu a player to evaluate the contribution. If the paper could evaluate serval games and compare them though the evaluation method proposed, it might be more convincing.————————————————————————————— Crowdsourcing User Studies With Mechanical Turk: In cs2001, Dr. Adam said that in his own HCI project, he used the amazon turk platform to do the user study. Since in the class, we learned that the physical environment is important and we need to control as more as we could, so I truly wonder that whether the data collected on this platform will have this issue. This paper answers my question: it’s true that the validation of users condition is hard to guarantee and there is no full control towards user’s own experiment setting. The paper said that the platform was used as collecting data for data mining or machine learning. The author argues that the system could also be used as an evaluation system to finish some evaluation user studies. He conducted several experiment to prove that the platform could be used in this case. But also points out it’s possible limitations.

Ruochen Liu 8:58:37 11/30/2017

Designing Games with a Purpose: People, especially the youngsters, spend huge amount of energy and time on playing games. They play games just for entertainment. But if we try to add other purposes into the design process of games, maybe we can have a win-win situation and make a good use of these man-power and time to solve computational problems and train artificial intelligence algorithms. This is an interesting and promising area, and this paper is just about this topic. “Games with a purpose” is also called GWAPs. Many GWAPs, such as the ESP game, Peekaboom, Phetch, and Verbostiy have been created. By researching them, three game-structure templates that can generalize successful instances of human computation games are found. They are output-agreement games, inversion-problem games, and input-agreement games. By designing games in these ways, the integration of computation and gameplay can be easily achieved. One of the possible common drawback of the three templates is that they all use the similarity of game players to ensure output correctness. This may be not suitable for creative and complicated games. GWAP, as a promising approach, has an ability to provide huge amount of training data for AI. However, much work remains to be done. Crowdsourcing User Studies with Mechanical Turk: As we all know, user studies play important roles in the design process. The cost may be a main factor in the conduction of a user study. This paper introduces micro-task markets, such as the Mechanical Turk from Amazon, as a potential way to get access of user studies with low time and low money cost. Micro-task markets are markets that in which users can post tasks to be completed by others in specific prices. Personally speaking, besides the function to conduct user studies, it is also a great platform to acquire huge amount of AI training data. Also, several limitations of Mechanical Turk are mentioned in this paper. These limitations and drawbacks must be put into account in the design of user studies. I am very willing to use this powerful and cost-efficient platform to conduct user studies in the future.