Evaluation 1

From CS2610 Fall 2016
Jump to: navigation, search

slides


Readings

Reading Critiques

Haoran Zhang 15:00:23 10/5/2016

Methodology Matters: Doing Research in the behavioral and social sciences: In this paper, the author talked about the importance of methodology when we are doing research in the behavioral and social sciences. This paper may have not talked about the human computer interaction, or computer science directly, actually, the author is a psychologist. But in human computer interaction area, I think there are connections with psychology when we want to do experiments. Because when you are doing human computer interaction experiments, you are actually trying to explore the human’s minds. And this is the same to what psychologist are doing. Thus, in human computer interaction experiments design, we need to be careful to the research process. Results depend on methods, and all methods have its own limitations. Thus, any set of results is limited. From this paper, we know what the limitation are of each testing methods. Thus, when we design the experiments, we may need to focus on the methods that more helpful to our experiments. Also, it is not possible to maximize all desirable features of method in any one study, trade off, and dilemmas are involved. In addition, each study must be interpreted in relation to other evidence bearing on the same questions. Evaluating User Interface Systems Research: Not like the other paper, it talked about something need to notice when design a test when doing research in the behavioral and social sciences. In this paper, it talked about the evaluation in user interface systems research. The author introduced a set of criteria for evaluating new UI systems work. Because the author believes that, in the future systems, there are more and more interaction methods with computer, not only desktop computing, but also off the desktop, nomadic or physical devices, and software systems. At that time, the old, simple usability testing methods will be not sufficient for the new systems. When we design a test, we need to be careful, in case we drop into a trap. For example, the usability trap, the fatal flaw fallacy, and legacy code. To evaluating effectiveness of systems and tools, we need to care about situations, tasks, users, importance, problem not previously solved, generality. Reduce solution viscosity, empowering new design participants, power in combination. Also, can the new UI system be whether it can scale up to large problems. Those are questions we need to answer in the tests.

Tazin Afrin 14:50:10 10/7/2016

Critique of “Methodology Matters: Doing Research in the behavioral and social sciences”: The strategy, tactics and operation of research methodology in behavioral and social sciences is directly related to the empirical evidence of the research. Hence to understand the evidence we have to understand the limitations and strength of the methodology. In this chapter Joseph McGrath describes some components and tools to perform a meaningful research in behavioral and social science. In the area of behavioral and social science research, content and ideas and techniques are the important set of three. More formally a substantive domain, a phenomena is the unit of elements and the pattern of phenomena is the relation between them. A conceptual domain contains the properties and relations. The modes of treatment are drawn from a methodological domain. Depending on which method is chosen an experiment may have some limitations and weakness. So to increase diversity multiple methods can be chosen. But the goal of the experiment is to maximize the evidence over the population of actors which is called generalizability, to maximize the precision and realism within the context. But it is not always possible to maximize all of them, hence the quadrant strategy is followed. The four quadrants are field strategies, experimental strategies, respondent strategies, and theoretical strategies. The author also thinks that the idea of validity is central to any research methodology. Internal validity is how close the result is to the actual data. If the ideas are not well defines then it is prone to construct validity. So overall the evidence should be interpreted in terms of the strength of the methodology. But at the same time evidence should not be seen as the limitation of the research rather a challenge. ------------------------------------------------------------------------------------- Critique of “Evaluating User Interface Systems Research”: The study represents a new set of criteria for evaluating the user interface system. Because the devices may involve new and potable instruments and new software for interactive applications where a simple usability testing may not be sufficient. Instead it requires some evaluation criterion. At first the authors tried find out the errors in evaluation and find the effectiveness of evaluation. The authors tried to find out why we need to evaluate a user interface system and why we need a new working system when one is already stable and running like a windows system. The answer is the force to change and the value added by the architecture. The hardware and operating system are regularly changing and it can do more interactive work than before. Also the generation of people who use the user interface system are extremely comfortable with UI and can easily learn and get used to the new interface. Also for the new architectures faster, good and more scalable products can be designed which forces the application developers. To evaluate a system we need to understand the situation, the task and the system. In the context of STU, we also need to know the importance. Some problems are not previously solved and that is one of the most compelling claim for a tool. Also a new solution is much stronger if it is generalizable. The evaluation of a new system ensures if there is any evidence of contribution and progress. Although simple metrics are easy to calculate, but they are not always meaningful. At the same time the developer must avoid the trap to creating only the measured components. Using the techniques described in this paper, complex systems can be compared with each other, hence I found this paper very interesting.

Zhenjiang Fan 16:46:01 10/9/2016

Methodology Matters: Doing Research in the behavioral and social sciences:::::::::::::::::: As the author says there three basic elements in the research of the behavioral and social sciences: content, ideas, as well as techniques and procedures. By content, the author means what kind of research area or target or theme you are going to focus on. By idea, the author means that what are your assumptions or view on the research subject, the assumption that you are going to prove or verify through experiments or studies. Techniques or procedures involve what kind of tools, methods, and procedures you are going to use to exploit. And it also involves a plan to evaluate the experiment results. For a formal study of the subject, the author refers to these three sets of things, as three distinct, though interrelated, domains: substantive domain, from which we draw contents that seem worthy of our study and attention; conceptual domain, from which we draw ideas that seem likely to give meaning to our results; and methodological domain, from which we draw techniques that seem useful in conducting that research. Then the work goes on talk about these three definitions. The main concept that the author focuses on is the third one, methodology domain, it is reasonable given the fact researchers spend most of their time on the third domain; how to set up the experiment environment, how to manipulate features of the experiment systems, etc. On the techniques how to manipulate features of an experiment, the author gives us some tips; giving instruction to participants (e.g., trying to motivate them to try hard by telling them that there will be a valuable prize for the best product); imposing constraints on features of the environment ; selecting materials for use ; giving feedback about prior performances ; using experimental confederates. Then the author continues on what kind of research methods we should use as well as what kind of research strategies we should use when we try to conduct a behavioral or social science research. Then the author provides some specific strategies with details. Since we have used many research strategies, it is essential to choose the best one from them for our experiments. The author does cover lots of things that we may need to pay our attention to when we try to conduct a behavioral or social science research, but the author inevitably throws out too many concepts on every single sub-subject. That may lead confusion sometimes.  :::::::::::::::::::Evaluating User Interface Systems Research::::::::::::::::: The paper, at the beginning, focuses on why current usability testing is not adequate for evaluating complex systems. But then the paper jump to some irrelevant topics. Then it comes to a conclusion that we need appropriate criteria for evaluating systems architectures. As the paper mentions there are several benefits a good UI system could bring to the table: reduce development viscosity; Least resistance to good solutions; Lower skill barriers; Power in common infrastructure; and enabling scale. As the paper states, the usability trap, the fatal flaw fallacy and legacy code are the ways sometimes misapplied evaluation methods can damage the interaction field. Through talking about how these misapplied methods may impact the real evaluating system, then the paper provides its ways of evaluating systems: whether the interactive technology addresses a specific context or the combination of a set of users, some tasks and a set of situations; whether the interactive technology demonstrates importance; whether the new interactive technology solves the problem left previously; whether the new interactive can be used generally; whether the new interactive technology can reduce solution viscosity; whether the new interactive technique could empower new design participants; whether the new technique support combinations of more basic building blocks; whether the technique can scale up. I think, the paper has mixed up too many evaluating elements into its idea, so that it makes the evaluating system more complex. And some of these factors sometimes are not that important.

Keren Ye 0:47:49 10/10/2016

Methodology Matters: Doing Research in the behavioral and social sciences As mentioned by the author, “this chapter is about some of the tools with which researchers in the social and behavioral sciences go about "doing" research. It raises some issues about strategy, tactics and operations. Especially, it points out some of the inherent limits, as well as the potential strengths, of various features of the research process by which behavioral and social scientists do research.” In the very beginning, the author talks about basic features of the research process. He refer them as three distinct, though interrelated domains: substantive domain, conceptual domain, and methodological domain. Detailed explanation with examples are then given. In the next paragraph, the author emphasizes on the dilemma of research method. He states that 1) Methods enable but also limit evidence. 2) All methods are valuable, but all have weaknesses or limitations. 3) You can offset the different weaknesses of various methods by using multiple methods. 4) You can choose such multiple methods so that they have patterned diversity; that is, so that strengths of some methods offset weaknesses of others. So, how to resolve the dilemma? The author later discusses the research strategies. Firstly, he states that the objectives of research include generalizability, precision, and realism. Then the strategies are proposed, these include: the field strategies, the experimental strategies, the respondent strategies, and the theoretical strategies. The author mentions study design, comparison techniques, and validity in the next paragraph. When mentioning comparison, he emphasize on assessing associations and differences. He then proposes to use randomization and “true experiments” helping to evaluate the problems. Also, he mentions sampling, allocation and statistical inference approaches. Finally in this paragraph, he talks about how to validate the findings. In the final paragraph, the author talks about classes of measures and manipulation techniques. He firstly discusses potential classes of measures in social psychology. Then he explains six classes of data collection methods in details - self-reports, trace measures, observations by a visible observer, observations by a hidden observer, public archival records, and private archival records. Strengths and weaknesses of these six measures are then discussed. When mentioning about manipulation variables, the author mentions selection, direct intervention, and inductions. The conclusion made by the author is quite important: 1) Results depend on methods. All methods have limitations. Hence, any set of results is limited. 2) It is not possible to maximize all desirable features of method in anyone study; tradeoffs and dilemmas are involved. 3) Each study must be interpreted in relation to other evidence bearing on the same questions. Evaluating User Interface Systems Research In the paper, the author claims that simple usability testing is not adequate for evaluating complex systems. Thus in the paper, the problems with evaluating systems work are explored and a set of criteria for evaluating new UI systems work is presented. The authors firstly provide a background of their study. They provide answer to the question “why UI systems research?”. Simply speaking, there are two reasons, the forces for change, and the value added by UI systems architecture. In the next paragraph, the paper discusses the traps, including the usability trap, the fatal flaw fallacy, and the legacy code. To evaluate the effectiveness of systems and tools, the authors provide some suggestions: 1) It is critical that interactive innovation be clearly set in a context of situations, tasks and users. 2) Before all other claims a system, toolkit or interactive technique must demonstrate importance. 3) Focus more on problems that are not previously solved. 4) Use generality to evaluate the solution, the greater the diversity and the larger the number of demonstrated solutions, the stronger the generality claim. 5) Reduce solution viscosity. 6) Empowering new design participants. 7) Power in combination. In sum, this paper shows a variety of alternative standards by which complex systems can be compared and evaluated. When we design a testing procedure, we should avoid the traps mentioned in the paper and take a look to our evaluation strategy carefully.

Steven Faurie 13:08:30 10/10/2016

Steve Faurie Methodology Matters: Doing Research in the Behavioral and Social Sciences: This paper describes the research process used in the behavioral and social sciences. The author describes three main features of the research process. The substantive domain, the conceptual domain and the methodological domain. The substantive domain are the actual things we are studying. It could be something like a person doing something. The conceptual domain are the ideas behind those actions that make them interesting and worth studying. The methodological domain is how we study something. It is the techniques used to gather and measure data. The author goes on to note that the methodological domain contains the more familiar “dependent” variables and “independent” variables. The chapter goes on describing how to choose a setting for your study. It points out you want to consider generalizability, will your findings hold in other settings, precision, the ability to accurately measure your observations and realism, how well does the setting relate to the setting in which you hope to draw conclusions from. The chapter goes on to describe field studies. Field studies are going out and observing some phenomena in the context it takes place in. The researcher should try to be relatively unobtrusive. Experimental strategies involve conducting experiments. Respondent strategies are essentially surveys or observations of data related to responses to questions or stimuli the experimenter created. Theoretical strategies are intended to be generalizable. They are often based on previous work and might not involve any direct observations. Computer simulations also fall into this category. The paper then goes on to discuss comparisons of studies. An interesting topic they bring up is baserates. Basically it says that if you don’t know how common something is in a general population it is difficult to come to some experimental conclusion about that thing. If you do some experiment and see 1/5 people do x in situation y. It might be true that 1/5 people do x in any situation. Causation vs correlation was also discussed, as was something called the difference question. The difference question basically ask if x is present when y is present and if x is absent when y is absent. The paper describes ways to answer this. The paper also talks about randomization as a way to make up for not being able to study every single combination of variables that could show up in your experiment. For instance rather than including age as one of your variables you might randomly select people from the population and hope to ignore age as one of the variables and be able to generalize across it. Sample size and results are discussed as well. Types of experimental validity are reviewed as well. Internal validity is a measure of how likely you are to be able to say the independent variable in your experiment caused the measurement of the dependent variable that was observed. Construct validity asks how well thought out the theory behind your experiment was. External validity relates to the repeatability of your experiment. External validity is crucial for making real meaningful scientific discoveries. Threats to validity are reviewed as well. The paper goes on to describe ways to measure variables. From observation, self reports, trace measures (basically looking at evidence left behind, and archival records. The paper finishes by describing techniques for manipulating variables in an experiment. Evaluating User Interface Systems Research: This paper focuses on finding ways to evaluate tools used to develop new user interface designs. An interesting portion of the paper focused on evaluation errors. One of those was that most studies assume users have minimal training with a device and just walk up and use it. This is good in some cases, but there are some domain specific pieces of software this test would fall apart for. Or prevent someone from developing a UI more tailored to expert users that would actually be the ones using the system. Another downfall pointed out was focusing on the question “what does this system not do?” The authors argued that focusing on that question could prevent actual advancement of ideas. I particularly like the section about the legacy code requirement. New systems should not be forced to implement versions of old legacy code if it compromises the abilities of the new system. In the section describing the actual evaluation the paper talks about taking into consideration the STU: situations, tasks and users for a given piece of software. To evaluate the interface, you need to understand those three items. The paper discusses other things to consider for any type of software development including the importance of the project. Creating something that lets people do something they don’t want to do really isn’t useful. Also does the solution solve a problem that nothing else did before. Interestingly the authors don’t just mean a single task. If you create something that solves just a single task you have created an application, not necessarily a new type of interface. Ideally a strong system would be general purpose as well. Would it be used by multiple populations like a phone OS or desktop OS, or is it domain specific? Like an interface only doctors would use. If we’re developing tools for people we want those tools to be able to quickly iterate over solutions. We don’t want to develop a toolset than can only address issues in one way. Also you need to evaluate the expressiveness of a solution. How many actions/decisions does a user need to make to accomplish what they intend. The fewer the better. The example given in the paper is dragging and dropping components like buttons onto a window rather than writing the window design specifically in C++. This is how Visual Studio currently lets users build simple Windows applications. A bonus that could make a system better is that it is easy enough to use that new populations of people could take part in interface design. They don’t necessarily have to be a software developer. Other important things to evaluate when looking at UI design systems are how many different things can it express? Can you combine the components of the system to build almost anything? Or is it severely constrained? We must also look at how difficult it is to add an additional component to the system. If you add a button you wouldn’t want to have to make every other component of the system aware of the button. It would require too much overhead, and as systems became more complex the costs of adding features would grow considerably with each additional feature. This relates to the question of whether the solution can scale up. Will it allow the design of systems that can solve complex problems, or will we be limited to simple one question button click type applications?

Alireza Samadian Zakaria 21:52:42 10/10/2016

The first paper talks about some of strategies and techniques with which researchers in the social and behavioral sciences do research. According to this paper, there are 3 important domains in each research: the substantial domain, the conceptual domain, and the methodology domain. Furthermore, we have some elements and relations between elements in these domains. This paper mostly focuses on methodology. In methodology, the elements are the modes of treatment which are also called methods and the relations have to do with the application of comparison techniques involving different variables such as independent and dependent ones. Methods are regarded as bounded opportunities to gain knowledge about phenomena. It is called bounded since they have some limitations; for instance, in using questionnaire respondents may try to appear competent. However, we can overcome these limitations by using multiple methods in a way that strength of some methods offset weaknesses of others. Thus, it is important to know about strategies, comparison techniques, designs, and methods. Regarding strategies, we need to consider about gathering some research evidence and we should try to maximize generalizability, precision, and realism. However, it is not possible to maximize all three criteria simultaneously and by using every method we maximize one of them and decrease other ones. Furthermore, the paper talks about 4 group of strategies which are the field strategies, the experimental strategies, the respondent strategies, and the theoretical strategies. Regarding comparison techniques, we need to gain answer relational questions in forms of base-rates, correlations, and differences. The base-rates means how often a phenomenon (Y) occurs in the general case; by knowing this, we can decide if the rate of the mentioned phenomenon in a particular case is high or low. The correlational questions are about asking whether there is a systematic covariation in the values of two or more variables. Another important thing in every research is validity; there are different types of validity which are internal, construct, and external. Another important part of a research is making a record of behavior; there are six classes of data collection methods which can be used for this purpose: self-reports, observations by visible or invisible observant, public and private archival records, and trace measures. Each of these classes of observation methods has some strength and weaknesses which should be considered and are surveyed in details by the author. ----- The second paper is about how we should evaluate new user interface systems so that true progress is being made. According to this paper, Simple usability testing is not adequate for evaluating complex systems so it is not enough for evaluation. At first, it is mentioned by the author that there are many values that UI systems architectures bring to the table such as reducing development time or ability to design UI by drawing rather than code; later, the author mentions that we can use these values to find the claims that are made for a system. For evaluating such a system we should know evaluation errors so that we can avoid them. There are three errors discussed in the paper: the usability trap, the fatal flaw fallacy, and legacy code. Furthermore, we need to know the claims; we should know about the situation, tasks, and users. By knowing this three concepts we can talk about the importance of the system regarding context. The new solution claim is stronger if the problem is not previously solved or if the new application can do many tasks. Regarding the UI tools, it is also good to know if it reduces solution viscosity by having flexibility, expressive leverage, and expressive match. In addition to these claims which are mostly focused on the speed or ease with which a UI could be designed, tools can also introduce new populations to the UI design process. At the end, the author says that some tools can be effective by supporting a combination of more basic blocks or simplifying interconnections between components.

Xiaozhong Zhang 1:29:07 10/11/2016

Methodology Matters: Doing Research in the behavioral and social sciences The paper talked about the nature of the research process and the main features of the research process; about strategies by which research can be carried out and some of the strategic issues that they imply; about study designs, comparison techniques, various forms of validity, and ways of dealing with various threats to them; and about types of measures and techniques for manipulating and controlling variables, and their various strengths and weaknesses. It concluded that any body of evidence is to be interpreted in the light of the strengths and weaknesses of the methodological and conceptual choices that it encompasses: The strategies, the designs, and the techniques for measuring, manipulating and controlling variables and for analyzing relations among them. Evidence is always contingent on all of those methodological choices and constraints. It is only by accumulating evidence, over studies done so that they involve different methodological strengths and weaknesses, that the author can begin to consider the evidence as credible, as probably true, as a body of empirically-based knowledge. It also mentioned that these strategies, designs, and methods together constitute a powerful technology for gaining information about phenomena and relations among them. It is true that each piece of information gained through those techniques is not certain, but only probabilistic. It is also true that each piece of information is not totally general; each piece is contingent on the means by which and the conditions Wider which it was obtained. It is therefore true that each set of results, to be meaningful and credible, must be viewed in the context of the accumulated body of information on that same topic. Evaluating User Interface Systems Research This paper addressed the question of "How should we evaluate new user interface systems so that true progress is being made?". The author claimed that user interface technology, like any other science, moves forward based on the ability to evaluate new improvements to ensure that progress is being made. However, simple metrics can produce simplistic progress that is not necessarily meaningful. Complex systems generally do not yield to simple controlled experimentation. This is mostly due to the fact that good systems deal in complexity and complexity confounds controlled experimentation. This paper shows a variety of alternative standards by which complex systems can be compared and evaluated. These criteria are not novel but recently have been out of favor. The author advocated that we must avoid the trap of only creating what a usability test can measure. The author also said that we must also avoid the trap of requiring new systems to meet all of the evaluations required above, because this would recreate the fatal flaw fallacy. Finally, the author mentioned that we must look to our evaluation strategy to answer the fundamental question "Has important progress been made?" If the answer is yes then we happily take our share of the new knowledge and move forward to fill in the gaps.

Debarun Das 4:13:12 10/11/2016

“Methodology Matters: Doing Research in the behavioral and social sciences”: This chapter discusses about doing research in the behavioral and social science. It starts by discussing about the three basic features of doing research. They are Substantive Domain, Conceptual Domain and Methodological Domain. Further, it discusses about the research strategies – the field strategies, the experimental strategies, the respondent strategies and the theoretical strategies. One of the main aspects of this paper is that it successfully discusses in details the procedures and techniques needed for doing research in this field of behavioral and social science. It further discusses about the classes of measures and manipulation techniques. Finally, it concludes the three main points of the paper: Any set of results are limited; tradeoffs are always involved; a set of results must interpreted with respect to other evidences and results of the same problem. This is thus an interesting read to know about the basics of this topic. ------------------------------------------------------------------------------------------------------------------------------------------------------------ “Evaluating User Interface Systems Research”: This paper mainly deals with discussion of the problems related to the general evaluation techniques of systems work and introduces a set of criteria for evaluating new UI systems work. This paper starts by discussing about the reasons for focusing on UI Systems Research. Further, it goes on to describe the errors in current evaluation techniques. These are the usability trap, the fatal flaw fallacy and legacy code. Finally, it goes on to describe the different criteria that are to be studied for evaluating new UI Systems work. This is again an interesting read as it highlights the deficiencies in the old evaluation system and points out the areas of improvement in this, thus providing a benchmark for progress in this field.

Zuha Agha 8:59:45 10/11/2016

Methodology matters: In this article, the author describes different methodologies for doing experiments in social and behavioral sciences, along with the strengths and weakness of each of them. In this context, the article describes 4 different design quadrants and explains the tradeoffs of maximizing generalizability, precision, and realism for each of those quadrants. The crux of the discussion is that no single methodology is optimum as there is a cost of associated with each one of them, so different types of problems will adopt different strategies. For example, conducting a field study may be natural and generalizable but the results may not be precise if the population is not representative enough. Randomization is another such factor that impacts experiment results and may be mitigated to an extend by doing repeated experiments across multiple random population samples. Probabilities and significance tests are another way to validate the hypothesis and analyze results. Overall, the conclusion is that the choice of experimental strategy depends on the problem at hand and the goal of the experiment but a sound experimental design is the one that takes the variability of all factors that may affect its results into account and validates the results by using all possible methodologies. --------------------------------------------------------------------------------------------------------------------------------------- Evaluating User Interface Systems Research: This paper describes evaluation techniques for different user interface systems. The author first discusses the importance and motivation of development of UI systems, followed by an analysis of the challenges associated with analyzing UI interfaces. One of the reasons why it’s difficult to evaluate the usability of such systems is that they cannot be compared to existing systems usability as the users are already experienced with existing systems, so there is a bias in such usability tests. Thus the authors propose a Situation, Task, User context approach to test usability. To evaluate a UI with respect to usability, making measurements across different STU contexts is one way to overcome some of the limitations involved in testing usability and making it more generalizable. Overall, the paper presents interesting ideas but the applicability of these ideas to modern day interfaces is questionable.