Evaluation 1

From CS2610 Fall 2014
Jump to: navigation, search

Slides


Contents

Readings

Reading Critiques

Qiao Zhang 15:02:07 10/27/2014

Evaluating User Interface Systems Research When evaluating complex systems, simple usability testing is not adequate. In this paper, a set of criteria for evaluating new UI systems work is presented, and problems with evaluating systems work are explored. There are three main problems regarding to the decline in new systems ideas, among which the author addresses the question of "How should we evaluate new user interface systems so that true progress is being made?". Simple metrics can produce simplistic progress that is not necessarily meaningful, hence the author brings up several alternative standards by which complex systems can be compared and evaluated. The reason why to study UI systems work is that some assumptions such as saving a byte of memory no longer hold. Trying to fit newer input technologies into old models will result in information loss, which is undesirable. UI systems can bring a lot of good things into development: it can (1) reduce development viscosity, provide (2) least resistance to good solutions, (3) lower skill barriers, (4) empower new techniques in common infrastructure, and (5) enable scale. Some evaluation methods are misapplied, leading to damage of the field. The author discusses three kinds: (1) the usability trap, (2) the fatal flaw fallacy and (3) legacy code. For (1) usability trap, researchers should not assume all potential users have minimal training. Neither should they make standardized task assumption, which means that a task should be inherently less variable among different users with different expertise. The third faulty assumption is that the scale of the problem should be relatively low. When testing the usability of interactive tools and architectures, the population should be equally ignorant of the new and the old systems. The standardized task and scale of the problem assumptions also affect testing UI toolkits. For (2) the fatal fallacy, existence of a fatal flaw should be given. No research system will ever pass it focuses on "what does it not do". For (3) legacy code, old architectures should not be barriers to the new systems. A dozen of evaluation metrics are given in the paper. Some are quite similar to previous ideas, e.g. "Expressive Leverage" is similar to "Expressiveness"/"Gulf of execution", and "Expressive Match" is similar to "Effectiveness"/"Gulf of Evaluation". Some metrics are quite important but often overlooked, such as "Simplifying Interconnection" and "Ease of Combination". As researchers, we need to keep such fallacies and evaluation metrics in mind when we are developing UI toolkits. =================================== Methodology Matters: Doing Research in the Behavioral and Social Sciences A distinct difference of HCI research from other computer science fields is that HCI not only studies the computer, but also the human subjects. It involves certain parts of social and behavioral science, hence this book chapter is quite important for HCI researchers. This chapter presents some of the tools with which researchers in the social and behavioral sciences go about "doing" research, and talks about strategy, tactics and operations issues, as well as inherent limits and potential strengths. Contents, ideas and techniques are always involved in behavioral and social sciences. More formally, they are three distinct domains: substantive, conceptual, and methodological domain. Substantive domain consists of phenomena, which is a pattern of human systems. The conceptual domain consists of property of a state/action, such as "attitude", "cohesiveness" etc. The methodological domain consists of methods, which include techniques for measuring, manipulating, and controlling the impact of some feature. Methods as tools have their own opportunities and limitations. To summarize, methods enable but also limit evidence. All methods are valuable but come with weaknesses/limitations. You can offset the different weaknesses of various methods by using multiple methods. You can choose such multiple methods so that they have patterned diversity ("Best for something, worse for something else"). Who, what, where, formally actor, behavior and context are the three facets researchers care about. When gather a batch of research evidence, maximizing generalizability, precision and realism is desirable. One interesting thing about this part is that the author used a diagram to demonstrate the strategy circumplex, which gives a clear overview of balancing different criterion. The author then explains each quadrant in details. Each strategy has certain inherent weaknesses, although each also has certain potential strengths. Since all strategies are flawed in different ways, to gain knowledge with confidence requires that more than one strategy - carefully selected so as to complement each other in their strengths and weaknesses - be used in relation to any given problem. The author also talks about statistical inferences in this chapter. In most cases, it requires the cases in the study to be a "random sample" of the population to which the results apply. If the samples are not random, one cannot run statistical inferences on them because the results will be not correct. Biased sampling method such as convenience sampling does not truly reflect the value of the study. There are several validities of findings: internal validity, construct validity, and external validity. The author also suggests some potential measures and manipulation techniques such as self-reports, observations, archival records and trace measures. The strengths and weaknesses are discussed accordingly. To manipulate variables, techniques such as selection, direct intervention and inductions can be applied to control the variables in experiments. All in all, we need to keep these points in mind: (1) Results depend on methods. All methods have limitations. Hence, any set of results is limited. (2) It is not possible to maximize all desirable features of method in any one study; tradeoffs and dilemmas are involved. (3) Each study must be interpreted in relation to other evidence bearing on the same questions.

Wei Guo 15:30:52 10/27/2014

Reading Critique for Evaluating User Interface Systems Research Due to the stability of desktop interface, the development of user interface systems are focusing off-the-desktop or nature systems. This paper explores the problems with evaluating system work, and presents a set of criteria for evaluating new UI system work. The reasons for doing UI system research: assumptions for hardware and operating system are no longer correct, the assumptions for users and their expertise are no longer correct, and interactive techniques have been proved. The evaluating includes: evaluation of errors and evaluation of effectiveness of systems and tools. There is one sentence in this paper “Any new UI system must either show that it can scale up to the size of realistic problems or that such scaling is irrelevant because there is an important class of smaller problems that the new system addresses.” The UI systems are designed to solve real world problems. Every small improvement in technique should make some contributions to human in daily life. Reading Critique for Methodology Matters: Doing Research in the Behavioral and Social Sciences This paper is about some tolls with which researchers in the social and behavioral sciences go about doing research. Doing research always involves bringing together: substantive domain (contents), conceptual domain (ideas), and methodological domain (techniques). Methods, as the tools for gathering and analyzing evidence, offer both opportunities not available with other methods, and limitations inherent in the use of those particular methods. There are some strategies that we can choose for a study, from Experimental strategies, Field strategies, Respondent strategies, and Theoretical strategies.

Eric Gratta 16:23:49 10/27/2014

Methodology Matters: Doing Research in the Behavioral and Social Sciences (1994) Joseph E. McGrath This paper is a portion of a book dedicated to surveying methods for conducting research in behavior and social sciences, as described by the title. The remainder of the book assesses HCI research based on this survey. It defines three domains of things – Substantive (object of a study; its evidentiary basis), Conceptual (ideas about or properties of the substantive domain that are being explored), and Methodological (techniques for doing the research) – and then goes on to explore those in detail. These distinctions should be useful for HCI research not just because HCI user studies are often behavioral in nature, but because these domains may apply broadly to all areas of research. The confidence that we have in the knowledge obtained by research is contingent on the methods that were used to obtain that information; thus, it is critical to use reliable methods and operate with an understanding of the limitations of those methods (because there will be limitations). Using multiple methods can improve confidence, especially when the different methods compensate for each other’s limitations. When gathering evidence for a study, three desirable features that the author defines are Generalizability (over the relevant populations), Precision (of measurements), and Realism (a contrived research setting may not translate convincingly into a real-world conclusion). To explain how no methodology can achieve all three of these qualities simultaneously, the author used an interesting diagram that displayed the features as opposed dimensions. Then, related work was explored to demonstrate how the tradeoffs between each of the three features might occur in a real situation. Later, three important comparison techniques are surveyed: baserates (notable observations can only made when context is available; specifically, knowledge of what occurs in the general case), correlations (frequently misunderstood as causation), and differences (studying the “interaction effects” of different variables on each other). Additionally, all three of these methods should be augmented with some element of sample randomization that reduces the likelihood of external factors inadvertently biasing the results. This chapter goes on to discuss four different types of validity, six different types of measures, and ways to manipulate variables, but the main message that the author tries to convey when discussing these are that all of these various research techniques explored are essentially “best for something and worst for something else.” By analyzing research to such a fine granularity, this chapter prepares a new HCI researcher for the key moments when they will have to challenge their decision-making process in conducting studies and analyzing their results. ------------------------------------------------------------------------------- Evaluating User Interface Systems Research (2007) Dan Olsen An extremely brief POTS-style abstract was used. P: Robust user interface systems do not exist for non-desktop systems that are mobile or physical. O: These future systems are complex, and so the UI systems developed for them cannot just do simple usability testing. TS: The space of evaluations of (“a set of criteria for evaluating”) new UI systems is explored. The author goes into some detail describing the benefits of user interface systems/toolkits. More interesting, though, is that the author devotes an entire section of the paper to warning the reader that evaluations of UI systems can go wrong, discussed in terms of the usability trap, the fatal flaw fallacy and legacy code. Issues arise when testing the usability of UI systems because often using the systems requires special expertise, it is too expensive to pay programmers to use new systems for a long period of time, and comparing the new system with an existing system poses many confounding variables. The “fatal flaw” topic refers to how, in the research of other systems, a fatal flaw or absent feature might make the research invalid, but in UI systems research the existence of fatal flaws is a given because there is so much functionality to support and too many cases for the researchers to consider all fully. The “legacy code” topic just refers to an excuse that is used to denounce the creation of new UI systems, especially those that do not make use of existing (legacy) code. The paper then explores how to evaluate UI systems. Systems need to be evaluated in their STU (situation/task/user) context, with the users usually being developers and task creating applications. They should also be evaluated by their importance, especially in relation to this STU context, which might be proved by demonstrating utility in diverse scenarios. Systems should be evaluated on the extent to which they are easy to use (the author’s ridiculous term was “reduce solution viscosity”). Some tools are good because they make some task easier or more accessible for new populations, other tools are good because they allow some amount of extensibility, such that there are infinite possibilities for users to design. As a last note, I felt that the main portion of the paper, under the heading “EVALUATING EFFECTIVENESS OF SYSTEMS AND TOOLS”, was fairly incohesive. The various subsections had different intentions that seemed confusing given that they were all under the same section.

nik37@pitt.edu 18:24:32 10/27/2014

Katsipoulakis Methodology Matters: Doing research in the behavioral and social sciences : This book chapter analyzes different research methods and their characteristics. The authors succeed in presenting a thorough description of each research method, where and how each should be used, and what is the proper way of applying each one. In the beginning, fundamental principles of the research process are presented, followed by a design space categorization of research methods. Each one is measured in three important properties: generalizability, precision, and realism. The authors use two axes for measuring research methods and those are abstraction level and obtrusiveness. One conclusion drawn from the text is that each method has its own strengths and weaknesses and that it is the scientist’s responsibility to choose the appropriate. In addition, in order for a research study to be valid, a researcher needs to define: a) base rates (the baseline with which new approaches are going to be compared to), b) correlation (whether one phenomenon is related with another is a specific context), and c) difference (the method with which two approaches are compared). Furthermore, the authors make concrete points on randomness and statistical validity in experimentation and state that both should not pollute the validity of an experiment. As far as experimental results are concerned, the validity of a result needs attention during interpretation and several types of measures can be used in that perspective. In conclusion, I enjoyed reading this chapter. Even though I have read a considerable amount of research papers, I have never realized that all research methods can be abstracted in this elegant way. ///----------------------------END OF FIRST CRITIQUE --------------------------------/// Evaluating User Interface Systems Research : This paper surveys different approaches for comparing UI toolkits with each other. Even though several different points of evaluation are presented, this paper leaves the reader confused and without any actual contributions. The authors restate the problems of UI toolkits and review crucial aspects for future evaluation. In the beginning, the reasons behind UI toolkit research are presented. Those include lowering the difficulty of application development of user-friendly and portable application, tools with low threshold become available to unskilled users, and scalable applications are produced. A number of evaluation errors on UI toolkits are presented that usually hinder the development of novel UI toolkits. The authors claim that systems should be evaluated based on users’ goals in terms of importance, unsolved problems, and generality. My critique for this paper is that it contributes nothing significant in the problem stated. In reality, the authors restate the shortcomings of UI toolkit evaluation and how some errors can be avoided in the future.

nro5 (Nathan Ong) 19:27:45 10/27/2014

Review of “Methodology Matters: Doing Research in the Behavioral and Social Sciences” by Joseph McGrath The author expresses in this chapter the general idea of how research can be conducted in the behavioral and social sciences. He also cautions that all methods of research have their strengths and limitations, and utilizing combinations of methods can create stronger arguments. For many, research tends to be a learned process of finding out useful methods and correct procedures. Rarely are people taught the discrete process of research in the non-mature sciences (mature sciences have the Scientific Method). This paper provides a good overview of the research procedure for the behavioral and social sciences, which is highly relevant for HCI and other fields of computer science where user studies relating to the usability of a system needs to be analyzed. Using a design space of physical referents (abstract vs. concrete) and obtrusiveness, the author shows that the 8 methods of research can be categorized accordingly, as well as where the three criteria for good research (Generalizability, Precision, and Realism) would lie in the design space. This paper was helpful for finally combining all of the ideas about research that I have encountered into one readable chapter. As an undergraduate, I was immediately thrust into doing research without much understanding of the process or why I had to take certain steps. It is much clearer to me after reading this chapter why certain procedures are done. Most of the research I have previously done falls under experimental simulations in the attempt to provide realism while also providing some level of precision in the data. Generalizability tends to be difficult in intelligent tutoring systems because the learning style tends to be dependent on the domain that the system will teach. Without this chapter, I would not have been able to reason about the choice for using experimental simulations. Review of “Evaluating User Interface Systems Research” by Dan Olsen Jr. The author compiles a paper that supports continuing user interface systems research. However, he notes that in order for research to continue to be productive in the future, the previous assumptions made of users need to change and the methods of evaluation need to change. I found the first point the author makes quite controversial. He mentions that “[W]indowing systems are designed to deal with a populace who had never used a [GUI]. That assumption is no longer valid.” While it is true that a generation of children and young adults who were exposed to technology almost all of their life are growing up, that still does not hold for those who are above 50. Even though many seniors have had computational experience with GUIs, many will continue to be unable to deal with newer systems. Admittedly, it makes little sense to continue to research GUIs for a population that will eventually no longer exist, but at this point in time, it seems shortsighted and immodest to disregard that population entirely, especially since those of the younger generation may benefit from the research that applies to an older generation in a few decades. The remainder of the paper is fairly straightforward and intuitive. There seems to be a lack of easy-to-use and easy-to-express user interface toolkits, especially since the common complaint about user interface programming is the difficulty of getting things right, even though the level of expressiveness is quite high. I wish the author had given more thought to this particular gripe, but technically speaking it is not a big issue. The author seemed to put a lot of time in explaining the criteria of good user interface toolkits and the fallacies that have resulted from longstanding practices. This list of criteria is important and greatly appreciated, especially since the fallacies that he presents in the paper are indicative of the wrong mindset that researchers have when developing new toolkits. However, I wish that he gave examples of user interface toolkits and what improvement they need to make them better. This would not only allow readers to be able to visualize what types of changes are needed, but also allows readers to concretize the abstract criteria that the author presents for good user interface toolkits.

phuongpham 21:05:29 10/27/2014

Methodology matters: doing research in the behavioral and social sciences:the chapter introduces and give insight discussions about a research study process. A design space of 8 different research strategies is given. After that, we would wonder which type of research question we are dealing with by comparing to 3 types of research question. Many measurements as well as ways to manipulate variables are also mentioned. From the chapter, tt is really hard to conduct a "true experiment" where all factors are considered. However, the chapter provided some suggestions that we can do to make the study more objective, credibility. From conducting to interpret the results. Therefore, we would not come to an invalid conclusion about the current research questions. ***Evaluating user interface systems research: this is an interesting paper. I like and dislike the paper at the same time. The paper has raised a question to address new requirements from new technology, i.e. touch based interaction. The author has pointed out usability has flaws and it is not the only way to evaluate an interactive UI system. What interesting about the paper is the author not only showed us how to evaluate a UI system, he also showed us how to conduct a UI system study, how to write a paper about a new UI system. All main points mentioned in this paper are the questioned to be addressed of when writing a paper or conducting a study about UI systems. Moreover, many points can be generalized to other research areas. I found the STU (Situations, Tasks and Users) very cool. The points needed to be addressed in almost every paper. Another interesting point about the paper is there are only 3 recitations. The author almost mentioned no previous work in the paper. On the other side, I also dislike the paper. The author raised an interesting question about the need to have new evaluation system for new machines, off-the-desktop. However, he did not answer the question directly or completely in the paper. All the mentioned arguments are also true for a desktop, old pointing GUI system. Last but not least, according the the Importance section, the author may want to argue why do we need new evaluation system for new technology, what have not been done correctly with current evaluation system? What will we gain if we have new interaction systems based on new technologies. Otherwise, "people will not discard a familiar tool and its associated expertise for a 1% improvement"

Bhavin Modi 0:24:07 10/28/2014

Reading Critique on Methodology Matters: Doing Research in the Behavioural and Social Sciences The paper discusses the research methods available for research in behavioural and social sciences and their pros and cons. A detailed analysis of research techniques and the things to keep in mind while gathering statistical data. Research in this field involves bringing together 3 things content, ideas, and techniques. Content refers to the behaviour you want to study and is worth your attention, the substantive domain. Ideas refers to the attitudes and behaviour that give meaning to our results, the conceptual domain. Techniques are the empirical or practical procedures for assessment, the methodological domain. The main focus is on the methodological domain here. To start with the various research strategies are discussed to maximize three important criteria’s, generalizability, precision and realism. The discussion continues as the three criteria’s cannot be maximized simultaneously, maximizing any one reduces one or both of the others. The diagram Fig. 2 The strategy circumplex in the paper clearly illustrates this problem and also gives us an overview of the four quadrants (The techniques for research). These techniques are the discussion for the remainder of the paper, they are Field, Experimental, Respondent, and Theoretical strategies each having two sub-parts. The main idea is that none of these strategies alone can prove the behaviour we want to study, they all have certain weaknesses and strengths. A good combination of them should be figured put for evaluation purposes so that the results are robust and valid. The weaknesses of one is masked by the strength of another. Moving on the comparison techniques are taking into account, the correlation of variables for an experiment is very important to know, the baserates are another important factor needed to prove a point as shown by the child birth defect rate example. The validity of the techniques to find such correlations are internal, construct, external validity and the threats to validity. The use of randomness in selection and allocation is shown as necessary for maintaining the generalizability of the experiment and also to account for unknown variable correlations that may confound the result. Finally the types of measures are taken into account and explored, self-reports, trace measures and archival records. To conclude there is much to learn from the paper, in terms of which research techniques to use and their where they lie in the design space for such techniques. Accumulation of evidence is an important factor and should not be viewed as a limitation by researchers. -------------------------------------------------------------------------------------------------------- Reading Critique on Evaluating User Interface Systems Research The approaches for evaluating complex user interface systems of the future that move away from the traditional GUI systems. The approaches are not novel but are not recently in favour due to stability in the current windowing systems. Research in UI has been continuous throughput the nineties until the advent of the GUI based window system, brought about by the Macintosh, Windows and Linux. They evolved from the command line interface, making usage more natural even for non-programmers. The window-mouse-keyboard approach is the standard today and has led to many innovations too like the pen based approach inspired by the mouse. They have become the standard interfaces of today and a considerable force of change will be required to move onto newer interfaces as people have got comfortable with the existing systems and try to improve it. The need of UI systems architecture is important as it leads to reduced development viscosity (ability to iteratively develop), least resistance to good solutions, lowers the skill barrier and provides power in common infrastructure. The author discusses the innovation of new user interface techniques that evolve from the current systems to something even better. Such systems are complex and the existing techniques are not enough to evaluate these, the errors due to misapplied evaluation are the usability trap, the fatal flaw fallacy and the legacy code. As such the approaches for new tools are entailed. The claims made by such systems should compare to STU: Situations, Tasks and Users, for claims such as importance, problem not previously solved, generality, reducing solution viscosity, Empowering new design participants, power in combination and can we scale it up? For reducing viscosity again multiple solutions are present flexibility, expressive leverage (reducing the total number of choices a designer must make for expressing the solution) and expressive match (hexadecimal colour representation or colour picker). The current paper together with the previous one today are responsible for creating an awareness and understanding the various research methodologies and evaluation techniques. The questions we should ask ourselves before delving into research and designing the framework to show the worth and viability of our research.

changsheng liu 0:36:07 10/28/2014

<Methodology Matters: Doing Research in the behavioral and social sciences> the paper is about the tools used for psychology research. It discusses limits and strengths of various research techniques. As the instructor Jingtao always emphasized: Everything is good at something and bad at other thing. Doing research contains three domains: The Substantive domain, from which we draw contents that seem worthy of our study and attention; The Conceptual domain, from which we draw ideas that seem likely to give meaning to our results; and The Methodological domain, from which we draw techniques that seem useful in conducting that research. The conclusion is not only useful for study in HCI, but also in a variety of other areas. The conclusion includes:(1) Results depend on methods. All methods have limitations. Hence, any set of results is limited.(2) It’s not possible to maximize all desirable features of method in any one study; tradeoffs and dilemmas are involved.(3) Each study must be interpreted in relation to other evidence bearing on the same questions. Hence any evidence is to be interpreted in the light of the strengths and weaknesses of the methodological and conceptual choices that it encompasses. <Evaluating User Interface Systems Research> describes the method we can use to evaluate new user interface. Simple usability testing is not adequate for evaluating complex systems. The problems with evaluating systems work are explored and a set of criteria for evaluating new UI systems work in presented in this paper. The paper first describes evaluation errors that we should avoid in testing. The first one is Usability trap. Many usability experiments are built on three key assumptions. The first is “walk up and use.” This assumes that all potential users have minimal training. This is a great goal for home appliances and for software tools used by many people. The “walk up and use” assumption does not work well for problem domains that require substantial specialized expertise, such as user interface programming or design. The second is the standardized task assumption. To make valid comparisons between systems one must have a task that is reasonably similar between the two systems and does not have many confounding complexities. The third assumption is scale of the problem. The economics of usability testing are such that it must be possible to complete any test in 1-2 hours. For a small team of researchers, finding flaws is very difficult. It is hard to anticipate all of the code paths a user will take. If research focuses on what a system cannot do, flaw analysis will be a barrier to systems research. Legacy code can also be a barrier to new systems research, since most UI research still uses it. This paper presents STU, situation, tasks, and users. It forms a framework for evaluating the quality of a system innovation. STU is very interesting and it contains several components. For example, Importance, it means before all other claims a system, toolkit or interactive technique must demonstrate importance. Generality means the more general the tool the less likely one can demonstrate all of the possible solutions for which the tool is useful. In general, this paper brings up some good points about evaluating user interface systems.

Longhao Li 0:50:20 10/28/2014

Critique for Methodology Matters: Doing Research in the Behavioral and Social Science This paper talks about the methodologies about doing research in behavioral and social science by the timeline order of conducting research. It tells us the specific knowledge that needs to be understood in each step of doing research. This is an important paper because it teaches readers in detail about how they can do in different stages of doing research in behavioral and social science. In general, people need to choose what they want to research, contents came from substantive domain, ideas from conceptual domain, and technique from methodological domain. Also people need to think about the strategies to use, like the setting for the study, which involved a lot of different categories of strategy, like field of study, field experiment and formal theory etc. Also this paper talked about the points that need to care about when conduct study and experiment, like how to do sampling, how to make sure the result can reflect the reality, and how to use the data to conclude the result. All of this guidance is very useful for researchers. New researchers like us, new PhD student can benefit a lot from it. By using my experience that involved in doing research, I do think the technique about doing study are important. When I am doing a user study of a research project, I got a lot of knowledge about how to design user study and how to conduct it. Controlling the variables are important, designers need to carefully control the variables to make sure the experiment can carry out strong result that can support the hypothesis. But to my understanding, even a good design of user study can lead to some useless result. Researchers needs to be patient and don’t afraid of failures. Try to believe that there must be one day of success waiting for you. Critique for Evaluating User Interface System Research This paper basically talked about how to be able to evaluate complex system and new UI system. The paper also point out some evaluation traps that people may make. Evaluating complex system is hard, so do the new UI system. Like the author talked about, computer is changing due to technology development. New input method shows up, and current technology has been improved. These changed lead to the motivation of improving the evaluating system. The author first point out three errors that people make when doing the evaluation: the usability trap, the fatal flaw fallacy and legacy code. Usability trap is about misevaluated the usability of system. Fatal flaw can help to detect error, but for some complex system, doing this will make no research system can pass it, which means that this evaluation will prevent some good system to be accepted. Legacy code standard is not suitable for some brand new system since it will give some limitations on innovation. Then the author talked about how to evaluate effectiveness, in which the author introduced STU: situations, tasks and users. By using this modeling, we can determine the importance, generality etc. of the system. It is important for the evaluation. I think this is a great paper, because it point out one aspect that we need to think about when facing the changing of computers, which is the evaluation. It is important for the development of computer science. The evaluation result can determine if some new idea can came out to make computer technology become more advanced. The method that the paper introduced contributes a lot for the evolution of system evaluation so that I think it is an important paper for the development of computers.

Qihang Chen 1:00:53 10/28/2014

The paper Evaluating User Interface Systems Research focuses on detecting the problems of evaluating UI systems and presenting the correct criteria. First, the authors talk about the importance of UI systems research and the existing barriers. Then, what presented are the values added by UI systems architecture, including reducing development viscosity, lower skill barriers and enabling scale, etc. Before exploring better ways to evaluate interactive software techniques, the paper presents some ways that misapplied evaluation methods that damage the field, including usability trap, fatal flaw fallacy, and legacy code. Next, the authors discuss about the metric of measuring the success of a toolkit: solve a problem not previously solved, generality, and empowering new design participants. The main contribution of the paper lies on the clarification of the barriers facing nowadays UI systems research and shows a variety of alternative standards by which complex systems can be evaluated. This paper is of great importance for later research as it provides efficient ways to evaluate the meaningful progress and thus guide researchers to perform deeper study. The flaw I can find is about the legacy code which is called 'barrier to progress' by the authors. I do think, legacy code is useful for later development.

SenhuaChang 1:24:30 10/28/2014

Methodology Matters: Doing Research in the Behavioral and Social Sciences The article presents basic methods for carefully carrying out studies. The author provide some strategies and did comparisons. Everyone has its own pros and cons. The author also presents several comparison techniques to let us draw conclusions reliably, which is the most userful parts I think. By using multiple methods and carefully apply them in a way that letting the weakness be offset by the advantage of each other, we can add credibility to the resulting evidence if results are consistent. How to validate our study by analyzing the result scientifically (comparing correlation and difference). Which is the methods this article discussed is also a part I interesting. We always forget and don't know how to make inference from what we got as the result to the goal/ hypothesis of our study. Evaluating User Interface Systems Research This article basically described the criteria we should take in evaluating new user interface systems researches. The most impressive argument the author made to me is that we should not evaluate new systems based on their usability. The argument basically says that it is too hard to do so. Instead, the author pointed out other criteria that we should note in designing such systems. This paper also tells us important aspects that a valid evaluation of UI should consider. This also provide guidance for us to examine our thoughts for the design stage. For example, we should have a clear targeted user population, select representative (important) tasks for our tasks oriented design; consider the generalization issue within an acceptable scale; make our design effective and efficient (this is the most important part of design according to my opinion); decompose the tasks into basic and shared actions as modules that provide intuitive and easy communication for implementation concerns.

Yingjie Tang 1:46:01 10/28/2014

The article “Evaluating User Interface Systems Research” gives a variety of alternative standards by which complex systems can be compared and evaluated. After reading this paper, I realize that the challenges to evaluate an user interface system are really there and computer scientists came up with some practical solutions to them. Before discussing about the challenges in evaluating user interface systems, the paper first address some principles of designing a successful UI system. What impress me most is the principle proposed by Bill Buxton that good interfaces should have lower skill barriers. It means that user interfaces should be easy to use by those who are not computer scientists, like artists and designers. I can not agree more with this saying. Even for us who majoring in computer science, a sophisticated user interface will bring us a lot of work load than a low skill barrier one. On the GUI design, I am for the C# much more than JAVA. It save me a lot of time to learn the interface in code while I go for C# because what I have to do is just to drag the graph into the center of the design board. And he also mentioned that there exists the fatal flaw fallacy which means that people tend to evaluate small interactive techniques or small behaviors to carefully examine all of the possible ways in which the validation might be wrong and he drew a great concern on it. Before I start research, I always think that research should be strictly correct and should be a serious thing. Yes, science itself means correctness, but if we always test a new technique from every aspect, it would prevent some novel ideas from booming. The hot topic “mobile gesture” draws a lot of attention these days, and there are countless scientists working on this topic. If every scientist push the gesture recognition a little bit to the accuracy, then the aggregate result will definitely be wonderful. ——————————————————————————————— When I first see the title “Methodology Matters: Doing Research in The Behavioral and Social Science” I was quite confused of reason why Prof. Wang put this paper on this chapter, because it is an article in social science and it seems nothing to do with human computer interaction. However, when I finish the first 6 pages, I realized that the method for measuring, comparing and validating social sciences are comparable to those in the various of user interface systems. I remember that Prof. Wang once said that the bottom line of modern computers is no longer computers, and it is humans. In this case, the user interface nowadays are no longer those of keyboard and mouse, they includes a wide range of inputs and outputs. From the robotic gloves to the computational hamlet, from single person interface to multi person interface, the evaluation for user interfaces faces a big challenge. When we evaluate the traditional user interfaces, we normally have three assumptions: “wake up and use”, standardize tasks, and experiment in a small scale(This perspectives are from the prior article). However, with the modern user interface, all of the three assumptions are all invalid. Thus, we making a reference from the social science about the validation methods is viable. Another reason is that human plays a much more important role in modern user interface. When comparing user interfaces, baserate really exists. Some times we have no access to know some user feedback for the user interface. For example, whether users are willing to user finger print as the approach to unlock the smartphone or users tend to user other keys to unlock the smartphone. Because only a few people in the world get contact with this technique, and the questionnaire from the very few people who have a iPhone 5S or a galaxy 5 can not represent the whole 6 billion people in the world. The correlational question can also be adopted into the evaluation of user interface systems. We analyze the same problem about fingerprint. We can set X as the time a user uses fingerprint and set Y as the utilization of the smartphone. Password approach takes more time than fingerprint approach, which may prevent users from using the smartphone for some simple tasks. Randomization can also be adopted to user interface system evaluation. Since human plays a important role in modern human interface, the numbers of factors increases tremendously. So it is really important to randomly set make some assignment for some trivial factors.

Yanbing Xue 2:16:09 10/28/2014

The first paper examines the way that data is obtained and its influence on the data itself. Methodology is the way in which data is collected through experiments. The setup of these experiments shapes the data itself. There are a few techniques used to manipulate the experiment such as giving instructions, imposing constraints, selecting materials, and giving feedback. By altering these conditions the results can be compared to see their effects. Looking at these methods, it is clear that they present both opportunities and limitations. In these experiments we refer to the actors, behavior, and context as the who, what, and where. There is the looming three features of generalizability, precision, and realism. These three are always being traded for each other. Research strategies can be divided into four categories: field strategies, experimental strategies, respondent strategies, and theoretical strategies. Each type is flawed in some way. The best research combines multiple types to cover these flaws. By using randomization, we can assign certain actors to certain groups. In this way, we can factor out any sampling bias that may occur. The ways in which data can be obtained include self-reporting, trace measurements, observations, archival data. The author discusses the strengths and weaknesses of these different data sources. Overall, I felt that the paper touched on a number of important issues regarding research. ========== The second paper claims that interactive innovations in UI must have a STU context; that is, they must clearly have a set of (S)ituations, (T)asks, and (U)sers that they are trying to appeal to. Normally, new systems are evaluated on how usable they are. The author explains that there are pitfalls to this approach, particularly because users will always be biased towards the systems with which they are already comfortable. Instead, rather than evaluating based on usability, we should evaluate systems based (i) how general they are, (ii) how viscous they are, (iii) how well they empower new design participants, and (iv) how well they can be extended to form new and better solutions. The author admits that the techniques that he's suggested are not novel, but that they have been lost on the UI systems community for some time. Unfortunately, I tend to disagree that abiding by these techniques will make UI systems easier to evaluate. Rather, these will suffer from the same pitfalls. Clearly, this is just my opinion. But that's the problem with this paper. No novel techniques are proposed, and there is no scientific justification that any of the existing techniques the author has suggested will improve the state of UI evaluation. The entire paper reads like an experience report rather than a true research paper.

zhong zhuang 2:21:24 10/28/2014

This book chapter is about methodology of behavioral and social sciences. The author makes a introductive explanation of the three basic components in behavioral and social science, the author defines them as three legs of a stool, or three domains of behavioral and social science—substantive domain, conceptual domain and methodological domain. The substantive domain is the basic elements in behavioral and social science. They are called phenomena and patterns of phenomena. They are the object of the study. The conceptual domain is the property of these phenomena. These might include some familiar ideas such as “attitude”,”cohesiveness”,”power”,”social pressure”.”status”. There are also relations in the conceptual domain, such as “causal”.The methodological domain is about methods of study these elements. they are called modes of treatment. The book emphasizes on the third domain – Methods. Methods are used to gain knowledge about some set of phenomena, but also methods are used to limit such knowledge. All methods are inherently have flaws. There is no way to completely avoid these flaws but you can choose multiple methods to try to offset the different flaws. When using a method, one always wants to maximize three properties of subject, -- generalizability, precision and realism. But the truth is these properties are contradictive between each other, for example, to maximize generalizability, one will lose precision. To maximize precision, one will also lose realism. So methods are categorized into for quadrants, they are experimental strategies, field strategies, respondent strategies and theoretical strategies. Each strategy is specialized in one or more domain. In sum, methods are very important , but all methods have limitations. It is not possible to maximize all desirable features of method. There are tradeoffs and dilemmas.

zhong zhuang 3:13:13 10/28/2014

This paper introduces a new way to evaluate large and complex UI interface system or toolkit. In most publications about UI design, the author will provide a usability test. In such test, some standardized tasks are defined and a group of users are invited to use the new system with these tasks. The completion time is measured and compared with old system or similar system, the result is based on these facts. This paper claims that this is not a good way to measure complex UI systems and UI toolkits, first, because these complex system is designated to a small experienced people, so the “walk up and use” is not hold, then there is no standardized task for such complex system, for the UI tool kit, its task is to create an UI solution, this can’t be standardized. The third problem is the scale of problem, these complex system testing can easily take 1-2 hours to complete, so if traditional usability test is used, it can easily cost hundreds of thousands dollars. In this paper, the author present a new way to evaluate these complex UI systems. First, the author introduces a new term STU – situation, task an users. This is the basic metric of a system. who are the users of these system, what are the tasks of the users who use this system, in what situations will the users use this system. Then importance is measured based on STU, importance in U means how much is the user population or how critical the user population is, importance in T means how important the tasks are or how many tasks can the system solve, importance in S means how often the system is used. Besides importance, another critical consideration is whether the system can solve problems that are not previously solved. The third measurement is generality, the new solution claim is much stronger if there are multiple populations that each have multiple tasks wants to use the new system. The last but most important one is Reducing solution viscosity. the general idea of this criteria is can the new system eliminates many of the design choices. The author presents three ways to reduce solution viscosity: flexibility, expressive leverage and expressive match. Flexibility is about making rapid design changes. Although it is not reducing the choices directly, but, by reducing the time and effort to try out a choice, it is reducing the choices indirectly. Expressive leverage is about accomplishing more by expressing less. if a expression of choice Y is general across the entire design, then a tool should encapsulates Y. Expressive match is an estimate of how close the means for expressing design choices are to the problem being solved, for example, express a color in hex code is less match than express it in a color picker.

yubo feng 3:33:41 10/28/2014

In today's reading material, the first paper "Methodology Matters: Doing Research in the behavioral and social sciences" is about methodology. Author in this paper mention about what is the principle of studying certain interest and what is methodology. More specifically to say, there is several strategies to be focus on: like field strategy, experimental strategy respondent strategy and the theoretical strategy; these strategies are talking about what we should do in every step. I think besides these strategies, more things should be focus, one of them that is important is feedback: the researcher should have quick feed back in each step, whether he is wrong or right, or if the way he does is the optimal choice; cause if researchers could not find it, then it is a kind of disasters in his study, that why we need prototype; we could determine this by quick making a prototype, then could evaluate our idea and method. Then, in the second paper, "Evaluating User Interface Systems Research", authors evaluate several UI systems, then find what their flaws is and what their shining points are, even some way to improve. This paper is a implementation of mythologies' feedback: by evaluating these methods now privilege, we could find out new way to explore unknown fields and to determine which way is the best under which situation.

Mengsi Lou 4:45:50 10/28/2014

Evaluating User Interface Systems Research This paper discusses how to evaluate new UI systems that involves new devices and new software systems for creating interactive applications. -----------The values added by UI systems architecture are Reduce development viscosity, Least resistance to good solutions, Lower skill barriers, Power in common infrastructure, and Enabling scale. A good UI toolkit will reduce the time it takes to create a new solution. And UI designers tend to follow the path of least resistance. A related concept is that toolkits can encapsulate and simplify expertise. As Bill Buxton’s Menulay says, large portions of the UI design problem could be handled by drawing rather than code. So the right toolkit design meant that artists and designers rather than programmers were dictating the visual appearance of user interfaces. Laying stable foundations makes possible larger more powerful solutions than ever before. ------------I think the most interesting part is the evaluation errors. As for the The Usability trap, many usability experiments are built on three key assumptions. The first is “walk up and use” that assumes all potential users have minimal training. The second is the standardized task assumption. To make valid comparisons between systems one must have a task that is reasonably similar between the two systems and does not have many confounding complexities. The third assumption is scale of the problem. Usability testing is attractive because it can produce a statistically valid, clearly explained, easily compared result. Here are other aspects of usability, the fatal flaw fallacy, Legacy code, etc. -------------And also the systems cares about the STU that refers to the Situations, Tasks and Users. It is critical that interactive innovation be clearly set in a context of situations, tasks and users. The STU context forms a framework for evaluating the quality of a system innovation. And the way to reduce solution viscosity are Flexibility and Expressive Leverage, Expressive Match. -------------- To sum up, we should avoid the trap of only creating what a usability test can measure. We shoud also avoid the trap of requiring new systems to meet all of the evaluations required above. /////////////////////////////////////////// Methodology Matters: doing research in the behavioral and social sciences This paper tells about the method of doing research. Some basic features of the research process are some interesting content, some ideas that give meaning to that content and some techniques or procedures by means of which those ideas and contents can be studied. The method domain shows the importance.

yeq1 7:50:23 10/28/2014

Yechen Qiao Review for 10/28/2014 Methodology Matters: Doing Research in the behavioral and social sciences In this paper, the author gave a comprehensive overview of how to do research in social and behavioral science. The paper begins by categorizing different domains of such research, and provides a summary of pros and cons of each research method, and different types of research questions, different study designs, different evaluation of the study’s validity, and different ways to make measurements. I find this paper to be extremely useful in any areas that involves human subject research, which includes social science, HCI, privacy, and medical. First of all, the authors provided categorization of different research methods, which indicates how the research methods compare with each other in terms of control, generalizability, realism, obstructiveness, and concreteness. The paper had clearly argued why each strategies may be useful in its own right, yet with limitations. For example, one cannot always use formal theory and computer simulations to argue questions related to privacy, due to the fact that a) parameters for this area is too large to encapsulate into computer simulations, and b) no new behavioral parameters may be entered in the study and all parameters should be justifiable by existing literatures. This is why it’s a bit difficult to find papers for my comprehensive exam: lots of papers just keep making definitions and theorems, and make a bunch of theoretical evaluations without having any kind of user studies. (I also observed this is also inversely true in HCI: lots of papers did experiments without providing theoretical grounding on why it works.) While these papers are useful in their own right, a researcher must keep the balance when selecting research papers using different techniques to avoid accumulating too much disadvantages of doing research in one particular way. I also think that the descriptions of what is a research question may be helpful to the class since many may not know what it means in my experience. In terms of measures, the most commonly used are self-reports, observations, and archival records. I think it may be particularly helpful for many young researchers to see shortcomings in each one, especially the archival records. Often, people are taught to believe in published studies and existing research, without taking into consideration of the shortcomings of taking only information of this type. The paper only vaguely described the shortcomings: low versatility and high dross rates. So I feel like explaining it a bit more: the studies and the census data are collected and coded by people and resources the researcher has no control over. This leads to the problem of having potentially way too many unknowns to be used as a primary source of research. For example, the sampling procedure may be inadequate for the purpose of the study as very rarely are records created using simple random sample (almost impossible and cost prohibiting to do in a population census) and how the sampling is done may be unknown, different people code the answers differently (“adults”, “awareness”, “citizen”, “quality of life”, and “unemployment” may have very different meanings across different states and countries, and even among the coders themselves within one data set.) As a result, some parts of the data may be coded similarly to what the researcher want but some may not, even if it’s about the same concept. In addition, due to the fact that the researcher generally do not have systematic access of the sample, it is impossible to validate the result without having to redo the work, and it is often difficult to spot any red flags in the data. Lastly, relying only on such studies limits the set of parameters for research. An innovative research may come from an idea that is of yet conceptualized. Various parameters in cognitive ease of use was well understudied in engineering until only recently, as engineers are often solely focused on functionality, reliability, cost efficiency, and physical ease of use instead. Using archival data will never allow people such as Don Norman to derive all concepts necessary to formulate his ideas. Personal experiences and observations may be crucial in leading new studies that are helpful answering many such questions. Evaluating User Interface Systems Research In this paper, the author identified some of the problems regarding to evaluations of UI system research. Some of the glaring problems noted by the author were: too much focus on usability even though it may be insignificant when compared to cost of running it, problems with recreating systems to do evaluations. The author provides a list of evaluation metrics and demonstrated how they should be studies. In general, I think the paper is interesting, especially on how to conduct evaluations in STU. Even by carefully following the author’s suggestion, I think it’s still quite possible to run into the same problems the author had described. For example, if our goal is to reduce the time to execution one common task, it may be impossible for us to modify the task in such a way that is completely mechanical and achieve complete isolation of the system’s performance. This gets nicely to the previous paper: multiple techniques may have to be used in order for us to be confident on the result of a research question.

Xiyao Yin 8:29:31 10/28/2014

‘Methodology matters: doing research in the behavioral and social sciences ’ contains some of the tools with which researchers in the social and behavioral sciences go about ‘doing ’ research and raises some issues about strategy, tactics and operations. Content, ideas, techniques and procedures are sets of things involved in doing research in the behavioral and social sciences. We can see different levels of elements, relations and embedding systems comparing with three domains, including substantive domain, conceptual domain and methodological domain. The main idea in research strategies is to choose a setting for a study. It is good in this paper to show four strategies(field strategies, experimental strategies, respondent strategies and theoretical strategies ) in a circle figure and uses different arrows to show the relationship and changes between them. After carefully discussion, we have found that all methods have limitations, it is unlikely to maximize all features of method in any one study and we should interpret each study in relation to other evidence bearing on the same questions. These conclusion is convincing and needs to be taken care of during future researches. ‘Evaluating user interface systems research ’ addresses the lack of appropriate criteria for evaluating systems architectures. Before, there are still some misapplied evaluation methods which can damage the field happening in usability trap, fatal flaw fallacy and legacy code. In this paper, the author considers evaluation in situations, tasks and users and provides three ways in which a tool can reduce solution viscosity: flexibility, expressive leverage and expressive match. We can demonstrate effectiveness of tools by supporting combinations of more basic building blocks using an inductive claim or the N to 1 reduction. In this part, I find simplifying interconnection is quite a good method because it shows the relationship between N components and provide compelling examples of this simplification. This paper shows a variety of alternative standards by which complex systems can be compared and evaluated. These criteria are not novel but recently have been out of favor and avoid many previous problems. In my opinion, these criteria cover fundamental points in evaluating UI systems and it is useful to consider them in creating new UI systems.

Christopher Thomas 8:56:12 10/28/2014

2-3 Sentence Summary of Methodology Matters: Doing Research in the Behavioral and Social Sciences – This essay discusses techniques and features of the research process, exploring different “domains” relevant to the research process. The author first lists 3 domains and then lists 3 levels of concepts for those domains. Next, the author discusses various methods for analyzing research results, designing studies, etc. This essay is an informative read for PhD students because it explores many questions that we often take for granted, such as what questions should we be exploring in a particular domain? What research strategies are the right ones to deploy for our purpose? One of the things that interested me in the paper was the author’s suggestion that no set of results about a topic is enough and other evidence must be taken into consideration. For instance, if I was to develop an algorithm and run it on a test set, getting nearly 100% accuracy on a problem that was previously getting 40% accuracy in the literature, it’s very possible that my experiment is in some ways flawed, or rather, my results are not testing the same measure that those in the literature are. After all, the easiest person to fool is yourself. Thus, when considering research findings and conclusions, it is critical for us to not simply take what a paper says as gospel truth, but rather to evaluate its findings in the context of the larger body of knowledge, because, after all, it could be a fluke. Only after the theory or experiment is retested and verified does science progress – otherwise it is just an interesting conclusion observed by one group of observers. Something else that I thought was interesting in the article was the concept of internal and external validity. These are two differences that I had never previously thought about before. The author describes internal validity as basically how good the experiment was designed and run (the research protocol, measurement techniques, etc.). In other words, in experiments we often have independent and dependent variables. The question internal validity tries to get at is did the change in the dependent variable depend entirely on the independent variable in question – or are there other factors, such as poor experimental design, or bad choice of methodology that are causing the changes in results. This is always very important when designing an experiment. We must always think about what our evaluation strategy is actually measuring and whether or not it is what we think it is measuring. External validity is another concept discussed in the paper, which examines basically how generalizable the research is. Returning to my previous example, if my research results turned out not to be based on bad experimental design, it could simply have been a fluke of the data I chose. On other data, the technique may yield horrible performance. Thus, the technique is not generalizable to other contexts and domains. We must remember here that as we evaluate results from our own research or from a paper we are reading, we must ask ourselves how generalizable the technique is. If we are to find a paper which claims to have optimal performance, but requires knowledge of future events, we should be skeptical of the generalizability of the technique. Further, if we are conducting an HCI user study, and our sample size is 2 people from our office, then we must remember that the conclusions we draw from this “sample” are probably not representative of all people in general. Thus, we must always remember that not only are conclusions important but generalizability is as well. Finally, I like the concept of the “pie chart” that the author provided for ways of thinking about different research strategies. The author divides research strategies into ‘quadrants’ and then discusses the benefits / downsides of each type (such as field strategies, experimental strategies, respondent, etc.). I found reading this section to be very informative, because simply by looking up the type of experiment you were running, you could find the types of questions to ask during evaluation and guidelines for designing and running that type of study effectively. 2-3 Sentence Summary of Evaluating User Interface Systems Research – This essay explores user interface research and discusses the motivations behind why more research is needed. After motivating the discussion that user interface improvements are possible and many old assumptions no longer hold, the author then discusses some common errors that happen with user interface research, such as people designing UIs just to do well on evaluation strategies. I think the author makes a very good point in this paper – namely that many experimentalists limit themselves and their research to areas that will do well on most evaluation strategies. The author calls this the “fatal flaw” fallacy. By this he means that researchers may be considering or brainstorming some technique which is promising, but when they think about how it will be evaluated, they realize that the technique will do horribly in some area of evaluation and thus they ignore this technique because it will not do well in evaluation. However, we must ask ourselves, do the best user interface techniques need to do well on every area of evaluation? Is it possible that some technique is “best for something, worst for another?” Thus, even though our technique may do poorly on some evaluation area, perhaps we aren’t asking the right questions in the evaluations strategy? It could be that the technique is better for some specialized domain. For instance, the authors discuss how many UI designs get evaluated on the concept of “walk-up and use,” whether people with no experience with the UI can use it without much learning. However, do all interfaces really need to be able to be used in this way? In fact, by trying to make our designs perform well on “walk-up and use” tests, we may actually be doing those who actually use our interface a disservice, because they may be experienced and by making the design decisions we make we may create extra work for them using the interface. Furthermore, do all user interface toolkits really need to support legacy applications? Is this a realistic constraint given how much technology has change and given within the last ten years we have had a sea change from desktop to mobile computing architectures? I think it is an unreasonable assumption and limiting. The author also discusses a variety of factors to take into consideration for evaluating proposed systems. One thing I learned about was to think in the “STU” context – situations, tasks, and users. However, we must always remember who our intended audience for our product or design is. When designing a user interface toolkit, the target audience is very often programmers and UI designers, but when we design an android interface, for instance, our target may be the population at large. Each user subset poses unique challenges and opportunities for us to explore in our research. Similarly, we must remember what situations the users will be using the technology in. Can we take advantage of their situation in some way to improve the interface? How can we bring the context of the user into the user interface to improve it in some way, optimizing it for the situation they found themselves in? Thus, a generalizable solution is good, but can we add some improvements to a specific situation which improves it even further? However, as the author points out, it is critical that we are not too specific. If we only design a method to be optimal in one small domain, which is so specialized, it will probably not be used. In those cases, it is probably easier for users to just use what they are used to instead of learning a new design just for one small focused area. Convincing UI designers to use a toolkit just for a specific problem is also probably unreasonable. Finally, I want to point out that one of the things which I found most interesting about this paper was that it stated that most users would not discard familiar tools unless there was a large improvement – a 100% improvement is required to make someone change tools. I think this is truly a remarkable observation and proof that people become stuck in their ways (i.e. people using Windows XP 15 years later). However, I would like to see a citation for this assertion which is not cited as it stands.

Jose Michael Joseph 8:57:00 10/28/2014

Evaluating User Interface Systems Research This paper is about the various techniques we can use to evaluate system interface and the various parameters that we should consider for the same. The author states that in the recent years there has been a decline in research in interactive systems. The main reason for this is that stabilized platforms have come up with their fixed user interface which people have got used to. This leaves very less room for new exploration and is in sharp contrast to the early days of the field where there were many toolkits and each had different designs. This has also led to the rise of generation of researchers who lack skills in toolkit or windowing system architecture and design. Another big criteria is understanding exactly how a user interface can be evaluated and this is the topic for this paper. The author states that in the earlier days people used to stress on making the maximum utilization of certain components which seem trivial now due to the growth of system architecture. What originally were constraints in a 250K pixel space do not pose any problems in a 10M pixel space. Yet these assumptions are built into the operating systems that we use these days are they are derived from the earlier operating systems for which such constraints were important. Most of the current windowing system we have were designed to be used by people who had no experience with GUI. This is not true for the current generation which has grew up with GUI and thus have a more intuitive understanding of it. The author states the various ways a UI system adds value. We use UIs to reduce development viscosity and what this means is that we try to reduce the time it takes to create a good solution. The more solutions are available to a user, the more effective the design process will be. UI designers also tend to follow the path of least resistance and design the UI in such a way that minimal work needs to be done to accomplish a task. Lower skill barriers states that even people without specialized skill in the direction of UI should be able to make effective UI designs. A common infrastructure would further empower the people by ensuring that the information is relayed in a similar way to all the users. The author states that there are three common evaluation errors which are: usability trap, fatal flaw fallacy and legacy code. Usability trap states that the designers often get caught up in providing good usability for the user especially when such usability is not clearly defined. This can lead to a long process without significant gains. The fatal flaw fallacy occurs when a small group of researchers tries to find every possible scenario that their design would not work in. Since this is an exhaustive list the researchers often get caught up in this and end up with very little gains. The legacy code is another trap where designers try to make their code compatible with older code. This is again a problem as many of the new features might not be portable to older coder resulting in them being left out from the design. One of the drawbacks of this paper is that it assumes incorrect information. In one section the author has stated that new users will only adopt systems if it is a whole 100% better than their existing system. Such assumptions are unfounded and thus dilute the research the authors have put into this paper. Another drawback is that there is no clear way to evaluate interfaces. Some interfaces might lack appeal but might be very efficient whereas others might look great but might run slowly. Thus the interface to be chosen is very application specific and this is one aspect that the author has not considered. The user interface designed is ultimately dependent on the users of the system and what they expect from the system. The author has stated though that the ideal way to limit drawbacks from various tools would be to use a combination of many tools. This is an idea that has been previously discussed in another paper and is quite correct an approach.

Jose Michael Joseph 8:57:38 10/28/2014

Methodology Matters: Doing research in the behavioral and social sciences This paper primarily talks about the various research techniques and the methods used to collect this data. It also talks about the strengths and limitations of each phase in this process and the various troubles that need to be overcome. It states that the basic features of research are content, ideas and techniques. Content is the quality that is being measured, ideas are the various factors that we think are affecting the content and techniques are the various ways we can study and analyze this information through different perspectives. Thus formally they consist of three domains: Substantive, Conceptual and Methodological which deals with content, ideas and techniques respectively. The various ways we can manipulate the features of the system we are studying by various techniques such as: giving instruction, imposing constraints, selecting materials, giving feedback and using experimental confederates. Each method that we choose to work with have their own inherent advantages and disadvantages. For examples the advantages of a self-reporting style of study is that it is relatively easy and we can acquire information easily but the drawback is that people can consciously choose to portray themselves differently. But we can nullify the weakness of one method by combining it with another method that has strengths in that particular direction. The author however fails to mention that a bad combination could result in us ending up with a method that has the disadvantages of both the methods. When gathering research data we must ensure we can maximize on the following features which are generalizability, precision and realism. Although we may want to maximize all three we cannot do this. This is because each of these features are dependent on each other and thus we would require to make a trade off depending on the context of our research. The research strategies of quadrant 1 are field study and field experiment. The difference between them is that field study involves the researcher collecting data obtained from the “natural” flow of events whereas in field experiment one or more variables of the system are controlled to understand the resulting response of the system. Quadrant 2 contains laboratory experiments which involve the participants entering in a setting that is entirely controlled. The point of such an exercise is to see the reaction that can be induced from the participants based on some predefined settings. Thus such a form of experiment allows us to understand the way a user would behave in a particular condition. The drawback of this method is that the realism of such a situation could be quite low. In order to gain more realism into the study we use another form of this technique called the Experimental Simulation. Formal Theory is the strategy that does not involve gathering any empirical data. It focuses of finding relations between the various desired points of interest. Another non empirical strategy is computer simulation. In this the system models the operation of an actual system but without any behavior by system participants. Comparisons always have to be made to make sense of the data. The comparisons we make are often determined by the level of inclusion of the study, the system being worked on, the relations of interest and the comparison techniques available. The author states that we must generally randomize our findings because while performing these experiments we might encounter various external factors that are not in our control. To ensure that the results of our experiments are not biased by these external factors we must conduct various randomized experiments. But we must note that conducting randomized experiments will exhaust far more resources than traditional experimental settings and this is because we would need new subjects to put through the various tests. Putting the same subject through various randomized tests is useless as after the first experiment the subject is already biased to one particular outcome.

Vivek Punjabi 9:26:03 10/28/2014

Methodology Matters: Doing Research in the behavioral and social sciences: This chapter describes some basic features and methods of doing research in the behavioral and social sciences. There are three basic features that the author says to include in every research process, viz. some content that is of interest, some ideas that give meaning to that content, and some techniques or procedures by means of which those ideas and contents can be studied. These features are formally referred as three domains distinct but also inter-related domains, viz. the substantive domain, the conceptual domain and the methodological domain, respectively. The author describes all the possible methodologies that can be considered while doing research, such as various research strategies, comparison techniques, validity, statistical inferences, classes of measures, random experiment, etc. The thorough analysis of each aspect of research is given in this chapter along with some good motivation, The topic I liked the most was the strategy circumplex which shows various research strategies with respect to their criteria. These explanations can guide any researcher on the right path and even provide motivation. Evaluating User Interface Systems Research: In this paper, the authors have provided criteria and methods to evaluate several user interface system, especially complex. The author gives the need, importance and motivation for evaluating user interface systems. There are three basic ways of an evaluation going incorrect: the usability trap, fatal flaw fallacy and legacy code. The usability traps include the usability experiments that lead to incorrect results due to certain assumptions such as assuming the user to be familiar with the system, unstandardized tasks and scale of the problem. Fatal flaw fallacy is the inability of the small team of researchers to recreate all the capabilities of an existing system or examine all the eventualities of an existing system. Legacy code is the standard that many people still invoke for UI systems research even though most of it has become irrelevant. The author then provides some other ideas to evaluate such systems which include: address every goal in form of STU(Situations, tasks and user), establishing importance of the problem and its proposed solution, generalizing new solutions, making tool as expressive and flexible as possible i.e. reducing solution viscosity, considering new design participants, combining and integrating several variations of a claim to create a more powerful whole, and scalability of the system, Thus, the author provides some brief ideas about techniques to evaluate UI systems and tries to focus more on progress rather than just results. The approach and topics covered provide a good range of concepts to consider while evaluating papers and tools. The idea of using progress as a measure for calculating performances is interesting and motivational. A thorough analysis of every topic in the paper would have been ideal.