Multimodal Interfaces

From CS2610 Fall 2014
Jump to: navigation, search

Contents

Slides

slides

HW1 Demonstration Order

  • Senhua Chang, QihangChen
  • Xiaoyu Ge, Longhao Li
  • Eric Gratta, Nick Katsipoulakis
  • Andrew Menzies, Vivek Punjabi
  • Nathan Ong, Jose Michael Joseph
  • Phuong Pham, Wenchen Wang
  • Yingjie Tang, Modi Bhavin
  • Chris Thomas, Brandon Jennings
  • Xiyao Yin, Wei Guo
  • Qiao Zhang, Yechen Qiao
  • Zhong Zhuang,Yanbing Xue,Changsheng Liu

Readings

Additional Readings (Reading critiques not required)

Reading Critiques

Eric Gratta 18:19:40 9/28/2014

Designing SpeechActs: Issues in Speech User Interfaces (1995) Nicole Yankelovich, Gina-Anne Levow, Matt Marx The goal of this paper was to expose some of the difficulties that come with designing a speech interface by describing what the authors experienced while testing their own, called SpeechActs. They use the abbreviation SUI to refer to Speech User Interfaces, as opposed to GUI/Graphical User Interfaces. The authors’ motivation is their belief that conversational speech is a better alternative to menu-based telephone systems that are tedious for users. The application’s functionality included that of many general office applications, targeted specifically at traveling professionals who would need information on-the-go. They used a series of user studies and redesigns to assess the interface as well as the speech recognition system, all of which was detailed in the paper. This paper contributed a detailed exploration of the challenges that come with SUIs, an exploration which probably influenced the development of SUIs that followed its publication. Simulating conversation seemed to be the key to a usable speech interface. This includes transitional prompts, sharing a common context between speakers, and more sound-oriented details like inflection and intonation. The telephone systems at the time could not reproduce human-sounding speech very well. The limitation of phone systems also prevented users from interrupting the system with their own voice, since the system would not receive audio while recording into the phone. Ironically, the authors proposed that keypad shortcuts be available so that advanced users may skip over familiar prompts, although this called into question the extent to which the authors were striving for a speech-only interface versus a speech-optional interface. A significant observation was that GUI interfaces do not translate well into SUI interfaces. Since SpeechActs was trying to give speech access to existing GUI applications, it was made clear via the design cycle that the GUI workflows did not transfer to conversation successfully. Each design iteration tended to push the interface toward a conversational style. I thought it was helpful and responsible that the authors addressed some issues regarding how plausible or useful a speech-based interface might be. Importantly, speech interfaces do not provide the same level of freedom to users; users feel compelled to fill silence and take action, whereas in the use of GUI applications, users are free to pause and think as well as explore uninterrupted. Unexplained was why the women were less successful at using SpeechActs. I would not have expected that result, and it seems unfair to have left that statistic unexplained. Was the third-party speech recognizer more adept at interpreting men’s voices? ------------------------------------------------------------------ Multimodal Interfaces (2003) Sharon Oviatt This paper is a survey of past, present, and future research in multimodal interfaces. Multimodal systems are defined as systems that are capable of accepting more than one mode of input in synchronization. These systems have become possible because of a wide array of new input devices. The author predicts that these developments, eventually, will lead to systems that have near-human sense perception. I’ve pulled out many of the novel and core topics addressed by this paper for this review. A distinction is made between active and passive input modes; in passive input, there may be sensors monitoring users’ behaviors to make decisions without explicit user commands to the computer. This lends itself to the discussion of “blended” multimodal interfaces, which blend the use of both passive and active modes and may temporally cascade the modes such that each modal interpretation influences the interpretation of the others. A unique benefit of multimodal interfaces, blended or not, is the concept of “mutual disambiguation.” In a unimodal system, a single stream of input is being interpreted and thus there is no context for checking against recognition errors. However, if two or more input modes are operating and the system receives data from both, an error in one mode may be detected by comparing it against the processing of the input from other modes, either at the feature level or the semantic level (post-processing). This improved error recognition improves the stability of the interface and can make the user’s interactions more efficient. The author points out the need for multimodal user interface toolkits to alleviate the complexity of designing multimodal prototypes in the future. Multimodal interface designers need to pay heed to the fact that users are only likely to act multimodally in certain situations and also may switch between unimodal and multimodal acting depending on the cognitive load that they are experiencing. An important note by the author is that research needs to explore further the temporal relationships between natural modes of expression (gaze, gesture, speech, and within those) such that advanced multimodal systems can take advantage of those relationships by anticipating them. Most of these human expressions are not simultaneous (but are synchronous). Further, cooperation will need to be made with researchers in the cognitive science field because of the complexity and non-intuitiveness of these relationships. Natural language processing needs to adapt to be more suitable to the way people speak in multimodal systems.

Yanbing Xue 14:26:45 9/29/2014

The first paper performs an investigation on the design of speech user interfaces. In this paper, the authors perform experiments to see peoples’ reaction of speech user interfaces. And for the problems they found, they propose ideas to solve them. This paper is based on SpeechActs, a research prototype of speech user interface. According to the authors’ user study, they found out that most users do not the over feedback provided by the system, and they also found that recognition rate is not strongly related to user satisfaction. Thus, SUI is not just graphical user interface translated into voice. Obviously, new input methods rather than keyboard and mouse have been gaining much interest as so much benefit they can bring to users. User interface with multi-input will cover broader range of users, accommodate to differences in ability and preference, continuously changing of computing conditions, and so on. Those all makes the paper very interesting in term of studying new input methods, speech interface in particular. Besides, speech is a natural medium in human communication, hence incorporating speech in user interface has been studied in a long time and has wide application in reality, from military to health care and telephony and many everyday programs. SpeechActs wanted to make the email, as well as other services, available to its user. It did so using a touch-tone phone. The user would call in and the system would let them converse their way through the interface. If the user wanted to reply, they would just speak the message they wanted and the system would send the message to the recipient. Two other important elements in conversations are intonation and pacing. These are features that the system couldn't accurately simulate what people do in real life. The lack of good prosodic makes the void sounded electronic and choppy. The noticeable long pauses in conversation cause users to slow down their reactions and feel unnatural. The second challenge is that translating a graphical interface into speech is not likely to produce an effective interface. The reason lays in the problem that vocabulary, information organization and information flow cannot be transferred well to SUI. The third challenge is speech recognition errors. It includes refection error, substitution errors and insertion errors. In order to avoid the "brick wall" effect in case of rejection errors, the authors uses progressive assistance. Explicit verifications are employed to limit the chances of being damaged by misunderstand users' commands. And to prevent insertion errors, users can press a keypad command to turn off the speech recognizer. The fourth challenge is the nature of speech. Because of lack of visual feedback leads users to feel less in control, and need to respond quickly without enough think time. ---------- The second paper is mainly about the state of the art in user input which encompasses more than one method for input. The first part of the paper introduces the concept of multimodality, and motivates the development of such interfaces. By using more than one type of input, such as a combination of speech, pen, and vision, the input can be more robust. For example, a multimodal interface may take employ both speech recognition and eye tracking to provide a satisfactory user experience. Research on multimodal interfaces began as far back as the early eighties, and combined speech recognition and touch sensitivity to allow a user to move blocks in a virtual world. For me, I believe multimodal interaction is the way of the future. For example look at the ways iPhone users can interact with their phone. The iPhone used touchpad keyboard input, speech recognition with Siri, and apps have the ability to optical recognition to scan pictures as a user input method. When reading this paper, I have a much clearer view of the structure of this article, as well as understanding of focus of each part. This article actually introduces everything involves in multidimensional interfaces, and make itself a manual for the future researchers.

Xiaoyu Ge 22:35:07 9/29/2014

Multimodal Interfaces The paper introduce multimodal interface as a system that combines different user input model along with multimedia output which will lead to a goal of supporting more transparent flexible and efficient human-like sensory interface. And one of the most significant benefit of the multimodal interface is it`s advance error handling ability. However, it requires great architectural changes of normal graphic interfaces, which should be able to process more than one recognition-based input stream. As introduced in the paper, the multimodal interface concepts are still in research-level development, since merges variety of different classification algorithms make the development even more challenging. The concepts author introduced in this paper seems to be pretty in advance. Some of current new technologies have already made used of similar concepts in order to achieve high accuracy and combine several recognition algorithms in order to reduce the error rate. I am working on a project, which make used of a multimodal interface device Kinect. Kinect is a Multimedia device developed by Microsoft combing voice, gesture and motion detection at the same time to recognize user`s activity. Since it make used of multiple input sources, combining recognition algorithms of different input sources, it surely achieved a higher level of accuracy then most other recognition devices in the market. Designing SpeechActs: Issues in Speech User Interfaces The author introduced a research prototype SpeechActs in order to introduce speech interface design challenges, and addressed approaches to solve these problems. The challenges are based on the result of user tests. Firstly, there are compatibility problem of regular GUI interface with SUI, vocabulary and information organization are not transferred well, so the author tried to support sentence structures to solve the vocabulary problem, reorganized the information base on the user`s need, and use directive prompts to redirect the error information flow. Moreover, information flow is very confusing. Secondly, speech recognition errors defined as insertion error, ejection errors and substitution errors. And the author managed to reduce the effect if these errors. Thirdly, the challenge of lacking visual feedback which result in rely on user`s memory, the paper introduced filters to narrow down the information for the user, and as for the slow speech output and lack of persistence the paper focus on make the conversation as briefly and orderly as possible. And the author added audio cue to solve the ambiguous silence problem. The problems introduced by the author are common problems in the user interface design for speech. Sari, for example, as the most widely used conversation interface have an interface different from normal graphic interface and used similar solution such as directive prompts to solve it`s reduce it`s recognition error rate. The problems and solutions in this paper are based on third-party recognition software, which do not have very high recognition accuracy. Some of the problem solving techniques requires user to provide more activities to correct the system`s behavior. Better recognition algorithms, and supporting multimedia input will greatly reduce error of the system and redundant user activities.

Nick Katsipoulakis 22:50:04 9/29/2014

Multimodal Interfaces : In this document, the idea of multimodal interfaces is presented. This kind of UIs allow the user to interact through multiple channels of communication. In the beginning of this article, the advantages of multimodal interfaces are enumerated and some of them involve flexibility of input, accomodation of broader range of users, physical overuse prevention, adaptability, satisfaction of user preferences, improved efficiency and error handling. Multimodal Interfaces owe their vast improvement in the advances in cognitive science and the use of high-fidelity simulations for testing. Since multimodal interfaces have to do with human's cognitive ability, a number of foundations need to be set. The most important among them is the way users integrate modals during use (simultaneous and sequential integrators). Research has shown that the phenomenon of multimodal hypertiming appears, which refers to the improvement in leveraging multiple channels of interaction. Turning to designing multimodal interfaces, they differ in the number of the streams affecting the event recognition process, in the way that actions are interpreted and whether they are built as part of an end application. Finally, different types of multimodal system's architectures are presented. Mainly, two different architectures dominate the design spacet, systems that either integrate signals in a feature level or in a semantic level. ///////////// --------------------------END OF FIRST CRITIQUE ----------------------------------/////// Designing SpeechActs: Issues in Speech User Interfaces : In this paper, SpeechActs is presented, which is a system for managing calendar and mail through voice commands. Also, several issues for designing this Speech User Interfaces are presented thoroughly along with the authors' design choices. SpeechActs was evaluated by a number of users and it produced a number of interesting results. During its design process, several challenges had to be overcome like simulating conversation. Most of the authors' contributions involve how to turn casual dialogue to meaningful commands for a system. In addition, transforming a graphical UI to a speech UI appeared to be a challenge, since previous assumptions did not apply (i.e. visual information, command interpretation, reliable answers etc). Recognition errors posed many difficulties in designing the whole system, and producing natural speech from a machine is crucial for users, since it is important to leave them a good usage experience.

Qihang Chen 23:30:14 9/29/2014

The paper 'Designing SpeechActs: Issues in Speech User Interface' examined a set of challenging issues facing speech interface designers and describes approaches to address some of these challenges. In the paper, the authors first introduced the functionality of the SpeechActs system which includes mail, calendar, weather and stock quotes. Then, the paper explained the methodology, including usability testing and iterative redesign. And what followed was the main part, design challenges and corresponding approaches: (1) simulating conversation includes response, prosody and pacing which can be respectively solved by explicit cue phrase and barge-in technique, etc.; (2) transforming GUIs into SUIs. GUI conventions would not transfer successfully to a speech-only environment. And the solution lies on vocabulary, information organization and information flow; (3) recognition errors which can be further divided into rejection, substitution and insertion. For rejection, progressive assistance can be used to solve; for substitution, implicitly and explicitly verification are adopted based on the danger of the command; (4) the nature of the speech. Problems are lack of visual feedback, speed and persistence and ambiguous silence. Corresponding solutions for each were discussed in detail in the paper. The most valuable result provided by the paper is that speech-only interfaces should be designed from scratch rather than directly translated from their graphical counterparts. And the methodology usability testing and iterative design is also creative and directive. Additional contributions of the paper lie on the identification of design challenges and the providence of main approaches. Though the applications tested by users are typical and comprehensive to large degree, I do think it is much better to cover the map application like Google map which is used much more frequent than stock quote. ---------------------------------------------------------------------------------------------- “Multimodal Interfaces” by Oviatt is a survey of multimodal interfaces. Multimodal interfaces are the combination of one or more input modes being used in a coordinated manner. A good application of this is in spatial tasks like working with map drawing programs. For instance, a user might speak what they want done to the two objects they circled on the screen, like draw a line between them. One problem with multimodal systems is that both inputs need to be synchronized or a threshold needs to be dynamically defined based on adaptive temporal threshold techniques, because the user may be a simultaneous integrator or a sequential integrator. Multimodal interfaces can help leverage the advantages of an input method and hide it’s disadvantages. By having two input modes the user will use the methods that they feel is less error prone, thus avoiding errors and frustration. Also, it is believe that human’s have a short term memory for different modes, such as audio and visual, so my using diverse input modes a system can maximize a user’s short term memory, thus allowing for more complex tasks to be accomplished faster. This reading makes a strong case for multimodal interfaces, but also warns that such systems can be difficult to design to take full advantage of the input methods combine power. It also explains that multimodal interfaces rely heavily on cognitive science. Many design principles being used come from linguistics and cognitive psychology, so multidisciplinary cooperation is needed of multimodal interfaces to be successful in the future.

Wenchen Wang 23:57:28 9/29/2014

<Designing SpeechActs: Issues in Speech User Interfaces> Summary: This paper proposes some principles and challenges of conversational interface design from user studies of SpeechActs project. SpeechActs project is a research prototype for creating speech applications. Paper Review: In analyzing the data from user studies of SpeechActs, they have identified four substantial user interface design challenges for speech-only applications. First is to simulate the role of speaker/listener in terms of prosody and pacing. Second is that GUI conventions would not transfer successfully to a speech-only environment. In vocabulary aspect, users do not get used to applying the words from GUI in work-related conversation. In information organization and information flow aspect, in to the SUI environment, all information is through conversation instead of text. Information cannot be pop up by text and people may get confused if no proper feedback is given. The third challenge is that recognition errors may happen when identify people’s sentence. Recognition errors can be classified as rejection, substitution and insertion errors. For example, when I use Siri and speak before the system is ready, it cannot understand my speech and shows I don’t get it. I think it is a rejection error. Forth challenge is the limitations of nature of speech, which are lack of visual feedback and the speech speed and persistence. For example, people may forget what they are saying just a moment ago. <Multimodal Interface> Summary: This paper introduces multimodal interfaces, including the types of multimodel, history, the advantages and the difference between GUI. Paper Review: Multimodal system is a system process two or more combined user input modes in a coordinated manner with multimedia system output. Although I haven’t used multimodal interface yet, I know currently, most multimodal interfaces enhance the traditional graphic user interface (GUI) with speech capabilities. Widespread standards have not yet developed around the design and implementation of multimodal interfaces, but there are a few common approaches in evidence today. One advantage of multiple input modalities is increased usability, which has the potential to accommodate a broader range of users than traditional interfaces. Maybe by using multimodal interface, more than two people could work together in one workstation. Another advantage of multimodal is to provide the adaptability to accommodate the continuously changing conditions of mobile use. I believe that future mobile would have multimodal interface, such as combining Siri and screen touch input. Multimodal interface also has advantage of error handling for user-centered and system-centered reasons. For system-centered reason, for example speech recognizer and pen-recognizer could have mutual disambiguation of the inputs, so that error is avoided.

changsheng liu 0:21:12 9/30/2014

<Designing SpeechActs: Issues in Speech User Interfaces> introduced SpeechActs, which is an experimental conversational speech system. The paper did case studies to prove that we should design SUI from zero, which means we cannot translate ideas and principles from GUI directly. The paper first had a short explanation of functionality of SpeechActs system, and then introduced the usability testing and iterative redesign. It summarized some problems of SpeechActs. For example, each of the users of SpeechActs bemoaned the slow pace of the interaction. This is because the system provided too much feedback. Techniques that worked well in the graphical interface turned out to be confusing and disorienting in the speech interface. The crucial part of the paper is that it listed four main challenges for SUI. First, it’s very hard to simulate conversation. We need to avoid explicitly prompting the user for input. Second, GUI conventions would not transfer successfully to a speech-only environment. The third challenge is that recognition errors. Speech recognition and text analysis are difficult issues. The final challenge is the nature of speech, SUI suffers from lack of visual feedback, consistency. The paper <Multimodal interfaces> introduced the main types of multimodal interfaces and their advantages, cognitive science, features. The paper answer the question that why multimodal interfaces are an improvement over unimodal interfaces. The first reason is that it reduces the ambiguity in user input and error of the system. The second reason is that it reduces the cognitive load on the user end. Multimodal system can spread the task across different input steams and thus reduce what they need to know to finish the task. Like the paper mentioned, if we want the machines to transfer into multimodal realm, work still needs to be done in natural language processing.

Wei Guo 1:04:22 9/30/2014

Reading Critique for Designing SpeechActs Issues in Speech User Interfaces This paper introduces the SpeechActs System, and then talks about the speech user interface user study, iterative design, and finally, refers to the design challenges and strategies. The SpeechActs system is an experimental conversational speech system. The result of the formative evaluation study design turns out to be “slow pace of the interaction”. To better prove the speech user interface, there are few things we can think about: build a more like human-human dialog, make the dialog brief and informative, give a brief and correct feedback to user, and do not just copy graphical interface. Since people are becoming busier and busier, a SUI is more and more important. We can try to imagine that a person is lying on the bed, and telling the phone how to create a ppt slides for his meeting tomorrow. And one hour later, he has a completed slides to show. In the meantime, this phone helps him order his dinner by obeying his audio command. After finishing the slides, the food are right in front of his door. In nowadays, the most popular SUI must be Siri—the intelligence personal assistant for Apple IOS. This app uses natural language user interface to forming dialog with user. However, Siri is far not enough to achieve the human needs. I am really interested in this area, and hoping to have some opportunity to do some project about this. Reading Critique for Multimodal Interfaces Multimodal systems allow users to interact with computers using multiple different modes or channels of communication. Such as speech, pen, touch, and manual gestures… The multimodal systems are widely used today. From the ATM machine, which includes touching screen and clicking button modals, to phones, which includes speech, pressing button, and touching screen. Also some others like Xbox, which includes gestures, button… The multimodal systems better fit human’s increasing needs. What’s the future directions of the multimodal system? The author of this paper points out that the human senses such as haptic, smell, and taste can also be included. I don’t really agree that smell can be implemented in multimodal system. The multimodal object is human. As we all know, human needs to interact with the machine system to get an output. Let’s suppose the case that we really want to implement smell in the multimodal system. Since we human cannot release different smells autonomously, system must be the one that release smell. To release the smell, this system must include a source for the smell. This system cannot be very small to hold the source, and also the system should be updated frequently since the smell source can be used out. It seems not to be very efficient compared to the touching and speech and gesture. This is just my opinion.

Longhao Li 1:50:37 9/30/2014

Critique for Designing SpeechActs This paper mainly talked about a development of a speech based user interface. The author includes the design, experiment and also refine process in the paper. I think this paper is important for the exploring of new style user interface on computer. Not just about the speech based interface they developed, but about the problem of speech based interface they point out. Speech based interface seems be natural for user to use. But in practice, there are a lot of problem when designing the interface. Some of them are technical, and some are due to the speech this natural way, like it is hard to keep the pace of conversation with user naturally, since the analysis of speech need times and also machine don’t know how to keep the conversation in the rhythm of the natural conversation between human beings. Also it is hard to make a conversation fast enough as mouse keyboard interface. It may due to that user lack of visual feedback. They have to finish listening to get all the information. Once the information is completed, user may get confused so that the speed will be slow. Even though that this interface is not fast, it is still very useful when people are hard to see screen. For example, when people are driving, it will be the best way to help driver to do the operation. Let’s imagine a situation that user is driving and they got a message sent from an important person, and they have to read it now and reply it. Speech interface will be the best choice. It doesn’t need a very fast speed of operation. But it needs to be safe to do that. The device can read out the message for the user and the user can speak to the device about what need to be reply. Computer will recognize the message and send it. Very nice interface for this situation. Critique for Multimodal Interfaces This paper talked about multimodal interfaces, which means the interface that have multiple user inputs working together. The author use the history of development and examples clearly explained the study of multimodal interface. This paper is important in my opinion. In nowadays, due to the development of sensor technology and computer technology, it is possible for some new ways show up for computer operation. We have already stick in computer mouse interface for a very long time. Our operating speed of computer didn’t become faster for a long time. It is time to improve our operating speed. This paper gave us an idea to do that. Combine different user inputs that working together in natural way seems to be a good solution to use to make the operation of computer be more easier and faster. The speech and point-based multimodal interface looks in natural way for users to express their mind. The invention of this kind of interface may lead to some improvement on experience of computer operations. Thus, I think this paper is important for the development of user interface. Also the paper point out some problems of the multimodal interface. One of the problems is that people always make mistakes while they are operating computers. People may not intent to make the mistakes, but it just go in natural way, such as people may hard to draw a perfect straight line by using mouse. Tolerant of this kind of error is necessary for the multimodal interfaces. Learning users’ habit is a good try to know what kind of mistakes that users may make when doing some operation. Then, when people are doing the operation, self-correction of the mistakes will speed up users’ action on computer.

SenhuaChang 2:43:08 9/30/2014

Designing SpeechActs: Issues in Speech User Interfaces In this article, the author introduce what SpeechActs is, SpeechActs is an experimental conversational speech system. Experience with redesigning the system based on user feedback indicates the importance of adhering to conversational conventions when designing speech interfaces, particularly in the face of speech recognition errors. This article examines some challenging issues when face the speech interface designers and describes approaches to address some of the challenges, in section of the SpeechActs system, the examples illustrated there are quite interesting and representative. This article give speech interface a very good example, lots of stuff need to be done to improve the performance, Siri in iPhone is the one of popular and powerful speech user interfaces. However, there are still lots of issues to be overcome with this research, therefore, more studies are needed to improve the performance and satisfy users. Multimodal Interfaces: This article gives us a very overall view including detailed introduction to recuently popular multimodal interface. There are lots of aspect of the multimodal interface mentioned in the paper, such as the advantages, underlying techniques and the way there are different from traditional graphical user interfaces, etc. Frankly speaking, I am not familiar with the multimodal interface, which is not very popular in daily life. Most of the time we still interact with machines in uni-modal way. The multimodal interface is very helpful for those people who are not familiar with operating systems. Like the children or disable person. By using multimodal interface, they can do what they might not be capable to do in the past. I am now taking a multimedia software engineering course, which is based on SIS, and lots of sensor. If the SIS is mature, I think it will help millions of people to communicate with each other in a new way.

Bhavin Modi 3:07:46 9/30/2014

Reading Critique on Designing SpeechActs: Issues in Speech user Interfaces SpeechActs is a SUI(Speech User Interface) created to interact via conversational language to perform tasks such as reading emails, maintaining calendar entries, stock inquiries and weather forecasts. The problems and the solutions faced by the system is discussed. The major point discussed is that one cannot convert a GUI to an SUI, these interfaces are inherently different and attempts for directly translation lead to poor interfaces. The authors have created a speech interface to work instead of the traditional telephone key press system. The motive behind this is to reduce load on the user’s short-term memory and making the interaction quicker and more natural. All the major problems with the system have been described, some of even exist today. The accent recognition plays a huge part, the user studies are still incomplete without consideration of people from different ethnicities. Each problem correlates to the other, like the accent one results in frustration, due to the inability of the system to recognize input, and because of repeated issuance of tasks. The challenges are simulating conversation, transforming GUI to SUI, recognition error, and nature of speech. But such applications today like google know and apple Siri, are very popular. The problem is also in discoverability of operations for a new user. The advantage is using a ubiquitous system like the telephone for access, conversation is more natural for humans. It frees up the hands are eyes also, leaving us open to multitask and not lose focus while driving for example. The unpredictability of both humans and the system can lead to chaos, most probably due to mismatch of pace and using improper grammar. Improvements can be achieved with continued research to create better speech synthesizers and a recognition mechanism, to deal with speech ambiguities. Keyword identification can be possible solution to estimate what the user wants, using intelligent agents to learn user traits and idiosyncrasies will make the system more robust. The importance of feedback is reiterated, both visual and auditory, mostly auditory in this case. It logically follows the mixed-initiative interface design paradigm. We have discussed input devices and their design spaces previously, mostly physical interaction with the system. We now move into a new area of Speech recognition and natural language processing to interact with the system. This is a new design paradigm and a brand new direction. Paper opens up more avenues of research in other areas, maybe using natural senses, like tracking eye movement. --------------------------------------------------------------------------------------------------------- Reading Critique on Multimodal Interfaces Multimodal Interfaces are ones with two or more user input modes speech, gestures etc. The paper describes Multimodal Interfaces, definition, advantages, the researches done, the future scope. The points that worth noting in the paper are that we should not think of only unimodal interfaces, and expand the collaboration between multiple input modes. Multi modal systems are more robust in that sense, even if one fails you still have the other. Their difference from typical GUI is multiple simultaneous input stream control and probabilistic approaches to combine input so as to ascertain user intentions. The input is not evaluated independently, but rather it is ascertained via a cause an effect simulation, where the lip movements affect the speech recognition and help reduce errors. One important design consideration is that having multimodal interfaces does not necessarily make the user multimodal, he may still prefer to give the input as two or more separate unimodal interfaces. These systems are many advantages, like such interfaces have huge possibilities for helping the disabled. Secondly, we advancement in technology and integration of multiple inputs will eventually make the system as natural to use as our daily objects for all types of people. Thirdly, the human cognitive process and the working of the short-term memory is also explained and how it better utilizes out memory through spatial reference and different memory allocations for the different senses. Work in still being done in this field and many research opportunity present itself in terms of different methods to give input to the system like head movements, eyes tracking, breathing even. The applications for robotics and the use of AI and machine learning is vast. This takes us back to our initials readings on Fitts’ Law where experiments were conducted to get input via a head mouse gear. The paper is its entirety was poorly written in my opinion. Multiple and repetitive inline references were a real hindrance and kept breaking the flow of reading. The content too was redundant, I believe Bolt’s experiment has been mentioned twenty times with the same two lines to define it. No new research was discussed and mostly the paper was a summary of existing works and experiments. The paper does provide a guide to multimodal interfaces, a good introduction to get acquainted with the area. Both papers today concentrate a lot of speech based interfaces, moving from the traditional physical interaction based interface.

Yubo Feng 3:45:27 9/30/2014

There are two paper in this reading, and I think the first paper is more interesting so I read it carefully and make summary as follows: In the first paper “Designing SpeechActs Issues in Speech User Interfaces”, author talked about the issues and approaches to address these problems in designing speech based human-computer interfaces. Three challenges are indicated in the paper: simulating conversation, lack of prosodics and pacing; transforming GUIs to SUIs, vocabulary, information organization information flow; recognition errors detect and checking; nature of speech caused visual feedback lack, consume delay, persistence lack and ambiguous silence make the recognition harder than we thought. In order to address these problems, author supposed three ways to solve the problem: Firstly, adhering to the principles of conversation does, make for a more usable speech-only interface. Just as in human-human dialog, grounding the conversation, avoiding repetition, and handling interruptions are all factors that lead to successful communication. Secondly, due to the nature of speech itself, the computer’s portion of the dialog must be both as brief and as informative as possible. Finally, as with all other interface design efforts, immediate and informative feedback is essential.

Yingjie 6:46:35 9/30/2014

The paper “Designing SpeechActs Issue in Speech User Interface” is a easy-read paper. It made deep research in the speech-only interface by analyzing the statics get on the SpeechActs prototype. The design of this prototype includes gather the user investigations and iterative redesign by several user studies. The study reviews a lot challenges as well as some basic rules in Designing a Speech-only interface. Like we can not just totally translate the graphical user interface to a speech user interface because some of the principles in GUI is not applicable in SUI. We are not going to speak some of the words which frequently appear in the GUI. I learned a lot from this paper, it made a lot of research on the user studies, and it gives me a hint on developing systems. Like the iterative feedback and redesign of the system. I think that the Speech-only user interface has a great prospect because our hands are occupied by some stuff while our mouth is free. Just like I have to type words while I want to read a message just received from my mom. The speech user interface will help a lot. When we are driving, it will be great if the SUI can help us to read messages or send messages. This will greatly enhance the quality of our life. It is a hot topic that people are watching on their phones for too much time, even when they are walking. It is very dangerous if we look at our phones while we walk on the roads. However, with the SUI, this problem may be solved by transferring the task from hand and eye to mouth and ear. The main difficulty to build the system I think is the technique on natural language processing is not mature enough, if we can recognize the intonation, then the commands and the input will not be a problem.———————————————————— “Multimodal Interface” is a article which analyze the history and the future of the multimodal interfaces. The main stream design of multimodal interface design is adding speech and gesture or adding speech and keyboard input in the past. And in the future the multimodal interfaces will be more innovative, well integrated and more robust. The goal of multimodal interfaces is to make it usable for different kinds of people no matter what kind of background he has. That requires the designers to take cultural difference into account. The typical information processing flow of a speech and gesture interface is that both speech and gesture controls the context concurrently, and the gesture is interpreted by the gesture understanding system while the speech is recognized by the natural language processing system. And the input information will all integrated by a multimodal integration processor. Multimodal interface is not new new for us, many large scale games take the advantages of multimodal interface. As far as I am concerned, the education area is the best part to implement the multimodal interfaces. Like the education of Chines, we can build a game which require the player to pronounce the intonation of a character while demo some gestures on it., which is of great helpful in learning Chinese.

Xiyao Yin 7:29:43 9/30/2014

‘Designing SpeechActs: Issues in Speech User Interfaces ’ shows different experiments and results in speech-only interfaces. Benefits in speech-only interfaces are that conversational speech offers an attractive alternative to keypad input for telephone-based interaction and it will leaves user’s hands and eyes free because of requiring minimal physical effort. What’s more,the number of commands is virtually unlimited. However, there are still some overcoming substantial obstacles in speech-only interfaces. Repair errors can be tiring and transferring design principles is also hard. In fact, I have the experience of using some speech-only interfaces. It cannot easily get the right command through speech and for many times it just causes error-prone speech recognizer and becomes a waste of time. The SpeechActs System includes speech-only interfaces to a number of applications including electronic mail, calendar, weather and stock quotes. After analyzing data from user studies, authors have identified four substantial user interface design challenges for speech-only applications including Simulating Conversation, Transforming GUIs into SUIs, Recognition errors and The Nature of Speech. The most interesting part for me is in recognition errors. It is a good method to allow the application developer to convert phrases meaning the same thing into a canonical form and some substitution errors will still result in the correct action. As a result, it shows that adhering to the principles of conversation make for a more usable speech-only interface and we should focus on conversation to get successful communication. ‘Multimodel Interfaces ’ describes research history and results in multimodel systems which contains two or more combined user input models. This paper is great in its organization because it uses different questions as the title of different parts. Instead of using some confusing words, questions seem easier for us to understand the main idea in each part. Multimodel systems have developed rapidly during the past decade, with steady progress toward building more general and robust systems. Unlike a traditional board-and-mouse interface, multimodel interfaces permit flexible use of input models and this will bring many benefits to us. In the recent years, many systems have been well developed beyond the prototype stage. However, they still need to be improved to become more powerful and get general methods of natural language and dialogue processing and this will become our future direction. In this paper, I find the figure of multimodel architecture is very useful because it describes difference and indicates us other types of combination between inputs. It also effectively shows how systems work in relate with those inputs. In conclusion, Multimodel interfaces can be playful and self-reflective interfaces that suggest new forms of human identity as we interact face to face with animated personas representing our own kind.

Jose Michael Joseph 7:31:36 9/30/2014

MultiModal Interfaces This paper talks about the various MultiModal technologies and their goals in general. A MultiModal technology is one that uses two or more modes of input combined such as text and speech etc. One of the key points that struck me is that this paper clearly indicates the reason why some input devices such as the pen and speech recognition were commercialized early on whereas gesture recognition and complex speech recognition took much longer. The paper says that this is because of the relative ease of using x/y co-ordinates over interpreting and segmenting manual movements. MultiModal input systems are essential as they enable the users to convey diverse information through different input mediums. It also helps the users to adapt to changing environments and use the input method that is most preferable to the situation the user is in. It also enables the users to be more efficient and quick in their work as they now have the ability to select the input device that is most preferable to their task. This paper also points out that in natural communication although we use more than one mode of communication (such as speech and gestures) it does not imply that they are simultaneous. One could precede the other by a varying time limit and it is very critical to check for this while developing multimodal systems. And in conclusion the paper states that since each multimodal technique has its own drawbacks the best approach would be one that uses a hybrid symbolic or statistical model.

Jose Michael Joseph 7:32:42 9/30/2014

Designing SpeechActs: Issues in speech user interfaces This paper talks about a technology that uses conversational input and output by means of a speech recognizer. It is the early research on current technologies such as Siri and Google Now. The primary difference between this early research work and their modern day counterpart is that the earlier one did not have any visual feedback. One of the primary drawbacks I noticed in this research paper was that to try and simulate a conversation the researchers decided to not offer conversational prompts in some of their features. Thus users were sometimes clueless as to how to proceed as they had no fixed set of “rules” as to how to behave in a particular situation. Thus this poses a problem of discoverability that can be only overcome with a tutorial or experience. The second pitfall for this paper was that it assigned numbers to the various mails which confused users as to how to proceed. It did not factor in to its calculations that users are already acclimated to a way of viewing messages, which is unread messages first, and thus were taken aback by a new way of representing these message. Although this problem was corrected later it shows the emphasis that we must place on user’s familiarity with current systems. Thirdly, the system did not provide the users a solid conceptual model because of its non-deterministic speech recognition model. Since users observed that the same input did not always produce the same output it created some confusion amongst the users about what they should be doing to get the system to work. This is a serious flaw as eventually most users will get tired and give up. Last but not the least, the primary drawback of this system was the lack of visual feedback. Humans are used to receiving a lot of visual feedback even while having a regular conversation. Thus a lack of it can throw the users off guard with them being confused on how to proceed. We can see clearly that this is something that the modern counterparts of this system have realized. Siri has a rich visual feedback system which also prompts the user when their voice is being recorded and processed. Although this paper has a few flaws I believe it was path breaking research and surely is one of the forerunners of the current technology that we have today.

Christopher Thomas 7:48:11 9/30/2014

2-3 Sentence Summary of Designing SpeechActs: The authors presented a spoken-dialogue system called “speechacts” which allows users to interact with via voice commands, similar to many automated spoken dialogue systems today. The system allowed users to read mail, weather, etc. The authors walk the reader through the process of transforming an existing GUI system into a spoken-dialogue system warning of common pitfalls that often occur when designing a spoken-dialogue Everyone is familiar with spoken dialogue systems today. Banks, car insurance companies, everyone is using them and how frustrating they are when they don’t understand us. I think one of the huge takeaway messages I got from this paper was that it isn’t all about recognition accuracy. One of the surprising findings and contributions in this paper was that the authors showed that recognition accuracy was not a good indicator of satisfaction with the system. This goes against what most people would expect – that the worse the recognition accuracy is the worse the satisfaction would be. The important takeaway message here was that the users brought lots of preconceived notions and past experiences with user spoken dialogue systems in with them when they interacted with the system, which colored their overall experience with the system. This is critical as we think about user interface design. We must always remember that users have lots of past experiences with other systems – they may be accustomed to other UIs and the way they do things, even though our UI may be more efficient, users may not rate it highly simply because of their preconceived notions about how a UI should work. Something else that we might want to take away from this paper is that users were more comfortable with the system when it didn’t use language that was associated with the GUI interface – they liked it to be more natural and flowing. The challenge is how we can take this principle of a natural, flowing interface which allows users to reach their destination as quickly as possible, and with the minimum number of interactions. GUI interfaces are typically very methodological and typically run the interaction with the user. One of the principles in this paper that I see we could apply to GUI interfaces is the idea of allowing the user to have some initiative (instead of the GUI prompting the user along the steps, maybe let the user supply some of his information at the same time), which decreases the number of interactions. Obviously this paper has a lot of connections with the last paper we read which was about mixed-initiative agents, where the agent takes some control depending on the circumstances of the situation. We can see the same thing here – the system tries to give the user the freedom, but then when the system fails to understand, the system then is forced to “box” the user in to a more specific and narrow types of questions, in the hopes of improving recognition accuracy. Another important takeaway message from this piece is the concept of translating one type of user interface into another – the idea is that what works on one platform may not be best for another platform. Thus, while prompting the user for the information in a methodical way in a GUI makes sense and does well, doing the same thing in a speech user interface results in lowered satisfaction and makes the system difficult to use. Thus, interface designers must always remember that what works in one domain may not be appropriate for another.   2-3 Sentence Summary of MultiModal Interfaces: The author of this piece provides a comprehensive survey of multimodal interfaces. The author offers a cognitive science perspective on why multimodal interfaces are more natural and provides a state-of-the-art overview and ideas for future research. One of the most interesting things I found in this paper was the idea of feature-level and semantic-level fusion techniques, which is the idea that semantic information from various interface modalities (such as speech, pen, pointing, etc.) can be combined to provide a richer experience than it could offer on its own. For instance, systems with cameras can use lip tracking technology to track the user’s lips and then combine that information to produce better speech recognition. One thing I found interesting in this paper was that users tended to prefer different input modalities for different tasks. For instance, some users prefer to use pen-input over speech for digits and graphic content. One interesting research direction I see from that is exploring how interfaces can be redesigned for “the path of least resistance,” meaning allowing users to enter input of different types as comfortably as possible. For instance, if I am entering a long number into a tablet, a good interface could give be the option to write in with the pen as well as type it. Another interesting result was the notion that users using different modalities tend to perform better as task complexity increases compared to those users who used the same interface throughout. In my opinion, this lines up with our previous paper on the gulf of evaluation and gulf of execution, even though it is not explicitly stated in this paper. For instance, for some tasks the gulf of execution could be easier via a certain interface. An example of this would be, “Siri, set a reminder for tomorrow for 3:00 to have lunch with Jim.” The alternative is to go manually into the system and manually do enter the reminder and save it. Most would agree that the first has a lower gulf of execution than the latter. So, thinking in this way, we can even see that different types of user interfaces are more appropriate for different tasks. Something interesting in this regard is that this distinction could change based on context. For example, users driving may actually normally prefer to type into the phone, but may be unable to look at the phone. Still, the user may be able to communicate with the phone through gestures or through a speech interface. In this situation, even though it isn’t the user’s preference, the user’s context needed to be taken into account. I see this part of the paper connecting very much with the paper we read about Toolkits, which argued that in the future user interfaces will need to be dynamic and changing based on context and device. Here, I can see that same concept re-emerging. One of the things notably absent from the paper were the disadvantages of multi-modal interfaces. I noticed that the author talked a lot about the various advantages of multi-modal interfaces, but didn’t provide any arguments against them (eg. Cost, complexity, privacy issues, etc.). The author made a good attempt to provide sound statistical arguments, but I think the paper would have been stronger if both sides of the coin were considered.

nro5 (Nathan Ong) 8:20:06 9/30/2014

Review of "Designing SpeechActs: Issues in Speech User Interfaces" by Nicole Yankelovich, Gina-Anne Levow, and Matt Marx While HCI tends to be visually-oriented, since vision is our most heavily-relied on sense, this paper instead presents research in the area of Speech User Interface (SUI) design. Aside from the SpeechActs system that they developed as an experimental design using the state-of-the-art at the time, they also presented four different design challenges for future SUIs that need to be addressed to create an effective speech-based interface for users. Speech is an interesting sub-field of HCI, but generally they are paired with GUIs of some form in order to provide visual feedback. SpeechActs is an interesting approach in that it works with different applications of a system (electronic-mail, weather, and calendar) by attempting to be a personal computer assistant to the user by accepting spoken commands. It seems very similar to mobile device assistants like Siri or Cortana, but even these SUIs provide visual feedback on when to speak. SpeechActs, while also containing buttons to prompt the computer to listen, does not have a visual cue to show users that the system is listening. Instead, the system relies on spoken sentences or tones to act as cues for users to speak. Relying only on sound presents many challenges for a person, especially since real personal assistants usually provide visual feedback through facial features, body language, and other subtle cues, while a computer system that relies primarily on sound will not. While designing the system, the authors lay out four design challenges for future SUIs. Three of them deal directly with the user and attempting to bring a computer speech system to a human level. It seems that in the future, the direction of SUIs will deal with humanizing the computer element. This also indicates that humans are too used to human-human interaction when it comes to speaking and listening, and many societal cues and conventions for conversations are not yet adequately incorporated into SUIs. People seem to have less patience with speaking-listening systems because conversations have already been established in society, whereas interacting with a keyboard and mouse for GUIs is relatively new and unfamiliar, meaning interaction with computers through a keyboard and mouse is more open-minded and optimistic. It seems SUIs suffer from a natural precedent and will forever "play catch-up" to natural conversations. Review of "Multimodal Interfaces" by Sharon Oviatt This paper mentions a short history of research systems and research on multimodal (multiple input) interfaces. Aside from the usual touch-inclusive systems (e.g. pen-based input for desktop computers), the author also mentions spoken-based interfaces and other combinations of input, and provides some cognitive science motivation behind this new research direction. As the author mentions, it seems these systems are still in their infancy, and still require a bit of thought when determining which features are most useful. When looking at the previous paper, we see that there are still many issues when it comes to developing a purely speech-based input method for a GUI-based application. It only seems natural to understand that any sort of combination of input methods will complicate matters further. It still remains to be seen which type of multimodal interface will be widely adopted, since currently most systems are riddled with user fluency issues. The author brings up a good point in analyzing the cognitive science behind multimodal interfaces. Often, researchers can derive inspiration for technologies from human capabilities, and this is the case for multimodal interfaces. For example, when leading a conversation, not only do people listen to the sentence being spoken, but they are also able to see how the other person's mouth moves and what kind of facial expression he or she has. Cognitive science believes that in addition to the spoken sentence, all of these features are important for determining the semantics of the sentence. It is only natural to also believe that computers can benefit from having mouth movement or facial expression data as well when considering speech-based input methods. Finally, in order to bring multimodal interfaces into maturity, there is definitely a need for data mining, since all of the sensor input, including video, sound, pen-input, hand gestures, etc., will need to be parsed accurately and quickly. In addition, unintentional gestures need to be filtered out as well, and subtle features may also need to be included in making a more accurate prediction of user intentions. Hopefully multimodal interfaces won't suffer from the moving target problem, since it seems development may be slower than new technologies that are coming out. The biggest threat to current multimodal interfaces, in my opinion, is a direct brain interface, where a user's brain waves can be determined to figure out a user's intentions, rather than relying on possibly faulty and varying gestures or speech.

yeq1 8:43:55 9/30/2014

Yechen Qiao Review for 9/30/2014 Designing SpeechActs: Issues in Speech User Interfaces In this paper, the authors had experiments with a prototype speech based user interface for remote human-computer interaction: SpeechActs. SpeechActs incorporates Mail, Calendar, Weather, and Stock Quotes, and uses synthesizes speech to simulate conversation with a user through a voice communication medium. (Voice in the paper but I don’t think this is the actual limitation…) At that time, the speech recognition technology is not sufficient for low error detection of user’s voice. The paper rather focused on what interaction techniques could potentially increase user’s satisfaction and reduce their frustrations. A cross-sectional study was performed with N=12. Users generally responded that they think the interaction is too slow, and they saw the need for being able to interrupt the speech synthesis and make selections immediately. They also complained the computer gave too much feedback and waste their time needlessly. Many of their other findings may already be out of date at the time of the review, such as insertion error, and speed of speech. Instead of focusing on listing a bunch of their other findings, I think it might be more important to see how this paper is still relevant today: 1. Speed of Articulation a. Speed of articulation depends somewhat in the communication medium chosen, and the encoding between a user’s thought to the interface’s input language. b. Some of the problems described may be inherent due to the nature of speech. For example, I can type faster (102 WPM with surface keyboard, and 110+ with mechanical keyboard) than I can speak if I want to achieve 100% accuracy. c. However, I think the speech recognition technology still has rooms for improvements. For example, the systems should be able to recognize directives, and directly infer the previous nodes in the information flow. If I were to say “Send an email to Ken about the new stock prices he invested in”, the system should be able to combine multiple actions together and figure out what is the correct sequence of actions, without me having to explicitly tell it each individual step. The author had addressed this somewhat in the secretary example but I think this goes beyond that. Multiple actions sequences and multiple actors may be implicitly referred in a compound sentence, and if the computer were to ever become a true assistant, they should be able to figure this out. d. I do not believe it is necessary for them to implement another set of keypad codes, if the previous point is fully addressed. This just feels like a lousy patch to a broken implementation, not a fix for a design problem. 2. Speed-Accuracy Tradeoff a. I think another point the author had expressed is that too much feedback may slow down the interaction and causes annoyance. This is not just for voice interface, but also others such as security and mobile applications: how many times do I have to see useless feedback “a popup has just been blocked”, “Lookout has just verified … is safe”, and “your location has just been accessed: svchost.exe”, and a beep when something happened to my smartphone in HCI class? Being able to provide feedback, but only when they need it is still a challenge we have not addressed fully today. Interruptability is a measure that is still difficult to capture, even if we have cameras that tracks the context, the facial expressions, and voice. MULTIMODAL INTERFACES In this text, the author gave an overview of the multimodal interfaces: what they are, what kinds are there, why do we build them, how do we build them, and some of the future directions. Multimodal interfaces are those that uses multiple sources of input to allow simpler, faster, more accurate, and more expressive interactions. The paper argued that multimodal interaction also suits the users because they naturally interact in this way. I think this can potentially contribute greatly in new interaction paradigms such as AR, where the users can interact with the environment and virtual world with more expressive ways. It also can potentially allows more accurate speech recognitions in new vehicle consoles, and allows the navigation systems to determine what the user spoke through both audio and lip movements.

Mengsi Lou 8:46:15 9/30/2014

Reading critique 2014/9/30 This paper is a introduction of the system Speech User Interfaces, and the point is on the user study part. The speech user interface consist of three parts in speech recognition and synthesis with telephony, natural language processing capabilities, and other tools for creating speech applications. I would like to focus on the user study in this paper. First, researcher should get the Formative Evaluation Study Design. After pattern is designed, researchers should design the tasks. Then you get the results and can make summary of them. Here are some challenges in users study in this research. The first one is to Simulate Conversation. And the researchers solve the problem by establishing and maintain what Clark calls a common ground or shared context. The second one is Transforming GUIs into SUIs. Searchers solve the problem from the view of Vocabulary and information organization and information flow. And the third one is Recognition errors. For this problem, errors divided into three categories: rejection, substitution, and insertion. Also, the Nature of Speech is another challenge. The main sub-problems are Lack of Visual Feedback and Speed and Persistence. //////////////////// multimodal interfaces This paper tells about the multimodal interfaces. That is the interfaces process two or more combined user input modes in a coordinated manner with multimedia system output. The author introduces the types of multimodal interfaces, like speech and pen input, speech and lip movements, speech and manual gesturing, and gaze tracking and manual input. Then the author introduces the goals and advantages of multimodal interface design. It uses flexible use of input modes. So we need to get a multimodal interface allow diverse user groups to exercise selection and control over how they interact with the computer. Next, the user says the methods and information used in designing multimodal interfaces. That is cognitive science literature and high-fidelity automatic simulations. The author then tells about the cognitive science related to multimodal interfaces, and also the basic difference and architectures and processing techniques used in multimodal interfaces.

Vivek Punjabi 9:00:33 9/30/2014

Designing SpeechActs: Issues in Speech User Interfaces The paper describes the issues that a speech interface designer faces through a speech system called SpeechArts. Speech Arts is a research prototype that integrates third party speech recognition and synthesis with telephony, NLP capabilities and other tools for creating speech applications. The author has provided some concrete examples that uses speech interfaces like mail application, calendar interface, weather application and stock quotes application, Then the author mentions of the user study that was conducted before SpeechArts was written, followed by the design challenges that were faced during its development. The various challenges that author has mentioned are Simulating conversation, recognition errors, nature of speech, etc. Thus, the author concludes that while creating a speech interface, its essential to start from scratch rather than trying to convert an existing GUI into SUI, adhere to principles of conversation and provide immediate and informative feedback. The paper gives some good inspiration on creating some well defined speech systems that will withstand the technological era. The challenges described are very precise along with their possible solutions. There is scope to give a couple of more examples and user studies of system such as SpeechArts which will form a more compelling argument and help researchers to understand the issues and concerns from different perspectives. Multimodal Interfaces: The author gives an introduction about multimodal interfaces and its need in today's life. Then the user describes about the types and history of multimodal interfaces in detail which tells us that these interfaces has grown up pretty fast in the last decade though the research started in late 80's. The author then gives some of its goal and advantages followed by the methods and information that have been used to design some novel interfacss. Later, the author tries to use cognitive science literature to design next generation multimodal interfaces. He uses some basic questions to compare these two fields and get the best of both to use it to design multimodal interfaces. Finally, it gives the architecture and processing techniques that are already being used to design these interfaces and conclude that there is still a lot of scope to modify and improve these designs. The paper provides information on a very interesting and new topic of multimodal interfaces. The approach of the author is well structured making it easier to understand the importance and design issues that are faced. The analogy with the cognitive science and trying the achieve the best of both the worlds is pretty amazing which provides many different approaches to be explored in future. Its a good inspiration for students who like to integrate HCI with cognitive science.