Multimodal Interfaces

From CS2610 Fall 2016
Jump to: navigation, search

Slides

slides

Readings

Additional Readings (Reading critiques not required)

Reading Critiques

Tazin Afrin 16:38:14 9/18/2016

Critique of “Designing SpeechActs: Issues in Speech User Interfaces”: In this paper, the authors, Yankelovich, Levow, and Marx studies a conversational speech system, SpeechActs partly translated from the graphical counterpart and based on user feedback. The authors tried to find the challenges of speech user interface (SUI) designs and comes to conclusion that such systems should be designed from scratch rather than from GUI. The SpeechActs prototype has third party speech recognition and synthesis, natural language processing capabilities and some applications to use speech recognition and conversation. The applications are e-mail hearing and sending, calendar listening, weather listening, and dynamic data feed for stock quotes with a speech interface. The authors run a usability study and found out that techniques that works well in graphical user interfaces (GUIs) does not necessarily works well in speech user interface (SUI) rather it can be confusing and disorienting. One main challenge in speech-only application is simulating conversation because one has to consider two main parts of conversation – speaking and listening. The main challenge in transforming GUI into SUI is vocabulary, because while speaking a user may pick up conversational language which is very different than used in a GUI. Also the recognizer of the speech must be accurate and we never know that for sure. The speed and persistence of the speaker creates another challenge for such speech-only systems. I strongly agree with the conclusion drawn by the authors that while creating SUI, translating from GUI is not effective. Rather we should consider SUI a separate standalone system and should be designed from scratch. ------------------------------------------------------------------------------------- Critique of “Multimodal Interfaces”: Multimodal system is a system that combines two or more simultaneous coordinated input mode and this provides a higher degree of expressiveness in terms of human computer interaction. In this article, the author Sharon Oviatt discusses the importance of multimodal interface design and the existing multimodal interfaces and history of such kind of interfaces. She also emphasized how robust and flexible the future multimodal interfaces should be and how more intelligent adaptation they should support. The history of multimodal interfaces is very rich, starting from the ‘put that there’ system to touchpad or trackpad. The cognitive science interest played a bigger role in designing such systems, for example, speech input-output combined with visual perception, facial movement etc. In contrast to GUI, a multimodal interface is recognition based which needs larger computational and memory requirements. Hence, recognition speech is not only recognizing the sound, but also correct speech recognition involves lip reading and matching. Which needs visual recognition. At the same time, a multimodal system may require to recognize the emotion for which facial expression an important aspect for future multimodal interface design. Incorporating multiple domains of communication between a machine and a human may increase the complexity but the real power of multimodal system according to the author should be the flexibility. I believe a multimodal interface has the future to be as robust as communicating with a human like interface that should resemble the turing machine.

Haoran Zhang 19:34:49 9/18/2016

Designing SpeechActs: Issues in Speech User Interfaces: The authors examines a bunch of problem that when design a speech interface should be facing. Because speech user interfaces are invisible, so all the design principle for graphic user interface are not suitable for speech user interfaces, we have to do it from scratch. The main design challenges are simulating conversation, transforming GUIs into SUIs, Recognition errors, and the nature of speech. For simulating conversation, designers need to make the interaction feel conversational. For example, add a prompt tone, so that to let users to know what they should say and it is them turn to speak. To improve this, we can let users’ voice to interrupt the speech synthesizer, or change the speed of speech synthesizer. In addition, adding keypad short-cuts, and replacing spoken prompts with auditory icons that evoke the meaning of the prompt also helps. For transforming GUIs into SUIs, there are few trend towards interpersonal conversational style. For example, vocabulary, sometime it is hard to convert text into speech or will have ambiguity. In addition to vocabulary, the organization and presentation of information often does not transfer well from GUI to SUI. Just as one way of organizing information can be clear on the screen and confusing when spoken, so it is with information flow. For recognition errors, there are two major solutions, rejection errors, and substitution error. To rejection errors, the system replies “I didn’t understand” to avoid brick wall effect. To Substitution errors, is to verifying every utterance to avoid misunderstanding, for example, “Kuai” and “Good-bye”. If the system interprets user’s speech “Kuai” to “Good-bye” it will terminate the connection with user. Due to lack of visual feedback, speed and persistence, and ambiguous silence, the nature of speech itself will be a problem too. To sum up, there is a long way to go on the path of SUI designing. Since we cannot translate GUI to SUI directly, the design of SUI must be a separate effort. Multimodal Interfaces: The author gives an introduction on multimodal systems, in addition, the author also talk about the different between multimodal interfaces and graphical user interfaces, basic architectures and processing techniques have been used to design multimodal systems, and its future. There are more and more existing multimodal interfaces, such as speech system, and gesture system. Since there are a major different with graphic user interface, so we cannot translate GUIs to multimodal interfaces. For the future of multimodal interface design, since most of the system were designed for research purpose, but actually they are beyond the prototype. This is a beginning area, and it has a long way to go, this is a new art form and a socio-political statement about our collective desire to humanize the technology we create.

Steven Faurie 13:36:55 9/19/2016

Steve Faurie Designing SpeechActs: Issues in Speech User Interfaces: This article was about the development of an application users would access through a telephone. The intended audience was traveling business people who didn’t always have access to a computer. The paper was written in 1995 and must have pre-dated the era of cheap consumer laptops. The entire system is now irrelevant, however some interesting observations regarding the development of a speech user interface were noted and are still valid to this day. One of the most relevant points is that speech systems almost always tend to be slower than other interface types. Listening to prompts is time consuming and replying via speech is also much more imprecise than other input methods. The users noted several other challenges about simulating conversation including the ability of a computer to actually sound somewhat natural and human, and the stilted pacing of conversation users often have with these types of system. While these issues persist in many speech interface systems I have used, like those at the cable company or pharmacy, some systems have improved considerably. Microsoft’s Cortana voice recognition system is much more natural than older systems. The differences in vocabulary used between GUIs and everyday speech was interesting, along with ways to succinctly convey information via speech and receive quick responses. For example pop up dialogues did not transfer well to the speech realm. Many of the issues they encountered seemed to have to do with poor speech recognition and an inability to process natural language rather than discrete predefined commands. Significant strides have been made in these areas in the last 20 years. Perhaps the best point they made and one I wish many speech recognition interface designers would take to heart is to keep the computer’s dialogue short and to the point. Also allowing users to interrupt if they already know how the system works is very important. My favorite, and also one of the simplest voice recognition systems I use, is surprisingly provided by Comcast. You hold down a button on the television remote and say a channel name. It will then turn the television to that channel. One of the reasons it is so effective is that it is much faster than browsing the guide and saves users from having to remember the channel name to number mappings. Multimodal Interfaces: The paper begins by describing several different types of multimodal interfaces including mouse, keyboard and speech, speech and pen, and speech and gestures. It’s interesting to see that many of these multimodal interfaces have been used in successful commercial products. For instance the Kinect is a speech and gesture multimodal input system. Windows 10 allows you to use a mouse, keyboard and speech, although it is much more limited than what is described in the article. I think one of the more successful uses of multimodal communication with a computer is switching the input used depending upon environmental circumstances. For instance I used voice to make a call while driving instead of searching through the contact list on my phone. Another interesting point in the article was how users could distribute the cognitive load of a task mixing interaction types. It makes intuitive sense to anyone who has ever tried to listen to someone speak while typing a letter. However it is much easier to do something like talk about something while pointing at something else. Additionally pointing and circling, using a touch pen as described in the article, can be much more efficient than trying to describe something like spatial information verbally. I was surprised that users did not tend to combine signals simultaneously that often. The paper said approximately 25% of input had co-temporal signals. For example saying something about an object while interacting with it using either a mouse or touch pen. One of the interesting design problems that would need to be considered with multimodal systems is that rather than having one input stream, the system would need to process several streams simultaneously.

Xiaozhong Zhang 20:29:33 9/19/2016

Designing SpeechActs: Issues in Speech User Interfaces The paper discussed some problem facing the speech user interface designers. The research is based on the user test and iterative design process of the application SpeechActs, which was translated from a Sun's GUI application for user word assistance. The paper started with introduction of the system and mentioned that there are four tasks used in the study, namely mail, calendar, weather and stock quotes. Through the user study, four challenges emerged in the app design. First is the simulation problem, i.e. making the interaction feel conversational. The author stated that the prosody and pacing of the conversation should be improved. The second challenge comes from the portability from a GUI application to a SUI counterpart. Since people tend to use different vocabulary and also expect the data to be more in order through SUI, the author stated that the app should change accordingly. The third problem is for recognition. Recognition error can be erroneously inserted, removed or substituted some portion of the user speech input. Among these, missing the word is the slightest w.r.t confusion in how the machine understands the user's intention, while adding or substituting a word could cause larger misunderstanding. Finally, the author concluded that through the user study and analysis, they have found several useful improvement directions for the current speech-based application development. Multimodal Interfaces The paper is an introductory piece of multimodal interfaces. It is also a survey paper of existing multimodal interfaces researches and applications. The paper is divided into three parts. The first part starts with the definition of the interface. Then it gave a detailed history of the field. It then talked about the goal and advantage of multimodal interface, with the main advantage being error tolerance and disambiguation. Following that, the methods and information used in the interface design previously was stated. The second part introduced the cognitive science underpinnings of multimodal interface design. Some of the titles covered in this part are multimodal usage case, feature integration, usage characteristics, difference between different interaction, design guidance, primary features etc. The third part talked about the implementation of the multimodal interface. This part began with the difference between multimodal and common GUI. The main idea is that multimodal interface may need server-side recognizer and time-synchronized data streams. Then it talked about basic architectures and processing techniques used in the interface design. There are mainly two types of architecture for input fusion namely feature-level and semantic level. In the semantic-level feature fusion, some difficulties of input paring and recognition were stated. Finally, the paper stated some future directions for interface design. Personally, I agree with the point that the multimodal interface has a promising future, because with the popularity increase of wearable devices, ubiquitous computing and deep learning methods.

Haoran Zhang 20:48:15 9/19/2016

Designing SpeechActs: Issues in Speech User Interfaces: The authors examines a bunch of problem that when design a speech interface should be facing. Because speech user interfaces are invisible, so all the design principle for graphic user interface are not suitable for speech user interfaces, we have to do it from scratch. The main design challenges are simulating conversation, transforming GUIs into SUIs, Recognition errors, and the nature of speech. For simulating conversation, designers need to make the interaction feel conversational. For example, add a prompt tone, so that to let users to know what they should say and it is them turn to speak. To improve this, we can let users’ voice to interrupt the speech synthesizer, or change the speed of speech synthesizer. In addition, adding keypad short-cuts, and replacing spoken prompts with auditory icons that evoke the meaning of the prompt also helps. For transforming GUIs into SUIs, there are few trend towards interpersonal conversational style. For example, vocabulary, sometime it is hard to convert text into speech or will have ambiguity. In addition to vocabulary, the organization and presentation of information often does not transfer well from GUI to SUI. Just as one way of organizing information can be clear on the screen and confusing when spoken, so it is with information flow. For recognition errors, there are two major solutions, rejection errors, and substitution error. To rejection errors, the system replies “I didn’t understand” to avoid brick wall effect. To Substitution errors, is to verifying every utterance to avoid misunderstanding, for example, “Kuai” and “Good-bye”. If the system interprets user’s speech “Kuai” to “Good-bye” it will terminate the connection with user. Due to lack of visual feedback, speed and persistence, and ambiguous silence, the nature of speech itself will be a problem too. To sum up, there is a long way to go on the path of SUI designing. Since we cannot translate GUI to SUI directly, the design of SUI must be a separate effort. Multimodal Interfaces: The author gives an introduction on multimodal systems, in addition, the author also talk about the different between multimodal interfaces and graphical user interfaces, basic architectures and processing techniques have been used to design multimodal systems, and its future. There are more and more existing multimodal interfaces, such as speech system, and gesture system. Since there are a major different with graphic user interface, so we cannot translate GUIs to multimodal interfaces. For the future of multimodal interface design, since most of the system were designed for research purpose, but actually they are beyond the prototype. This is a beginning area, and it has a long way to go, this is a new art form and a socio-political statement about our collective desire to humanize the technology we create.

Keren Ye 22:56:37 9/19/2016

Designing SpeechActs: Issues in Speech User Interfaces This paper focus on the SpeechActs, which is an experimental conversational speech system. Based on the experience with redesigning the system, the authors find that both adhering to conversational conventions and designing from scratch are very important. Moreover, the authors proposed several challenging issues and describe ways to solve some of them. In the introduction chapter, the authors describe the background. They find that connecting portable computer to the network more complicated than telephone, while touchtone interfaces also suffer several problems. Therefore the paper proposes to use conversational speech interfaces. Although there are substantial obstacles the authors still try to resolve them in SpeechActs. The authors explains the functionality of the SpeechActs system in details in the next chapter. In which they describe and give examples to show how the applications work, including electronic mail, calendar, weather, and stock quotes. Then, the paper states the methodology of designing the SpeechActs. This includes usability testing and iterative redesign. More specifically, they first conduct a survey and a field study to guide the design of prototype. They then conduct a usability study based on the working prototype. Finally the formative evaluation study and redesign are processed iteratively. The next chapters describe the details of the tasks which are designed to help evaluating each of the four SpeechActs application. Then they conclude that participants liked the concept behind SpeechActs and eagerly awaited improvement. Some challenges are proposed in the following chapters. 1) Simulating the conversation is not that easy. 2) It is hard to directly transform GUIs into SUIs. 3) The recognition error rate is hard to optimize. 4) The nature of speech such as short term memory of the system, lack of visual feedback, speed and persistence, ambiguous silence, and so on. Though it is challenge to design a conversational speech system, the authors are still optimistic to the future of it. Multimodal Interfaces Multimodal systems process two or more combined user input modes in a coordinated manner with multimedia system output. Looking back to the history of the multimodal interfaces, there are several main types of the interfaces and the authors explain them in details in the first few pages. Mentioning about advantages and goals of multimodal interfaces design, the authors state that: 1) The interfaces provide flexibility, diversity, and adaptability to the end users. 2) It also provide efficiency gains while using. 3) The superior error handling is another advantage in that the well-designed multimodal architecture can support mutual disambiguation of input signals. 4) Recent research shows that the design minimizes users’ cognitive load. To design the novel multimodal interfaces, two things are very important: 1) the cognitive science literature, and 2) high-fidelity automatic simulations. To further explain the ideas, the authors describe in details and give their reasonings, especially why the simulations are worth to try. In the next chapter, the authors discuss the cognitive science underpinnings of multimodal interface design. They tend to provide a more accurate foundation for guiding the design of next-generation multimodal systems. They answer the questions such as when do users interact multimodally, aht are the integration and synchronization characteristics of users’ multimodal input, and so forth. Comparison to graphical user interfaces are made in the next chapter. The multimodal interfaces 1) assume continuous and simultaneous input from parallel incoming streams, 2) process input modes using recognition-based technologies to resolve ambiguity, 3) have a larger computational and memory requirements which often makes them desirable to distribute the interfaces over networks, 4) require time stamping of input. In the next paragraph, the paper mentions the basic architectures and processing techniques. Generally, there are two main subtypes of multimodal architecture: 1) feature level integration, and 2) semantic level integration. Detailed example are explained later in the paragraph. Finally in the last chapter, the authors shows their optimistic opinion towards multimodal interfaces.

nannan wen 0:34:34 9/20/2016

Read on Designing SpeechActs: Issues in Speech User Interface by Nicole Yankelovich et.al review. The goal of this paper was to expose some of the difficulties that come with designing a speech interface by describing what the authors experienced while testing their own, which is called SpeechActs. They used the abbreviation SUI to refer to Speech User Interfaces, as opposed to GUI/Graphical User Interfaces. The authors’ motivation is their belief that conversational speech is a better alternative to menu-based telephone systems that are tedious for users. By doing experiment Speechuser Interface in SpeechActs prototype with professional travellers and developers, the paper shows out drawback of human-machine speech interaction, and through that points out 4 main challenges in SUI: Simulating Conversation, transforming GUI into SUI, Recognition errors and Nature of Speech. This paper separated SUI from GUI design, which is a big change I believe. -----------------------------------Multimodal Interfaces by Sharon Oviatt, his paper is a survey of past, present, and future research in multimodal interfaces. The first part of the paper introduces the concept of multimodality, and motivates the development of such interfaces. By using more than one type of input, such as a combination of speech, pen, and vision, the input can be more robust. For example, a multimodal interface may take employ both speech recognition and eye tracking to provide a satisfactory user experience. Multimodal systems show great advanced along with cognitive science as the document shows some example of systems multimodally interact with human. The main contributions of this paper belongs to the part explaining basic architectures and processes techniques of these systems.

Alireza Samadian Zakaria 0:46:16 9/20/2016

The first paper is about SpeechArch which is a research prototype that integrates some other third party tools for creating speech applications. They used speech-only interface for electronic mail, calendar, weather and stock quos. In this paper, the authors talk about their experiences during designing this interface. In each step of their design, they have evaluated their research by giving some tasks to some users participating in the study. It seems that male users had a better performance on using this system. However, the authors do not talk about possible reasons, and the fact that male authors had a better performance does not have any contribution in this paper. There are some design challenges for speech-only interfaces; one of the main challenges is the difficulty of simulating a conversation because of four reasons: repetitive and similar sub-dialogs, lack of good intonation, pacing and uninterruptable nature of the designed interface. The authors suggest some solutions to some of these problems such as pacing; they have tried to solve this problem by allowing users to interrupt the speech synthesizer using their voice and allowing users to speed up some of the familiar prompts. The paper also mentions some of the challenges about transforming GUIs into SUIs. One of the challenges is vocabulary because we need another sets of vocabulary for such a system to be appropriate for speaking and conversations. Another difficulty is that it is hard organizing information in these kinds of systems; for example, when a user read the first unread email, should the second one become the first one or it should remain unchanged? Another Challenge that they have experienced during designing this system is Recognition errors. They talk about three categories of errors which are rejection, substitution and insertion. The most possibly dangerous recognition error is substitution since it can destroy some information or do something wrong. The authors proposed some solution to decrease the effect of these errors. At the end they mentioned another challenge which is the fact that it is easy to speak but it is hard to perceive information by hearing. The second paper is about multimodal interfaces. Multimodal interface is a type of interface which supports more than one kind of input and one of its inputs are at least recognition base inputs. These kinds of interfaces have more flexibility and humans can convey their purposes better in this matter. The author first talk about the past generation of multimodal interfaces, the first generation of multimodal interfaces were about talking and pointing. After this generation, there were some newer ones focusing on lip movements and other gestures to improve speech recognition which are called blended multimodal interfaces. The author also provides some advantages for multimodal systems, one of the main advantages is its superior error handling. Multimodality also improves robustness and it is minimize users’ cognitive load. Furthermore, the author provide some information about methods of designing multimodal interfaces. Designing these systems are organized by two things: the cognitive science and high-fidelity automatic simulations which is a kind of simulation that one of the designers respond to user’s actions instead of the system. It will help the designers to alter a planned system’s characteristics in major ways. The author also talks about some cognitive science myths of multimodal interaction which are replaced with contrary empirical evidences. For example one of these myths that I used to think is true was that speech and pointing is the dominant multimodal integration pattern. Furthermore she mentions some differences between multimodal language and formal language and speaking language. People speaks differently in a multimodal environment and a speech-only environment. She also talks about three main differences between multimodal interfaces and GUIs. At the end, two subtypes of multimodal integration are surveyed in details which are feature level integration and semantic level integration.

Zhenjiang Fan 2:27:42 9/20/2016

Designing SpeechActs Issues in Speech User Interfaces::: Even though the SpeechActs system is not very prevalent voice recognition system in the market right now, but I do think it can represent most of voice recognition systems since they all have similar functions and make the same mistakes. But I do have a problem with the way they conduct their tests of their applications, which has just a few participants and uses only one application. And I do think the first challenge stated by the paper is very important one, because the subject you are talking to is a machine basically, so how to simulating conversation in a natural way is pretty trick and complicate. To solve this, the solutions provided by the paper seem rigid, not good solutions for this problem. For the second challenge, transforming GUIs to SUIs, there must be a systematic solution theory behind this problem. Apparently the paper does come up with a few tips on this issue, but I think we could break this issue into several phases and then analyze them one by one, like syntactic phase, semantic phase, etc. How to address recognition errors is not a trivial thing, it demands lots of GUIs representation. The paper does categories different error types, which are well defined. The paper mentions the challenge of the nature of speech, which is very similar to the first challenge the paper provides. The design of SUIs and transformation between SUIs and GUIs are new to many of us and have a bright research value given we are going to use a lot of voice recognition applications in the future. MULTIMODAL INTERFACES::: Multimodal systems are going to take a great share in the market in the near future. As we can see, pretty much every smart device does have multiple sensing input parts. This material provide everything we need to know about multimodal systems and multimodal interfaces. It not only provides all the basic information about the subjectm but also digs deep on how multimodal interfaces work and what we need to do to improve multimodal experience both in technical terms and human cognitive terms. Given the multimodal interface field is not as popular as GUI at the moment, the material lists the differences between them, but I think it should go much further to talk about their differences. To have a architecture or system tool of designing multimodal interfaces could be very complicate, this we can see from the example that the material provides. The material provides a typical information processing flow in a multimodal architecture designed for speech and gesture as an example. As we can imagine, it is going to be a very complicate procedure in multimodal interfaces design for all popular sensor inputs. The material does come up with a theory - Hybrid architectures. Hybrid architectures represent one major new direction for multimodal system development. Multimodal architectures also can be hybrids in the sense of combining Hidden Markov Models and Neural Networks. This proposal could have potential implementation value for the complicate concepts, because HMM and NN are both relatively similar in terms of their information processing methods.

Debarun Das 2:52:31 9/20/2016

“Designing SpeechActs: Issues in Speech User Interfaces” : This paper mainly addresses the challenges faced for designing speech user interfaces with the help of an ‘experimental conversational speech system’ called “SpeechActs”. The speech interface is discussed with respect to four applications, namely mail, calendar, weather and stock quotes application. Also, for the design of this speech user interface, user feedbacks were taken in different stages. A total of 14 users were used for this evaluation. By studying and analyzing the data from these user studies, they identified a series of design challenges. Some of the prominent design challenges include simulating proper conversation by maintaining a common ground of interaction, transformation of GUIs to SUIs, Speech Recognition Errors and the nature of speech. This paper strongly (and correctly) argues that the design of SUIs should be separate from that of the GUIs. Designs decisions for SUIs should be independent of the design decisions of SUIs. This paper was published in 1995 and thus provides a baseline for the areas of SUI design to be exploited in the future. It is one of the important papers that gives background for design of modern SUIs like Siri and Google Now. ……………..................................................................................................... “Multimodal Interfaces”: This is an article that discusses the nuances of multimodal interfaces. As described in this article, ‘multimodal interfaces combine two or more user input modes’. It discusses in details the different types of multimodal interfaces, the design of existing and past such interfaces, its overall advantages over normal GUIs, the basic architecture and processing of messages between such systems and the future of such interfaces. Some of the advantages of such interfaces are that a multimodal interface facilitates error recovery and error avoidance. An example of this is an interface where can user can use both speech and pen based interfaces. The user can use speech in case of quick access and can use pen interface where accuracy is needed. Also, it facilitates user to switch interfaces without causing much error. The above advantages are user centered advantages. A system-centered advantage includes that of “mutual disambiguation” of input signals, thus causing lesser error rates. Also, since it supports parallelism by using multiple input modes at the same time, so tasks are completed faster with less error. This article stands as a good background for research in multimodal interfaces.

Zuha Agha 3:07:27 9/20/2016

1. MultiModal Interfaces This paper highlights the key features of multimodal interfaces, their advantages, design principles and the psychological aspects of user interaction with such interfaces. Multimodal interfaces support different sensory modalities such as touch, sense, speech in interface interaction which allows the user to carry out a task using one or more modalities together. Use of multiple modalities in interfaces provide several benefits to users. This includes increased adaptability for users to switch to their choice of modality based on the type of the task, its difficulty, their comfort level and workload. It also allows increased accessibility by making the devices more accessible to users with some sensory impairment. As a result, multimodal interfaces are being widely adopted. But at the same time, the author points out some interesting psychological myths associated with the design of multimodal interfaces. One of the most interesting myths in my opinion is that a device supporting multimodal interface does not imply that the user is actually interacting with the device mutlimodally for all commands. Some commands may be executed more efficiently using unimode interaction. On another note, in my opinion one reason for such the user sticking to unimodal interaction could be the gulf of execution between the device’s capabilities and the user’s perception of it. Another interesting idea that relates to this myth as well is how it is important for the modalities to be complimentary so that it is natural for the user to combine them together. Moreover, user interaction and coordination patterns is an equally important factor as some user interact sequentially to switch from one mode to another while others interact simultaneously finding it easier to use multiple modalities together. Lastly the paper discusses the characteristics of the architectures that support multimodal interfaces including memory and synchronization requirements. In my opinion, the paper provided a reasonable overview of multimodal interfaces but I thought that the paper had little contributions of its own in analyzing multimodal interfaces and mostly cited prior work on it. ===================================================================================== 2. Designing Speech Acts The paper presents challenges of conversational speech systems by designing a prototype that supports speech synthesis and recognition on a limited set of actions, followed by conducting a user study with that prototype. Results showed that the key challenges involved in designing the system included translating intonation words, appropriate pacing, choice of vocabulary, mapping GUI terminology to SUI, recognition errors and the lack of visual feedback in speech-only interfaces. Overall, the takeaway from the paper is that it is important to keep the principles of human-human dialogue in mind when designing a speech-only interface and design effective mechanisms for verification and feedback. Speech recognition systems and interfaces have advanced a lot since this paper was published as they are now deployed widely in commercial applications and are used on a daily basis such as Siri and Google Voice in smartphones. Though such systems have still not attained the perfect of human-human interaction, they have advanced significantly over the past decade and a half.

Anuradha Kulkarni 7:52:05 9/20/2016

Designing SpeechActs: Issues in Speech User Interfaces: This paper introduces the user interfaces, SpeechActs and presents the challenges and feedbacks obtained from the tested user. SpeechActs is a prototype of a conversational interface system that allows users to navigate through different mobile applications such as weather, email, stock quotes, and schedules. Since all commands are done through speech recognition software, it presents highly different challenges than the GUI. The main Challenges faced by the Speech User Interface(SUI) are Simulating Conversation, transforming GUI into SUI, Recognition errors and Nature of Speech. The other challenges are lack of visual feedback and speed and persistence. The paper explains each challenge with help of examples and even provides guidelines to prevent them or take few things into account like considering the pacing of the user, etc. Obviously, the most crucial bottleneck is speech recognition errors themselves such as background noise, missed timing of speech, and misunderstood words that contribute to improper understanding. The paper proposes some ideas to improve the SUI such as build a more like human-human dialog, make the dialog brief and informative, give a brief and correct feedback to user, separate SUI from GUI design. Speech interfaces are becoming more popular with speech-to-text messaging systems, and personal assistants such as siri, cortana. Recognition software has improved greatly from this time period. Multimodal Interfaces: This paper discusses multimodal interfaces, interfaces that use more than one mode of user input in harmony with one another like a speech and stylus interface. The paper introduces to the existing multimodal interfaces systems. Then it presents the goals and the advantages of multimodal interface design. One of the major benefits of multimodal interfaces is the ability for one input modality to help to recover from errors in another input modality. Like solving speech recognition errors using a stylus for textual input. Also, they allow a single device to be used efficiently in varying environments. The main contribution of this paper is presenting the basic architecture and the techniques incorporated into these systems. This act as a good reference point for modern systems. The drawback of the paper is that it doesn’t quite give future prospective ideas.