Multimodal Interfaces

From CS2610 Fall 2015
Jump to: navigation, search




Additional Readings (Reading critiques not required)

Demostration plan

  • Adriano Maron, Matt Barren
  • Darshan Balakrishna Shetty,Ankita Mohapatra
  • Shijia Liu,Zinan Zhang
  • Vineet Raghu, Mahbaneh Torbati, Long Nguyen.
  • Samanvoy Panati, Sudeepthi Manukonda
  • Manali Shimpi and Ameya Daphalapurkar
  • Chi Zhang, Lei Zhao, Zihao Zhao
  • Xinyue Huang, Mingda Zhang

Reading Critique

Adriano Maron 15:45:21 9/27/2015

Designing SpeechActs: Issues in Speech User Interfaces: This paper presents SpeechActs, an telephone-based speech recognition system for interacting with email, calendar, weather, and stock quotes. The authors present a usability study that provided important feedback for iteratively improve SpeechActs. The study also suggested that speech-only interfaces have particular requirements and should not be based in their graphical counterparts. The main challenges faced by the authors included: how to create a fluid conversation; how to organize and present the information without visual clues; how to handle recognition errors, feedback and verification; and how to deal with the user's limited ability to maintains a mental model of the system's state. Nowadays, the technology has evolved to the point where speech recognition is fairly accurate. The great challenge is how and what information present to the user at any given time. Due to the lack of visual clues, and the user's tendency to get easily distracted, simple and fast tasks are the best candidates for speech-based interaction. Less intrusive visualization technologies, such as Google Glass, can now be coupled with speech-recognition systems in order to overcome the lack of visual clues, allowing the completion of more complex tasks. ==================================================== Multimodal Interfaces: This chapter discusses the research and prototypes for Multimodal Interfaces, in which users control the system though multiple simultaneous input methods. Such level of control provides a higher degree of expressiveness to the commands sent from user to the computer. Most of the research in multimodal interfaces are related to speech and pen recognition, however other options could be studied (gaze interpretation, facial expressions, ...). In multimodal interfaces, the modality choice, i.e., which input source is desired for a given task, is an important design issue. Not all tasks are meant to be performed using multiple sources of input. A good example of a task that benefits from multimodal interfaces is the the digital board. With digital pen and speech recognition support, the user can write something (an equation, perhaps) and say a sentence that provides explanation about what is being written. Such sentence could be automatically added to the board, at the same time as the pen is being used. Also, the sequential use of different inputs could be used when the user asks the system to search on the web for some content. Then, the user can use the pen to select a portion of the text, and voice commands to copy and paste the contents to other region of the board. Given the complexity of such systems, high-fidelity simulation testing provides an easy platform for testing early designs of multimodal interfaces, without the hassle of entirely building such complex system. In this approach, the user interacts with the front-end, while the back-end is controlled by a programmer that emulates the expected behavior of the system. When designing such interfaces, the know-how from building traditional GUIs does not contribute significantly. Multimodal interfaces differ from traditional GUIs in the sense that they expect a simultaneous controlling events, actions can be ambiguous and are context-dependent and multiple sources of inputs must be synchronized in order to maintain their semantic meaning.

Kent W. Nixon 23:00:46 9/28/2015

Designing SpeechActs: Issues in Speech User Interfaces In this paper, some Sun researchers discussed their findings as related to HCI from implementing a speech user interface (SUI) for their internal email, weather, stock, and calendar applications. They described how the system was configured to work via voice transmitted from a mobile phone, and how limitations of technology at the time required them to sometimes rely on key tones. A lot of the features they discuss, such as users being able to interrupt the speech synthesizer, and universal key tone commands, are now common in phone-operated automated interfaces. I found it to be very interesting how a number of concerns raised by users regarding the SUI, namely, that if a dial tone or GUI was possible to use instead, those options were much more efficient, are still one of the main things holding back SUI's today. I also found it interesting how by just changing user feedback so it was different every time the speech recognition failed, users found it to be more organic and intuitive. That's not something you are ever taught in an engineering course! Multimodal Interfaces This paper is a survey of all the known information regarding multimodal interfaces. It covers a number of misconceptions regarding them, and the empirical evidence which debunks them. One of the more interesting facts in this paper was that when it comes to multimodal interfaces, there are two distinct groups of users: group one will use multiple forms of input simultaneously, such as writing and talking, while the other does so sequentially, performing the entirety of the task with one input method before switching to the other. Unsurprisingly, the latter group usually provides more accurate input, as they focus only on one thing at a time. It is also discussed how having multiple input forms allows users to tackle more complex task with a decreased cognitive load, as they are able to discretize sections of the task and map them onto multiple input methods in parallel, allowing the input methods themselves to act as some from of working memory. The paper concludes by discussing the underlying architecture required for multimodal input systems to work well – namely, the input methods need to be explicitly decoupled from the underlying task, in some cases with multiple different systems and OS's handling individual input forms and them coordinating the input later down the line.

Manali Shimpi 23:32:02 9/28/2015

Multimodal Interfaces: Multimodal systems are the systems that are able to process two or more combined user input modes. The development in the multimodal system is due to myriad input and output technologies becoming available. There are many multimodal systems have been developed and are rapidly being developed. Few earlier multimodal systems supported speech input along with keyboard and mouse input. Multimedia map was one of those systems where a user could speak, type or point using a mouse to extract tourist information. Multimodal systems that process speech and continuous 3D manual gesturing are emerging rapidly. Multimodal interface allow flexible use of input modes. They can accommodate a broader range of users with different skills, cognitive styles, ages, native languages, sensory impairments and other temporary illness or permanent handicaps. Multimodal interfaces also provide adaptability to handle continuously changing conditions of mobile use. There are user centered and system centered reasons for multimodal system to be able to avoid errors and recover easily. The example for user centered multimodal system would be a system that simplifies users’ language. The example for system centered multimodal system would be a system that support mutual disambiguation. The design for new multimodal systems is inspired and organized by cognitive science literature on intersensory perception and intermodal coordination during production is beginning to provide a foundation of information for user modelling along with information on what system must recognize and how multimodal architecture should organize. Multimodal systems are constrained by the number and type of input modes they can provide. It is seen that there are many individual differences and cultural differences in the use of multimodal interfaces. Multimodal interfaces are recognizing human , languages and actions. Designing SpeechActs: Issues in Speech User Interfaces: The paper describes the functionality of SpeechActs system. SpeechAct is a research prototype that integrate third party speech recognition and synthesis with telephony, natural language processing capabilities, and other tools for creating speech applications. The author conducted formalative evaluation study design in which 14 users participated. The users were tested in group of three. After the test the interface was changed. After conducting the test on the group of users it was found that each user bemoaned the slow pace of interaction, most of them thought computer gave too much feedback. Current speech technologies pose many design challenges along with that the nature of speech is also problematic. It has two main problems. One is lack of visual feedback and other is speed and persistence. It is not very effective to translate a graphical interface into a speech.

Ameya Daphalapurkar 23:39:16 9/28/2015

The paper titled ‘Multimodal Interfaces’ covers the various aspects related to the interfaces from the reasons behind building them, their types, history and their current status, the advantages, user interaction to the basic ways in which they differ from graphical user interfaces. Oviatt explains multi model interfaces as system processes that combine two different kinds of inputs such as pen speech and output them in a c ordinated manner. Multi modal interfaces have seen a huge development with respect to both the software and hardware components. The author depicts the basic and the most preliminary interfaces with those having the keyboard and mouse interface. Proceeding further they have been moved a bit far from this and have now integrated two input streams parallely. Paper also asserts the advent of upcoming and emerging 3D manual gesturing. The advantage of these user interfaces being the capacity to attract a wide range of users. Moreover providing a feasible adaptability in order to acclimate the changing features is another advantage. Other important aspect is recovery from errors and error avoidance. To design multimodal interfaces the easiest and inexpensive way preferred is high fidelity simulation. They help in trade offs and calculating decisions of alternate design. Studies have also proved that multimodal interface users reply extempore to changes dynamic in nature. Simultaneous and sequential are the two integrators where overlapping occurs in simultaneous and sequential finishes them one by one. There occur linguistic differences. Multimodal interfaces require the time stamping of inputs. Architectures are developed in a way that modalities are presented individually. Thus the cognitive guidance will continue to help give a future direction. *************** The paper titled ‘Designing SpeechActs: Issues in Speech User Interfaces’ talks about the system named SpeechActs. It specifically focuses on the problems that the designers of SpeechActs faced and a way to dodge these challenges. SpeechActs makes use of telephony, NLP and other speech recognition systems. Example is of a mail application that recognizes the input and performs basic functionalities. Paper then demonstrates the interaction between the user and the system. User study and its details are mentioned as well. A summary of results has been explained in detail to illustrate the user and their views. Various challenges have been mentioned, first being the simulating conversation the other being transformation from GUI to SUI designs. Recognition errors are a major concern as they consist of many sub types. Rejection error occurs when user’s words have no hypothesis for the system. Substitution errors are the ones where there is a misinterpretation for a user’s words although legal, still incorrect. Insertion error occurs when system inserts random words due to noise. Nature of speech is another challenge which includes the problems due to lack of feedback, speed and persistence. Paper concludes that SUI design should be separate effort that involves learning and studying human conversation.

Zihao Zhao 15:09:21 9/29/2015

The paper “Designing SpeechActs Issue in Speech User Interface” is a easy-read paper. It made deep research in the speech-only interface by analyzing the statics get on the SpeechActs prototype. The design of this prototype includes gather the user investigations and iterative redesign by several user studies. The study reviews a lot challenges as well as some basic rules in Designing a Speech-only interface. Like we can not just totally translate the graphical user interface to a speech user interface because some of the principles in GUI is not applicable in SUI. We are not going to speak some of the words which frequently appear in the GUI. I learned a lot from this paper, it made a lot of research on the user studies, and it gives me a hint on developing systems. Like the iterative feedback and redesign of the system. I think that the Speech-only user interface has a great prospect because our hands are occupied by some stuff while our mouth is free. Just like I have to type words while I want to read a message just received from my mom. The speech user interface will help a lot. When we are driving, it will be great if the SUI can help us to read messages or send messages. This will greatly enhance the quality of our life. It is a hot topic that people are watching on their phones for too much time, even when they are walking. It is very dangerous if we look at our phones while we walk on the roads. However, with the SUI, this problem may be solved by transferring the task from hand and eye to mouth and ear. The main difficulty to build the system I think is the technique on natural language processing is not mature enough, if we can recognize the intonation, then the commands and the input will not be a problem.———————————————————— “Multimodal Interface” is a article which analyze the history and the future of the multimodal interfaces. The main stream design of multimodal interface design is adding speech and gesture or adding speech and keyboard input in the past. And in the future the multimodal interfaces will be more innovative, well integrated and more robust. The goal of multimodal interfaces is to make it usable for different kinds of people no matter what kind of background he has. That requires the designers to take cultural difference into account. The typical information processing flow of a speech and gesture interface is that both speech and gesture controls the context concurrently, and the gesture is interpreted by the gesture understanding system while the speech is recognized by the natural language processing system. And the input information will all integrated by a multimodal integration processor. Multimodal interface is not new new for us, many large scale games take the advantages of multimodal interface. As far as I am concerned, the education area is the best part to implement the multimodal interfaces. Like the education of Chines, we can build a game which require the player to pronounce the intonation of a character while demo some gestures on it., which is of great helpful in learning Chinese.

Matthew Barren 20:22:51 9/29/2015

Summary of Designing SpeechActs: Issues in Speech User Interfaces: Yankelovich, Levow, and Marx layout their process of iteratively designing SpeechActs, and the conclusions they drew from each iteration of user testing. The authors then extend the paper to general challenges that exist in producing a successful Speech User Interface (SUI) The authors of Designing SpeechActs: Issues in Speech User Interfaces establish a goal of creating an SUI focused on meeting the needs of professional users. The application of the SUI extends to reading emails, calendaring, and typical queries. Throughout the design process Yankelovich, Levow, and Marx are focused on delivering an SUI that moves towards a conversational approach. They recognize that there are many axioms in conversation that are difficult to mimic in conversation. A particularly interesting conversation dynamic that is difficult to produce is the variance in the way users speak to an SUI. As the authors note, a user is likely to abandon the vocabulary used in a GUI, and instead, input their own lexicon into the conversation. People generally do not have a uniform means of speech, and therefore, the SUI may need to map multiple inputs to one output. There is an opportunity for machine learning to provide this mapping. If users are able to initialize a word or phrase to a set of known actions by the SUI, the application will become more flexible in the way it treats user speech. An additional difficulty to conversation dynamics is the speed and pacing. People communicate differently in terms of their speed of delivery and the rate of comprehension. The authors came up with a solution to manually control the pacing of conversation. Advanced analysis of user speech could allow the SUI to more appropriately speak. Feedback through speech is difficult between two individuals communicating over the phone. The removal of any and all social cues makes it difficult for the user to trust the actions that are occurring. The authors supplement this feedback through verification prompts. Additionally, the authors correctly note that SUIs and GUIs have different domains for the actions that can occur. One way of increasing feedback is providing the user with the ability to switch between interfaces to do a visual verification in the same state as the SUI. At first, this seems to defeat the purpose of developing an SUI. Instead, this can provide the opposite effect. The ability to do a visual verification allows the user to develop trust for the SUI, and eventually, the switching between GUI and SUI (or doing both in concession) interfaces will allow the user to develop a sense of trust in the SUI’s results. Summary of Multimodal Interfaces: In Multimodal Interfaces Sharon Oviatt discusses the history, characteristics, and implications of multi-modal interfaces. She emphasizes the importance of interfaces to be able to receive multiple inputs to deliver high fidelity results. Multimodal interfaces allow users to interact with an application that receives multiple inputs, and then the inputs can be examined in parallel to provide high quality machine outputs. As discussed in previous papers, a common goal in interfaces is to map the user intention correctly to the desired goal. In this sense, the pathways to communicate between a machine and a human compose a shared language between the two. Multimodal interfaces aggregate multiple domains of communication. Although this will most likely increase complexity because of an increase in outcomes, the machine has more input features to differentiate the users desires. As Sharon Oviatt discusses from Bolt to many contemporary multimodal interfaces there is an emphasis on two dimensional pointing and speech examination. In addition she notes, that speech and lip movements are the most “mature” multimodal inputs. These two inputs can be very useful for recognizing speech, but how can a machine recognize context or emotion? One area that multimodal interfaces can extend to is facial mapping. The technology exists to examine how faces contort to express different emotions. With this type of feature, a multimodal interface can examine the users emotion and respond accordingly. Additionally, if multimodal interfaces extend from two-dimensional inputs to three dimensional, multimodal interfaces can examine the posture and condition of an individual. As Oviatt describes, current and new features can extend to a wide variety of industries. The real power of a multimodal interface is the ability of a flexible form of communication. Humans are not consistent communicators like computers. Instead of having a set of predetermined set of vocabulary that maps uniquely, humans’ forms of expression grows over time and has various meanings depending on context. Multimodal interfaces are an opportunity to allow computers to sort through the context, and deliver results that match the users desires while allowing individuals to communicate in a more human manner.

Long Nguyen 21:47:22 9/29/2015

Read on Designing SpeechActs: Issues in Speech User Interface: by doing experiment Speechuser Interface in SpeechActs prototype with professional travellers and developers, the paper shows out drawback of human-machine speech interaction, and through that points out 4 main challenges in SUI: Simulating Conversation, transforming GUI into SUI, Recognition errors and Nature of Speech. Furthermore, there are some sub-problems including Lack of Visual Feedback and Speed and Persistence. In order to prove the SUI, the paper proposes some ideas of: build a more like human-human dialog, make the dialog brief and informative, give a brief and correct feedback to user, separate SUI from GUI design. I believe this paper is a big contribution to future SUI design of modern application like Siri and Google Talk.----------------------------------------------------Read on Multimodal Interfaces: The document presents about multimodal systems. They are the systems which combined multiple user inputs and be expected to be easier to learn, use and more effective compared to unimodal recognition systems. First it introduces existing multimodal interfaces systems, mostly in recent 15 years with some and comparisons in 6 characteristics. Then the document proposes analysis in the goals and advantages of multimodal interface design, featuring error handling in both user-centered and system-centered sides. Multimodal systems show great advanced along with cognitive science as the document shows some example of systems multimodally interact with human. The main contributions of this paper I think belongs to the part explaining basic architectures and processes techniques of these systems, which is a good reference for modern systems. However I prefer more ideas in future directions, which is poorly described.

Chi Zhang 21:52:29 9/29/2015

Critiques on "Designing SpeechActs: Issues in Speech User Interfaces" by Chi Zhang. This paper introduced SpeechActs to us. In this paper, the authors tried to express the idea towards how people should design Speech UI system. They agree that people should not directly adjust the Graphic UI into the Speech UI. They believe that the designers should start Speech UI design from scratch. SpeechActs system has some problems that matter, which is addressed by the authors. Too much feedback generated by this system sometimes slows down the speed of interaction as users might take time to process all the feedback. However, in the environment of GUI, that is not considered as much feedback or overwhelming. After addressing the problems of this system, the authors also talk about the challenges for Speech UI. Conversions are very hard to emulated. GUI cannot be successfully translated into a speech-only environment. Recognition part is the real issue as it is very hard to implement perfectly. This paper is a quite illustrative paper as it introduces many aspects of current system and identifies many problems and future concerns, which are very constructively helpful for future research. --------------------------------------------- Critiques on “Multimodal Interfaces” by Chi Zhang. This paper it is mainly about the multimodal interfaces. This is explained as the interfaces that process two or more combined user input modes in a coordinated manner with multimedia system output. The authors talked about the main categories of this interface, advantages and main features of it. Multimodal interfaces would reduce the ambiguity in user input and system errors. Multimodal system can help distribute the task around different input steams and thus reduce the amount of work to finish the task. This is really well-organized paper to do such good illustration work. And it produced many constructive and significant insights into this set of interfaces.

Lei Zhao 22:55:06 9/29/2015

Paper 1: This paper introduces the user interfaces developed by them called SpeechActs, with a focus on the challenges and the feedbacks from the tested user. The developed four systems: the mail system, the calendar application, the weather application and the stock quotes application. First, the users did some tests on different groups of people with different occupations. Then they collected the feedbacks for them and improves the applications accordingly. The conclusion from the users’ feedback turns out that it is not a good strategy to adopt the design methods from GUI applications directly, instead, a speech interface should be designed from scratch. Since this paper is published in 1990s, nowadays many speech interface applications have been developed, however, traditional GUI is still not being challenged, so there still remains great research potential space in this area. ================================= Paper 2:The major topic of this paper is the introduction of multimodal interfaces. First this paper describes what is the unimodal interface and multimodal interface. Then, they put their effort to discuss the advantages of multimodal interface over the unimodal one. There are mainly two reasons: 1) multimodal interface can provide a more clear explanation of user input. 2) multimodal interface reliefs cognitive load on user end. However, more work need to be done in the area of natural language processing to make this technique practical.

Xinyue Huang 2:48:25 9/30/2015

Designing SpeechActs: Issues in Speech User Interface: The paper introduces the conversational speech system SpeechActs and introduces a set of challenging issues facing speech interface designers and addresses some challenges. SpeechActs is a research prototype which integrates third party speech recognition and synthesis with telephony, natural language processing for creating speech applications. The system includes speech-only interfaces to a number of applications including electronic mail, calendar, weather, and stock quotes. The paper introduces the formative evaluation study design and tasks. Problems appear after testing. The problems include the slow pace of the interaction, the computer gave too much feedback, inappropriate translation of the of message organization into speech. The paper shows the comparison of results and the design challenges. The first challenge is simulation conversation. The major design challenge is to create speech application to simulate the role of speaker/listener convincingly enough to produce successful communication with the human collaborator. The challenges of them include prosody and pacing. The second challenge is transforming GUI into SUIs. The main challenge aspects include vocabulary because the vocabulary used in the GUI does not transfer well to the SUI. The second challenge is information organization because it often does not transfer well from the graphical to the conversational domain. The third challenge includes information flow. Another challenge is recognition errors. Recognition errors include three categories. a rejection error is said to occur when the recognizer has no hypothesis about what the user said. A substitution error involves the recognizer mistaking the user’s utterance for a different legal utterance. An insertion means that the recognizer interprets noise as a legal utterance. The last challenge is the nature of speech which includes the lack of visual feedback and speed and persistence. Multimodal Interfaces The paper introduces multimodal systems which can process two or more combined user input modes, such as speech, pen, touch, head and body movements, in a coordinated manner with multimedia system output. There is a growing interest in multimodal interface design for the goal of supporting more transparent and efficient means of human-computer interaction. The author introduces a concrete history of multimodal human interface and the current status. There are multiple goals for the design of multimodal interface design. The first one is multimodal interface permits diverse user group to interact exercise selection and control over how they interact with the computer. There are also many advantages for multimodal systems. The first one is efficiency gains which can derive from the ability to process input modes in parallel. The second one is its superior error handling, both in terms of error avoidance and graceful recovery from errors. The author also introduces the cognitive science underpinnings of multimodal interface design. The cognitive science underpinnings may focus on when the users interact multimodally, what the integration and synchronization characteristics of user multimodal input are, what the individual differences exist in multimodal interaction and what are the implications for designing systems for universal access. It also includes the primary features of multimodal language. In the latter part of the paper, the author introduces the basic ways in which multimodal interfaces differ from graphical user interfaces, the basic architectures and processing techniques used to design multimodal systems, and the main future directions for multimodal interface design.

Zinan Zhang 3:23:15 9/30/2015

1. For Designing SpeechActs: Issue in Speech User Interfaces----------- This paper first takes the speech acts system as a example to illustrate that speech acts is not as easy as we thought. Then it states some challenges faced in designing speech acts and try to address some of the challenges. Among all the challenges mentioned in the paper, the most difficult challenge I think is the simulating conversation. It is really hard to deal with because computer is not a real human so it cannot have a real conversation with a human. What the human want to do is to set up some phrase that organized like by human. So that when human talk with a machine, it feels like they are really talking with a human. However, the machine cannot be intelligence enough to understand all the movement that the human do. For example, just like what mentioned in the paper, if a man want to skip the current content and break it with a word, the machine cannot understand that the human do not want hear the current content any more. The machine just goes straight until it meets a stop when a paragraph is ended.--------------------------------------------------------------------------------------------------------------2. For Multimodal Interfaces------------ The paper mainly talks about the multimodal interfaces. It introduces what the multimodal system is and some relevant information about the multimodal system. Anyway, it gives a brief introduction about the multimodal system in all directions. After learn about the multimodal system, I think it is a really complexity system. It process two or more combined user input modes in a coordinated manner with multimedia system out. Dealing with a certain input modes, such as speech acts, is complexity enough. But this multimodal needs two or more input to be combined. That needs different kinds of the technics to work together so that the whole processing can be worked well and the output can be what the human want. In my opinion, this kind of interface’s development is relevant to the all other single input’s development (speech, pen, touch...). Although the multimodal sounds very smart, it is unnecessary to spend too much effort on this field until the other single input have a great development.

Samanvoy Panati 3:26:51 9/30/2015

Critique 1: Designing SpeechActs: Issues in Speech User Interfaces SpeechActs is a conversational speech system developed at Sun Microsystems Laboratories. This paper illustrates a set of challenges faced during the design and development of the software and the approaches to solve these problems. With the booming development of mobile interfaces, it would be better to have a hands-free operation to access different applications. But developing speech interfaces involves overcoming many obstacles. To develop SpeechActs system, the developers first identified the principles and challenges of the conversational interface design. They followed the approach of iterative redesign by conducting many user studies. The results showed that the main drawback was the slow pace of execution. Also, the users are dissatisfied with some features like getting too much feedback and the inability to interrupt the speech output with their voice which can be rectified in the next iteration. The researchers also found out that the error rates and the user satisfaction are loosely correlated. The researchers mainly found four design challenges. The major one is to simulate the role of speech interface to produce successful communication with the human. This requires understanding the patterns used in normal conversation. Also, the pace of the speech should be modified according to the user’s pace. The second challenge is transforming GUIs to SUIs. This task is not trivial since the user can have visual interaction in GUI and therefore he has full control. But on the other hand, in SUI, the user has less control. The information should be organized and should be represented in the right way. The third design challenge is recognition errors. These are of three types. Rejection error is not being able to understand the speech input given by the user. Substitution error is interpreting the wrong input as the intended input. Insertion error is interpreting the noise as the input. The final design challenge is the nature of speech. Due to the lack of visual feedback, it takes time for the user to visualize the situation and then follow through it. Sometimes, silence from the interface is ambiguously interpreted as the system is working, while the system didn’t recognize anything and waiting for the user input. This can be solved by giving timely feedback to the user. Finally, this concludes that adhering to the conversational principles helps to make usable speech-only interface and immediate feedback is essential to solve many issues. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Critique 2: MULTIMODAL INTERFACES Multimodal interfaces process multiple inputs at a time. Their goal is to support more transparent, flexible, efficient and powerfully expressive means of human computer interaction. The most rudimentary multimodal interface developed uses both standard keyboard and mouse interface. Now there are many combinations using speech, pen, touch, manual gestures, gaze and head and body movements. These are categorized based on functionality, architectural features and general classification of different speech and gesture multimodal applications. Multimodal interfaces permit flexible use of input modes which cannot be seen in traditional key board and mouse or a unimodal interface. Multimodal interface gives the user, the choice of modality to convey the type of information he wants. He can switch between modes at any time. The main design issue in multimodal interfaces is the modality choice. The systems are becoming more and more complex and it is better to give the flexibility of input to the users. If the driver of a vehicle wants to use his mobile, it is efficient if he can operate it in eyes-free mode. Speech processing interface can be used for this. Multimodal interfaces provide a great adaptability which is needed to accommodate continuously changing conditions of mobile use. The main advantageous feature of these interfaces is error handling, both in terms of error avoidance and graceful recovery from errors. The users will select the appropriate input mode, which he feels as less error prone, for entering the input. There is also a system-centered reason for superior error handling. A multimodal interface which is well designed can solve the mutual disambiguation of input signals. The design of these interfaces depends on accurate prediction of when users are likely to interact multimodally. This is done during interpersonal communication. Most multimodal systems developed to date are bimodal systems but there is excellent scope for the systems with more than 2 modes in future.

Priyanka Walke 3:45:23 9/30/2015

Reading Critique on Designing SpeechActs: Issues in Speech user Interfaces This paper discusses about the design of SpeechActs, a Speech User Interface. This interface is designed using cthe conventions of conversational language in order to perform some daily tasks like reading emails, weather forecasts, answering telephones etc. It majorly discusses the SpeechActs functionality, proceeds with its iterative redesign and concludes with challenges and design strategies to meet those challenges. The author states that an attempt in converting the Graphical User Interface to a Speech User Interface leads to bad quality of interfaces as they both adhere to different characteristics. Hence, a direct conversion is not possible. In fact an alternative format can be used to meet the requirements instead of direct translation. Hence, he describes this by using a speech interface over that of telephone key-press system. The ultimate reason of this design is to reduce the user’s load and make the interaction more natural and convenient. As mentioned above, this paper describes all the challenges faced while designing the system, the strategies used to tackle them. There still exist a few problems that need to be handled, one of them being the accent recognition problem as it causes inability of the system to recognize the received input and also the inflow to repetitive tasks. Other challenges for this system include simulating conversation, GUI to SUI conversion, recognition error and nature of speech. The systems where human replacement is needed especially where the responses vary in huge amounts, it is difficult to cover up those scenarios which may lead to a total confusion. The paper stresses on providing feedback in terms of audio visuals along with the mixed-initiative interface design. This being an extremely novice area of research, there definitely a huge scope here. Reading Critique on Multimodal Interfaces Multimodal Interfaces are those which can process multiple input modes like speech, pen, touch, gesture etc. This paper mainly describes the Multimodal Interfaces, their design, need, history, goals to achieve, advantages, research and requirements. The author states that it is necessary to move beyond unimodal interfaces, explore the collaboration between the available modes of input. Since, multimodal systems accept multiple input modes, they are definitely powerful than the unimodal interfaces as failures in input mode does not stop the system in multimodal systems. They differ from the conventional Graphical User Interfaces is in terms of multiple streams controls for the input and possible ways to combine the input in order to meet the users requirements. Also, the input is interdependent on the other gestures like the use of lip movements in order to reduce the speech recognition errors. Also, it is not compulsory for the user to provide input in all possible forms required by the multimodal interface. The input can be provided in any of the forms accepted by the multimodal system. Designing such interfaces provides a lot of help to the needy one’s and also bring about an ease in performing the daily activities. This field is definitely vast and hence huge amount of research is expected in order to cover as many possible areas of it. The paper only deals with the multimodal interfaces with quite a number of redundant statements and inferences. Just a plain collection of information with no exciting research discussion. A lot of light can be shed on the different and upcoming research areas.

Vineet Raghu 3:47:57 9/30/2015

Designing SpeechActs: Issues in Speech User Interfaces SpeechActs is a prototype of a conversational interface system that allows users to navigate through different mobile applications such as weather, email, stock quotes, and schedules. All commands are done via a speech recognition software, which the authors state present highly different challenges than a conventional graphical user interface. Thus, these types of interfaces cannot be directly translated from a GUI, as this will not create the most user friendly model. Pacing appeared to be one of the most difficult challenges for a SUI, as users want normal conversational speed, which is sometimes difficult since the agent must process information concurrently. Another difficulty in this domain as that at the time of writing, interruption of speech agents was difficult, whereas users typically can predict what the computer is trying to say, and can continue with the next command quickly. In the translation of GUI’s into SUI’s, designers must take care to avoid certain issues. For example, the vocabulary used in a GUI can be very unnatural in a speech based interface. The author’s demonstrate this by giving an example of using relative dates in speech such as “next Monday,” whereas in a GUI, something like this would be unnecessary as users could simply click on the date corresponding to next Monday. This can be a difficult challenge to overcome. Obviously, the most crucial bottleneck is speech recognition errors themselves, as background noise, missed timing of speech, and misunderstood words can all contribute to improper understanding. The particular design struggle here is mitigating user frustration in these scenarios by encouraging users to repeat what they said in a constructive manner, or alternatively, assuring that commands were understood properly, without completely slowing down the system to a halt. The prototype must have been an interesting development at the time, and though speech interfaces are becoming more popular with speech-to-text messaging systems, and personal assistants such as siri, GUI’s are still the predominant interface of choice. Overall, it appears that recognition software has improved greatly from this time period, but some of the issues noted by the authors still remain. Particuarly, the pacing problem to make conversational interfaces natural, and nowadays, the understanding of complex proper nouns is a challenge, such as names or places. ---------------------------------------------------------------------------------------------------------Multimodal Interfaces This paper describes multimodal interfaces, which are interfaces that use more than one mode of user input in harmony with one another like a speech and stylus interface. The author further classifies these interfaces into blended, which combine a passive input mode and a separate active input mode. One of the major benefits of multimodal interfaces is the ability for one input modality to help to recover from errors in another input modality. Like solving speech recognition errors using a stylus for textual input. Also, they allow a single device to be used efficiently in varying environments. For example, our mobile phones can be used as speech devices while driving or touch screen devices while relaxing at home. It appears as though for these types of interfaces to continue to grow and progress, the focus needs to be on AI techniques. These can both enhance the accuracy of input modalities such as speech and vision, but also it can improve integration of multiple input modalities if these are framed properly as a learning problem. With experience, the system can properly integrate these signals to provide a full context of a user to know what they are trying to accomplish at this time.

Shijia Liu 4:16:57 9/30/2015

Section1: Designing SpeechActs: Issues in Speech User Interfaces Nowadays people have been used to speeches during their work, especially for those on travel. As a result, SpeechActs in invented as an experimental conversational speech system aimed to help office workers. SpeechActs has many kinds of functions such as error recognition. But up to now it has not been widely applied in professionals. SpeechActs has been tested for its practice in many aspects. From current results, it could be suggested that this invention did benefit some users in certain aspects. It could easily find out the errors and some grammar problems. However, it also faces some challenges. The nature of conversational speeches cannot be realized through this application due to its lack of human intelligence. Besides, some detail could be fully reflected such as silence or emotional elements. In addition, the system itself needs constant enhancement before applied into practice. The present system is not stable to support repeated visit or regular feedback. In conclusion, SpeechActs has positive meaning in leading people to convenience and efficiency in daily work. The translation system needs further improvement as well as humanized elements added to perfect this system to better match current demand of professionals. Section 2: Multimodal Interfaces Multimodal systems process two or more combined user input modes. They can be speeches, gestures, movement or even body language. Modern individual needs more flexibility and freedom to engage with more than one task at one time. Hence it will be necessary to increase its working efficiency by using computer system. Multimodal systems satisfy this demand by offering combined user input or output. The system is designed with multi-functional parts, allowing more tasks dealt with simultaneously. The overall efficiency will not be decreased under this situation while the controller could engage with different jobs without extra concentration. The trend of future is to enable more scientific technology to replace traditional labor force in some fields. The advantages of multimodal systems outweigh those of traditional labor with less probability of making errors. The self-revise system within multimodal will be created and enhanced one step further in a short period to stimulate to the development of human computer interface. Different individuals could even conduct many activities either for fun or work in varying occasions via multimodal system. And it never impedes the progress of each other’s work. More and more signals or marks will be recorded by human computer interface to generate a better system gradually.

Jesse Davis 4:48:44 9/30/2015

Designing SpeechActs: Issues in Speech User Interfaces This paper is important for HCI and the Computer Speech field in general because the entire paper is based on identifying the major issues, design problems, and limitations of the speech user interfaces of its time with the help of the SpeechActs research prototype speech recognition/synthesizer. Seems like this would’ve been an important for the creation/design of Siri and I enjoy they went about most of their studies/experiments (i.e. using phases and trying to reorganizing the experiment for each group). This paper identified a lot of the potential problems with early speech UI, such as the conversational cues that are important for when a user should know when to speak, what to say, navigation methods, etc. The error handling included a lot of robustness implementation and refining which I was impressed by. The end of the paper bring up a lot of important issues by glossing over the subtopics: speed and persistence, (easy to produce speech, hard to consume speech), ambiguous silences, and the lack of visual feedback. Multimodal Interfaces This excerpt covers what they have explained as an extremely important domain of HCI: Multimodal Interfaces. The idea that new input/output methods and their combinations are important for making break throughs is an easily supportable statement given this excerpt. The beginning goes over some pioneer multi-modal systems (Figure 21.1) of several different breeds for speech/gesture applications and details their multimodal characteristics. The next important discussion topic this excerpt dives into is the goals and advantages of multimodal systems where it details several definitions of multimodal interfaces types and how/why they’re used, which relays into the section that discusses what is currently out their multimodal-wise and how to people commonly perceive it. I was able to relate to this because one of the projects I worked on while in R&D was using various combinations of hand gestures, voice capture, and EEG readings to operate/navigate learning environments and Google Maps. The excerpt goes on to give in depth details about how multimodal systems works and they’re typical specifications and requirements

Ankita Mohapatra 5:46:06 9/30/2015

Reading Critique on Designing SpeechActs: Issues in Speech User Interfaces: The goal of this paper was to expose some of the difficulties that come with designing a speech interface by describing what the authors experienced while testing their own, called SpeechActs.The authors’ motivation is their belief that conversational speech is a better alternative to menu-based telephone systems that are tedious for users. The application’s functionality included that of many general office applications, targeted specifically at traveling professionals who would need information on-the-go. They used a series of user studies and redesigns to assess the interface as well as the speech recognition system, all of which was detailed in the paper. This paper contributed a detailed exploration of the challenges that come with SUIs or Speech User Interfaces, an exploration which probably influenced the development of SUIs that followed its publication. Simulating conversation seemed to be the key to a usable speech interface. This includes transitional prompts, sharing a common context between speakers, and more sound-oriented details like inflection and intonation. The telephone systems at the time could not reproduce human-sounding speech very well. The limitation of phone systems also prevented users from interrupting the system with their own voice, since the system would not receive audio while recording into the phone. Ironically, the authors proposed that keypad shortcuts be available so that advanced users may skip over familiar prompts, although this called into question the extent to which the authors were striving for a speech-only interface versus a speech-optional interface. A significant observation was that GUI interfaces do not translate well into SUI interfaces. Since SpeechActs was trying to give speech access to existing GUI applications, it was made clear via the design cycle that the GUI workflows did not transfer to conversation successfully. Each design iteration tended to push the interface toward a conversational style. I thought it was helpful and responsible that the authors addressed some issues regarding how plausible or useful a speech-based interface might be. Importantly, speech interfaces do not provide the same level of freedom to users; users feel compelled to fill silence and take action, whereas in the use of GUI applications, users are free to pause and think as well as explore uninterrupted.================================================================================================================================================== Reading critique on Multimodal Interfaces: This paper is a survey of past, present, and future research in multimodal interfaces. Multimodal systems are defined as systems that are capable of accepting more than one mode of input in synchronization. These systems have become possible because of a wide array of new input devices. The author predicts that these developments, eventually, will lead to systems that have near-human sense perception. I’ve pulled out many of the novel and core topics addressed by this paper for this review. A distinction is made between active and passive input modes; in passive input, there may be sensors monitoring users’ behaviors to make decisions without explicit user commands to the computer. This lends itself to the discussion of “blended” multimodal interfaces, which blend the use of both passive and active modes and may temporally cascade the modes such that each modal interpretation influences the interpretation of the others. A unique benefit of multimodal interfaces, blended or not, is the concept of “mutual disambiguation.” In a unimodal system, a single stream of input is being interpreted and thus there is no context for checking against recognition errors. However, if two or more input modes are operating and the system receives data from both, an error in one mode may be detected by comparing it against the processing of the input from other modes, either at the feature level or the semantic level (post-processing). This improved error recognition improves the stability of the interface and can make the user’s interactions more efficient. The author points out the need for multimodal user interface toolkits to alleviate the complexity of designing multimodal prototypes in the future. Multimodal interface designers need to pay heed to the fact that users are only likely to act multimodally in certain situations and also may switch between unimodal and multimodal acting depending on the cognitive load that they are experiencing. An important note by the author is that research needs to explore further the temporal relationships between natural modes of expression (gaze, gesture, speech, and within those) such that advanced multimodal systems can take advantage of those relationships by anticipating them. Most of these human expressions are not simultaneous (but are synchronous). Further, cooperation will need to be made with researchers in the cognitive science field because of the complexity and non-intuitiveness of these relationships. Natural language processing needs to adapt to be more suitable to the way people speak in multimodal systems.

Mingda Zhang 7:50:09 9/30/2015

Designing SpeechActs: Issues in Speech User Interfaces Authors of this paper developed early stage speech user interfaces and summarized their progress, experience and lessons in this paper. Their system is called SpeechActs, which serves as an experimental conversational speech interface so users can finish their tasks while talking to the system. The motivation is good because traditional interfaces have multiple drawbacks in certain occasions. However, in their experiments some technical limitations emerged. For example, they realized that talking seems to be slow for interaction, and since it's error-prone and lacking feedbacks, repeated verification became unavoidable. The authors explained their challenges and corresponding approaches during development in detail, such as conversation simulation, transform from GUI to SUI and tackling recognition errors. It seemed that experience of developing traditional GUI did not help much in exploring the speech-only interfaces according to the authors, and they even suggested to start design from scratches. This is worth thinking because typically we believe that reinvent the wheels are unnecessary and tend to translate some existing knowledge into new concepts. However, considering the giant distinctions of these two types of interfaces, sometimes a highly customized approach would be more preferable. Multimodal Interfaces This paper illustrates the idea of dealing with systems with multiple communicative modes(channels). In fact, many of current systems are armed with keyboard, mouse, speech-recognition system, touch pad or even manual gestures. Personally speaking, I believe that incorporating multiple interfaces can be a option to increase the bandwidth of human communicating with computers. As we all know, mouse is the most successful input device and its performance has almost reached the upper limit of human itself. In other words, the limiting factor became human hand. However, is that the best we can do? The computer and machines have helped people with many impossible tasks, and I believe a great tool should help human users improve their limits. Therefore, multimodal interfaces can be a chance. As a comprehensive overview, this paper demonstrates the history as well as future perspective of multimodal interfaces, and also, many other tasks can be accomplished with these techniques. Advantages and possible drawbacks are also analyzed in detail. Although, even we have got many well-developed interfaces as weapons in our arsenal, how to most effectively coordinate them to work together requires much effort.

Mahbaneh Eshaghzadeh Torbati 8:45:17 9/30/2015

(Designing SpeechActs): This writing concerned about a development of a speech based user interface. The writers discussed the design, experiment, and refine process. I think this paper is important for the exploring of new style user interface on computer. Not just about the speech based interface they developed, but about the problem of speech based interface they point out. Speech based interface seems be convenient for user to use. But practically, there are lots of problems for designing the interface. Some are technical, and some are due to the speech this natural way, like it is hard to keep the pace of conversation with user naturally, since the analysis of speech need times and also machine don’t know how to keep the conversation in the rhythm of the natural conversation between human beings. Also it is hard to make a conversation fast enough as mouse keyboard interface. It may due to that user lack of visual feedback. They have to finish listening to get all the information. Once the information is completed, user may get confused so that the speed will be slow. Even though that this interface is not fast, it is still very useful when people are hard to see screen. For example, when people are driving, it will be the best way to help driver to do the operation. Let’s imagine a situation that user is driving and they got a message sent from an important person, and they have to read it now and reply it. Speech interface will be the best choice. It doesn’t need a very fast speed of operation. But it needs to be safe to do that. The device can read out the message for the user and the user can speak to the device about what need to be reply. Computer will recognize the message and send it. Very nice interface for this situation. (Multimodal Interfaces): This paper talked about multimodal interfaces, which means the interface that have multiple user inputs working together. The author use the history of development and examples clearly explained the study of multimodal interface. This paper is important in my opinion. In nowadays, due to the development of sensor technology and computer technology, it is possible for some new ways show up for computer operation. We have already stick in computer mouse interface for a very long time. Our operating speed of computer didn’t become faster for a long time. It is time to improve our operating speed. This paper gave us an idea to do that. Combine different user inputs that working together in natural way seems to be a good solution to use to make the operation of computer be more easier and faster. The speech and point-based multimodal interface looks in natural way for users to express their mind. The invention of this kind of interface may lead to some improvement on experience of computer operations. Thus, I think this paper is important for the development of user interface. Also the paper point out some problems of the multimodal interface. One of the problems is that people always make mistakes while they are operating computers. People may not intent to make the mistakes, but it just go in natural way, such as people may hard to draw a perfect straight line by using mouse. Tolerant of this kind of error is necessary for the multimodal interfaces. Learning users’ habit is a good try to know what kind of mistakes that users may make when doing some operation. Then, when people are doing the operation, self-correction of the mistakes will speed up users’ action on computer.

Sudeepthi Manukonda 8:50:01 9/30/2015

Designing SpeechActs: Issues in Speech User Interfaces is a paper that talks about the usage and importance of speech system in Human Computer Interface. For any communication feedback is very necessary. Even when two people are talking, it is necessary that the other person responds in order to check what he has understood. It is the same with the Speech Acts. This paper talks about the challenges it underwent during the design of that. It is even more difficult when the interface is speech-only interface. Human Computer Interface research is all about making the life of humans a lot easier without even letting them know. SpeechActs makes the user free from the burden of remembering long telephone numbers, or the possibility of miscommunication. Communication with the person while he is travelling is an improbable one. It is difficult to contact him. Mobiles have sufficed a little but still not completely. SpeechActs provided to provide the complete solution to this issue. SpeechActs is a research prototype and it integrates third-party speech recognition and synthesis with telephony, natural language processing capabilities and other creating speech application tools. SpeechActs is technology which is completely speech based. The communication is completely like talking talking to the next person. The information provided is in terms of voice and our instructions are supposed to be given in audio. The writer has given examples of accessing the messages, calendar, weather, etc. using the speech recognitions and the possible queries the user has for these tasks. Implementing tasks by only speech is not only a highly risk prone technology but also very difficult to design. Experiments have been conducted on a group and the number of tasks completed is recorded and also the accuracy with which they are completed. These provide the necessary statistics required in analysing the design procedure. There are several design issues too generated during the SpeechActs design. They are simulating conversation, transforming graphic user interface into speech user interface, error recognition, the nature of speech, and the continuity of the conversation. Some very important points have been observed in communication that much be incorporated in order to make it a natural speech processing technology. They are avoiding repetitions, handling interruptions. and grounding the conversation. Converting GUI to SUI is not that effective and SUI should have a separate interface design to achieve successful communication. —— We have talked about design of different input systems and the problems faced during the design of them. But what is we have not one but two or more modes of input interfaces. How are we going to design the system? How are we going to let the user know of the two functionalities? How are we going to get maximum functionality out of both? These questions are answered in this book and in this chapter. There are several types of multi modal systems that are present and created from past couple of decades. A lot of research work is going on in these fields. The main objective of creating a multi modal system is to create a robust system which would work on all types of networks and systems. Multi modal systems are preferred for the various number of inputs they offer and for a number of advantages they have. The design process needs us to keep in mind several aspects and they are multi modal interfaces, active input modes, integrator, fusion and other recognition patterns. Multimodal systems also have a cognitive approach in designing, understanding and using the system. Multi User Interfaces differ quite a lot from Graphic User Interfaces. GIU assumes that there is only one input terminal and most of the times the keyboard press is ignored when the mouse is still clicked. While MUI takes into all the inputs even if there are given simultaneously. Multi User Interfaces, as we spoke already, have two or more modes of inputs. The early Multi User Interfaces took into account the speech and gesture and this diversified to other init methods too. A lot of research work is going on in trying to improve the technology and hopefully we will see a revolutions in this in the near future. The future is looking for an innovative, well integrated, and robust multi modal system.