Multimodal Interfaces

From CS2610 Fall 2017
Jump to: navigation, search

Slides

slides

Readings


Reading Critiques

Xingtian Dong 17:59:53 9/22/2017

1. Reading critique for ‘Designing SpeechActs: Issues in Speech User Interfaces’ I don’t know too much about today’s speech user interfaces, so I cannot tell if the challenges that the author brought out are still challenges for todays’ technique. But the paper is a good example for exploring a new area. It is worthy to identify the challenges and decompose them into small problems. It is helpful for scientists to know what the problems are and how to conquer them. Another point that worth mention is how the author design tests to examine if the problems have been conquered and how to evaluate them. Recognition rates were a poor indicator of satisfaction. The users are very complex, when we evaluate some tools or interfaces, it is not enough if we only test how sufficient it is and how much successful rate it will achieve. We should learn more about the users. 2. Reading critique for ‘Multimodal interfaces’ I think the most important thing in this paper is that author point out the problems that multimodal interfaces are facing. The problems need more advanced machine learning and distributed system(I think). Now these techniques are much more advanced, it’s the right time to solve the problems. This paper is also useful for today, not only the problems the author mentioned are solvable with todays’ techniques, but also for the author combined many papers in multimodal interfaces area. He classify the papers in several aspects. It is easy for starter to know the overview of multimodal interfaces.

Ahmed Magooda 23:01:47 9/23/2017

Designing SpeechActs: Issues in Speech User Interfaces In this paper the authors introduce SpeechActs, which is a speech user interface (SUI) that is built on top of automatic speech recognition system (ASR) and can be used for tasks like calendar, email management and other dialogue based tasks like weather, etc... The authors performed some experiments to infer how good controls can transfer from GUI to SUI. The results showed that transfer is not easy from GUI to SUI due to multiple reasons (Human short term memory, the absence of visual aid, etc...). The authors then analyzed the types of errors they had and the reviews they got from experimental users and end up introducing some challenges which are (Simulation problem (prosody and pacing), vocabulary variance from GUI to SUI, Recognition errors (insertion, removal or substitution). I think the paper is reasonable and provide a solid reasoning of difference between pure GUI and pure SUI and how it can be hard to transfer one to the other, while the paper provided some experimental analysis, i guess they should have more samples of the comments given by users, since the number of users is not big anyway, also the paper doesn't provide a clear differentiation between the updates they made to the system, so we have no way of correlating the types of errors to the human satisfaction. --------------------------------------------------------- MULTIMODAL INTERFACES In this paper the author provides an introduction and survey of multimodal interfaces, which are interfaces that integrate more than one input model. The input models can actually be integrated in multiple ways (using one to further understand the other, use one as active and the other as passive, use one as error correction method to the other, use one as information source to the other, etc..) These combining paradigms are based on the application and the confidence factor of each model. The paper starts discussing the pros of using multimodal interfaces and the science underpinnings of the interfaces and finally the interface implementation. The paper then starts discussing the difference between GUI and multimodal interfaces as GUI assume only one channel is responsible for making decision which is not the case for multimodal ones. One if the points I agree with the author in, is that multimodal interfaces are beleived to be more efficient since humans by nature are multimodal. In human-human interaction we tend to be multimodal, humans convey messages through multiple channels (speech and body movement, speech and facial expressions, speech and gaze, etc..) which proved to be more resilient to errors and more efficient.

Jonathan Albert 15:07:53 9/24/2017

SpeechActs: This paper reviews the functionality of a speech-recognition-based interface for an email application. It details many challenges associated with such systems, and offers guidelines for improvements. Though the SUI mentioned by the authors proved difficult to use, its difficulties are important for designers of similar systems. Too many modern systems fall into the same traps; they could benefit from these considerations. Making the interface conversational but not verbose, minimizing silence without pestering the user, and accounting for a user's limited attention and patience are crucial. Otherwise, the system will frustrate and confuse its users. Notable examples of this are "electronic receptionists" that greet those calling in to various support lines. Instead of providing an issue detector that leads to FAQ answers, they seem to merely be engaged in delaying people by a maze of options until the next representative is available--or until the caller hangs up. In their current form, electronic receptionists tend to provide abysmal user experience due to their inability to understand the nuances of language or correctly categorize the user's issue so that they can be directed where they desire to be. If made less tedious, properly designed systems might relieve helpdesk agents without chasing users away from the phone lines. ---- Multimodal: This document explains and elaborates on various facets of multimodal interfaces, which use more than one type of user input "modes" such as speech and bodily movement. It focuses on the influence of cognitive science upon such studies and mentions users' performance when interacting with a map-based multimodal system. The breadth of topical coverage in this document is immense, and understandably so, as it seems to be a dedicated summary of the field. However, the denseness of the author's wording works to dissuade those seeking a bird's eye overview. Added to this is the overhead of sifting through lengthy parenthetic citations which, as on page 415, can appear interleaved in a single sentence. Their length and frequency turn them into line noise, making attempts at skimming information difficult. Further, this document seems to be geared towards cognitive scientists only. If that was the intended audience, then the document is understandably focused on principial theoretics. Without that information, however, I wonder whether the spirit of the document aligns with its call to action in its closing section. The recognition that more thinkers should think about this type of system is what I interpret to be the focus of the final section, but it seems to give cross-discipline designers in the "computer science community" only a cursory glance. With clearer formatting and topical focus (if intended), this document would be improved.

Krithika Ganesh 17:04:16 9/24/2017

Designing SpeechActs: Issues in Speech User Interface – This paper explains the research prototype: SpeechActs’ initial methodology proposed including frequent feedback from users to note the challenges faced while using it, and iteratively redesigning the prototype to meet the challenges faced by users in a speech only environment targeting traveling professionals. To test the usability of SpeechActs which consisting of mail, calendar, weather ad stock quotes features, the author employs 14 users. Are 14 users’ opinion enough to design research prototype like Speech Acts? Something to ponder about.  The result of the usability test proves that there are many design challenges that needs to tackled. The recognition rates of females turn out to be lower than men  and the author gives no justification for such an estimation. Also, this aspect of the device poorly performing with females was not accounted as a challenge by the author. The author tackles challenges faced one by one : challenge1 of simulating conversation was resolved by maintaining ‘common ground’, tackling pace by the barge-in technique and by allowing the users to modulate the speed of the synthesized speech, challenge2 of transforming GUI into SUI was tackled by maintaining ‘conversational conventions’ to improve vocabulary and improving the information flow removing the prompt dialog boxes, challenge 3 of recognition errors by tackling rejection errors by proving ‘progressive assistance’, substitution errors by echoing back part of the input command in the answer, insertion answer by allowing the user to press a keypad to turn off the speech, challenge 4 of the nature of speech is solved by tackling lack of visual feedback by providing both scanning and filtering techniques and tackling ambiguous silence by providing audio cue indicating that the computer working and silence indicating that the system is waiting for input. What is not clear to me is that is the author aiming to design an interface with only speech as input or optimal interface which also takes speech as input, as he tries to include keypad for users who acquainted to the interface which is not a speech only interface. Today we have voice assistants like Bixby, Siri and Google assistants help us in checking weather and time, setting alarms and reminders, or turning on Bluetooth, but Voice have traditionally struggled to understand more than just a simple request, hence a lot of research is going on improving the same and this paper is a good start to understand what challenges need to be considered. ------------------------------------------------------------------------------------------------Multimodal interface: This paper explains meaning, types, history, advantages, disadvantages, method to design, cognitive underpinnings, architecture and future of multimodal interfaces. Multimodal systems process 2 or more combined input modes such as speech, touch, manual gestures etc. The types of multimodal interfaces are systems which recognize speech and pen based gestures, speech and continuous 3D manual gesture, speech and lip movements, vision based. The advantages of multimodal interfaces are: permit flexible use, accommodates broader range of users, allows natural alternation between modes, superior error handling by mutual ‘disambiguation’ and minimizing users cognitive load. The principles of cognitive science and high fidelity automation simulations are the methods that have been used to design the multimodal interfaces. What is interesting is that there are 2 types of users with respect to integration patterns: ‘Simultaneous integrators’: overlaps input modes temporally and ‘Sequential ones’: finishes one input mode before beginning the next input mode. The author that one need to explore temporal relationships among natural expressions like gaze, gesture and speech and most of today’s smartphones bimodal and there is excellent scope in the field of interface design research to design multimodal devices. Multimodal languages are linguistically simpler than spoken language. There are 2 types of multimodal architectures: feature level architecture (one mode influences the course of recognition of other) and semantic level architecture (input modes are coupled temporally). This paper is a good start to get acquainted to multi modal systems but gives no clue how to design one. This paper has too many references mentioned in the main content which hinders the flow of information. 

MuneebAlvi 21:56:00 9/24/2017

Critique of Designing SpeechActs: Issues in Speech User Interfaces Summary: This paper describes the development of SpeechActs which is used to translate many of Sun’s applications into programs that users can interact with using their voices. The paper also discusses the many challenges of developing a Speech User Interface This paper shows me that speech user interaction is still evolving today. We still have some of the same issues. For example, many speech user interfaces (such as the ones we use when we call help desks) allow the user to interrupt the voice to give a command. Another related issue is trying to allow the speech system to perform as many tasks with as fluent conversation as possible. This is similar to Siri and Google voice recognition on iOS and Android. Both attempt to understand a variety of user inputs and try to understand what the user is trying to achieve. Both also attempt to hold a natural conversation with the user by suggesting further actions like scheduling an appointment found in an email. One issue from the paper that we don’t face anymore is the users not being able to understand the voice output. Most systems today are clear enough that most native speakers of the language understand. For non-native speakers, devices with screens show the output and input of the voice. This issue is resolved because the internet no longer requires a phone line connection as it did during the time of this article. Critique of Multimodal Interfaces Summary: This reading is about interfaces that require a variety of types of inputs to be used effectively. These inputs consist of speech, typing, writing, and looking. This reading reminds me a lot of xbox Kinect. The Kinect enhances the xbox by providing two additional aspects to the xbox in addition to the controller. The Kinect allows inputs in the form of speech and gestures. Through these three aspects, users are able to carry out a variety of tasks through different methods. For example, if users do not purchase the multimedia remote, they can use their voice to control media playback. If the controller’s batteries died, the users can use gestures to control various activities like navigating the menu. These inputs can also be seen on a complementary basis. For example, a user can navigate to the movie library by using voice or gesture but then the user can use the controller to enter a movie title. The controller allows greater precision that might be easier to use to type. Another example describing the topics of the paper is the car entertainment and info centers. Many cars today have multi user input. They allow speech commands, button presses, and drawing letters and gestures on mini touch pads. This allows the user to focus on the road and minimizes the time a user must look away from the road. A user can ask the system through speech to call someone or the user can bring up the navigation system through voice and then specify the address through the touch pad for greater accuracy. The multimodal inputs in modern cars presents inputs that can be used individually or in a complementary fashion.

Sanchayan Sarkar 0:59:47 9/25/2017

CRITIQUE 1 (SpeechActs: Issues in Speech user interface)--------- In this paper, the author demonstrates the design challenges of a speech user interface through a series of experiments in designing a system called SpeechActs. It also provides solutions in overcoming those challenges. One of the interesting feature that this paper shows is that often the quantitative results are not the based indicators of user perception. For example, the author experimentally shows that poor error rates in Table 2 did not correlate with the user satisfaction. Similarly one of the most important challenge is the translation from one mode to another. The paper demonstrates that direct translation from GUI to SUI is not recommended. So what works for one mode won’t work for another. Here, the vocabulary of speech flow has to different from GUI, the information organization and flow cannot be same as GUI. For example, Dialog boxes in GUI is very efficient in controlling information flow, however an equivalent prompt in SUI will create frustration within users. Further, the paper also discusses key challenges like the nature of speech where the feature of fast response is essential. Faster Feedbacks in SUI is need to compensate for the slowness of the medium. The paper also categorizes the errors into rejection, substitution and insertion. The interesting thing to understand the idea of explicit response-implicit response tradeoff that must be done by the interface designer to enhance perception amongst users. Also, simulation of an actual conversation is a major challenge in SUI designs. The use of correct intonations, pacing and less use of interruptions is necessary for the interface to feel natural. One thing I dislike about the paper is lack of diagrams. The author mentions plenty of design challenges and their solutions but do not group together in a chart or a diagram for a better visual understanding. Instead, the reader has to make the list of challenges that can often be confusing with such a wide array of terminologies. Never the less, the paper is substantial in not just learning about the critical challenges of Speech User interfaces but also in knowing that the underlying cognitive models that must be taken into account while changing from one mode to another. <--------------------------------------------------------------------------------------------> CRITIQUE 2 (Multimodal Interfaces)--------------- This chapter discusses a narrative on the evolution, applicability and the merits and demerits of multimodal interfaces. It also argues for a cognitive understanding on human interaction across different modes it’s implication on design multi-modal interfaces. One of the main merits of the chapter is that it introduces a strong vocabulary in multi-modal interaction and it’s implication in design. It introduces terms like mutual disambiguation, simultaneous vs sequential integration, hyper-timing and feature level vs semantic level fusion. These are core terminologies based on cognitive understanding of human interaction across multiple modes. For example in a contrasting scenario of feature level vs semantic level fusion architecture, one can understand the context of tight coupling of input modes (feature level) vs loose coupling of input modes (semantic level). Another interesting feature of this chapter is in analyzing the myths about multimodal interaction and explaining how they are often counter intuitive by giving examples of exhaustive studies. For example a common myth is that “multimodal input involves simultaneous signals”. However, the author shows that the more effective interfaces are those that takes into account the temporal cascading and also how contrasting signals help for a better synchronization across multiple modes. The author further asserts that not just nature of input signals, but the actions to be performed, the age groups performing the action, disabilities, etc. are all factors that go into designing an interface on cognitive foundation. It also shows the differences between the attributes of multimodal language vs natural language attributes. Even though the data stream of multi-modal interaction is higher than natural language, the semantics of it is quite terse compared to that of natural language. This again is counter intuitive. This work also relates to my field of Computer Vision where vision based techniques are often used for multimodal interaction. For examples, robots can understand the behavioral patterns of people by using body language gesture recognition. Such passive tracking is often the cause of response. Also new algorithms of deep learning like CNN are emerging in fusing image attributes with that of natural language semantics for better annotations. Therefore, this chapter finds a lot of relevance in my field. Finally, this chapter is quintessential in understanding the underlying principles of multi-modal interaction and gives a future direction of research in automatically learning human integration patterns and temporal sequencing in enhancing the wide range of possibilities that multiple modes of interaction can offer.

Mingzhi Yu 16:04:36 9/25/2017

Multimodel Interface by Sharon Oviatt: This is 30 pages paper gives a summary and future prediction about the multimodal interface. Although it is published earlier in 2012, which is 5 years ago from now, it pointed out the great advantages that multimodel could bring to the field. Looking at today's IT manufactories, the multimodal interface seems to be a baseline for most of the products in the market nowadays. Companies seem to reach the agreement that a single interface product won't attract the customers enough. For example, the touch technique of Apple will detect the touch point as well as the hardness of a press. The facial recognition system will recognize user's facial features as well as detect their emotion. Although I mostly agree with the point of views in this paper, I also want to point out that the single interface product is not always a bad design comprising the multimodal one. For example, I love to use Kindel to read, which has a simple graphical interface and no more fancy space to manipulate. This is because the task of reading requires users to concentrate; in other words, it should keep away from the possible distraction. The multimodal interface seems not necessary here. Too many freedom to manipulate the system sometimes could be distracted. I don't expect one system will understand what I am remembering during the reading and take some action or when I swipe the page, it will swipe many pages according to how hard I swipe. Designing SpeechActs: Issues in Speech User Interfaces by Nicol Yanklovich. This is a very interesting paper that is published in 1995. It points out how useful the speech recognition could be in the field of user interface design and also what the challenges will be. I have to say it is completely right at this point. Even over 20 years, people are trying to develop more intelligent speech recognizer for many different purposes. People realize and keep trying to make the use of it. The most successful case, I would say it is the Amazon Alex and Google Home. Amazon connected its product with its SR and SR could help the users to shop on their website. Google home connect the home devices and allows the users to control their devices without even putting a hand on them. The challenges discussed in the paper are generally true. Some of them are probably out of date. For example the short term memory issue, some of the SR systems nowadays will record the user information and preference in case next time the user have the same demand. Also it some intelligent SR won't need to give a pop cue all the time, they are simply listening to all the time and exhaustively trying to find out what the user's purpose from a huge queries database. In general, this is paper gives us some surprisingly right prediction and summary of the SR faces. I am surprised about how sightful the authors are. Some of the issues they mentioned over 20 years ago are exactly the issues SR have today. Also, SR is exactly one of the trendy UI design techniques today.

Tahereh Arabghalizi 19:28:04 9/25/2017

Designing SpeechActs: Issues in Speech User Interfaces: In this paper, the authors introduce SpeechActs which is a system for managing calendar and mail via voice commands. They also present the problems they encountered like simulating conversation, and the choices that they had for the design. Several users evaluate SpeechActs and the outcomes were interesting. One of the most important concerns of authors was how to convert a casual dialogue to meaningful commands for a system. Another challenge was to transform a GUI to a speech UI. Furthermore, recognition errors made lots of issues while designing the system because producing natural speech from a system is very important for users who expect an easy and friendly user experience. The only negative point that I can mention is that the authors did not explain why women were less successful at using SpeechActs. ------------------------------------------------------------------------------------------------ Multimodal Interfaces: This paper is a survey of past, present, and future research in multimodal interfaces which are defined as systems that are capable of accepting more than one mode of input. In the first part of the paper the authors introduce the concept of multimodality and motivates development of these interfaces. Using more than one type of input, such as a combination of speech, vision, and pen input can be more robust. For example, a multimodal interface can use both speech recognition and eye tracking to provide a satisfactory user experience. As we know nowadays Apple employs touchpad keyboard, optical recognition apps and Siri as a speech recognition in iPhone. The authors also note that researchers in cognitive science and NLP should collaborate more because of complexity of the temporal relationships between natural modes of expression.

Xiaoting Li 20:54:43 9/25/2017

1. Designing SpeechActs: Issues in Speech User Interfaces: In this paper, the authors point out the design challenges of speech interface based on a user study of an experimental conversational speech system called SpeechActs. The challenges include simulating conversation, transforming GUIs into SUIs, recognition errors and the nature of speech. The authors give solutions to some of these challenges. This paper was published in 1995. As we can see, some of the challenges still exist today. When we’re using Siri, we still have the recognition errors challenge. If we speak too fast or if we speak in some accent, the system can hardly recognize the input correctly. In the paper, the authors point that some verifications can help to reduce the error rate. Nowadays, besides improving user interface design, computer scientists today also keep collecting user’s input to get a better trained model to reduce recognition rate. In the paper, another important challenge is that designers cannot simply transform GUIs to SUIs. So is there way to combine these two types of interfaces together to make use of the advantages of both interfaces? For example, can we simply show users’ voice input as text at the same time when users speak to the system so that users can get feedback and verify the input in time? 2. Multimodal Interfaces: In this chapter, the author gives us detailed introduction of multimodal interfaces, including the history and status, advantages of using multimodal interfaces, “myths” related with this area and some clarification, and techniques and architectures being used in this area. This chapter shows us when comparing with unimodal interfaces, multimodal interfaces enjoy some advantages including flexible use of input modes, improving efficiency, being able to handle uncertainty and etc. In this chapter, the examples given by the author are mostly bimodal interfaces. The author doesn’t talk much about trimodal or other multimodal interfaces since at that time, multimodal interfaces are just beginning to model human-like sensory perception. There are still lots of work to do to develop multimodal interfaces. The good take-away message for me is the author’s detailed clarification about the ten myths of multimodal interaction. These myths may be common in user interface design even today. But the author’s detailed introduction helps me have better understanding of user interface design.

Spencer Gray 22:24:30 9/25/2017

In the first paper, Designing SpeechArts: Issues in Speech User Interfaces, the authors described the process of designing a speech interface and the challenges that accompanied that process. The motivation was the pinpoint the limits of current speech interface technology, and to discover the differences between speech interface and graphical interface systems. This paper is significant in the HCI literature because the authors highlighted the fact that speech interfaces must be desinged very differently than graphical interfaces. They stressed that the interface must be similar to a conversation in order to be easy for a human to interact with. While it is simple to just try to transform an existing GUI application into a speech interface, the authors showed why this approach would not be successful. The most interesting part of this paper to me was the fact that they used some developers as a control group in their experiments. I found this to be an interesting approach because they were more than just expert users of the system. This allowed the researchers to truly evaluate the performance of the speech recognition portion of their project. In the second paper, Multimodal Interfaces, the author describes the motivations, history, designs, and future directions for multimodal interfaces. A multimodal interface is one that combines two or more user input modes. The most typical multimodal systems combine some sort of touch and speech. This is an important paper in HCI literature because multimodal interfaces were not as common at the time this was written. By now, almost all of our computer interfaces are multimodal. Understanding the history and motivations are important in determining the future for multimodal devices. We can compare where we are now with multimodal technology with where we were then, and see that many of the same cognitive science underpinnings and desires to humanize our inventions still apply. What I found most interesting in this paper was the myths that the author debunked throughout the second half of the paper. Most of those seemed like valid assumptions that I would have made.

Kadie Clancy 0:03:44 9/26/2017

Designing SpeechActs Issues in Speech User Interfaces: This article discusses the principles and challenges of conversational speech interface design through the results of the usability testing and iterative design of the SpeechActs system. The SpeechActs system is a research prototype that includes a speech-only interface to a number of applications including email, calendar, weather, and stock quotes. The authors identify several challenges with speech interfaces, as well as solutions that were experimented with to overcome these challenges. For example, recognition errors make SUI unpredictable and frustrating. Many interactions with such a system can result in these errors, like speaking before the system is ready to listen, uttering words not in the system’s vocabulary, or background noise. SpeechArts attempts to address these issues by issuing implicit or explicit feedback, using a natural language component to compensate for the variety of queries all mapped to the same task, and providing users the ability to turn off the recognizer. Another major challenge with SUI deals with the nature of speech, as users will need to rely on a different mentality than when dealing with GUIs. For example, speech is more difficult to process for humans than reading text. The users address this problem by making computer dialog brief but informative. The authors conclude that adhering to the principles of human-to-human conversation makes for a more usable SUI, rather than translating a GUI into speech. This article is important in a number of ways. First, it presents challenges and possible solutions for designers interested in creating speech interfaces. Second, it illustrates the merit of iterative design and user feedback in the creation of a useable and successful interface. Multimodal Interfaces: Multimodal systems are systems that process more than one combined user input mode with multimedia output. These input modes can include speech, pen, touch, gaze tracking, gesture tracking, and touch. Interfaces of this type model human-like sensory perception, which allows users to have more robust interactions than with conventional interfaces. Multimodal interfaces have several advantages over traditional interfaces. For example, multimodal interfaces allow for flexibility in more complex interactions, like those presented by constantly changing environment conditions. Multimodal interfaces also allow for a broader range of users, including non-native speakers and those with sensory impairments. An example of an early multimodal systems is Bolt’s “Put That There” interface, which allowed users to use speech and pointing on a touchpad to move objects on a screen. Cognitive science has given a foundation for user modeling, which is crucial to developing multimodal systems for users of varying integration patterns. User integration patterns include sequential (users finish one mode before beginning the other) and simultaneous (users temporally overlap multimodal commands). Cognitive science also gives insight into whether users are interacting unimodally or multimodally. Similarly, cognitive science provides the findings that users respond to changes in their cognitive load by shifting to multimodal interaction. Based on this foundation in cognitive science, multimodal interfaces will move past the point-and-speak interface, like Bolt’s, which only makes limited use of new input modes. Future multimodal interfaces move towards interfaces that integrate complementary modalities and also contain more than typical bimodal inputs.

Yuhuan Jiang 0:13:57 9/26/2017

Paper Critiques for 09/25/2017 == Designing SpeechActs: Issues in Speech User Interfaces == This paper discusses the challenges and issues in speech user interfaces (SUI) by walking through the issues encountered by the authors in the design of an system equipped with SUI named SpeechActs. Simulating conversation is the first challenge the authors discussed. In SpeechActs, this is addressed by avoiding explicit prompts for input as much as possible, which creates a more natural dialog feel. However, there were still issues such as the who’s turn it is for speaking being unclear. The next challenge is transforming GUI to SUI. Some interesting user responses were used to support the argument that we cannot create SUI out of GUI. For example, the direct analog of dialog boxes (e.g., “Are you sure to send this?” with a YES and NO button) is a question “Are you sure to send it?” which expects a “Yes” or “No” answer. However, users are not compliant, and are often confused by such questions or they simply ignore the question or repeat the sending request. To relate to today’s technology, I take iOS, the operating system for Apple’s iPhone smartphones as an example. When you turn on the accessibility feature named VoiceOver, the texts and controls on the screen will be read. However, this is exactly what the paper is arguing against. An SUI should not be a direct translation of the GUI. However, also on iOS, the speech-based virtual assistant is an excellent example of SUI. It carries out tasks upon user’s requests made in the form of speech. == Multimodal Interfaces == This paper focuses on multimodal system, in which two or more combined user input modes are processed in the same system. The author begins by discussing the need for multimodal systems, which is the need for more transparent, flexible, efficient and powerfully expressive systems. The author also gives an account of multimodal user interfaces with cognitive science. It answers questions such as when will the user use an interface multi-modally, what individual differences are there in using a multimodal interface, and whether there is redundancy in multimodal inputs. The paper went on to discuss the basic architectures and techniques used to design multimodal systems. This includes feature-level and semantic-level approaches. Multi-agent architectures such as the Open Agent Architecture and Adaptive Agent Architecture are mentioned as infrastructures for building multimodal systems. The downside of the paper is it’s structure. A lot of sections have a large number of lengthy paragraphs, which can make the paper less easy to read.

Charles Smith 0:52:38 9/26/2017

On: Designing SpeechActs The author of this paper describes the implementation of Sun’s speech based interface. The author also explains some of the challenges, and solutions to, that the team came across. Right from the beginning the author makes a great point, don’t design a speech interface the same way you’d design a GUI. These two kinds of interfaces are drastically different, even if they serve the same purpose. I believe that we can even extend this idea further than just screens and voices, but rather others like don’t design a touch screen interface like a mouse interface. This paper unfortunately tries to draw conclusions with extremely small sample sizes, like user satisfaction and speech recognition rate. I believe that the author draw good ideas from this, but with a such a small pool of participants (only 3 per iteration) that making these conclusions should not have been made. On: multimodal This paper takes a look at removing the restriction of only using one input device at a time, and combining them to make a better interface. The author describes the past, present, and possible future for these devices (or, as thought when it was written). This idea at first seemed really novel, until I realized I already own a device that does this, and carry it around everyday. My smartphone, while not used in every function, has the ability to both interpret speech and typing, and does so frequently! Through use of the included virtual assistant, I am able to use make use of these features every day, showing the importance of the ideas laid out in this paper. This paper also draws a similar conclusion to the first one about interfaces, what works in a GUI won’t always work in other situations. The interface should be designed around the tools at hand, not other commonly used tools.

Mehrnoosh Raoufi 2:16:43 9/26/2017

Designing SpeechActs: Issues in Speech User Interfaces: This paper presents the SpeechActs that is a prototype system that implements a conversational system. The author explains the procedure of developing and evaluating this system and how they tried to improve it based on the results of their experiments. In their user study of the first version of SpeechActs, the figured out that the system is too slow and is giving too much feedback that users feel uncomfortable with that. Also, their study reveals that error rates don't correlate with satisfaction straightly. It is users' expectation from the conversation that impacts their satisfaction. Then author states design challenges while they want to simulate conversation. One challenge is transforming for GUI to SUI that cannot be done by simply translating the GUI options to voice. The study shows the translation is not workable. Another challenge is vocabulary that again cannot be transferred from GUI. Moreover, information organization is another issue. For example, users are used to seeing their new message numbered first rather than last so blindly translating from GUI to count emails respectively in order of their arrival is not working. Information flow control found to be another challenge. In GUI it is handled by pop-up dialog while in conversational speech it is difficult to cope with. Another challenging issue is recognition errors that happen if recognizer fails to detect the command or respond consistently. It may happen in a case, for example, the user starts speaking before the machine is ready to listen or the accent matter. There are also some intrinsic limitations such as lack of visual feedback and persistence in the speech-only interface. In the paper, no clear solution is presented for aforementioned issues. However, the author suggests not translating from GUI and try to consider users' expectations and keep the speech interface as brief and informative as possible.---------------------------------- Multimodal Interfaces: This paper introduces multimodal systems i.e. to process two or more combined user input as their interface simultaneously. These interfaces include speech, pen, touch, gaze, manual gestures and etc. It tells about the history of multimodal systems as well as their advantages, path of evolvement and current status. The author states these systems are more robust and easier in fault detection and corrections as they have different sources of user input data. In addition, multimodal systems require larger memory and computational power. Such a powerful system allow a single device to be used in various environments i.e. a significant advantage of them. The author also provides information about methods of designing multimodal systems. Their design is based on the cognitive science and high-fidelity automatic simulations. Later, the author talks about underpinnings that cognitive science provides for next-generation of these systems. The author believes cognitive science has revealed the myth about multimodal systems and rectifies previous misconceptions provided by empirical evidence. The topic discussed in this paper is s growing trend in today technology. As we see, the number of systems inputs and the variety of their interfaces are increasing. It is becoming popular especially in gaming convulses. The advent of Kinect is a fitting example.

Ronian Zhang 7:22:19 9/26/2017

Designing SpeechActs: Issues in Speech User Interfaces: this paper explores the challenges of speech-only interfaces by designing the SpeechAct prototype and testing the software among potential users. The conclusion is that SUI is completely different from GUI, and should not simply copy the design of the latter. The software has many interesting features and it’s obvious that those tiny optimizations could improve the performance hugely. Even though it’s beyond the scope of the paper, but I believe by combining visual data (say using camera to capture facial expression or gestures), the performance and usability could improve a lot. The best and most efficient way to solve the challenge mentioned in the paper might lie in combining other input info or providing more info (maybe add more hint on the input keypad). The testing is some how inaccurate, because the participant have “quick reference card” (cheat sheet). Since it’s bad design if one has to look up the guidance and it’s impractical for most of today’s scenario (self-phone-call-service). As discussed in the paper, human has its own limitation towards audio assistant (it requires more capacity, cognitive skills and written information could be much better absorbed), I doubt whether further study worth the effort (Siri is smart enough, and it has good feedback and recognition result, but people still barely use it, maybe only use it to add event on the calendar). ————————————————————————————— Multimodal Interfaces: this paper focus on multimodal interfaces which could be defined as system that combines 2 or more user input methods. This paper is innovating and it is just the problem in the former paper: nowadays, the focus on improving input performance on single input is a dead-end, by using multi input ways, the performance could easily be improved. It accommodates broader range of users, helps users shift quickly among different environmental conditions, improves efficiency, better handles error rates. Pens are just introduced in our everyday life: it uses as an input for tablet readers and it’s more suitable for note-taking, but I also feel by combining pens and gestures together, it would be much more useful, convenient (for iPad Pro, if gestures are combined, the switch between functions of pens could be more efficient and it makes other tradition non-pen functions ready to use). Speech and pointing is the most natural way to express user’s mind, it is also accommodates more conditions. The paper also highlights a easy way to build to develop the multimodal systems: using high-fidelity simulation. Even though facing some serious problems when dealing with simultaneous event, synchronization problems, cultural differences, I still think it’s promising. And future works should be done (even is not the no.1 priority) on learning individual differences (both the differences in the pattern of same input method and the differences in the way of choosing input methods), if this interfaces want to be wilder accepted and applied.

Ruochen Liu 8:31:41 9/26/2017

1. Designing SpeechActs: Issues in Speech User Interfaces: In this paper, the author introduced the SpeechActs System and made a comparison between speech user interface and graphical user interface. Also, several challenges in speech user interface were concluded and the strategies for meeting those challenges in a speech-only environment were presented. One of the biggest challenges is transforming GUIs into SUIs. At the beginning, the SpeechActs project tried to transform existing graphical interfaces into the SUI designs. But the user studies apparently showed that the GUI conventions would not transfer successfully to a speech-only environment. There were challenges on vocabulary, information organization and information flow. For example, the study showed that users are not in the habit of using the vocabulary from the graphical interface in their work-related conversations. So the vocabulary used in the GUI did not transfer well to the SUI. In order to fix this problem, researchers tried to support vocabulary and sentence structures in keeping with the users’ conversational conventions rather than with the words and sentence structures used in the corresponding graphical interface. Almost every lock has its own key. In conclusion, the main point of this paper is that the design of the speech user interface must be a separate effort that involves studying human-human conversations in the application area. 2. Multimodal Interfaces: This paper is a thorough introduction to the multimodal interface. From what multimodal interface is, to the meaning, history, current status, methods, basic architectures, processing techniques and main future directions of multimodal interfaces. Readers can get all kinds of useful information and great understanding about multimodal interface from this paper. On the study of an emerging technology, it is always important to find the origin of it. By doing this, we can learn a good lesson and maybe think more freely to solve the problems we are facing today. On the origin of multimodal interface, the goal of supporting more transparent, flexible, efficient, and powerfully expressive means of human-computer interaction made the study on how to design multimodal interface interesting and promising. The multimodal interface had the potential to expand computing to more challenging applications. Also, compared with unimodal recognition systems involving a single recognition-based technology, multimodal interface can be more robust and stable. The differences between multimodal interface and graphical user interface are also worth attention. Unlike graphical user interface, multimodal interfaces can process continuous and simultaneous input from parallel incoming streams. And multimodal systems process input modes using technologies that based on recognition, so they can handle uncertainty and entail probabilistic methods of processing. The recognition-based system also means a larger computational and memory requirements.

Amanda Crawford 8:38:58 9/26/2017

• Designing SpeechActs: Issues in Speech User Interfaces, Nicole Yankelovich, Gian-Anne Levow, Matt Marx, ACM CHI 1995, pp. 369 - 376. Design Speech Acts: Issues in Speech User Interfaces gives us a case study on the design of our current Windows Cortana virtual assistant. Yankelovich, Levow, and Marx takes an iterative design methodology in prototyping and testing their tool. Through these phases, they were able to capitalize on quick feedback, in which, they were able to identify improvements and limitations. One of their limitations, was trying to create a standard system that would be able to recognize human speech that was highly diverse and sometimes undeterministic. Although this limitation was due to their use of a third party library, in which, they are not able to control, the chose to take the route of identifying way they could maximize on building error avoidance and tolerance. • Multimodal Interfaces, Sharon Oviatt, in The Human-Computer Interaction Handbook, A. Sears, J. Jacko, ed., Lawrence Erlbaum, 2003, pp. 286-304. In Sharon Oviatt's surveying chapter on Multimodal Interfaces, she give us a brief and organized insight on what multimodal systems are, their evolution and history, and the underlying goal of further research initiatives. The benefits of multimodal systems is that humans would be able to communicate more powerfully and intellectually with systems if they were equipped beyond unimodal mouse and keyboard restricted systems. By allowing the user to communicate and clarify concepts that may be inexplicable via a single input device, a multimodal system may promote a friendlier error environment through error avoidance and building tolerance. The common aim of building systems should be that a design is simulates a high fidelity prototype and design. Sharon also provides a generic development cycle where a user may prototype and test the system. She evaluates the different binding fusion processes that allow distinct user interfaces to interact with each other. This paper is important if a user want to understand the foundation of building a multimodal system. It also helps one identify further improvements in thinking about how to design an interface that expresses how humans communicates and perceives.

Akhil Yendluri 8:57:40 9/26/2017

Designing Speech Acts: Issues in Speech User Interfaces This author explains the designs and problems related to the development of a speech user interfaces. The author and his team develop a speech user interface called SpeechActs system. Their aim was to identify principles and challenges in conversational interface design and find new avenues of research. The SpeechActs System is a combination of mail, calendar, weather and stock quotes. Some of the main challenges the users faced were slow pace of interaction and low recognition rate esp. for women. Simulating the role of speaker/listener convincingly and making a meaningful sub-dialog were some design challenges faced. Other important design challenges were Prosody, Pacing, Vocabulary, Information Organization, Information flow, Recognition errors, Substitution errors, Insertion errors, lack of visual feedback, speed and persistence and ambiguous silence. Some of the challenges faced have been mitigated to some extent in today's technology. The recognition rates have become better. New products such as Amazon's Alexa have implemented it well. But making such products scale across multiple languages is not possible and also one reason why GUI is better than SUI. Moreover in today's world internet and accessing internet from mobile devices even in remote locations has become a breeze. So there is little need for SUI. MultiModal Interfaces Multimodal Interface is a system which can process two or more user input modes simultaneously where it combines user input and recognition based input. The paper also explains the advantages of this model over the usual interfaces. Using multiple modes of input provides ease of use for the user. The user can type using keyboard and then switch to a stylus simultaneously. Multimodal Interfaces have improved substantially, take the case of Microsoft Surface. Error recovery is also one of the major factors in Multimodal Interfaces. This paper helps us in understanding the past and existing technologies but talks little about the future multimodal interfaces.