Intelligent User Interfaces

From CS2610 Fall 2016
Jump to: navigation, search

slides

Readings

Reading Critiques

Alireza Samadian Zakaria 17:36:06 10/19/2016

According to the first reading, computers have undergone such a rapid advances which we cannot find in any other technology; however, the shape of the computer has not changed a lot. The paper, at first, talks about ubiquitous computing. The vision of ubiquitous computing was proposed by Weiser at first and it consisted of computationally enhanced walls and floors. They launched a research program in which the strategy followed three tracks: “computation by the inch” focused on small devices like active badges and palm size computers. “Computation by the foot” was concerned with computationally enhanced pads of papers and “computation by the yard” was about larger devices such as Live-Board. In that time, in another Xerox lab, there was another research focused on human-computer interaction; they designed the Digital Desk which was a smart desk with some abilities like showing electronic contents and changing real papers to electronic contents by its camera. Furthermore, they took some steps toward virtual reality like data-glove which is a glove augmented with sensors which report the position of the hands. However, this field is more popular today since we have more computational a graphical power than that time. The author also talks about the reactive room which was a room in which they used the context of the environment in the interaction between computer and human. In that example, the room was a meeting room and the computers were designed for this purpose. Tangible interfaces is another concept which the author talks about; they were used in the ambient room which was a small office augmented with many ambient displays to provide background information by the means such as light patterns changing. At the end, the paper focuses on interacting with tangible computing and it illuminates some of the differences between tangible computing and traditional computers such as the fact that there was a single center of interaction in the traditional interactive systems unlike the tangible computing in which we have many points of control. ------------------ The second paper says we are living between two worlds: physical world and cyber world. The traditional interface between these two worlds is common rectangular screens which show the information. However, the aim of the authors’ research is to move beyond the current dominant model of GUI and they call this new type of HCI “Tangible User Interface”. They want to do that by having many interfaces in real word consisting ambient media and interactive surfaces instead of a single screen; it can be done by having trans-board, meta-desk, and ambient-room which are some of the prototypes. The first two prototypes are in the center of users’ attention and the last one is focused on the background information. These systems are demonstrated thoroughly in the paper; however, some of the ideas exist in today smart-homes, and being obsessive in this matter, in my opinion, can cause some distractions for the user and wasting the user’s time.

Haoran Zhang 14:01:19 10/20/2016

Sikuli: Using GUI Screenshots for Search and Automation: In this paper, authors present a tool called Sikuli, this tool can help GUI users to search and automation using screenshots. For example, if the user wants to delete a file, user can point to the file icon and recycle bin icon on the screen, and Sikuli will interpret what you want can delete the file. Other than that, this system can user screenshots for searching, if you want to search something, just make a screenshot, and the system will use ocr technology to recognize the words in the picture and do a search for you. I think this technology is pretty much like Google Goggles. For Google Goggles, you just use your cellphone to take a picture, and Google will extract the keywords for you to search on Google automatically. Of course, Sikuli can do automation which Goggles cannot do. But I think maybe there is a limitation of Sikuli compare with Google Goggles. For Google goggles, it can search a pure picture, for example, a picture of artwork. But in Sikuli, they system cannot extract any useful information about the artwork only with OCR technology, thus, for searching part, Google Goggles may beat Sikuli. But like what I said, Sikuli can help user to do automation. This a good part, but I still doubt if it is easy to use. Because it seems that, user need to simple programming, which the most user don’t know how to do. In other word, if Sikuli is really useful, why it is not popular today?

Zhenjiang Fan 15:55:14 10/20/2016

Sikuli: Using GUI Screenshots for Search and Automation::::::::::::::::::::: Using screenshots of Graphic User Interfaces as an input source of searching or automation sometimes could bring unnecessary troubles, for example, what if the graphic recognizing process can not recognize the input graphic or misrecognize the input graphic. In this case, the recognizer may repeatedly take the same wrong action again and again. So, I think, the first thing that we want to make sure is the recognition procedure is almost perfect. Given the current high recognition error rate, this technique has not been utilized much so for. Despite what I mentioned above, the paper's proposal is a great idea and has huge potential to become a general tool for the future computer devices. This paper presents Sikuli1, a visual approach to searching and automating GUI elements (Figure 1). Sikuli allows users or programmers to make direct visual reference to GUI elements. To search a documentation database about a GUI element, a user can draw a rectangle around it and take a screenshot as a query. Similarly, to automate interactions with a GUI element, a programmer can insert screenshot directly into a script statement and specify what keyboard or mouse actions to invoke when this element is seen on the screen. As the paper's introduction of its tool. Sikuli1 is very powerful tool and it covers almost every perspective that this kind of tools should have. In other words, it is a great prototype to follow. As the paper mentions it has two major shortnesses: theme variations and visibility constraints. Many users prefer a personalized appearance theme with different colors, fonts, and desktop backgrounds, which may pose challenges to a screenshot search engine. Sikuli Script operates only in the visible screen space and thus is not applicable to invisible GUI elements, such as those hidden underneath other windows, in another tab, or scrolled out of view.

Alireza Samadian Zakaria 12:09:29 10/26/2016

The paper is about a visual approach to search and scripting for graphical user interfaces by using screenshots. Sikuli, is the name of this approach and it allows users to search a large collection of online documentation by sending a screenshot. This system consists of three components: a screenshot search engine, a search engine query interface, a user interface for annotations. They have used SIFT for extracting visual descriptors and used OCR to extract texts from the screenshot and changing it into 3grams; then they used this 3grams as another kind of descriptor for searching. They used a within-subject design to compare this method of searching, with a method using queries. They have also used a dataset and compared these two methods of searching by drawing precision/recall curve. Furthermore, the authors talk about their API for visual scripting and some of the classes and methods of this API are thoroughly described. Unlike other scripting languages, this scripting approach uses a visual representation of the elements instead of their name or position. There are also some pairs between images and words, for example, they have paired “PowerPoint” to its icon. Actions include clicking, dragging, and typing. At the end, after talking about some related work in the field of help systems and image-based interactions, the authors mention two limitations of the approach. First, different themes in operating systems can lead to different screenshots and the results might be different. Second, sometimes, some elements are not visible at first because of the overlapping windows or need of scrolling which Sikuli cannot find them.

Tazin Afrin 20:57:20 10/26/2016

Critique of “Sikuli: Using GUI Screenshots for Search and Automation”: In this paper, the authors present a visual approach to search and automate the GUI using screenshots named Sikuli. In this interesting software, the user can take a screenshot of a graphical interface element and conduct search using that screenshot. User do not have to use the elements name. For automation of GUI interaction, Sikuli also provides a visual scripting API. The authors conducted a web based user study and showed that, screenshot based searching is faster than keyboard based for map navigation, bus tracking etc. The lack of efficient and intuitive search elements for documentation of GUI motivated the authors to come up with Sikuli. This is crucial when user might face trouble to use the element. Previous approaches let the user enter keyword for searching but it may not be very obvious to find one suitable keywords. But using screenshot might be very obvious in those situation. Moreover, screenshots are accessible and universal over all kinds of application. In this paper, the authors contributed with a Sikuli search system, which enables users to search a huge collection of online documents using screenshots. The system can retrieve a wide variety of dialog boxes, which has been demonstrated empirically using a user study. The user study also shows that screen shots are faster than keywords. The authors also gave a scripting language calles Sikuli Script, which enables the programmers to control the screenshots programmatically. It also has an editor interface for screenshot writing automation.

Xiaozhong Zhang 21:43:26 10/26/2016

Sikuli: Using GUI Screenshots for Search and Automation The paper presented Sikuli, a visual approach to search and automation of graphical user interfaces using screenshots. The author claimed that Sikuli allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help system using the screenshot instead of the element's name. The author further mentioned that Sikuli also provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events. The author then reported a web-based user study showing that searching by screenshot is easy to learn and faster to specify than keywords. The paper also demonstrated several automation tasks suitable for visual scripting, such as map navigation and bus tracking, and showed how visual scripting can improve interactive help systems previously proposed in the literature. The paper concluded by mentioning two limitations of the approach and offering possible solutions for future work.

Keren Ye 22:01:29 10/26/2016

Sikuli: Using GUI Screenshots for Search and Automation The paper presents Sikuli, which is a visual approach to search and automation of graphical user interfaces using screenshots. The innovation is that it allows the user to use screenshot to search. Results show that searching using screenshot is easy to learn and faster to specify than keywords. In general, the main contributions include: 1) Sikuli Search, i.e., screenshots based search system; 2) A user study showing the benefits of screenshots; 3) Sikuli Script, which is a scripting system enables programmers to use the API; 4) Two detailed examples of applying the technique. In the main body, the authors introduce the designs of Sikuli Search first. This includes the system architecture, screenshot search engine, user interface for searching screenshots, and user interface for annotating screen shots. Some implementation details are also covered. Then the authors discuss about the user study they made and the detailed method. Performance evaluation is also given. In the next part of the main body, Sikuli Script is introduced. Detailed algorithms and strategies are proposed, especially the ones that handled computer vision problems. It is basically based on scale invariant feature transformation (SIFT). Based on the SIFT feature, the authors mention in details how do they construct the reverse index and visual dictionary. In the last part, the paper presents some scenarios that the technique could be applied. The first example is creating stencils-based tutorials, and the other is automating minimal graphical help. In the conclusion, the authors mention some limitation of the system yet they are still optimistic about the idea.

Debarun Das 2:36:04 10/27/2016

“Sikuli: Using GUI Screenshots for Search and Automation” ::: This paper discusses about a visual approach called “Sikuli” which incorporates search using screenshots of a GUI element, instead of its name. It initially discusses about the system architecture and the prototype implementation. Then, it goes on to describe about the User study that was conducted. The users were asked to give their subjective opinions of the use of such an interface. The main aim was to establish two main goals: i) it is faster to specify screenshot queries than to type in keywords, ii) results of screenshot queries have almost the same relevance as search by using keywords. The results were favorable to support the two goals. However, I believe that using only 12 users for the user study was a bit of a limitation for this user study. It then discusses in details the motivation and algorithm for the implementation of the system. A higher number of participants would have made the study more promising. Finally, it discusses about the limitations and possible solutions of this work. This paper was quite interesting because it is published towards the end of the year 2009, which is relatively recent (normal search using texts has had been very popular by then). This work seems somewhat similar to searching by image, introduced by Google. However, even if it may make things simpler, yet it may create ambiguity when the same image can be used to interpret more than one idea. Also, I believe this will be good for general searches only (like searching by the icon of a particular software) but not for more specific searches, (like searching for ‘when was a particular company founded’?). In those cases, I believe that searching by text would be a better choice.

Anuradha Kulkarni 7:54:03 10/27/2016

Sikuli: using GUI screenshots for search and automation ---- This paper presents Sikuli, a visual approach used to search and automate GUI using screenshots. It is a two-fold application that allows users to search for help and document GUI elements (such as a toolbar button, icon, or dialog box) through a screenshot query, and allow users to write scripts which utilizes screenshots in the scripts themselves to perform automated UI alterations. Sikuli search uses three visual features (surrounding text, visual words, and embedded text) to perform a search over many UI manuals, websites, etc. to find help for UI elements and allow screenshots of UI elements to be thumbnails for annotation. The search system was evaluated through within subject design user case study comparing a standard keyword text search for UI elements with the screenshot search provided by Sikuli. The experiments presented were very sound and evaluation was very good with many graphs. There are two drawbacks with respect to the evaluation. One I feel many users should have been used to do user case study instead of 12 subjects as it would give a better analysis and diverse one. Second is that the evaluation of scripting API is unclear. Overall this is a new approach compared to the traditional search with advantage of allowing the user to search without the keyword.