Week 15 – Post-Mortem

What went well

[Scope Pivot]

Out-of-scope is a kind of keyword for ETC client projects, especially the projects for proof of concepts. Real Empathy is no exception as well. Given an ambitious vision from clients, the biggest challenge in the early development stage is to pivot the scope but also meet the client’s needs at the same time.

In our case, clients want a solution for their virtual standardized patient program with clear requirements. The system should be running in Virtual Reality(VR), and needs to use Artificial Intelligence(AI) to respond to students and evaluate the empathy level of trainees. No multiple choice, drop-down menu, nor video.

Basically, our clients were right. The mission they brought to us is the best solution of standardized patient programs in the far future, but not now. We need to pivot the goal to fit the scope and to serve their demand. After many meetings and discussions with clients, we understand that they want to use this project as a demonstration to raise awareness and funds from the board of Pitts Med School’s Dean Office, so they could hire a team to develop and operate this project in the long term. Then, we realized “Extendibility” is the first priority, since clients would prefer to show the potentials of iteration and replicability.

[Early Prototype]

Early prototypes could help the team to demonstrate their idea and examine the design. We built several prototypes in the early stage of development.

We built a first rapid prototype to prove the idea of 360 Video for interactive storytelling. It’s a game of “Paper, Scissor, Stone.” According to the player’s choice, it plays different clips to continue the story. As a golden spike, it helps us test the working pipeline and show clients how a VR interactive video looks like.

The second one is a paper prototype. After we transformed the scripts to the first version of conversation tree, we set up playtest sessions via Zoom and arranged a few interviews with medical students. The team member played as a standardized patient and followed the conversation tree to react to the playtesters. With these inspiring playtests, we learn the fact that a well-trained medical student is more predictable than we expected. As our target users, medical students generally intend to follow interview structures, which is manageable by a pre-set conversation tree.

[Filming Rundown and Breakdown]

It is always hard to schedule times with actors/actresses and the whole filming crew. Therefore, every filming day is valuable and tight. Considering the interactive video experience is highly demanded on consistency, some mistakes may cause the previous filmings to be wasted and require everyone to re-film it again.

Taking the advice from faculty, we build a rundown document to make sure that we cover all the script and miss no details. To develop the rundown, we break down the filming schedule into several stages. First, we tested filming in our base (project room), where we instantly review the clips after recording. Then we scouted the filming environment(clinic room) in advance, so we could test the process of setting and filming. Finally, we filmed every clip with our official actress who is one of our clients as well. With the previous two-stage testing, the final filming is efficient and effective.

[Cross-fading]

The original idea to handle switching clips is using jump-cut. It’s effective and easy, but the jump-cut couldn’t hide the shifts between clips and make the conjunctions tremble. We have tried different ways to solve this problem, such as directing the performer’s pose, or locating the camera more stable. None of them works well, so we have to find a way out by applying some advanced editing techniques like cross-fading.

Considering we are making an interactive video, we can’t rely on existing editing tools, so we need to build the solution in Unity by ourselves. Cross-fading means the system will fade-out the finished clips and fade-in the following one. Forcing cross-fading in rapid succession will make the switching breakness subtle or even seamless, especially in the VR world. It might be related to how human-being’s brain works. People’s visual sensitivity in the VR world is lower than on a media plane. We assume that brains have to process too much information in the VR world, so the cross-fading feels seamless for users in general.

[Playtest and Training Conversation Tree]

We are comfortable to proudly claim our success because we keep close contact to our clients and highly concentrate on their needs. As mentioned earlier, extensibility is the top priority. We decided to use Voiceflow as our AIChatbot tool, because it’s graphic user interfaces are friendly to people who have limited coding knowledge. Basically, under our system structure and following instructions, training conversation trees is totally manageable on the website for everyone.

Training models require data input. Therefore, playtests play an important role in the development process. Generally speaking, each iteration is estimated to host round ten times playtests for examining the coverage of possibility and tuning the conversation flow. Observation, question survey, and interview are all equally important for reception of feedback. Recording the playtest sessions is helpful when we compare the videotapes with the speech-input text files, so we can review if the wrong detection is caused by pronunciation, word-choice, or phrasing.

Overall, this tuning process increases the effectiveness and efficiency of building a new story/scenarios. No matter how poor the script is at the beginning, it will get back to the trail after several iterations with playtests. It’s predictable that three to four iterations of the development cycle, 4 to 6 weeks probably, could make a comprehensive conversation tree and interactive video experience in the Real Empathy system. Those numbers are strongly supported by the evidence of our experience this semester, such that clients should be confident to make a clear development schedule for budget and fund-raising.

[Product Positioning]

The last pivot we made is about product-positioning. When this semester started, we thought we provided a total solution for standardized patient programs, including training, evaluation, and changing behaviours. However, we realized that we don’t know how to evaluate empathy automatically by machines. If we can’t evaluate a student’s performance, we can’t prove they have changed after our training game/experience. To answer this question, we reached out to the Standardized Patient Program of Pitts Med School and learned how they did their work. According to the reference, we learned that instructors will accompany students during the training session and take notes on the checklist. Inspired by this discovery, we realized that we should leave the empathy evaluation to instructors rather than handling it by the system.

With this understanding, we re-positioned Real Empathy as an initiator in the training process. We provide a test/exam experience for students to show empathy and meet the baseline requirements. Then displaying the data we collected during the entire session could be a foundation for students and instructors to have a further discussion about performance and improvement. The clients didn’t expect this pivot but they are surprised and happy with the change. Since the data platform idea wasn’t part of their vision when they came to us, this significant lesson-learned seems like an extra/bonus from us. As a bridge, Real Empathy connects medical students and instructors from lecture-learning to sandbox-practicing. We show a clear image of our product positioning so clients could persuade their financial team to invest this idea.

What could have been better

[Directing]

Lack of experience for directing may decrease actors/actresses performance. At the beginning, we try to represent the real conversation between doctors and patients. When filming, the director acted as a doctor and had a conversation with the patient. The director(doctor) can hold the scripts and just read it but the patient can’t. However, making an interactive video requires preciseness. Sometimes, actors or actresses have to repeat their lines again and again due to a missing detail. It not only wastes the time but also the energy or focus.

A better guideline in our scenario is that the director just says what the actors/actresses are going to say, which guarantees no missing or improvisation. We tried this way at the very end of filming and it works pretty well, especially when the patient is saying something similar with slight differences. The downside of this solution might be emotionless for the actors/actresses, but we thought it would be a fair trade-off to improve the effectiveness of recording.

[AIChat Performance and Function]

Machine Learning is such a powerful technique for problem solving. However, tailoring a solution for specific uses requires too much knowledge and time that we don’t have under a scope of a semester long project. Therefore, we decided to build up the Real Empathy system by using the existing AI Chatbot service. The combination of Voiceflow and IBM Waston is the minimum viable solution, not the best performance one.

First, the detection latency and performance issue is critical to remain immersion for an interactive video experience. The delay may be caused by anywhere between the working sessions, including Speech Detection(IBM Waston), Conversation Tree(Voiceflow), and Loading Videos(Unity). Considering the compatibility of these various platforms, it becomes harder to debug the latency and optimize the performance. For future development, unit-test is necessary to fix this bug and improve the pipeline.

Second, the missing tone and stressness detection. We did the research for the comparison of existing commercial-level AI Chatbot, and we learned that none of them can handle the tone or emotion of the speech input. Some academic papers tried to address this issue, however, we don’t have a chance to test exactly how it performs with our use case due to the scope and time limitation. To be honest, tone detection is still a challenging task, such that none of the big tech companies has a stable solution, neither Alexa nor Siri. A long waiting time for the improvement of related techniques is expected. To prepare for the future research, we strongly recommend the audio recording/storing features should be added to the Real Empathy system.

Machine learning(ML) models require tons of data, usually labeled-ready data is preferred. Therefore, storing the students’ audio recording during the training process is valuable and helpful for the further ML research to access it as a resource.

Third, a better way to handle the small talk. The current solution of Real Empathy is straightforward and trivial. We did playtests and collected the data from observation to add new possible responses to the conversation tree. It’s easy to extend and understand, in other words, it’s the most viable and effective solution we could have right now. Therefore, with more playtests and more response-handlers added in Voiceflow, the conversation tree tool, could better imitate a real conversation. However, there is a sweet spot, which means over some point, adding new response-handlers will increase complexity of the tree exponentially. Higher complexity causes a higher level of difficulty to manage the conversation tree. It’s endless to pursue a fully-completed conversation tree, so the designer and developer in the future should be very careful of the trade-off between the complexity and completion.

Is there any better solution? Yes, there might be. As we learned, there are some existing solutions called contextual chatbots. The basic idea is the chatbot will take a profile/article as training data, and then the chatbot can answer almost any related questions. It should be a smarter way to handle small talks with the control of complexity. However, although the above mentioned solution might be used to implement a more manageable conversation tree, the complexity of video clips is still a challenge.

[Data Collection]

As one of our main features, the data platform was added to the Real Empathy system in the late semester. It’s a pivot according to the limitations of the system we discovered, when we learned that we are not able to evaluate empathy automatically by the machines without missing tone detection. We changed our objectives to build a platform and provide data to advisors and students for reviews. But, we made this pivot too late, when we had already finished most of the design for the training process. Although we have already quantified student’s responses, like word counts and the average reaction time, there are several suggestions to the future development team to collect monitoring data.

Storing the student’s training footage as audio or video files. Now we only store it as a text file and show it on the final dashboard. However, audio or video output could be better understood when it’s replayed. The other advantage of the virtual patient program, which definitely should be noticed, is the accessibility of micro-gesture. For example, tracking the movement of the VR headset, could tell many things, such as the estimated eye contacts or the frequency of nodding as acknowledgement to the patient’s responses. Monitoring those invisible data is a big plus than the traditional standardized patient program. Data, as part of our purpose of value, should be considered at early design, so we could design some interactions or contents that support generating data during the process.