- Gathering quantifiable measures for application features and user experience in agile teams in addition to qualitative user feedback is essential for determining feature effectiveness.
- Google Lighthouse profiling and the Google Core Web Vitals measures are great at capturing user experience and application performance at a point in time. Still, developer assumptions of user behaviour can limit their effectiveness.
- Synthetic Monitoring is an excellent tool for examining the user experience at regular intervals, but development team assumptions still skew it.
- Real User Monitoring, or RUM, can be used to determine feature effectiveness and track errors based on real user behaviour, while maintaining anonymity, for the most part.
- Combining application monitoring for backend services with RUM gives a full indication of the user experience, including tracing features through the layers of your application.
One of the most satisfying aspects of software development is the joy of knowing that the software we use is helping people. It drives many of us in our daily pursuits of working software and rewards the hours of frustrating debugging and error chasing that can leave us laying our heads on the keyboard in defeat.
Conversely, discovering that the feature you poured your heart and soul into that is not used or is no longer used, can be a deafening blow that knocks away the prior triumph. Sadly, it is an experience that the author has had in their career.
Determining application user experience and feature effectiveness can be a challenging task. The introduction of adding Real User Monitoring, or RUM and Application Performance Monitoring, or APM, to my personal projects has given me great insight into the power that these techniques would have given to me in my prior developer life.
Here I reflect on the challenges of determining user experience and effectiveness and how if I could turn back the clock I would use modern techniques such as Real User Monitoring, or RUM, and Application Performance Monitoring to determine the true effectiveness of features.
Specifically, I will refer to stories from my time in banking to show which measures can help agile teams determine not only if features are being used, but diagnose other common issues too.
Why Is Usage Capture So Hard?
Before joining Elastic as a developer advocate, I spent over 10 years working for a large global investment bank, producing various dashboards and applications that often deal with high volumes of historical or live data. I fondly remember moving into a new team and adopting Scrum, after years of waterfall development and tight implementation deadlines pushed out by late-running analysis and design phases. I was part of a team with a prominent director at the helm who embraced Scrum and taught us the key practices and ceremonies. It was exhilarating to deliver features every few weeks and receive positive feedback on features from happy users in our regular demos.
This joyful time coupled with a lack of measures is often where many teams start to struggle. The feature factory mindset began to creep in, and the delivery rate started to model more of a continuous sprint than the marathon you truly need to maintain regular delivery momentum. We found that with a lack of metrics it was often difficult to tell objectively if the new features were indeed being used after being showcased to users, and what their experience was like. Grassroots agility often involves kicking metrics definition and capture into the long grass as the world of KPIs, OKRs and general metrics can seem scary and confusing. Indeed, it was only later when I connected with the firm’s enterprise agile transformation and learned more about metrics and KPIs from our agile coach, that I realised that metrics would have helped us identify user experience woes much earlier.
The Limitations of Application Profiling
We found after around one year that some of our features were not being used, and also that performance issues were starting to frustrate some analysts from using other vital features. The news came through the grapevine via a prominent leader, leading to us looking to identify the cause.
A colleague used the in-build profiling tools within Google Chrome to identify the root cause of our poor performance being a high number of repaint events delaying the loading of the content. An example of the view he would have been presented with is depicted in the right-hand panel below.
This discovery also led us to look for other measures of user experience that we could capture more regularly. Google Lighthouse is another tool present in the browser that can also be invoked via scripts that can allow it to be integrated into your CI, or Continuous Integration pipeline. It allows for web performance measures to be captured for your application, with results similar to the right-hand panel in the following screenshot.
Experimenting with Google Lighthouse was my introduction to Google Core Web Vitals, and the potential to measure the user experience and win back their hearts and minds.
Google Core Web Vitals
Google Core Web Vitals are a series of metrics announced by Google in May 2020 that can be used to evaluate the user experience of websites. The core web vitals, a subset of their metrics being put forward as a best practice for determining user experience, were included as criteria for search rankings in June 2021. Although their inclusion in rankings will be of less of a concern when building applications for internal audiences as I was in the bank, for external audiences this will be an important consideration.
These measures are not the silver bullet to solving the performance problems I outlined, however, they would have helped give us an early warning indication of the repaint issues and the delays experienced by users, as the main metrics relate to three key themes of user experience. Although there are several of interest, there are three main measures to consider.
- Initial loading time for users can be evaluated using Largest Contentful Paint, or LCP. This denotes the time taken for the largest portion of the page to load, normally to the main image or text block. In the case of my records application profiled above, this would be the large image on the home screen. Ideally, your pages should have an LCP of 2.5 seconds or less.
- Responsiveness to initial user activity such as a button or link click is measured by FID, or First Input Delay. We have all experienced the frustration of receiving no response to our click events on slow websites. The time from when a user performs an action to when the browser processes the corresponding event should ideally be under 100 milliseconds.
- The stability of the elements on the page is denoted by Cumulative Layout Shift, or CLS, which is based on the largest burst of layout movements. This measure might seem less obvious, but it is important to minimise the movement of elements to prevent user error when interacting with the screen. No one wants to click the wrong item and then frantically try to undo the action! For a visually stable web application, the CLS score should be under 0.1.
Sadly, using these metrics in my prior life was limited to adding meaningful error handling in my code and occasionally running the Lighthouse tools to see how the metrics were looking. But now I would look at how to integrate it into our CI pipeline instead via the command line tool, or even a plugin.
We decided to focus on two other areas. The first was using end to end testing, or e2e, to execute the user workflow prior to production and identify potential issues earlier. Initially, we used Protractor for the automation as we were building our web frontends using Angular and another engineer had experience using it. Later I used Cypress as an alternative when I became frustrated with the flakiness of Protractor’s retry logic and concerned that Protractor was nearing end of life.
The second problem was that we decided we needed to capture basic view statistics in our application to determine if new screens were being used. At this time in the bank, there was one product in use that incurred a licence cost that we weren’t able to justify for an internal application.
Instead, we ended up building our own where basic events triggered by code were sent from our application to a log via an intermediary service. The events were then visualised in a simple dashboard using an available data visualisation tool. An example architecture showing how this could be achieved is presented below.
I can’t disagree with the sentiment that the wheel was reinvented in this case. Building your own solution means also maintaining your own solution, and definitely means you carry responsibility for hosting it as well. This solution was also limited as errors and device information were not captured, giving us a limited view of the user experience.
It also means this didn’t integrate with monitoring tools used by application support teams. In this case, application support was handled by another team who used other tools such as AppDynamics to identify key spikes and traces. Not only are those groups interested in general availability, but looking at usage behaviour patterns can be a good indication of looking at scaling for regular busy periods.
Limitations of Synthetic Monitoring
We typically think of monitoring as something limited to the field of Observability, that is embraced by support teams, DevOps practitioners and SRE engineers to assess the availability of our applications. However, as my story outlines, the developer and production sides often remain fractured.
Those automated tests we wrote should be able to be executed at all stages of development, verification through CI pipelines and testing environment, and onto final production deployment. Periodic execution of such scripts within Observability tooling is known as Synthetic Monitoring. The challenge is ensuring you use a common automation framework that is compatible with your synthetic monitoring tool. So with my new experience, using Playwright rather than Cypress would allow me to run the same suite periodically against production using Elastic Synthetic Monitor as well as against my local changes and through my CI verification steps.
I am a great proponent of e2e testing, and have always desired to run these on production in my time building software. To be able to see breakages early in production is a gift that would have helped us detect such issues prior to being called by users.
While there are great advantages to seeing if the steps performed against your application, users may not always follow the path you expect when using your application. We also require insights into how and if they are using our application, not just checking if the way we expect them to use it is functioning as expected.
Getting Started With Front to Back Instrumentation
Real User Monitoring, or RUM, is a subset of APM, or Application Performance Monitoring that allows us to gather useful information about our live running applications without the need to trigger custom API endpoints. RUM specifically refers to the ability to monitor the behaviour of users as they use your web application. Engineers can easily gather anonymised statistics on their usage as well as errors thrown in the layers under monitoring. These can also be combined with traces from your backend services using either an APM agent or OpenTelemetry.
Many products are able to provide these capabilities, and open source options such as BoomerangJS also exist. They all work in very similar ways. To illustrate the setup we shall explain the concepts of instrumentation using Elastic RUM and APM agents feeding into Elastic APM via the architecture depicted below.
UI RUM Instrumentation
@elastic/apm-rum via a package manager such as npm. You will need to provide a name for your service, version and the URL of your APM server endpoint to start sending messages, as you can see from the below code snippet.
Backend Service Instrumentation
Providing instrumentation of your backend services as well as frontend with RUM gives you full front to back tracing of the behaviour of your application. For the backend service instrumentation, there are two approaches that can be taken, which are discussed in the subsequent sections.
Either option allows you to capture front to back tracing across your services and get a more holistic idea of the duration of activities across related services, as shown in the below trace screenshot from my sample application. In my case, the React UI traces are highlighted in green, and the backend controller duration is in blue.
Option 1: Elastic APM Agent
Combined with backend Application Performance Monitoring, or APM, services in Java and other languages can be instrumented. For this to work, as per the above code example, you need to include the backend server process with the
distributedTracingOrigins option to prevent blockage by CORS.
The agent can be instrumented either via automatic attachment to the running JVM process, programmatic setup using the
apm-agent-attach dependency or manual setup using the
The agent attachment step is included in your main application code, as illustrated in the below sample Spring Boot application. Ensure you have the
apm-agent-attach dependency installed in your Java application.
Option 2: OpenTelemetry
Using the OpenTelemetry Java client instead of the APM agent gives you the advantage of portable tracing across many Observability providers. To use this option you need to provide a similar configuration.
The below example shows how to integrate the same APM settings using the automatic OpenTelemetry instrumentation approach for Java by using the
-javaagent flag and the OpenTelemetry Java agent. Just like with the APM option, you still need to include the backend server process with the
distributedTracingOrigins option in the UI RUM configuration to prevent blockage by CORS. Ensure you pick either the APM or OpenTelemetry approach for instrumenting your backend services to prevent conflict.
Once the traces are set up and coming through from your services via either Elastic APM or OpenTelemetry, it’s possible to leverage the data to gain insights into how your users are actually using your application, as well as any errors that they encounter along the way.
Diagnosing Common User Experience Pitfalls
Now that I have been using both RUM and APM within my own prototype application, I see how it could have been used to gather ongoing metrics reviewing the user experience my users add to supplement the qualitative feedback we would receive in sprint reviews, or in unexpected performance complaints just like in my earlier story.
RUM dashboards, such as the Elastic User Experience dashboard shown below, will showcase a series of key measures based on user interactions with your monitored application. These metrics can be evaluated over time to answer not only the simple question of are users using it at all, but also if they are having a positive experience.
There are four elements that could help me with assessing the challenges in not only my prior story but other instances in my career within banking technology, denoted by the circled numbers:
- Google Core Web Vitals measures to determine whether the overall user experience is within the recommended range. In addition to the three main metrics discussed in the prior section, other related measures can be used to scrutinise the user experience and responsiveness of our application.
- Total page views over time to answer the simple question of whether users are actually using these features. Being able to see the number of accesses of distinct URLs over time, and related breakdowns by information such as browser or operating system, can help identify patterns of use. Thinking with my former finance hat, advanced knowledge that particular screens are used at key reporting points such as quarter-end or year end, or even during bank holidays, can help you anticipate the need to scale at key peaks, as well as adjust the frequency of metric reporting to consider this usage.
- Average page load duration by region is great for identifying where your application is being used. Although many internal banking applications may need to account for regional nuances such as calendars and regulatory and legal requirements, for software developers we want to identify any areas of the world where users are experiencing poor load times. In those markets, we can then make strategic decisions on hosting or scaling to address if that slowness is leading to operational challenges or user dissatisfaction. However, this information does need to be used carefully when considering anonymity. In my prior banking life, I knew the individuals performing processes on regional desks, and it could be the case that I could tie their usage back. Such information must be used to monitor the application rather than the user.
- Browser and OS visitor breakdown. For me, working on internal applications, the operating system was less of a concern given all my users were on Windows PCs. However, browser choice has given me a few headaches in my time as a frontend engineer, particularly our old favourite Internet Explorer. I have encountered situations of errors and slowness being directly attributable to our lack of support for Internet Explorer. Gathering this information by hand can be challenging, so monitoring allows for that traffic to be focused on, and errors traced through with the corresponding user agent information, as highlighted in the below error screenshot.
Relating my newfound knowledge of Real User Monitoring to my prior working life as a technologist in banking has been a fulfilling experience. We have all had that experience looking over aging application code and designs written by others and sighing with derision at the choices made. Without a time machine or good documentation, it is difficult to understand the reason those choices were made. But as engineers, or indeed humans, we should know that people were doing their best at the time with the information they had, or the tools at their disposal. For myself and former colleagues, qualitative feedback from users and attentive listening to their challenges using our software was what we could do at the time. Now as a developer advocate, I would advocate for a mixture of metrics gathered via monitoring and qualitative feedback from users to identify issues and improve problematic features.
It is easy to become frustrated as teams when we find that feature is not being used, and that gathering information via monitoring may expose unpleasant information on the applications we build that we would prefer to stay buried. It is my view that not knowing at all is far more disappointing. Use these tools to eliminate that disappointment. But use them wisely, and remember that users may generate the results, but it is the application itself that is under scrutiny.