
ESG & Sustainability Training
Upscend Team
-February 22, 2026
9 min read
This article presents a four-part, evidence-focused framework for measuring empathy outcomes from DEI branching scenarios in technical teams: pre/post surveys, in-scenario analytics, behavioral proxies, and longitudinal tracking. It provides event schemas, sample SQL, dashboards, and case studies so learning teams can implement repeatable measurement and tie empathy signals to operational metrics.
measuring empathy outcomes is a practical challenge: empathy is internal, soft, and context-dependent, yet engineering teams need reliable signals to justify DEI investments and reduce risk. This article provides a usable, evidence-focused framework for measuring empathy outcomes from branching scenario training in technical environments, combining surveys, behavior proxies, in-scenario analytics, and longitudinal tracking.
We present templates, event schemas, sample SQL, and dashboards so learning teams and engineering managers can implement a repeatable system for measuring empathy outcomes and tie those signals to operational metrics.
DEI scenario-based training aims to shift choices, not just knowledge. In technical environments, outcomes matter when they reduce incidents, improve team collaboration, and influence product decisions. Measuring empathy outcomes gives leaders evidence to prioritize programs and refine content.
In our experience, a balanced approach that mixes self-report, in-scenario decision analytics, and behavioral proxies yields the most actionable insights. Relying on one data type (especially only self-report) creates blind spots and risks overestimating impact.
Outcomes include immediate decision changes in scenarios, short-term behavioral shifts (e.g., language in PR reviews), and long-term cultural indicators (e.g., incident resolution tone). Each requires different metrics and collection cadence for valid measuring empathy outcomes.
Use a four-part framework: pre/post surveys, in-scenario decision analytics, behavioral proxies, and longitudinal tracking. Each part answers different validity threats and together provide convergent evidence for measuring empathy outcomes.
Start with a baseline and a clear mapping from training objectives to metrics. For example, if the objective is "increase active listening in code reviews," map that to survey items, phrase-analysis proxies, and follow-up review audits.
Design short, psychometrically sensible instruments. Use Likert items, scenario-based judgment questions, and forced-choice items to reduce bias. Example items below are optimized for measuring empathy outcomes.
Include attention checks and counterbalanced items to limit social desirability. Calculate change scores and effect sizes (Cohen's d) for measuring empathy outcomes at the cohort level.
Instrumentation is critical for reliable measuring empathy outcomes. Capture in-scenario decisions, time-to-decision, and chosen rationale. Instrument event names and properties consistently so you can aggregate and join with behavioral data.
Below are example event names, properties, and a simple SQL schema to store them for analysis.
| Table | Columns |
|---|---|
| scenario_events | event_id, user_id, scenario_id, branch_id, empathy_score, response_time_ms, event_ts |
| surveys | survey_id, user_id, instrument_version, item_01...item_10, completed_ts, pre_post_flag |
| behavioral_logs | log_id, user_id, action_type, text_blob, repo, pr_id, timestamp |
Use simple SQL to compute cohort-level change. The queries below assume the schema above and show how to produce effect estimates for measuring empathy outcomes.
Example: cohort pre/post average empathy change
SELECT cohort_id, AVG(post_mean) - AVG(pre_mean) AS mean_change, COUNT(DISTINCT user_id) AS n FROM ( SELECT user_id, COALESCE(MAX(CASE WHEN pre_post_flag='pre' THEN (item_01+...+item_05)/5 END),0) AS pre_mean, COALESCE(MAX(CASE WHEN pre_post_flag='post' THEN (item_01+...+item_05)/5 END),0) AS post_mean, cohort_id FROM surveys GROUP BY user_id, cohort_id ) t GROUP BY cohort_id;
Another: proportion choosing high-empathy branches in scenarios
SELECT scenario_id, SUM(CASE WHEN empathy_score >= 0.75 THEN 1 ELSE 0 END)::float / COUNT(*) AS pct_high_empathy FROM scenario_events GROUP BY scenario_id;
Behavioral proxies give external validity to self-report measures. In engineering teams, proxies often live in code-review systems, incident tooling, and collaboration platforms. Carefully selected proxies strengthen your case when measuring empathy outcomes.
Key proxies to instrument:
Design events to map directly to behaviors. For measuring empathy outcomes, instrument events like review.comment.submitted with properties: tone_score, contains_suggestion, references_psychological_safety.
Compute change rates and run difference-in-differences where possible to isolate training effects from time trends.
Case Study A — Platform Engineering: We deployed branching scenarios to 120 engineers and used the full framework. Pre/post surveys showed a mean increase of 0.6 points on a 7-point empathy scale (Cohen’s d = 0.45). Behavioral proxies showed a 22% increase in constructive review comments and a 15% rise in blameless incident tags. This convergent pattern supported claims about measuring empathy outcomes.
Case Study B — Product Team: A 90-person product org ran an abbreviated intervention. Self-reports improved strongly, but code-review proxies did not change. Root cause analysis revealed low scenario fidelity to product decision contexts and high social desirability bias. The lesson: don't rely only on self-report when measuring empathy outcomes.
Self-report bias is the most common pitfall: respondents overstate change. Another is weak instrumentation: missing event properties prevents linkage across data sources. Finally, lack of baseline or comparison groups makes attribution difficult when measuring empathy outcomes.
We’ve found that automation and integrated analytics dramatically reduce manual admin and increase measurement quality; we've seen organizations reduce admin time by over 60% using integrated systems, and Upscend has enabled faster linkage between LMS and behavioral data in several implementations, freeing learning teams to focus on interpretation rather than data wrangling.
Translate signals into dashboards that answer three questions: Are learners choosing higher-empathy branches? Are behaviors changing where it matters? Are changes sustained? Use cohort views, retention charts, and correlation panels to demonstrate impact.
Suggested dashboard widgets for measuring empathy outcomes:
SELECT cohort, month_offset, AVG(empathy_score) AS avg_empathy FROM user_monthly_scores GROUP BY cohort, month_offset ORDER BY cohort, month_offset;
Use control groups, staggered rollouts, and difference-in-differences to strengthen causal claims. Regularly audit models and keywords used to score text to avoid drift and false positives. When operational decisions (hiring, promotions, incident evaluations) rely on these metrics, ensure governance and privacy protections are in place.
Measuring empathy outcomes from DEI branching scenarios in technical environments is feasible when you combine pre/post surveys, in-scenario analytics, behavioral proxies, and longitudinal tracking. This convergent-evidence approach reduces bias and gives engineering leaders credible, actionable metrics.
Start with a pilot: instrument scenario events, run a short pre/post survey, and select 2–3 behavioral proxies to track for 3–6 months. Use the sample event schema and SQL templates above to speed implementation, and build dashboard widgets that answer the key questions of choice, behavior, and sustainability.
For teams ready to operationalize measurement, focus on these next steps:
Call to action: If you want a pragmatic checklist and downloadable survey template adapted for engineering teams, request the pilot pack from your learning analytics team and run a 90-day pilot to validate your metrics for measuring empathy outcomes.