Skip to main content

Collect User Feedback

Collecting and logging user feedback is essential for understanding the real-world quality of your GenAI application. User feedback provides ground truth about your application's performance and helps identify areas for improvement.

Why Collect User Feedback?

User feedback provides critical insights that automated metrics cannot capture:

  1. Real-world quality signals - Understand how actual users perceive your application's outputs
  2. Continuous improvement - Identify patterns in user satisfaction to guide development priorities
  3. Training data creation - Use feedback to build high-quality evaluation datasets
  4. Quality monitoring - Track satisfaction trends over time and across different user segments
  5. Model optimization - Leverage feedback data to improve your underlying models

Types of Feedback

Different feedback mechanisms serve different purposes in understanding user satisfaction:

Feedback TypeDescriptionCommon Use Cases
Binary feedbackSimple thumbs up/down or correct/incorrectQuick user satisfaction signals
Numeric scoresRatings on a scale (e.g., 1-5 stars)Detailed quality assessment
Categorical feedbackMultiple choice optionsClassifying issues or response types
Text feedbackFree-form commentsDetailed user explanations

Feedback Collection Patterns

Linking Feedback to Traces

The key to effective feedback collection is connecting user feedback back to the specific interaction that generated it. This enables you to:

  • Understand which types of requests lead to positive or negative feedback
  • Analyze the relationship between application performance and user satisfaction
  • Build evaluation datasets from real user interactions

Request Tracking Approach

Use unique request identifiers to link feedback to specific traces:

  1. Generate unique IDs for each user interaction
  2. Associate IDs with traces using tags or metadata
  3. Collect feedback referencing the same ID
  4. Store feedback alongside trace data for analysis

Implementation Considerations

When implementing feedback collection:

Frontend Integration:

  • Make feedback collection frictionless (simple thumbs up/down)
  • Provide optional detailed feedback for users who want to elaborate
  • Consider contextual timing for feedback requests

Backend Storage:

  • Link feedback to trace identifiers
  • Store feedback metadata (user ID, timestamp, context)
  • Consider privacy requirements and data anonymization

Analysis Integration:

  • Query feedback alongside trace performance data
  • Track feedback trends over time
  • Identify correlations between trace characteristics and user satisfaction

Using Trace Tags for Feedback

While dedicated feedback APIs are in development, you can implement feedback collection using MLflow's trace tagging system:

Tagging Strategy

Use consistent tag naming conventions to organize feedback data:

  • feedback_received: true/false - Whether feedback was provided
  • feedback_positive: true/false - Binary satisfaction indicator
  • feedback_rating: 1-5 - Numeric satisfaction score
  • feedback_category: helpful/unhelpful/incorrect - Categorical feedback
  • feedback_user: user_123 - User identifier (consider anonymization)

Feedback Analysis

Query traces with feedback tags to analyze patterns:

import mlflow

# Find traces with positive feedback
positive_feedback_traces = mlflow.search_traces(
experiment_ids=["1"],
filter_string="tags.feedback_positive = 'true'",
max_results=100,
)

# Find traces with negative feedback
negative_feedback_traces = mlflow.search_traces(
experiment_ids=["1"],
filter_string="tags.feedback_positive = 'false'",
max_results=100,
)

# Compare characteristics
print(f"Positive feedback traces: {len(positive_feedback_traces)}")
print(f"Negative feedback traces: {len(negative_feedback_traces)}")

Feedback Collection Best Practices

1. Minimize User Friction

  • Simple interfaces: Start with binary thumbs up/down
  • Progressive detail: Allow optional detailed feedback
  • Contextual timing: Ask for feedback at natural stopping points

2. Preserve User Privacy

  • Anonymize identifiers: Use hashed or pseudonymous user IDs
  • Limit data collection: Only collect necessary feedback information
  • Data retention: Implement appropriate retention policies

3. Design for Analysis

  • Consistent tagging: Use standardized tag names across your application
  • Structured data: Store feedback in queryable formats
  • Temporal tracking: Include timestamps for trend analysis

4. Handle Edge Cases

  • Rate limiting: Prevent feedback spam or abuse
  • Validation: Ensure feedback data quality
  • Error handling: Gracefully handle feedback collection failures

Analyzing Feedback Data

Performance Correlation

Analyze the relationship between trace performance and user satisfaction:

# Compare execution times for positive vs negative feedback
positive_times = [trace.info.execution_time_ms for trace in positive_feedback_traces]
negative_times = [trace.info.execution_time_ms for trace in negative_feedback_traces]

if positive_times and negative_times:
avg_positive_time = sum(positive_times) / len(positive_times)
avg_negative_time = sum(negative_times) / len(negative_times)

print(f"Average time for positive feedback: {avg_positive_time:.2f}ms")
print(f"Average time for negative feedback: {avg_negative_time:.2f}ms")

Track feedback trends over time to monitor application quality:

from datetime import datetime, timedelta

# Get feedback from the last week
week_ago = datetime.now() - timedelta(days=7)
timestamp_ms = int(week_ago.timestamp() * 1000)

recent_feedback = mlflow.search_traces(
experiment_ids=["1"],
filter_string=f"timestamp_ms > {timestamp_ms} AND tags.feedback_received = 'true'",
max_results=500,
)

# Calculate satisfaction rate
positive_count = len(
[t for t in recent_feedback if t.data.tags.get("feedback_positive") == "true"]
)
satisfaction_rate = positive_count / len(recent_feedback) if recent_feedback else 0

print(f"Weekly satisfaction rate: {satisfaction_rate:.2%}")

Building Evaluation Datasets

Use feedback data to create evaluation datasets:

# Extract high-quality interactions for evaluation datasets
high_quality_traces = mlflow.search_traces(
experiment_ids=["1"],
filter_string="tags.feedback_positive = 'true' AND tags.feedback_rating >= '4'",
max_results=200,
)

print(f"Found {len(high_quality_traces)} high-quality interactions")
print("These can be used as positive examples in evaluation datasets")

Integration Examples

Web Application Feedback

Implement feedback collection in web applications by:

  1. Generating request IDs when processing user requests
  2. Returning IDs in API responses for frontend reference
  3. Collecting feedback via separate endpoints that reference the ID
  4. Tagging traces with received feedback data

Chat Interface Feedback

For conversational interfaces:

  1. Track conversation context with session identifiers
  2. Enable turn-level feedback for individual responses
  3. Support conversation-level feedback for overall satisfaction
  4. Link feedback to specific turns or entire conversations

API Integration Feedback

For API-based applications:

  1. Include feedback endpoints in your API design
  2. Support batch feedback for multiple interactions
  3. Provide feedback schemas for consistent data collection
  4. Document feedback patterns for API consumers

Production Considerations

Monitoring Feedback Health

Track key metrics to ensure effective feedback collection:

  • Feedback collection rate: Percentage of interactions that receive feedback
  • Response time impact: Ensure feedback collection doesn't slow down responses
  • Data quality: Monitor for spam or invalid feedback
  • User engagement: Track which feedback mechanisms are most effective

Scalability

Design feedback systems that scale with your application:

  • Asynchronous processing: Handle feedback collection without blocking responses
  • Batch operations: Process feedback in batches for efficiency
  • Storage optimization: Use appropriate storage systems for feedback volume
  • Query performance: Ensure feedback queries remain fast as data grows

Next Steps

tip

Start simple with binary feedback (thumbs up/down) and expand to more detailed feedback mechanisms as you understand your users' preferences and your application's needs.

Summary

Effective user feedback collection enables continuous improvement of GenAI applications:

  • Strategic value: Feedback provides ground truth for application quality
  • Implementation patterns: Use trace tags and consistent identifiers to link feedback to interactions
  • Analysis opportunities: Correlate feedback with performance metrics and trace characteristics
  • Production readiness: Design for scale, privacy, and operational monitoring

User feedback, combined with comprehensive tracing, provides the observability foundation needed to build and maintain high-quality GenAI applications that truly serve user needs.