Analyze customer reviews using Amazon Bedrock

4 weeks ago

Customer reviews can reveal customer experiences with a product and serve as an invaluable source of information to the product teams. By continually monitoring these reviews over time, businesses can recognize changes in customer perceptions and uncover areas of improvement. Analyzing these reviews to extract actionable insights enables data-driven decisions that can enhance customer experience and reduce churn. However, with the growing number of reviews across multiple channels, quickly synthesizing the essence of these reviews presents a major challenge. The process is often resource intensive, requiring a significant amount of time and human effort while still being prone to human errors and delays in identifying key insights, recurring themes, and improvement opportunities. As a result, customer pain points can go unnoticed and problems can escalate. The latest advances in generative artificial intelligence (AI) allow for new automated approaches to effectively analyze large volumes of customer feedback and distill the key themes and highlights.

This post explores an innovative application of large language models (LLMs) to automate the process of customer review analysis. LLMs are a type of foundation model (FM) that have been pre-trained on vast amounts of text data. This post discusses how LLMs can be accessed through Amazon Bedrock to build a generative AI solution that automatically summarizes key information, recognizes the customer sentiment, and generates actionable insights from customer reviews. This method shows significant promise in saving human analysts time while producing high-quality results. We examine the approach in detail, provide examples, highlight key benefits and limitations, and discuss future opportunities for more advanced product review summarization through generative AI.

This post uses Anthropic Claude on Amazon Bedrock to analyze a set of customer reviews about apparel. Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Table of Contents

Toggle

Potential outcomes

This post describes how you can achieve the following outcomes using a generative AI-powered analysis of customer reviews:

Review summarization – Analyze sizeable quantities of reviews from both internal and external sources by identifying and condensing pertinent information into concise summaries.
Sentiment analysis – Assess whether the reviews have a positive, negative, or neutral tone, and assign confidence scores for the given sentiment.
Action item extraction – Automatically extract a list of action items that suggest possible product improvements based on trends and recurring themes in the reviews.
Visualization – Generate business intelligence (BI) dashboards that display key metrics and graphs.

Business value

Businesses can see the following benefits by using generative AI to analyze their reviews:

Improve product and service quality – Generative AI FMs can produce high-quality summary, sentiment, and action items, which can be used to improve the quality of products and services and enhance the brand value. These metrics can be tracked over time, allowing for continuous monitoring and performance to maintain or improve the customer experience.
Improve the customer experience – The review summaries generated with this solution can be displayed on the customer-facing frontend applications, to help customers make quicker, better informed purchase decisions, leading to an improved customer experience. Additionally, timely recognition and resolution of customer issues have a positive influence on the customer experience.
Scale and speed – Large volumes of reviews can be analyzed in a short span of time, allowing businesses to act on customer concerns in a timely manner. Regular application of this solution can augment internal workforce efficiency, resulting in cost savings.
Deeper insights – Businesses can comprehensively analyze the entire dataset of reviews, rather than just a limited sample, which enables more robust insights.
Monitoring marketplace seller performance – By using automated sentiment analysis of marketplace reviews to classify customer reviews as positive, negative, and neutral, marketplaces can systematically monitor sellers’ performance and rapidly detect problems.

Solution overview

Before we dive into the technical implementation details, let’s look at an example of a customer review analysis done on a set of reviews for an apparel product. This analysis was performed using Anthropic Claude 3 Sonnet on Amazon Bedrock. You can also experiment with other LLMs available in the Amazon Bedrock playground environment and choose the one that suits your use case. Make sure you have access to the model being used for inference.

We provide a list of reviews as context and create a prompt to generate an output with a concise summary, overall sentiment, confidence score of the sentiment, and action items from the input reviews. Our example prompt requests the FM to generate the response in JSON format. You can apply robust prompt engineering techniques to instruct the model to perform your specified actions to minimize any bias or hallucinations in the response, and have the output in the specific format required.

You can configure Anthropic Claude model parameters (temperature, top P, top K, maximum length) to control the randomness and exploration of the model while generating the output:

Temperature – The amount of randomness injected into the response. Defaults to 1. Ranges from 0-1.
Top P – Use nucleus sampling. In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability and cuts it off after it reaches a particular probability specified by top P. You should alter either temperature or top P, but not both.
Top K – Only sample from the top K options for each subsequent token. Use top K to remove long tail low probability responses.
Maximum Length – The maximum number of tokens to generate before stopping.

The following screenshot shows an example request prompt taken from the Amazon Bedrock playground on the AWS Management Console.

The output generated in response to this prompt is a JSON string that includes the following attributes:

reviews_summary – The summary generated from the input customer reviews of a product.
overall_sentiment – Overall sentiment based on the input customer reviews.
sentiment_confidence – Confidence score of the overall_sentiment on the scale of 0–1 (as indicated in the prompt).
reviews_positive, reviews_negative, and reviews_neutral – Percentage of positive, negative, and neutral reviews, respectively.
action_items – List of action items identified from the input reviews.

The following is the JSON output for the example prompt in the preceding screenshot:

{
  “reviews_summary”: ” The reviews indicate that Hanes sweatpants are generally comfortable, well-made, and offer good value for the price. However, sizing inconsistencies seem to be a major issue, with many customers finding the pants either too large or too small. The lack of pockets and fading issues were also mentioned. Overall, the sentiment leans positive, but improvements in sizing accuracy and product features could enhance customer satisfaction.”,
  “overall_sentiment”: “positive”,
  “sentiment_confidence”: 0.8,
  “reviews_positive”: 60,
  “reviews_neutral”: 20,
  “reviews_negative”: 20,
  “action_items”: [
   “Provide a detailed size chart for better sizing accuracy”,
   “Consider adding pockets to the sweatpants design”,
   “Investigate and address fading issues with the fabric”
  ]
}

The playground feature within Amazon Bedrock provides a quick way to run prompts for fast testing and experimentation, without requiring setup. However, when building a scalable review analysis solution, businesses can achieve the most value by automating the review analysis workflow. The following reference architecture illustrates what an automated review analysis solution could look like.

The architecture carries out the following steps:

Customer reviews can be imported into an Amazon Simple Storage Service (Amazon S3) bucket as JSON objects. This bucket will have event notifications enabled to invoke an AWS Lambda function to process the objects created or updated.
The Lambda function runs the business logic to process the customer reviews within the input JSON file. These reviews are then included as context in the predefined prompt template used as input to the FM. The prompt has detailed instructions to be followed by the FM to generate a JSON output with summary, sentiment, and action items from the reviews. The function then invokes an FM of choice on Amazon Bedrock.
Amazon Bedrock invokes the FM and responds with the generated output based on the input prompt.
The Lambda function parses the output from Amazon Bedrock and persists the necessary data (summary of reviews, overall sentiment, and action items) in Amazon DynamoDB. The review summary stored in DynamoDB can optionally be displayed on the website to help customers make purchase decisions, without needing to navigate through a long list of customer reviews.
Amazon EventBridge Scheduler invokes a Lambda function one time a day that generates a report of the products whose summary and sentiment were updated in DynamoDB in the past 24 hours.
The Lambda function generates a CSV file with the changes (product, review_summary, sentiment_score, and action_item), and persists the CSV to Amazon S3.
The Amazon S3 event notification invokes Amazon Simple Notification Service (Amazon SNS) as soon as the CSV report is uploaded.
Amazon SNS sends an email to merchandizing and other relevant teams, who can then review the report and resolve any action items.
Optionally, data stored on DynamoDB can be used to build business dashboards to monitor the customer sentiment about products or services over time. The reference architecture uses the AWS BI service Amazon QuickSight to visualize the data insights from DynamoDB.

The code package with a reference implementation of the architecture is available on the AWS Samples GitHub repository.

Key considerations

Some important considerations when implementing this solution:

Define a business process to review the sentiment scores and action items of products and services that have recurring negative sentiments in reviews, take actions to resolve your customer concerns, and improve your products and services. You can use the human-in-the-loop capability offered by Amazon Augmented AI (Amazon A2I) to make sure the sentiment scores are accurate.
Define a mechanism to measure the sentiment for products and services for which the FM recommended action items were resolved.
Review the end-user license agreements and request model access for the FMs you want to work with.
Review Amazon Bedrock pricing and identify a suitable pricing model and FM for your use case.
The following suggestions should be considered when choosing an FM:

Experiment with different text generation models supported by Amazon Bedrock.
Use the Amazon Bedrock model evaluation feature to evaluate the supported models. You can use Amazon SageMaker Ground Truth to label the sample dataset that you want to use for model evaluation on Amazon Bedrock.
Review the model pricing by provider and Amazon Bedrock service quotas.

Identify the insights you want to derive from the customer reviews and refine the model prompts and parameters to suit your needs.
Optimize the prompt template and apply suitable prompt engineering techniques to generate the model output and required format based on your business needs.
Consider the model throughput and context window size limits to scale the solution to meet your data volume and frequency needs.
Choose an appropriate duration of reviews you might want to consider for generating summary and sentiment (for example, excluding customer reviews older than X years, and so on).
Choose between analyzing all reviews of a product or just the new reviews (that is, use new reviews and the existing review summary from DynamoDB) each time there’s an update to reviews of that product.
Analyze the customer reviews of a product or service only when there are new reviews added for the day:

Import the customer review JSON files to an S3 bucket only when there are new reviews for the product.
Each time customer reviews of a product are analyzed, maintain metadata in DynamoDB to identify any incremental reviews in the latest feed.

Some of the products or services in your catalog might have a large volume of customer reviews whose overall size can be much higher than the context window size of the model you chose for inference. Apply alternate techniques to analyze such reviews:

For example, split the customer reviews of the product or service into multiple groups, analyze each group separately in the first iteration, then use the results of the first iteration as input context and generate the final output (that is, the final output review summary will be a summary of all review summaries from the first iteration). It might need multiple iterations depending on the volume of reviews.

Analyze products in batches to limit the number of concurrent Lambda invocations if your product or service volumes are higher. You might need an event scheduler to invoke the Lambda functions instead of the current Amazon S3 event notifications, which invoke one Lambda function per product JSON. Review Lambda quotas and function timeout to create batches. You can also consider alternate services such as AWS Step Functions or AWS Batch.
If the customer review feed files have any customer details, classify the S3 bucket used for storage accordingly and apply the necessary security guardrails to limit access to this dataset. Also, make sure you don’t include any customer information in the prompt to the FM. Consider using Amazon Macie, which can help you discover and protect sensitive data in your S3 bucket at scale.

Conclusion

Using generative AI FMs opens new possibilities for businesses to derive value from customer reviews. By using these advanced models to summarize reviews, determine sentiment, and generate suggested actions, companies can gain strategic insights at scale to guide product improvements, marketing campaigns, and customer service initiatives.

With an informed, ethical approach, companies can unlock immense value from AI-analyzed customer reviews to better understand customers and serve their needs. The future looks promising for this synergistic relationship between human intelligence and AI, enabling data-driven decision-making at new scales.

Resources

For further reading, refer to the following:

How Technology Leaders Can Prepare for Generative AI
The positive impact Generative AI could have for Retail
Generative AI: The Catalyst for Revolutionizing Physical Retail
Building with Amazon Bedrock Workshop
Automating product description generation with Amazon Bedrock

About the Authors

Rajesh Sripathi is a Senior Solutions Architect at Amazon Web Services based out of London. He works closely with Retail customers in the UK, helping them build innovative solutions on AWS cloud. Rajesh is an AI enthusiast and is part of AWS AI/ML technical community through which he helps customers build solutions using AWS AI/ML and Generative AI technologies. Outside of work, he is passionate about travel and driving.

Huma Zafar is an Associate Solutions Architect in the AWS UK FSI team. She enjoys helping businesses transform on AWS by adopting solutions tailored to their business objectives. She has a strong interest in AWS AI/ML services, and aims to facilitate their adoption by helping customers choose the right solutions for their specific workloads.

Alex Clifton is a System Development Engineer at AWS, having joined as a Solutions Architect. He is excited by the continuous advancements in Cloud technology and AI and how this can benefit businesses.