Abstract: |
User-generated reviews serve as valuable resources for analyzing the user sentiments across various product aspects, enhancing personalized product recommendations. However, due to a scarcity of user feedback, many e-commerce products received insufficient number of reviews, resulting in less effective recommendations. To address this limitation, many works adopt a data augmentation approach by generating synthetic reviews. This involves simulating textual preferences toward unseen products, providing additional information for user preference modeling. With the advancement of large language models (LLMs), generating human-like reviews has become more accessible through prompting strategies that leverage the LLM’s capabilities in reasoning and deep understanding in human natural language.
Generally, there are two main approaches to generate reviews with prompting techniques: zero-shot and few-shot prompting. Zero-shot prompting involve designing a suitable prompt for LLMs to generate a synthetic review for a target product, without providing any examples. In contrast, few-shot prompting includes historical review samples from users or products in a prompt, enabling LLMs to better understand keywords and patterns in reviews, yielding more reliable synthetic reviews. Despite these advancements, neither approach incorporate user personal preferences into LLMs, leading to generic synthetic reviews that are not tailored to each individual user’s unique taste. For instance, using few-shot prompting to generate reviews for the movie “Titanic” often yields common keywords related to this movie (such as “romance”, “tragic”, or “James Cameron”), but lacks the user’s unique and sentimental opinions. Utilizing those synthetic reviews may generate indiscriminable user presentations, leading to less-personalized product recommendations.
To address this, our objective is to propose a personalized review generation by incorporating user personal preferences to produce user-specific synthetic reviews using LLMs. We aim to fine-tine LLMs with a review-generation task, providing three types of input: review texts, their corresponding ratings, and user unique identifiers (IDs). For instance, to generate a synthetic review of the movie "Titanic," the review texts are sourced from the target user's historical reviews of other movies, preferably those similar to "Titanic," and/or historical reviews of "Titanic" from neighboring users who share similar interests. In addition to the review texts, their corresponding ratings are also incorporated as input. Including both review texts and ratings enhances LLM's reasoning capability by associating keywords and patterns in reviews with preference degrees toward a target product. Finally, to enable LLMs to generate personalized reviews, unique user IDs, specific to each individual user, are included as input. These IDs play a crucial role in capturing individual user sentimental opinions and preferences from the historical reviews and ratings in the fine-tuning process of LLMs. All types of input are mapped with corresponding embeddings and fine-tuned with the review-generation objective, using an actual user review on a target product as the desired output. By integrating user IDs and a fine-tuning strategy, we believe in LLM's ability to generate high-quality, personalized reviews—valuable for downstream tasks like data augmentation. |