Consumer choices in fashion are influenced by a complex interplay of psychological, economic, and social factors, including style preferences and relational dependencies among products. While convolutional neural networks (CNNs) like ResNet50 effectively extract visual features from fashion images, they often overlook the semantic relationships that drive consumer behavior. This study proposes an integrated ResNet50-GCN model that combines spatial feature extraction with graph-based relational modeling to better analyze consumer preferences. Evaluated on the DeepFashion dataset, the approach yields a 10% improvement in classification accuracy, offering strategic insights for personalized marketing, trend prediction, and sustainable fashion practices. These results underscore the value of AI in enhancing understanding of consumer psychology and economic decision-making in fashion retail.
Consumer behavior in the fashion industry is shaped by psychological factors such as perceived authenticity, emotional attachments to styles, and cognitive biases in purchasing decisions, alongside economic models that weigh value against price [20, 21]. Traditional economic models, like the utility maximization framework, explain how consumers allocate budgets based on perceived benefits, but they often fail to capture the relational dynamics, such as trend influences and social comparisons, that characterize fashion choices [22]. Advances in AI, particularly convolutional neural networks (CNNs) like ResNet50, have enabled detailed analysis of visual data from consumer-generated content, yet these tools underexploit the interconnected nature of preferences [2].
This study introduces ResNet50-GCN, a hybrid AI architecture that integrates ResNet50 for visual feature extraction with graph convolutional networks (GCNs) to model relational dependencies in consumer fashion data. Applied to the DeepFashion dataset [1], which reflects real-world consumer choices, the model constructs dynamic graphs based on similarity in preferences, outperforming standalone CNNs. The primary contributions are:
The paper is structured as follows: Section 2 reviews related work, Section 3 describes the dataset, Section 4 details the proposed method, Section 5 presents experimental results, Section 6 discusses implications, and Section 7 concludes with future directions.
AI and CNNs in Consumer Fashion Analysis
Convolutional neural networks (CNNs) are pivotal in processing visual data to infer consumer preferences, with ResNet50 serving as a benchmark for extracting features from fashion images [2]. Likewise, VGG16, known for its uniform architecture of convolutional layers, has been applied to classify fashion items, providing detailed feature maps that help in discerning consumer preferences based on visual attributes such as textures and patterns that influence purchasing decisions. In consumer research, CNNs analyze purchasing patterns and style affinities, as seen in applications for personalized recommendations on platforms like Zara [23]. For instance, Liu et al. [1] used CNNs on DeepFashion to predict categories, providing insights into how visual cues influence consumer choices. Emerging models like Vision Transformers (ViT) [7] incorporate attention mechanisms to capture global preferences, but they require vast data and overlook economic constraints in consumer decision-making [24].
Graph convolutional networks (GCNs) [6] excel in modeling non-Euclidean relationships, such as social influences on fashion trends. In consumer psychology, GCNs simulate networks of preferences, akin to how peers and influencers shape choices [25]. Variants like Graph Attention Networks (GAT) [9] have been applied to recommendation systems, enhancing understanding of relational dependencies. However, their use in fashion consumer analysis is limited by challenges in graph construction from image data, particularly in integrating psychological factors like perceived authenticity [26].
Combining CNNs and GCNs bridges visual analysis with relational modeling, offering strategic value in fashion marketing. For example, Parisot et al. [12] used CNN-GCN hybrids for relational inference in medical contexts, inspiring applications in consumer segmentation. In fashion, Wan et al. [13] adapted this for hyperspectral data, improving classification for market trend prediction. FashionGraph [14] models product relationships for recommendations, aligning with economic models of consumer choice where similarity drives purchases [27]. Li et al. [15] employed cosine similarity in GCNs for scene analysis, outperforming traditional methods and informing personalization strategies that boost brand loyalty [28]. This study builds on these by proposing ResNet50-GCN to analyze consumer preferences, with implications for economic utility models and marketing personalization.
Fashion choices are driven by psychological factors like emotional gratification from trends and cognitive evaluations of value [29]. Economic models, such as the economic consumer behavior model, emphasize rational choices based on price-value trade-offs, while incorporating sustainability preferences in circular economy frameworks [30]. Datasets like DeepFashion [1] enable AI to uncover these dynamics, but models like FashionBERT [17] integrate text without fully addressing image-based relational psychology. Our approach uses ResNet50-GCN to model these interdependencies, supporting marketing strategies that enhance consumer engagement and reduce returns through better personalization [31].
This study utilizes the DeepFashion dataset [1], a rich repository of over 800,000 fashion images reflecting consumer preferences from e-commerce and social media. The Category and Attribute Prediction Benchmark subset, captures diverse consumer choices grouped into tops (Type 1, e.g., Blazer), bottoms/skirts (Type 2, e.g., Jeans), and full-body garments (Type 3, e.g., Dress). Annotations include attributes like color and style, enabling analysis of psychological and economic drivers in purchases.
Split into training (209,222 images), validation (40,000), and test (40,000) sets, DeepFashion mirrors real-world variability in consumer behavior, such as pose and lighting influences on perceived value. Compared to simpler datasets like Fashion-MNIST [18], it offers deeper insights into economic choices and psychological affinities.
Analysis of the training set reveals imbalances in category popularity, with dominant styles (e.g., labels 1, 15) exceeding 10,000 images, reflecting market-driven consumer biases [32]. Underrepresented categories (e.g., labels 10, 30) highlight niche preferences, potentially impacting economic models by favoring mass-market items (Figure 1). Our method addresses this through relational graphs, improving insights into minority preferences for targeted marketing.
Figure 1.Distribution of image counts across categories in the DeepFashion training set, highlighting imbalances in consumer preferences
Using pre-trained ResNet50, we computed cosine similarity among category features, yielding a matrix (Figure 2) with values from 0.60 to 0.90. High similarities (e.g., 0.81 between categories 21 and 22) indicate psychological clustering in choices, informing economic bundling strategies. A 0.75 threshold for graph edges ensures focus on strong relations, aiding marketing in predicting cross-category purchases [33].
Figure 2. Similarity matrix between categories in DeepFashion, with cosine similarity ranging from 0.60 to 0.90.
ResNet50-GCN is a hybrid model designed to analyze consumer fashion preferences by merging visual feature extraction with graph-based modeling of relational psychology and economic choices. On the technology side, this approach aims to elevate fashion image classification by integrating local feature learning with relational modeling, addressing the shortcomings of traditional CNNs in capturing semantic relationships among fashion items. On the managerial side, it reflects the real-life consumer journey where emotional connections and value judgments interact, providing a tool for marketers to forecast trends and adjust campaigns more effectively.
The ResNet50-GCN architecture fuses ResNet50’s spatial feature extraction with GCN’s relational processing. Its key components are mathematically defined as follows:
for where is the adjacency matrix and is the weight matrix for layer
Figure 3.ResNet50-GCN architecture. ResNet50 extracts features, GCN models relations, and a classifier predicts preferences.
A graph is constructed to represent relationships among image samples in each batch, where denotes nodes (images) and denotes edges. Each node corresponds to an image with label and feature vector . The graph is built using two criteria:
The total edge set is:
where:
with . The resulting graph, represented as an edge index tensor, serves as input to the GCN layers, enabling adaptive modeling of batch-specific relationships.
The training process is outlined as follows:
where is the predicted probability for class of sample , and is the batch size.
where and are first and second moment estimates.
The performance of ResNet50-GCN is benchmarked against traditional CNN models, including ResNet50, VGG16, and FashionNet [1]. ResNet50, pre-trained on ImageNet and fine-tuned on DeepFashion, provides a robust baseline. VGG16, a standard CNN architecture, offers a common reference for image classification. This selection enables a thorough comparison between conventional CNN methods and our relational approach.
Performance is assessed using Accuracy, Precision, Recall, and F1-score, derived from the confusion matrix:
Table 1. Overall Performance On The Deepfashion Test Set
Model |
Top-1 Accuracy |
Precision |
Recall |
F1-score |
ResNet50 |
65.90% |
42.57% |
35.28% |
36.87% |
VGG16 |
67.63% |
46.70% |
33.18% |
35.95% |
ResNet50-GCN |
88.96% |
57.15% |
46.54% |
48.41% |
Table 2. Performance on specific category types
Category type |
Model |
Precision |
Recall |
F1-score |
Type 1 |
ResNet50 |
75.18% |
100% |
85.83% |
(Tops) |
VGG16 |
76.40% |
100% |
86.95% |
|
ResNet50-GCN |
81.54% |
100% |
89.83% |
Type 2 |
ResNet50 |
79.87% |
100% |
88.81% |
(Bottoms and Skirts) |
VGG16 |
84.73% |
100% |
91.73% |
|
ResNet50-GCN |
98.26% |
100% |
99.12% |
Type 3 |
ResNet50 |
55.21% |
100% |
71.14% |
(Full-body Garments) |
VGG16 |
62.67% |
100% |
77.05% |
|
ResNet50-GCN |
73.65% |
100% |
84.82% |
Table 3. Top-K Accuracy on the test set
Model |
Top-1 |
Top-3 |
Top-5 |
FashionNet[1] |
- |
82.58% |
90.17% |
ResNet50 |
65.90% |
85.27% |
91.71% |
VGG16 |
67.63% |
86.93% |
92.88% |
ResNet50-GCN |
88.96% |
97.01% |
98.45% |
The integration of ResNet50 with Graph Convolutional Networks (GCNs) not only delivers over a 10% improvement in predictive accuracy compared to baseline CNNs but also offers actionable benefits for marketing and retail management in the fashion sector. By combining spatial feature extraction with relational modeling, the approach uncovers latent product connections—such as style similarities—that mirror consumer perception and comparison processes. These insights can directly inform personalized marketing strategies, enabling more relevant product recommendations, targeted promotional campaigns, and data-driven assortment planning. Furthermore, the ability to capture nuanced relational patterns allows retailers to anticipate shifts in consumer preferences, respond dynamically to emerging trends, and strengthen brand differentiation in competitive markets. While the method requires higher computational resources and depends on high-quality relational data, its strategic value lies in translating advanced AI capabilities into tangible business outcomes. Future adaptations for real-time deployment could further enhance customer engagement and operational efficiency in dynamic retail environments.
The demonstrated effectiveness of the ResNet50-GCN architecture has direct implications for marketing and retail management in the fashion industry. By capturing both visual attributes and relational style patterns, the model enables more precise segmentation and personalized recommendations, leading to improved customer engagement and conversion rates. Retailers can leverage these insights to optimize product assortment, design targeted promotional campaigns, and enhance inventory planning based on data-driven predictions of consumer preferences. Moreover, integrating such models into real-time systems can help brands respond dynamically to shifting consumer trends, strengthening competitiveness in fast-paced markets.