{"id":403,"date":"2026-01-07T17:15:20","date_gmt":"2026-01-07T10:15:20","guid":{"rendered":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/?p=403"},"modified":"2026-01-26T21:50:32","modified_gmt":"2026-01-26T14:50:32","slug":"building-a-vegetable-image-classifier-from-scratch-to-state-of-the-art","status":"publish","type":"post","link":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/?p=403","title":{"rendered":"Building a Vegetable Image Classifier: From Scratch to State-of-the-Art"},"content":{"rendered":"<h1>Abstract<\/h1>\n<p class=\"font-claude-response-body whitespace-normal break-words\">Deep learning has revolutionized computer vision, making it possible to build highly accurate image classification systems with relatively little effort. But how much difference does it really make to use pre-trained models versus building your own from scratch? In this project, I set out to answer that question by building a vegetable image classifier using three different approaches: a custom Convolutional Neural Network (CNN) built from the ground up, and two state-of-the-art architectures\u2014VGG19 and ResNet50.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">The goal was simple: classify images of broccoli, cabbage, and cauliflower with the highest possible accuracy, while understanding the trade-offs between building custom models and leveraging transfer learning.<\/p>\n<h2>The Dataset: Vegetable Image Classification<\/h2>\n<p class=\"font-claude-response-body whitespace-normal break-words\">I used the <strong>Vegetable Image Dataset<\/strong> by misrakahmed, available on Kaggle. This dataset contains 21,000 images spanning 15 different vegetable categories. For this project, I focused on just three classes: <strong>broccoli<\/strong>, <strong>cabbage<\/strong>, and <strong>cauliflower<\/strong>.<\/p>\n<h3 class=\"font-claude-response-subheading text-text-100 mt-1 -mb-1.5\">Dataset Breakdown<\/h3>\n<p class=\"font-claude-response-body whitespace-normal break-words\">The dataset comes pre-split into three segments:<\/p>\n<ul class=\"[&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc space-y-2.5 pl-7\">\n<li class=\"whitespace-normal break-words\"><strong>Training set<\/strong>: 1,000 images per class \u00d7 3 classes = 3,000 images<\/li>\n<li class=\"whitespace-normal break-words\"><strong>Validation set<\/strong>: 200 images per class \u00d7 3 classes = 600 images<\/li>\n<li class=\"whitespace-normal break-words\"><strong>Test set<\/strong>: 200 images per class \u00d7 3 classes = 600 images<\/li>\n<\/ul>\n<p class=\"font-claude-response-body whitespace-normal break-words\">One major advantage of this dataset is that all images are already standardized to <strong>224 \u00d7 224 pixels<\/strong>. This meant I could skip complex resizing operations and focus directly on building and training models.<\/p>\n<h3 class=\"font-claude-response-subheading text-text-100 mt-1 -mb-1.5\">Data Exploration &amp; Pre-Processing<\/h3>\n<p class=\"font-claude-response-body whitespace-normal break-words\">Before diving into model building, I performed some basic data exploration:<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Understanding the Structure<\/strong>: The dataset is organized into folders, where each folder name represents a vegetable class. This made it easy to iterate through the directory structure and automatically label images based on their parent folder.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Minimal Pre-Processing Required<\/strong>: Since images were already uniform in size, my pre-processing pipeline was straightforward:<\/p>\n<ol class=\"[&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-decimal space-y-2.5 pl-7\">\n<li class=\"whitespace-normal break-words\"><strong>Loading Images<\/strong>: I used the Pillow library to open image files and convert them into NumPy arrays<\/li>\n<li class=\"whitespace-normal break-words\"><strong>Normalization<\/strong>: Pixel values were scaled from [0, 255] to [0, 1] by dividing by 255.0<\/li>\n<li class=\"whitespace-normal break-words\"><strong>Label Encoding<\/strong>: I applied one-hot encoding to convert class labels into a format suitable for neural network training (e.g., broccoli = [1,0,0], cabbage = [0,1,0], cauliflower = [0,0,1])<\/li>\n<li class=\"whitespace-normal break-words\"><strong>Tensor Conversion<\/strong>: Arrays were converted to TensorFlow tensors for efficient computation<\/li>\n<\/ol>\n<p class=\"font-claude-response-body whitespace-normal break-words\">The folder-based structure made automation possible. I wrote a Python script that navigated through the train, validation, and test directories, filtered only the three vegetables I needed, and assembled the complete dataset ready for training.<\/p>\n<h2 class=\"font-claude-response-heading text-text-100 mt-1 -mb-0.5\">Building the Models<\/h2>\n<p class=\"font-claude-response-body whitespace-normal break-words\">With data preparation complete, I moved on to the exciting part: building and training models. I took three different approaches to see which would yield the best results with the least effort.<\/p>\n<h3 class=\"font-claude-response-subheading text-text-100 mt-1 -mb-1.5\">Approach 1: Vanilla CNN (Custom Architecture)<\/h3>\n<p class=\"font-claude-response-body whitespace-normal break-words\">The first challenge was to build a CNN from scratch\u2014no pre-trained weights, no transfer learning, just raw neural network design. This &#8220;Vanilla CNN&#8221; would serve as my baseline.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Why Start From Scratch?<\/strong><\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">Building a custom model helps you understand the fundamental building blocks of CNNs: convolution layers, pooling operations, activation functions, and how they work together to extract features and make predictions.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>The Design Process<\/strong><\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">I went through <strong>four iterations<\/strong> before arriving at a satisfactory architecture. Each iteration taught me something valuable:<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Iteration 1<\/strong>: I started simple with a basic convolution-pooling-dense pattern. This achieved <strong>93.3% accuracy<\/strong> but showed signs of overfitting after epoch 16. I used L2 regularization (1e-4) to combat this, and implemented early stopping to prevent the model from degrading further.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Iteration 2<\/strong>: I replaced L2 regularization with dropout (50%) on the dense layers. This improved accuracy slightly to <strong>93.6%<\/strong>, but overfitting was still present in the validation loss curve.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Iteration 3<\/strong>: I combined both techniques\u2014adding back L2 regularization alongside dropout\u2014and increased the dense layer neurons from 64 to 128. I also expanded training to 100 epochs with early stopping patience of 15. This configuration achieved <strong>95.8% accuracy<\/strong> and showed much better stability.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Iteration 4<\/strong> (Final): Inspired by research showing that deeper networks can capture more complex features, I built a true deep learning architecture:<\/p>\n<ul class=\"[&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc space-y-2.5 pl-7\">\n<li class=\"whitespace-normal break-words\"><strong>4 convolutional layers<\/strong> with varying filter sizes (64, 128, 64, 32)<\/li>\n<li class=\"whitespace-normal break-words\">Strategic use of different kernel sizes (3\u00d73, 5\u00d75, 1\u00d71) to capture features at multiple scales<\/li>\n<li class=\"whitespace-normal break-words\"><strong>Leaky ReLU<\/strong> activation instead of standard ReLU to prevent vanishing gradients<\/li>\n<li class=\"whitespace-normal break-words\">Removed regularization from dense layers to allow the model to train more freely<\/li>\n<li class=\"whitespace-normal break-words\">Trained for 300 epochs<\/li>\n<\/ul>\n<p class=\"font-claude-response-body whitespace-normal break-words\">This final architecture achieved <strong>99% accuracy<\/strong> on the test set! The loss curves showed smooth, consistent learning without the explosive gradients I&#8217;d seen in earlier iterations.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Key Takeaway<\/strong>: Building from scratch requires patience and experimentation. It took 300 epochs and multiple architectural revisions to reach 99% accuracy. But the learning experience was invaluable.<\/p>\n<h3 class=\"font-claude-response-subheading text-text-100 mt-1 -mb-1.5\">Approach 2: Transfer Learning with VGG19<\/h3>\n<p class=\"font-claude-response-body whitespace-normal break-words\">After spending considerable effort on the custom model, I wanted to see how much easier it would be to use a pre-trained architecture. Enter <strong>VGG19<\/strong>.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>What is VGG19?<\/strong><\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">VGG19 is a convolutional neural network developed by the Visual Geometry Group at Oxford. It consists of <strong>16 convolutional layers<\/strong> and <strong>3 fully connected layers<\/strong>, hence the name &#8220;19&#8221;. The model was originally trained on ImageNet, a massive dataset containing millions of images across thousands of categories.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Why Use Transfer Learning?<\/strong><\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">Transfer learning leverages knowledge from models trained on large datasets. Instead of learning features from scratch, you&#8217;re starting with a model that already understands edges, textures, shapes, and complex patterns. You simply adapt the final layers to your specific classification task.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>My VGG19 Experiments<\/strong><\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">I ran <strong>four experiments<\/strong> with different configurations:<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Experiment 1<\/strong>: I added a hidden dense layer (64 neurons) after the VGG19 base and applied L2 regularization. This achieved <strong>99.8% accuracy<\/strong> but experienced exploding gradients around epoch 14, triggering early stopping at epoch 18.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Experiment 2<\/strong>: I tried replacing L2 regularization with dropout (50%) to avoid gradient explosion. Unfortunately, this caused <strong>vanishing gradients<\/strong>\u2014the model stopped learning entirely and got stuck at 33% accuracy.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Experiment 3<\/strong>: Suspecting ReLU might be causing vanishing gradients, I switched to Leaky ReLU while keeping dropout. The problem persisted. This confirmed that dropout itself was the culprit when combined with transfer learning.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Experiment 4<\/strong> (Final): I removed both regularization and dropout, trusting that VGG19&#8217;s pre-trained weights would naturally resist overfitting. I trained for 100 epochs with early stopping patience of 20. The results were spectacular: <strong>100% accuracy<\/strong> with a loss of just 0.0121. All 100 epochs were used productively without any gradient issues.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Key Insight<\/strong>: Transfer learning dramatically reduced the trial-and-error process. VGG19 reached 100% accuracy in just 100 epochs, compared to the 300 epochs needed for my custom model.<\/p>\n<h3 class=\"font-claude-response-subheading text-text-100 mt-1 -mb-1.5\">Approach 3: Transfer Learning with ResNet50<\/h3>\n<p class=\"font-claude-response-body whitespace-normal break-words\">For my final experiment, I used <strong>ResNet50<\/strong>, another powerful architecture but with a fundamentally different design philosophy.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>What Makes ResNet Different?<\/strong><\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">ResNet (Residual Network) introduced the concept of &#8220;skip connections&#8221; or &#8220;residual connections.&#8221; These connections allow the network to learn residual functions rather than direct mappings, making it possible to train very deep networks (50+ layers) without vanishing gradients. ResNet50 has\u2014you guessed it\u201450 layers.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>My ResNet50 Experiments<\/strong><\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Experiment 1<\/strong>: I started with a hidden dense layer (64 neurons), L2 regularization (1e-2), and dropout (30%). This achieved <strong>99.83% accuracy<\/strong> with a loss of 0.1922 after 100 epochs\u2014solid results but not perfect.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Experiment 2<\/strong> (Final): I increased the hidden layer to 128 neurons, removed L2 regularization, and increased dropout to 50%. Training was set for 100 epochs with early stopping patience of 10.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">The model hit early stopping at epoch 61, meaning the best weights were found at epoch 51. The final results: <strong>100% accuracy<\/strong> with an incredibly low loss of <strong>0.000875<\/strong>\u2014the best performance across all models!<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Key Observation<\/strong>: ResNet50 not only achieved perfect accuracy but did so with the lowest loss value, suggesting the most confident predictions. The skip connections likely helped the model converge faster and more reliably.<\/p>\n<h2 class=\"font-claude-response-heading text-text-100 mt-1 -mb-0.5\">Results Comparison: The Moment of Truth<\/h2>\n<p class=\"font-claude-response-body whitespace-normal break-words\">Let me lay out the final results side by side:<\/p>\n<table class=\"bg-bg-100 min-w-full border-separate border-spacing-0 text-sm leading-[1.88888] whitespace-normal\">\n<thead class=\"border-b-border-100\/50 border-b-[0.5px] text-left\">\n<tr class=\"[tbody&gt;&amp;]:odd:bg-bg-500\/10\">\n<th class=\"text-text-000 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">Model<\/th>\n<th class=\"text-text-000 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">Training Epochs Used<\/th>\n<th class=\"text-text-000 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">Test Accuracy<\/th>\n<th class=\"text-text-000 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">Final Loss<\/th>\n<th class=\"text-text-000 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">Epochs Until Loss &lt; 0.1<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr class=\"[tbody&gt;&amp;]:odd:bg-bg-500\/10\">\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\"><strong>Vanilla CNN<\/strong><\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">300<\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">99.0%<\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">0.0325<\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">~100<\/td>\n<\/tr>\n<tr class=\"[tbody&gt;&amp;]:odd:bg-bg-500\/10\">\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\"><strong>VGG19<\/strong><\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">100<\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">100%<\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">0.0121<\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">~18<\/td>\n<\/tr>\n<tr class=\"[tbody&gt;&amp;]:odd:bg-bg-500\/10\">\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\"><strong>ResNet50<\/strong><\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">51<\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">100%<\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">0.000875<\/td>\n<td class=\"border-t-border-100\/50 [&amp;:not(:first-child)]:-x-[hsla(var(--border-100) \/ 0.5)] border-t-[0.5px] px-2 [&amp;:not(:first-child)]:border-l-[0.5px]\">~21<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3 class=\"font-claude-response-subheading text-text-100 mt-1 -mb-1.5\">What Do These Numbers Tell Us?<\/h3>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Efficiency<\/strong>: Transfer learning models (VGG19 and ResNet50) reached peak performance much faster than the custom CNN. VGG19 needed roughly 18 epochs to get loss below 0.1, while my Vanilla CNN needed about 100 epochs to reach the same point.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Effort vs. Results<\/strong>: I had to make <strong>12 different layer modifications<\/strong> while building the Vanilla CNN to reach 99%. In contrast, VGG19 only required <strong>1 additional layer<\/strong> on top of the pre-trained base, and ResNet50 needed <strong>3 custom layers<\/strong>. This dramatically reduced development time.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Final Performance<\/strong>: While the Vanilla CNN achieved impressive 99% accuracy, both transfer learning models hit perfect 100% accuracy. More importantly, ResNet50&#8217;s extremely low loss (0.000875) indicates highly confident predictions, which is crucial for real-world deployment.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>Resource Investment<\/strong>: If I had standardized all models to train for just 25 epochs, the Vanilla CNN would have significantly underperformed compared to the transfer learning approaches, likely stuck somewhere around 94-95% accuracy.<\/p>\n<h2 class=\"font-claude-response-heading text-text-100 mt-1 -mb-0.5\">Lessons Learned &amp; Insights<\/h2>\n<p class=\"font-claude-response-body whitespace-normal break-words\">This project taught me several valuable lessons:<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>1. Transfer Learning Is Powerful, But Not Magic<\/strong><\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">Pre-trained models gave me a massive head start, but I still needed to understand how to properly adapt them. My failed experiments with dropout and regularization on VGG19 showed that you can&#8217;t just blindly add layers\u2014you need to understand how they interact with pre-trained weights.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>2. Architecture Matters<\/strong><\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">ResNet50&#8217;s skip connections proved superior for this task, achieving both perfect accuracy and the lowest loss. The architectural innovation of residual connections isn&#8217;t just theoretical\u2014it translates to real performance gains.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>3. Building From Scratch Has Value<\/strong><\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">While more time-consuming, building the Vanilla CNN taught me fundamentals that made working with VGG19 and ResNet50 much easier. Understanding why certain layer combinations cause vanishing or exploding gradients helped me debug issues faster.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><strong>4. Regularization Requires Careful Tuning<\/strong><\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">One of the most surprising findings was that removing regularization entirely from the transfer learning models actually improved performance. This suggests that ImageNet pre-training already provided strong regularization through learned feature representations.<\/p>\n<h2 class=\"font-claude-response-heading text-text-100 mt-1 -mb-0.5\">Conclusion &amp; Future Directions<\/h2>\n<p class=\"font-claude-response-body whitespace-normal break-words\">This project demonstrated that for image classification tasks, transfer learning offers a compelling advantage: <strong>faster training, higher accuracy, and less manual tuning<\/strong>. However, understanding how to build CNNs from scratch remains valuable for developing intuition about deep learning.<\/p>\n<h3 class=\"font-claude-response-subheading text-text-100 mt-1 -mb-1.5\">Potential Improvements<\/h3>\n<p class=\"font-claude-response-body whitespace-normal break-words\">If I were to extend this project, I would:<\/p>\n<ul class=\"[&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc space-y-2.5 pl-7\">\n<li class=\"whitespace-normal break-words\"><strong>Expand to all 15 vegetable classes<\/strong> to test model robustness<\/li>\n<li class=\"whitespace-normal break-words\"><strong>Implement data augmentation<\/strong> (rotations, flips, color jittering) to improve generalization<\/li>\n<li class=\"whitespace-normal break-words\"><strong>Try newer architectures<\/strong> like EfficientNet or Vision Transformers<\/li>\n<li class=\"whitespace-normal break-words\"><strong>Deploy the best model<\/strong> as a web application for real-time classification<\/li>\n<li class=\"whitespace-normal break-words\"><strong>Analyze misclassified images<\/strong> to understand model limitations<\/li>\n<li class=\"whitespace-normal break-words\"><strong>Experiment with ensemble methods<\/strong> combining multiple models<\/li>\n<\/ul>\n<h3 class=\"font-claude-response-subheading text-text-100 mt-1 -mb-1.5\">Final Thoughts<\/h3>\n<p class=\"font-claude-response-body whitespace-normal break-words\">Whether you&#8217;re building a custom model or using transfer learning, the key is understanding your data and iterating based on results. This project reinforced that machine learning is as much art as science\u2014requiring experimentation, patience, and continuous learning.<\/p>\n<p class=\"font-claude-response-body whitespace-normal break-words\">If you&#8217;re starting with image classification, my recommendation is clear: <strong>start with transfer learning<\/strong> to get quick wins and build momentum, but don&#8217;t skip learning the fundamentals of CNNs. The combination of both approaches will make you a more effective practitioner.<\/p>\n<hr class=\"border-border-300 my-2\" \/>\n<p class=\"font-claude-response-body whitespace-normal break-words\"><em>All experiments were conducted using TensorFlow Keras on Google Colab with L4 GPU acceleration. The complete code and detailed results are available in my research report.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Abstract Deep learning has revolutionized computer vision, making it possible to build highly accurate image classification systems with relatively little effort. But how much difference does it really make to use pre-trained models versus building your own from scratch? In this project, I set out to answer that question by building a vegetable image classifier [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,17,31,18,28,8,29,30],"tags":[],"class_list":["post-403","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","category-computer-vision","category-convolutional-neural-network-cnn","category-deep-learning","category-resnet","category-self-project","category-tensorflow","category-vgg"],"_links":{"self":[{"href":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/index.php?rest_route=\/wp\/v2\/posts\/403","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=403"}],"version-history":[{"count":2,"href":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/index.php?rest_route=\/wp\/v2\/posts\/403\/revisions"}],"predecessor-version":[{"id":405,"href":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/index.php?rest_route=\/wp\/v2\/posts\/403\/revisions\/405"}],"wp:attachment":[{"href":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=403"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=403"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/yosua-kristianto.devcraftlabs.my.id\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=403"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}