Abstract

On the verge of building my own mobile-based virtual assistant, I present McAstr, which is designed to respond with emotion-aware facial expressions based on the sentiment of its replies. A large language models (LLMs) trend over the years, as well as the most go-to solution of neural network architecture and transformer-based models like BERT (Bidirectional Encoder Representation from Transformers), are known for their resource-intensive infrastructure. This is a problem when it comes to building a natural language processing model in a resource-limited environment. To address this issue, I decided to train a not-very-simple Support Vector Classifier (SVC) model to perform emotion analysis. To address its limitation in generalization and adaptability, I designed a feedback-driven architecture that enables continuous model improvement through user interaction. This blog, documents the engineering journey, from the infrastructure design to deployment and feedback integration, offering insights into building adaptive AI systems.

1. Introduction

The idea behind McAstr was simple: what if a virtual assistant could not only respond with words, but also express emotions visually? As I began building McAstr, I realized that sentiment analysis would be the key to making this assistant emotionally intelligent. But building a model that works well for limited resource environments, while facing true challenge of generalizations, is not a small feat. This completely ruled out transformer-based models or artificial neural networks, pushing me toward more traditional machine learning approaches.

Figure 1.1 McAstr Virtual Assistant Concept

However, the real challenge emerged during model testing. While the initial emotion analysis model performed well when performing generalization with data that is similar to the initial dataset, it struggled with messy, unpredictable nature of real-world user input. The model was static, frozen in time with the knowledge it had gained during training, and needed a way to make it grow to learn from its interactions without requiring manual re-train over time.

Traditional machine learning deployments follow a linear path of training a model, model deployment, and hoping it works. When it does not work, the data scientist manually collects failure cases, re-trains, and redeploy the model. The machine learning model also sometimes made surprising mistakes. Because the nature of the machine learning model that is built to recognize patterns it had seen before, which makes it as good as its training data, no matter how large or diverse a dataset. The traditional deployment process is slow, labor-intensive, and lack of scalability. I wanted something different, a system that could learn continuously from its mistakes without heavily relly on human intervention. This led me to design what I call information crowding architecture. It refers to a feedback loop where user corrections automatically flow back into the training pipeline, creating a self-improving system. To achieve this matter, it requires an automated Machine Learning Operations (MLOps) architecture.

2. What is MLOps?

MLOps can be defined as automated machine learning operations [1]. It is comparable with the existing continuous integration / continuous deployment (CI/CD) system in software engineering, where the application deployment process is being automated while customized along with business needs and team / technical capabilities. It automates the application deployment pipeline that is standardized by version control, testing process, compiling/building the system, and finally, the application deployment.

Figure 2.1 Machine Learning Workflow

Machine learning deployment also had a workflow [2]. As pictured in Figure 2.1, the standard case starts with dataset pre-processing/normalization to standardize data for model readable format and creating data segmentation to determine which data is used for training, testing, and validation. The process then continues with performing model hyperparameter tuning and training using training and validation data segments, which took the longest time within the workflow, but the most crucial step in building a machine learning model. The trained model then being evaluated by employing the testing data segments to ensure the model’s generalization. If the required evaluation results are satisfied, the model will then be deployed on the server. MLOps meant to automate these processes in building and deploying machine learning model.

3. Implementation

The information crowding architecture refers to the method I implement to update the knowledge of the emotion analysis model. This architecture is most likely suitable for server-side model deployment, where the model is deployed and used centrally. The information crowding is done by building a web that is capable to performing the model prediction output’s feedback loop. To build the architecture, a simple web infrastructure with RESTful API technology stack are employed as Table 3.1 below:

Table 3.1 Technology Stack

No.	Stack	Technology
1	Web Front-End	Vue.js (TypeScript)
2	Backend API	FastAPI (Python)
3	Database	MySQL 8
4	Machine Learning Operations (MLOps)	Python

The business process is pretty simple. User navigates to the site, fill the prompt with the phrase that they want to express the emotion with, and then submits the prompt. Through FastAPI, the model is being integrated in the background and producing the output of an expression image that is pre-mapped with the designated label. The returned image is then loaded in the web interface, which also triggers a hidden feedback button that only appear after the output is produced. If the user clicks the feedback button, they may insert the correct output by selecting one among all images that represent each label. This business process can be drawn as Figure 3.1 and the application process is available at Figure 3.2 below:

Figure 3.1 Feedback Loop Business Process

Figure 3.2 McAstr PA Frontend Application Feedback Loop Process

The implementation of this project employs a high-availability web platform to perform a feedback loop business process, with a RESTful API architecture to embed the emotion analysis model. Leveraging MLOps and the discussed business process earlier, I built an automatic emotion analysis model knowledge update. As mentioned in the MLOps discussion section earlier, we are know that building a machine learning model requires a workflow. The premise of this blog post is not to discuss the specific choice and process of building the model, but rather to discuss how we can automate the machine learning building workflow, which can be achieved by performing MLOps.

Figure 3.2 The Environment Architecture Concept

3.2 Data Design

Figure 3.3 Entity Relationship Diagram

To support continuous learning and maintain a clear separation of concerns within the McAstr MLOps pipeline, I designed a lightweight yet structured relational schema using MySQL. The goal of the data design is to make the model’s lifecycle transparent, auditable, and reproducible while keeping the implementation simple enough.

The database consists of five tables, each responsible for a specific part of the machine learning workflow, ranging from knowledge storage to configuration management.

1. mlops_tbl_version

This table is used as a master reference for the currently active machine learning model. The primary responsibility of this table is to track which model version is currently deployed.

2. mlops_tbl_model_log

All transactional events related to the model’s interference in providing emotion prediction, as well as the feedback loops, are stored in this table. This table acts as the bridge between user interaction and model improvement, as it logs each sentiment prediction made by the model, and stores user feedback through label correction records when users select a more accurate emotion.

3. mlops_tbl_knowledge

This table is used as central knowledge storage, holding all labelled text samples used for training, testing, and validation of the model. The data combines the original dataset with newly acquired samples from user feedback that came from from model log. This provides a reliable foundation for retraining the model through ETL pipelines and preserving historical knowledge as the model evolves. Over time, this table becomes a hybrid dataset consisting of formal benchmark data and real user-generated data, which allows the model to specialize in the audience domain.

4. ml_tbl_emotion_pic_repo

To make the system more interesting, instead of returning the emotion label as text or actual label data, I choose images that are represent each emotion visually, and hand-map it with the correct label. This table stores the file’s CDNs of each mapped image according to the emotion label, allowing the frontend to retrieve expression images consistently based on predicted labels.

5. gco_tbl_config

This table holds miscellaneous configuration parameters needed by either the backend or the MLOps engine.

3.3 Information Crowding Code Implementation

As a Web-based application with REST API architecture, most of the code implementation lies in integration. There are two core business functions to be automated with this application:

1. Emotion Analysis

Figure 3.4 Emotion Analysis Screen

User navigates to the web, enters their prompt to be analyzed, and then clicks at the “Generate Expression” button. In background, this action will send the prompt directly to FastAPI backend with request as follow:

POST /api/v1/text-analyzer
Request:


{
    "text": "I CANT BELIEVE THIS HAPPENED TO ME!!!!!"
}


Response:


{
    "status": true,
    "code": "200",
    "message": "Text analysis completed!",
    "data": {
        "image_uri": "https://somecdnuri.com/some-end-point/emotion.jpg",
        "feedback_id": "53c453d4-2fdd-40d8-97e4-136f8bd8f437"
    }
}

2. Feedback System

Figure 3.5 The Feedback System Modal

Once the Frontend receives the result of the representative image, the user can either choose to give feedback on the result. In the background, the Frontend system took the feedback_id that was returned prior to the emotion analysis API response earlier, and requested the feedback prompts that contain la ist of randomized images taken from the pic repository URL table. Every picture represents each unique label that can be chosen, and submitted to revise the prediction.

RETRIEVE PICTURES REVIEWS

POST /api/v1/text-analyzer/feedback/review
Request:

 
{
    "feedback_id": "53c453d4-2fdd-40d8-97e4-136f8bd8f437"
}


Response:

{
    "status": true,
    "code": "200",
    "message": "Feedback provided",
    "data": {
        "data_0": {
            "image_uri": "https://somecdnuri.com/some-end-point/emotion-0.jpg",
            "choice_id": 0
        },
        "data_1": {
            "image_uri": "https://somecdnuri.com/some-end-point/emotion-1.jpg",
            "choice_id": 1
        },
        "data_2": {
            "image_uri": "https://somecdnuri.com/some-end-point/emotion-2.jpg",
            "choice_id": 2
        },
        "data_3": {
            "image_uri": "https://somecdnuri.com/some-end-point/emotion-3.jpg",
            "choice_id": 3
        },
        "data_4": {
            "image_uri": "https://somecdnuri.com/some-end-point/emotion-4.jpg",
            "choice_id": 4
        },
        "data_5": {
            "image_uri": "https://somecdnuri.com/some-end-point/emotion-5.jpg",
            "choice_id": 5
        }
    }
}


FEEDBACK SUBMISSION

POST /api/v1/text-analyzer/feedback/submit

Request:

{
    "feedback_id": "53c453d4-2fdd-40d8-97e4-136f8bd8f437",
    "review_id": 3
}

Response:


{
    "status": true,
    "code": "200",
    "message": "Feedback successfully submitted!",
    "data": {
        "feedback_id": "53c453d4-2fdd-40d8-97e4-136f8bd8f437"
    }
}

All model’s prediction, and user revisions are saved in the model log table as the following Figure 3.6 below:

Figure 3.6 Transactional Data within Model Log Table

3.4 MLOps Code Implementation

The MLOps worker layer is the component that closes the feedback loop by automatically running the knowledge ETL (Extract, Transform, Load) process and retraining the model. The ETL pipeline begins by extracting newly collected knowledge from the mlops_tbl_model_log table, then inserting or updating it into the mlops_tbl_knowledge table. To avoid duplication, each entry uses an upsert mechanism based on the prompt’s unique ID, ensuring that repeated prompts are updated rather than redundantly reinserted. The system also uses a daily data cut-off by checking the created_at attribute in the model log, so only fresh entries are processed during each ETL cycle. Although the Python script itself does not include an internal scheduler, it is easily automated at the server level. For example, on a Linux-based server, a simple cron job can run python main.py inside the deployed project directory at any chosen schedule.

Once the knowledge update is completed, the MLOps worker moves into the Machine Learning workflow, which follows the standard Natural Language Processing pipeline. The process starts with loading all data from mlops_tbl_knowledge, followed by a full normalization sequence: lowercasing, stop word removal, punctuation cleaning, stemming, and vectorization. After preprocessing, the data is segmented into train, test, and validation sets. The model is then trained, evaluated, and its performance is compared against the currently deployed model version recorded in mlops_tbl_version.

If the newly trained model performs better than the active one, a new entry is inserted into the version table to register this model as the latest version. Its learned parameters are exported as a .joblib file, ready to be served by FastAPI, along with an automatically generated Excel report containing evaluation metrics such as accuracy and the confusion matrix. Importantly, regardless of whether the new model outperforms the previous one, both the model weight and its accompanying report are always backed up in their designated storage directory. This ensures complete traceability and historical preservation of every training cycle.

Source Code

The code for all implementations within this blog can be viewed and use by clicking the GitHub links below:

No.	Stack	Repository
1	Web Front-End	McAstr PA Frontend
2	Backend API	McAstr PA Backend
3	Machine Learning Operations (MLOps)	McAstr PA MLOps

References

[1] Wazir, S., Kashyap, G. S., & Saxena, P. (2023). MLOps: A Review. https://arxiv.org/pdf/2308.10908

[2] Zhengxin, F., Yi, Y., Jingyu, Z., Yue, L., Yuechen, M., Qinghua, L., Xiwei, X., Jeff, W., Chen, W., Shuai, Z., & Shiping, C. (2023). MLOps Spanning Whole Machine Learning Life Cycle: A Survey. https://arxiv.org/abs/2308.10908

Contribution

Interested to contribute in this repository? Any questions or feedback?

Feel free to email me to yosua_kristianto144@outlook.com

Yosua Kristianto

Building an Emotion Aware Response Model through Sentiment Analysis and Information Crowding Architecture