[7] Nging Server Setup - Building a Text Detection API using Drogon Server with OpenCV

Setting up text detection in RHEL and integrating it with a Drogon server to develop functionality that extracts text information as an API and displays it to clients.

Note: The purpose is Text Detection, not OCR (Optical Character Recognition). OCR will be integrated later.


0. Search for Text Detection Frameworks

Framework Language Pros Cons
Tesseract OCR C++ (Python, etc.) Open source, supports multiple languages, customizable Can have degraded performance without preprocessing, requires integration with OpenCV for detection
OpenCV + EAST model C++, Python Fast text detection, supports both CPU and GPU, easy integration with OpenCV Only supports text detection, so OCR is needed separately; performance limitations with high-resolution images
EasyOCR Python PyTorch-based, supports multiple languages and styles, easy to install and use May be slow on CPU, requires GPU for high performance
Amazon Textract (API) Multi-language Serverless deployment on AWS, advanced features (table and form recognition), stable performance Potentially high cost for large datasets, restricted to cloud use, security considerations
Google Vision API (API) Multi-language Comprehensive text detection and image analysis, serverless deployment on GCP High costs for large datasets, restricted to cloud use, data security concerns
PaddleOCR Python Excellent performance for Asian languages, lightweight model with fast CPU processing Based on PaddlePaddle, which may be a barrier for PyTorch/TensorFlow users, requires Python for integration
  • External API calls are excluded.
  • Since the objective is detection, proceed with OpenCV + EAST model.
  • The API will provide text coordinate data within the image and generate a new image with frames around detected text ($filename_detection.jpg).

1. OpenCV Setup and Required Library Installation

Proceed with the following steps (initially implemented via shell):

#!/bin/sh
sudo yum update -y
sudo yum groupinstall -y "Development Tools"
sudo yum install -y epel-release
sudo yum install -y cmake git gtk2-devel boost-devel
sudo yum install -y libjpeg-turbo-devel libpng-devel libtiff-devel
sudo yum install -y libdc1394-devel gstreamer1-devel gstreamer1-plugins-base-devel
sudo yum install -y tbb-devel

# OpenCV 4.10.0
wget https://github.com/opencv/opencv/archive/refs/tags/4.10.0.tar.gz
mv 4.10.0.tar.gz opencv.tar
tar -xvf opencv.tar
mv opencv-4.10.0 opencv

# OpenCV Contrib 4.10.0
wget https://github.com/opencv/opencv_contrib/archive/refs/tags/4.10.0.tar.gz
mv 4.10.0.tar.gz opencv_contrib.tar
tar -xvf opencv_contrib.tar
mv opencv_contrib-4.10.0 opencv_contrib

# OpenCV Build
cd ./opencv
mkdir build
cd build

# CMake
cmake -DOPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
      -DCMAKE_BUILD_TYPE=Release \
      -DBUILD_SHARED_LIBS=ON \
      -DWITH_IPP=ON \
      -DWITH_TBB=ON \
      -DWITH_OPENMP=ON \
      -DENABLE_FAST_MATH=ON \
      -DCMAKE_INSTALL_PREFIX=/usr/local \
      -DBUILD_EXAMPLES=OFF \
      -DBUILD_TESTS=OFF \
      -DBUILD_PERF_TESTS=OFF \
          ..

# Build and Install
make -j$(nproc)  # Builds using all CPU cores
sudo make install
sudo ldconfig  # Updates the library cache

This setup installs in the system path by default.


2. API Development

Continue using the previously set up server.

2-1. Connect with Drogon Ctl

cd /root/drogon2/drogon/build/drogon_ctl 
drogon_ctl create controller TextDetectionController

2-2. Implement TextDetection Source location ( /root/drogon2/drogon/build/drogon_ctl/testAPI/controllers )

TextDetectionController.h

#pragma once
#include 
#include 
#include 

using namespace drogon;

void initializeModel();

class TextDetectionController : public HttpController
{
public:
    std:: string _storagePath = "/root/storage/";
    METHOD_LIST_BEGIN
    // Process text detection for images located in src and save to dst
    ADD_METHOD_TO(TextDetectionController::handleTextDetection, "/text-detection", Get);
    METHOD_LIST_END

    // Method declaration
    void handleTextDetection(const HttpRequestPtr& req, std::function&& callback);
    std::vector decodeBoundingBoxes(const cv::Mat& scores, const cv::Mat& geometry, float scoreThresh);
};

TextDetectionController.cc

#include "TextDetectionController.h"
#include 
#include 

using namespace cv;
using namespace cv::dnn;
namespace {
    cv::dnn::Net eastNet;
    const std::string eastModelPath = "./frozen_east_text_detection.pb";
}

// OpenCV Initialization
// Since model initialization takes time, set globally for efficiency
// Note: this is thread-unsafe and would need improvement for production server logic
void initializeModel() {
    if (eastNet.empty()) {
        eastNet = cv::dnn::readNet(eastModelPath);
        if (eastNet.empty()) {
            throw std::runtime_error("Failed to load EAST model");
        }
    }
}

std::vector TextDetectionController::decodeBoundingBoxes(const cv::Mat& scores, const cv::Mat& geometry, float scoreThresh)
{
    std::vector detections;
    const int numRows = scores.size[2];
    const int numCols = scores.size[3];

    for (int y = 0; y < numRows; y++) {
        const float* scoresData = scores.ptr(0, 0, y);
        const float* x0_data = geometry.ptr(0, 0, y);
        const float* x1_data = geometry.ptr(0, 1, y);
        const float* x2_data = geometry.ptr(0, 2, y);
        const float* x3_data = geometry.ptr(0, 3, y);
        const float* anglesData = geometry.ptr(0, 4, y);

        for (int x = 0; x < numCols; x++) {
            float score = scoresData[x];
            if (score < scoreThresh)
                continue;

            float offsetX = x * 4.0;
            float offsetY = y * 4.0;
            float angle = anglesData[x];
            float cosA = cos(angle);
            float sinA = sin(angle);
            float h = x0_data[x] + x2_data[x];
            float w = x1_data[x] + x3_data[x];

            Point2f offset(offsetX + cosA * x1_data[x] + sinA * x2_data[x],
                           offsetY - sinA * x1_data[x] + cosA * x2_data[x]);
            Point2f p1 = Point2f(-sinA * h, -cosA * h) + offset;
            Point2f p3 = Point2f(-cosA * w, sinA * w) + offset;
            RotatedRect rrect(0.5f * (p1 + p3), Size2f(w, h), -angle * 180.0f / CV_PI);
            detections.push_back(rrect);
        }
    }

    return detections;
}

void TextDetectionController::handleTextDetection(const HttpRequestPtr& req, std::function&& callback)
{
    Json::Value respStr;
    HttpStatusCode code = k200OK;

    do
    {
        auto filename = req->getParameter("filename");
        if (filename.empty())
        {
            code = k404NotFound;
            break;
        }

        std::string path = _storagePath.c_str() + filename;
        // Read Image From Server Local
        Mat image = imread(path);
        if (image.empty())
        {
            code = k404NotFound;
            break ;
        }

        // Set image dimensions and resize ratio
        int origH = image.rows;
        int origW = image.cols;
        int newW = 320;
        int newH = 320;
        float rW = static_cast(origW) / newW;
        float rH = static_cast(origH) / newH;

        // Create blob and set input for the network
        Mat blob = blobFromImage(image, 1.0, Size(newW, newH), Scalar(123.68, 116.78, 103.94), true, false);
        eastNet.setInput(blob);

        // Set output layers for the EAST model
        std::vector outputLayers = {"feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3"};
        std::vector outs;
        eastNet.forward(outs, outputLayers);

        // Perform text detection
        std::vector detections = decodeBoundingBoxes(outs[0], outs[1], 0.7);
        if ( detections.size() <= 0 )
        {
            // Return 404 if no text is detected
            code = k404NotFound;
            break;
        }

        // Return results in JSON format
        Json::Value jsonResponse;
        for (const auto& detection : detections) {
            Rect boundingBox = detection.boundingRect();
            boundingBox.x *= rW;
            boundingBox.y *= rH;
            boundingBox.width *= rW;
            boundingBox.height *= rH;

            Json::Value box;
            box["x"] = boundingBox.x;
            box["y"] = boundingBox.y;
            box["width"] = boundingBox.width;
            box["height"] = boundingBox.height;
            jsonResponse["detections"].append(box);
        }

        // Generate the response
        auto response = HttpResponse::newHttpJsonResponse(jsonResponse);

        // Process Text Area Rectangle
        // Draw rectangles around detected text areas
        for (const auto& detection : detections)
        {
            Rect boundingBox = detection.boundingRect();
            boundingBox.x *= rW;
            boundingBox.y *= rH;
            boundingBox.width *= rW;
            boundingBox.height *= rH;

            rectangle(image, boundingBox, Scalar(0, 255, 0), 2);  // Draw green rectangle
        }
        // Save the image
        std::filesystem::path outputPath = _storagePath + (std::filesystem::path(path).stem().string() + "_detection.jpg");
        imwrite(outputPath.string(), image);

        callback(response);
        return ;
    }
    while(false);

    auto resp = HttpResponse::newHttpResponse();
    resp->setStatusCode(code);
    callback(resp);
}

main.cc

#include 
#include "controllers/TextDetectionController.h"

int main() {
    try
    {
        // Load model during server initialization
        initializeModel();
    }
    catch (const std::exception &e)
    {
        std::cerr << "Error initializing model: "
                << e.what() << std::endl;
        return 1;  // Stop server if model loading fails
    }
    drogon::app().loadConfigFile("../config.json");

    LOG_INFO << "Server RUN";
    drogon::app().run();

    return 0;
}

2-3. Edit drogon CMake

cd /root/drogon2/drogon/build/drogon_ctl/testAPI
vi CMakeLists.txt
  • Modify as shown below to add TextDetectionController.cc after FileController.cc:
# Necessary changes:
# Add find_package(OpenCV CONFIG REQUIRED)
# Add ${OpenCV_LIBS} to target_link_libraries

# If OpenCV is not in the default path, specify the path (not needed if previous steps are followed)
# set(OpenCV_DIR "/path/to/opencv")  # Example: /usr/local/opencv
---

add_executable(${PROJECT_NAME} main.cc controllers/FileController.cc controllers/TextDetectionController.cc)

find_package(Drogon CONFIG REQUIRED)
find_package(OpenCV CONFIG REQUIRED) # Added
target_link_libraries(${PROJECT_NAME} PRIVATE Drogon::Drogon ${OpenCV_LIBS}) # Added

2-4. Build drogon

  • Run cmake followed by make
cd /root/drogon2/drogon/build/drogon_ctl/testAPI/build
cmake .. 
make

3. Execution

3-1. Download Model Before Running

cd /root/drogon2/drogon/build/drogon_ctl/testAPI/build

# Detection model
wget https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1

./testAPI

3-2. Execution and Upload

  • Use a sample text image as shown below (sample image)
text image

  • Upload the file to the server via the upload endpoint (http://127.0.0.1:10099/doUpload/). (Refer to the previous page for the upload implementation)
do upload

  • After upload, check the file list at (http://127.0.0.1:10099/list/)
list result

  • Call the detection API (http://127.0.0.1:10099/text-detection?filename=image_in_text.PNG)
text detection call

  • After detection, check the updated file list at (http://127.0.0.1:10099/list/)
after detection and list result

  • Download and verify the detected image (http://127.0.0.1:10099/download/image_in_text_detection.jpg)
opencv result



댓글

이 블로그의 인기 게시물

윤석열 계엄령 선포! 방산주 대폭발? 관련주 투자 전략 완벽 분석

한국 핵무장 논의와 방위산업 관련주: 핵무기 개발 과정과 유망 종목 분석

[로스트아크] 제작 효율 최적화 위한 영지 세팅