Files

T

peanut 917b0e5848 docs: add analytics and tts implementation plans

2026-05-17 10:02:46 +08:00

34 KiB

Raw Blame History

Script Text-to-Speech Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add CPU-friendly open-source text-to-speech generation for user scripts, with cached backend tasks and mini program playback.

Architecture: backend-single creates TTS tasks, validates source ownership, cleans script text, checks audio cache, and calls a private Python FastAPI TTS service. The Python service runs on 101.200.208.45 and writes audio files under /data/uploads/emotion-museum/tts. The mini program requests generation from ScriptDetailView.vue, polls status, and plays the returned audio URL with uni.createInnerAudioContext().

Tech Stack: Spring Boot 2.7, MyBatis Plus, Java 17, MySQL, Python FastAPI, MeloTTS or equivalent CPU-friendly Chinese TTS engine, systemd, uni-app/Vue.

File Structure

Create sql/2026-05-17-tts-task.sql: t_tts_task migration.
Create backend-single/src/main/java/com/emotion/entity/TtsTask.java: task entity.
Create backend-single/src/main/java/com/emotion/mapper/TtsTaskMapper.java: task mapper.
Create backend-single/src/main/java/com/emotion/dto/request/tts/TtsTaskCreateRequest.java: create request.
Create backend-single/src/main/java/com/emotion/dto/response/tts/TtsTaskResponse.java: task response.
Create backend-single/src/main/java/com/emotion/service/TtsTaskService.java: contract.
Create backend-single/src/main/java/com/emotion/service/impl/TtsTaskServiceImpl.java: ownership, cache, async processing.
Create backend-single/src/main/java/com/emotion/service/TtsEngineClient.java: internal engine client contract.
Create backend-single/src/main/java/com/emotion/service/impl/HttpTtsEngineClient.java: calls Python service.
Create backend-single/src/main/java/com/emotion/controller/TtsController.java: user-facing TTS endpoints.
Modify backend-single/src/main/resources/application.yml: TTS config defaults.
Modify backend-single/src/main/resources/application-prod.yml: production paths and internal URL.
Create backend-single/src/test/java/com/emotion/service/TtsTaskServiceTest.java: service tests.
Create backend-single/tts-service/requirements.txt: FastAPI deps; MeloTTS is installed from the official repository because the official docs use editable install plus python -m unidic download.
Create backend-single/tts-service/app.py: FastAPI service.
Create backend-single/tts-service/README.md: server setup.
Create backend-single/tts-service/emotion-museum-tts.service: systemd unit.
Create mini-program/src/services/tts.js: TTS API client.
Create mini-program/src/components/ScriptAudioPlayer.vue: player component.
Modify mini-program/src/pages/main/ScriptDetailView.vue: add player.
Modify mini-program/src/services/analytics.js only if analytics plan has not already exported track.

Defaults

sourceType: epic_script
voice: default_zh_female
Text limit: 5000 cleaned characters
Python internal URL: http://127.0.0.1:19110
Audio output directory: /data/uploads/emotion-museum/tts
Public URL prefix: /uploads/emotion-museum/tts or the existing static upload prefix configured by backend/Nginx

Task 1: Add TTS SQL Migration

Files:

Create: sql/2026-05-17-tts-task.sql
Step 1: Create migration

CREATE TABLE IF NOT EXISTS t_tts_task (
    id VARCHAR(64) PRIMARY KEY COMMENT 'Primary key',
    user_id VARCHAR(64) NOT NULL COMMENT 'Owner user id',
    source_type VARCHAR(50) NOT NULL COMMENT 'Source type, for example epic_script',
    source_id VARCHAR(64) NOT NULL COMMENT 'Source content id',
    text_hash VARCHAR(128) NOT NULL COMMENT 'Hash of cleaned text and voice',
    text_length INT NOT NULL COMMENT 'Cleaned text length',
    voice VARCHAR(64) NOT NULL DEFAULT 'default_zh_female' COMMENT 'Voice id',
    status VARCHAR(20) NOT NULL DEFAULT 'pending' COMMENT 'pending, processing, success, failed',
    audio_url VARCHAR(500) NULL COMMENT 'Public audio URL',
    audio_path VARCHAR(500) NULL COMMENT 'Server audio path',
    duration_ms BIGINT NULL COMMENT 'Audio duration',
    error_message VARCHAR(1000) NULL COMMENT 'Failure message',
    request_count INT NOT NULL DEFAULT 1 COMMENT 'Cache hit request count',
    create_by VARCHAR(64) NULL COMMENT 'Creator',
    create_time DATETIME DEFAULT CURRENT_TIMESTAMP COMMENT 'Create time',
    update_by VARCHAR(64) NULL COMMENT 'Updater',
    update_time DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Update time',
    is_deleted TINYINT DEFAULT 0 COMMENT 'Logic delete flag',
    remarks VARCHAR(500) NULL COMMENT 'Remarks'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='Text-to-speech task table';

CREATE INDEX idx_tts_task_user_source ON t_tts_task (user_id, source_type, source_id);
CREATE INDEX idx_tts_task_text_hash ON t_tts_task (text_hash);
CREATE INDEX idx_tts_task_status ON t_tts_task (status);
CREATE INDEX idx_tts_task_create_time ON t_tts_task (create_time);

Step 2: Commit

git add sql/2026-05-17-tts-task.sql
git commit -m "feat: add tts task table"

Task 2: Add Backend TTS Entity, DTOs, and Mapper

Files:

Create: backend-single/src/main/java/com/emotion/entity/TtsTask.java
Create: backend-single/src/main/java/com/emotion/mapper/TtsTaskMapper.java
Create: backend-single/src/main/java/com/emotion/dto/request/tts/TtsTaskCreateRequest.java
Create: backend-single/src/main/java/com/emotion/dto/response/tts/TtsTaskResponse.java
Step 1: Add entity

package com.emotion.entity;

import com.baomidou.mybatisplus.annotation.TableField;
import com.baomidou.mybatisplus.annotation.TableName;
import com.emotion.common.BaseEntity;
import lombok.*;
import lombok.experimental.SuperBuilder;

@Data
@EqualsAndHashCode(callSuper = true)
@SuperBuilder
@NoArgsConstructor
@AllArgsConstructor
@TableName("t_tts_task")
public class TtsTask extends BaseEntity {
    @TableField("user_id")
    private String userId;
    @TableField("source_type")
    private String sourceType;
    @TableField("source_id")
    private String sourceId;
    @TableField("text_hash")
    private String textHash;
    @TableField("text_length")
    private Integer textLength;
    @TableField("voice")
    private String voice;
    @TableField("status")
    private String status;
    @TableField("audio_url")
    private String audioUrl;
    @TableField("audio_path")
    private String audioPath;
    @TableField("duration_ms")
    private Long durationMs;
    @TableField("error_message")
    private String errorMessage;
    @TableField("request_count")
    private Integer requestCount;
}

Step 2: Add mapper

package com.emotion.mapper;

import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.emotion.entity.TtsTask;
import org.apache.ibatis.annotations.Mapper;

@Mapper
public interface TtsTaskMapper extends BaseMapper<TtsTask> {
}

Step 3: Add DTOs

package com.emotion.dto.request.tts;

import lombok.Data;

import javax.validation.constraints.NotBlank;
import javax.validation.constraints.Size;

@Data
public class TtsTaskCreateRequest {
    @NotBlank
    @Size(max = 50)
    private String sourceType;

    @NotBlank
    @Size(max = 64)
    private String sourceId;

    @Size(max = 64)
    private String voice;
}

package com.emotion.dto.response.tts;

import lombok.Builder;
import lombok.Data;

@Data
@Builder
public class TtsTaskResponse {
    private String id;
    private String sourceType;
    private String sourceId;
    private String status;
    private String voice;
    private String audioUrl;
    private Long durationMs;
    private String errorMessage;
}

Step 4: Compile

cd backend-single
mvn -DskipTests compile

Expected: BUILD SUCCESS.

Step 5: Commit

git add backend-single/src/main/java/com/emotion/entity/TtsTask.java backend-single/src/main/java/com/emotion/mapper/TtsTaskMapper.java backend-single/src/main/java/com/emotion/dto/request/tts backend-single/src/main/java/com/emotion/dto/response/tts
git commit -m "feat: add tts task model"

Task 3: Add TTS Service and Engine Client

Files:

Create: backend-single/src/main/java/com/emotion/service/TtsTaskService.java
Create: backend-single/src/main/java/com/emotion/service/TtsEngineClient.java
Create: backend-single/src/main/java/com/emotion/service/impl/HttpTtsEngineClient.java
Create: backend-single/src/main/java/com/emotion/service/impl/TtsTaskServiceImpl.java
Modify: backend-single/src/main/resources/application.yml
Modify: backend-single/src/main/resources/application-prod.yml
Test: backend-single/src/test/java/com/emotion/service/TtsTaskServiceTest.java
Step 1: Add service contracts

package com.emotion.service;

import com.baomidou.mybatisplus.extension.service.IService;
import com.emotion.dto.request.tts.TtsTaskCreateRequest;
import com.emotion.dto.response.tts.TtsTaskResponse;
import com.emotion.entity.TtsTask;

public interface TtsTaskService extends IService<TtsTask> {
    TtsTaskResponse createOrReuse(TtsTaskCreateRequest request);
    TtsTaskResponse getTask(String id);
    TtsTaskResponse getBySource(String sourceType, String sourceId, String voice);
}

package com.emotion.service;

public interface TtsEngineClient {
    TtsEngineResult synthesize(String text, String voice, String outputPath);

    class TtsEngineResult {
        private final boolean success;
        private final String audioPath;
        private final Long durationMs;
        private final String errorMessage;

        public TtsEngineResult(boolean success, String audioPath, Long durationMs, String errorMessage) {
            this.success = success;
            this.audioPath = audioPath;
            this.durationMs = durationMs;
            this.errorMessage = errorMessage;
        }

        public boolean isSuccess() { return success; }
        public String getAudioPath() { return audioPath; }
        public Long getDurationMs() { return durationMs; }
        public String getErrorMessage() { return errorMessage; }
    }
}

Step 2: Add application config

In application.yml:

emotion:
  tts:
    enabled: true
    engine-url: http://127.0.0.1:19110
    output-dir: /data/uploads/emotion-museum/tts
    public-url-prefix: /uploads/emotion-museum/tts
    max-text-length: 5000
    default-voice: default_zh_female

In application-prod.yml, add the same emotion.tts block with production paths if not inherited.

Step 3: Add HTTP engine client

package com.emotion.service.impl;

import com.emotion.service.TtsEngineClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.http.ResponseEntity;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestTemplate;

import java.util.Map;

@Service
public class HttpTtsEngineClient implements TtsEngineClient {
    private final RestTemplate restTemplate;

    @Value("${emotion.tts.engine-url:http://127.0.0.1:19110}")
    private String engineUrl;

    public HttpTtsEngineClient(RestTemplate restTemplate) {
        this.restTemplate = restTemplate;
    }

    @Override
    public TtsEngineResult synthesize(String text, String voice, String outputPath) {
        try {
            Map<String, Object> body = Map.of("text", text, "voice", voice, "outputPath", outputPath);
            ResponseEntity<Map> response = restTemplate.postForEntity(engineUrl + "/synthesize", body, Map.class);
            Map<?, ?> data = response.getBody();
            boolean success = data != null && Boolean.TRUE.equals(data.get("success"));
            if (!success) {
                return new TtsEngineResult(false, null, null, String.valueOf(data != null ? data.get("errorMessage") : "empty response"));
            }
            Long durationMs = data.get("durationMs") instanceof Number ? ((Number) data.get("durationMs")).longValue() : null;
            return new TtsEngineResult(true, String.valueOf(data.get("audioPath")), durationMs, null);
        } catch (Exception e) {
            return new TtsEngineResult(false, null, null, e.getMessage());
        }
    }
}

Step 4: Add task service implementation

package com.emotion.service.impl;

import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import com.baomidou.mybatisplus.extension.service.impl.ServiceImpl;
import com.emotion.dto.request.tts.TtsTaskCreateRequest;
import com.emotion.dto.response.tts.TtsTaskResponse;
import com.emotion.entity.EpicScript;
import com.emotion.entity.TtsTask;
import com.emotion.mapper.EpicScriptMapper;
import com.emotion.mapper.TtsTaskMapper;
import com.emotion.service.TtsEngineClient;
import com.emotion.service.TtsTaskService;
import com.emotion.util.UserContextHolder;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import org.springframework.util.DigestUtils;

import java.nio.charset.StandardCharsets;
import java.util.concurrent.CompletableFuture;

@Service
public class TtsTaskServiceImpl extends ServiceImpl<TtsTaskMapper, TtsTask> implements TtsTaskService {
    private final EpicScriptMapper epicScriptMapper;
    private final TtsEngineClient ttsEngineClient;

    @Value("${emotion.tts.output-dir:/data/uploads/emotion-museum/tts}")
    private String outputDir;
    @Value("${emotion.tts.public-url-prefix:/uploads/emotion-museum/tts}")
    private String publicUrlPrefix;
    @Value("${emotion.tts.max-text-length:5000}")
    private int maxTextLength;
    @Value("${emotion.tts.default-voice:default_zh_female}")
    private String defaultVoice;

    public TtsTaskServiceImpl(EpicScriptMapper epicScriptMapper, TtsEngineClient ttsEngineClient) {
        this.epicScriptMapper = epicScriptMapper;
        this.ttsEngineClient = ttsEngineClient;
    }

    @Override
    public TtsTaskResponse createOrReuse(TtsTaskCreateRequest request) {
        String userId = UserContextHolder.getCurrentUserId();
        String voice = request.getVoice() == null || request.getVoice().isBlank() ? defaultVoice : request.getVoice();
        String text = loadSourceText(userId, request.getSourceType(), request.getSourceId());
        String cleaned = cleanText(text);
        if (cleaned.length() > maxTextLength) {
            cleaned = cleaned.substring(0, maxTextLength);
        }
        String hash = DigestUtils.md5DigestAsHex((voice + "\n" + cleaned).getBytes(StandardCharsets.UTF_8));

        TtsTask existing = getOne(new LambdaQueryWrapper<TtsTask>()
                .eq(TtsTask::getTextHash, hash)
                .eq(TtsTask::getVoice, voice)
                .eq(TtsTask::getIsDeleted, 0)
                .last("LIMIT 1"));
        if (existing != null) {
            existing.setRequestCount((existing.getRequestCount() == null ? 0 : existing.getRequestCount()) + 1);
            updateById(existing);
            return toResponse(existing);
        }

        String filename = hash + ".mp3";
        TtsTask task = TtsTask.builder()
                .userId(userId)
                .sourceType(request.getSourceType())
                .sourceId(request.getSourceId())
                .textHash(hash)
                .textLength(cleaned.length())
                .voice(voice)
                .status("pending")
                .audioPath(outputDir + "/" + filename)
                .audioUrl(publicUrlPrefix + "/" + filename)
                .requestCount(1)
                .build();
        save(task);
        CompletableFuture.runAsync(() -> process(task.getId(), cleaned, voice, task.getAudioPath()));
        return toResponse(task);
    }

    @Override
    public TtsTaskResponse getTask(String id) {
        TtsTask task = getById(id);
        String userId = UserContextHolder.getCurrentUserId();
        if (task == null || !userId.equals(task.getUserId())) {
            return null;
        }
        return toResponse(task);
    }

    @Override
    public TtsTaskResponse getBySource(String sourceType, String sourceId, String voice) {
        String userId = UserContextHolder.getCurrentUserId();
        TtsTask task = getOne(new LambdaQueryWrapper<TtsTask>()
                .eq(TtsTask::getUserId, userId)
                .eq(TtsTask::getSourceType, sourceType)
                .eq(TtsTask::getSourceId, sourceId)
                .eq(TtsTask::getVoice, voice == null || voice.isBlank() ? defaultVoice : voice)
                .eq(TtsTask::getIsDeleted, 0)
                .orderByDesc(TtsTask::getCreateTime)
                .last("LIMIT 1"));
        return task == null ? null : toResponse(task);
    }

    private void process(String taskId, String text, String voice, String outputPath) {
        TtsTask task = getById(taskId);
        if (task == null) return;
        task.setStatus("processing");
        updateById(task);
        TtsEngineClient.TtsEngineResult result = ttsEngineClient.synthesize(text, voice, outputPath);
        task = getById(taskId);
        if (result.isSuccess()) {
            task.setStatus("success");
            task.setDurationMs(result.getDurationMs());
        } else {
            task.setStatus("failed");
            task.setErrorMessage(result.getErrorMessage());
        }
        updateById(task);
    }

    private String loadSourceText(String userId, String sourceType, String sourceId) {
        if (!"epic_script".equals(sourceType)) {
            throw new IllegalArgumentException("Unsupported sourceType");
        }
        EpicScript script = epicScriptMapper.selectById(sourceId);
        if (script == null || !userId.equals(script.getUserId())) {
            throw new IllegalArgumentException("Script not found");
        }
        StringBuilder text = new StringBuilder();
        if (script.getTitle() != null) text.append(script.getTitle()).append("\n\n");
        if (script.getPlotIntro() != null) text.append(script.getPlotIntro()).append("\n\n");
        if (script.getPlotTurning() != null) text.append(script.getPlotTurning()).append("\n\n");
        if (script.getPlotClimax() != null) text.append(script.getPlotClimax()).append("\n\n");
        if (script.getPlotEnding() != null) text.append(script.getPlotEnding()).append("\n\n");
        if (script.getPlotJson() != null && script.getPlotJson().get("fullContent") != null) {
            text.append(script.getPlotJson().get("fullContent"));
        }
        return text.toString();
    }

    public static String cleanText(String text) {
        if (text == null) return "";
        return text.replaceAll("[#>*_`\\-]", "")
                .replaceAll("\\s+", " ")
                .trim();
    }

    private TtsTaskResponse toResponse(TtsTask task) {
        return TtsTaskResponse.builder()
                .id(task.getId())
                .sourceType(task.getSourceType())
                .sourceId(task.getSourceId())
                .status(task.getStatus())
                .voice(task.getVoice())
                .audioUrl("success".equals(task.getStatus()) ? task.getAudioUrl() : null)
                .durationMs(task.getDurationMs())
                .errorMessage(task.getErrorMessage())
                .build();
    }
}

Step 5: Run compile

cd backend-single
mvn -DskipTests compile

Expected: BUILD SUCCESS.

Step 6: Commit

git add backend-single/src/main/java/com/emotion/service/TtsTaskService.java backend-single/src/main/java/com/emotion/service/TtsEngineClient.java backend-single/src/main/java/com/emotion/service/impl/HttpTtsEngineClient.java backend-single/src/main/java/com/emotion/service/impl/TtsTaskServiceImpl.java backend-single/src/main/resources/application.yml backend-single/src/main/resources/application-prod.yml
git commit -m "feat: add tts backend service"

Task 4: Add TTS Controller

Files:

Create: backend-single/src/main/java/com/emotion/controller/TtsController.java
Step 1: Add controller

package com.emotion.controller;

import com.emotion.common.Result;
import com.emotion.dto.request.tts.TtsTaskCreateRequest;
import com.emotion.dto.response.tts.TtsTaskResponse;
import com.emotion.service.TtsTaskService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.*;

import javax.validation.Valid;

@RestController
@RequestMapping("/tts")
public class TtsController {
    @Autowired
    private TtsTaskService ttsTaskService;

    @PostMapping("/tasks")
    public Result<TtsTaskResponse> create(@Valid @RequestBody TtsTaskCreateRequest request) {
        try {
            return Result.success(ttsTaskService.createOrReuse(request));
        } catch (IllegalArgumentException e) {
            return Result.badRequest(e.getMessage());
        }
    }

    @GetMapping("/tasks/{id}")
    public Result<TtsTaskResponse> detail(@PathVariable String id) {
        TtsTaskResponse response = ttsTaskService.getTask(id);
        return response == null ? Result.notFound("TTS task not found") : Result.success(response);
    }

    @GetMapping("/tasks/by-source")
    public Result<TtsTaskResponse> bySource(@RequestParam String sourceType, @RequestParam String sourceId, @RequestParam(required = false) String voice) {
        TtsTaskResponse response = ttsTaskService.getBySource(sourceType, sourceId, voice);
        return Result.success(response);
    }
}

Step 2: Run compile

cd backend-single
mvn -DskipTests compile

Expected: BUILD SUCCESS.

Step 3: Commit

git add backend-single/src/main/java/com/emotion/controller/TtsController.java
git commit -m "feat: expose tts task APIs"

Task 5: Add Python TTS Service Skeleton

Files:

Create: backend-single/tts-service/requirements.txt
Create: backend-single/tts-service/app.py
Create: backend-single/tts-service/README.md
Create: backend-single/tts-service/emotion-museum-tts.service
Step 1: Add requirements

fastapi==0.111.0
uvicorn[standard]==0.30.1
pydantic==2.7.4

MeloTTS itself is installed with:

git clone https://github.com/myshell-ai/MeloTTS.git /data/programs/MeloTTS
cd /data/programs/MeloTTS
/data/programs/emotion-museum/tts-service/.venv/bin/pip install -e .
/data/programs/emotion-museum/tts-service/.venv/bin/python -m unidic download

Step 2: Add FastAPI app

from pathlib import Path
from threading import Lock
from pydantic import BaseModel, Field
from fastapi import FastAPI

app = FastAPI(title="Emotion Museum TTS")

_model = None
_speaker_ids = None
_model_lock = Lock()


class SynthesizeRequest(BaseModel):
    text: str = Field(min_length=1, max_length=5000)
    voice: str = "default_zh_female"
    outputPath: str


def get_model():
    global _model, _speaker_ids
    with _model_lock:
        if _model is None:
            from melo.api import TTS

            _model = TTS(language="ZH", device="cpu")
            _speaker_ids = _model.hps.data.spk2id
        return _model, _speaker_ids


@app.get("/health")
def health():
    return {"status": "ok"}


@app.post("/synthesize")
def synthesize(request: SynthesizeRequest):
    output = Path(request.outputPath)
    output.parent.mkdir(parents=True, exist_ok=True)

    try:
        model, speaker_ids = get_model()
        speaker_id = speaker_ids.get("ZH")
        model.tts_to_file(request.text, speaker_id, str(output), speed=1.0)
    except Exception as exc:
        return {
            "success": False,
            "audioPath": None,
            "durationMs": None,
            "engine": "melotts",
            "errorMessage": str(exc),
        }

    return {
        "success": True,
        "audioPath": str(output),
        "durationMs": None,
        "engine": "melotts",
    }

Step 3: Add systemd unit

[Unit]
Description=Emotion Museum TTS Service
After=network.target

[Service]
Type=simple
WorkingDirectory=/data/programs/emotion-museum/tts-service
ExecStart=/data/programs/emotion-museum/tts-service/.venv/bin/uvicorn app:app --host 127.0.0.1 --port 19110
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Step 4: Add README commands

# Emotion Museum TTS Service

Install on `101.200.208.45`:

```bash
cd /data/programs/emotion-museum/tts-service
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt

git clone https://github.com/myshell-ai/MeloTTS.git /data/programs/MeloTTS
cd /data/programs/MeloTTS
/data/programs/emotion-museum/tts-service/.venv/bin/pip install -e .
/data/programs/emotion-museum/tts-service/.venv/bin/python -m unidic download

cd /data/programs/emotion-museum/tts-service
uvicorn app:app --host 127.0.0.1 --port 19110
curl http://127.0.0.1:19110/health


- [ ] **Step 5: Run local syntax check**

```bash
python -m py_compile backend-single/tts-service/app.py

Expected: no output and exit code 0.

Step 6: Commit

git add backend-single/tts-service
git commit -m "feat: add private tts service scaffold"

Task 6: Add Mini Program TTS Client and Player

Files:

Create: mini-program/src/services/tts.js
Create: mini-program/src/components/ScriptAudioPlayer.vue
Step 1: Add TTS client

import { get, post } from './request.js'

export const createTtsTask = ({ sourceType = 'epic_script', sourceId, voice = 'default_zh_female' }) => {
  return post('/tts/tasks', { sourceType, sourceId, voice })
}

export const getTtsTask = (id) => {
  return get(`/tts/tasks/${id}`)
}

export const getTtsTaskBySource = ({ sourceType = 'epic_script', sourceId, voice = 'default_zh_female' }) => {
  return get('/tts/tasks/by-source', { sourceType, sourceId, voice })
}

export default {
  createTtsTask,
  getTtsTask,
  getTtsTaskBySource
}

Step 2: Add player component

<template>
  <view class="script-audio-player">
    <button class="audio-button" :disabled="loading" @click="handleClick">
      {{ buttonText }}
    </button>
  </view>
</template>

<script setup>
import { computed, onUnmounted, ref, watch } from 'vue'
import { createTtsTask, getTtsTask, getTtsTaskBySource } from '../services/tts.js'
import analytics from '../services/analytics.js'

const props = defineProps({
  scriptId: { type: String, required: true }
})

const task = ref(null)
const loading = ref(false)
const playing = ref(false)
let audio = null
let timer = null

const buttonText = computed(() => {
  if (loading.value) return '正在生成'
  if (task.value?.status === 'success') return playing.value ? '暂停朗读' : '播放朗读'
  if (task.value?.status === 'failed') return '重试朗读'
  return '生成朗读'
})

const clearTimer = () => {
  if (timer) clearInterval(timer)
  timer = null
}

const pollTask = (id) => {
  clearTimer()
  timer = setInterval(async () => {
    const res = await getTtsTask(id)
    task.value = res.data
    if (task.value?.status === 'success' || task.value?.status === 'failed') {
      loading.value = false
      clearTimer()
      analytics.track(task.value.status === 'success' ? 'script_tts_success' : 'script_tts_error', {
        script_id: props.scriptId,
        task_id: id,
        error: task.value?.errorMessage || ''
      }, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
    }
  }, 2500)
}

const generate = async () => {
  loading.value = true
  analytics.track('script_tts_request', { script_id: props.scriptId }, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
  const res = await createTtsTask({ sourceId: props.scriptId })
  task.value = res.data
  if (task.value?.status === 'success') {
    loading.value = false
    return
  }
  if (task.value?.id) {
    pollTask(task.value.id)
  }
}

const play = () => {
  if (!task.value?.audioUrl) return
  if (!audio) {
    audio = uni.createInnerAudioContext()
    audio.src = task.value.audioUrl
    audio.onPlay(() => {
      playing.value = true
      analytics.track('script_tts_play', { script_id: props.scriptId, task_id: task.value.id }, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
    })
    audio.onPause(() => {
      playing.value = false
      analytics.track('script_tts_pause', { script_id: props.scriptId, task_id: task.value.id }, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
    })
    audio.onEnded(() => {
      playing.value = false
      analytics.track('script_tts_complete', { script_id: props.scriptId, task_id: task.value.id }, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
    })
    audio.onError((error) => {
      playing.value = false
      analytics.track('script_tts_error', { script_id: props.scriptId, task_id: task.value.id, error: error.errMsg || 'play failed' }, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
    })
  }
  if (playing.value) audio.pause()
  else audio.play()
}

const handleClick = async () => {
  if (task.value?.status === 'success') {
    play()
    return
  }
  await generate()
}

const loadExisting = async () => {
  if (!props.scriptId) return
  const res = await getTtsTaskBySource({ sourceId: props.scriptId })
  task.value = res.data
}

watch(() => props.scriptId, loadExisting, { immediate: true })

onUnmounted(() => {
  clearTimer()
  if (audio) {
    audio.stop()
    audio.destroy()
    audio = null
  }
})
</script>

<style scoped>
.script-audio-player {
  margin-top: 24rpx;
}
.audio-button {
  height: 72rpx;
  border-radius: 999rpx;
  color: #fff;
  font-size: 25rpx;
  font-weight: 800;
  background: linear-gradient(135deg, #24c6dc, #7f5af0);
}
</style>

Step 3: Build mini program

cd mini-program
npm run build:mp-weixin:test

Expected: build succeeds.

Step 4: Commit

git add mini-program/src/services/tts.js mini-program/src/components/ScriptAudioPlayer.vue
git commit -m "feat: add script audio player"

Task 7: Wire Player into Script Detail

Files:

Modify: mini-program/src/pages/main/ScriptDetailView.vue
Step 1: Import component

import ScriptAudioPlayer from '../../components/ScriptAudioPlayer.vue'

Step 2: Add component below the hero stats

In the .hero-card template, after the stats block:

<ScriptAudioPlayer v-if="script?.id" :script-id="script.id" />

Step 3: Build mini program

cd mini-program
npm run build:mp-weixin:test

Expected: build succeeds.

Step 4: Commit

git add mini-program/src/pages/main/ScriptDetailView.vue
git commit -m "feat: add tts control to script detail"

Task 8: Deploy and Verify TTS Service on Server

Files:

Modify deployment scripts only if manual deployment proves repetitive.
Step 1: Upload service directory

scp -r backend-single/tts-service root@101.200.208.45:/data/programs/emotion-museum/tts-service

Expected: files are copied.

Step 2: Install and start manually

ssh root@101.200.208.45 "cd /data/programs/emotion-museum/tts-service && python3 -m venv .venv && . .venv/bin/activate && pip install -r requirements.txt && if [ ! -d /data/programs/MeloTTS ]; then git clone https://github.com/myshell-ai/MeloTTS.git /data/programs/MeloTTS; fi && cd /data/programs/MeloTTS && /data/programs/emotion-museum/tts-service/.venv/bin/pip install -e . && /data/programs/emotion-museum/tts-service/.venv/bin/python -m unidic download && cd /data/programs/emotion-museum/tts-service && nohup .venv/bin/uvicorn app:app --host 127.0.0.1 --port 19110 > tts.log 2>&1 &"

Expected: command exits.

Step 3: Check health

ssh root@101.200.208.45 "curl -s http://127.0.0.1:19110/health"

Expected:

{"status":"ok"}

Step 4: Install systemd unit after health passes

scp backend-single/tts-service/emotion-museum-tts.service root@101.200.208.45:/etc/systemd/system/emotion-museum-tts.service
ssh root@101.200.208.45 "systemctl daemon-reload && systemctl enable emotion-museum-tts && systemctl restart emotion-museum-tts && systemctl status emotion-museum-tts --no-pager"

Expected: service status is active.

Step 5: Commit deployment doc tweaks if needed

git add backend-single/tts-service/README.md
git commit -m "docs: update tts deployment notes"

Only run this commit if README changed.

Task 9: Final TTS Verification

Files:

No code changes unless bugs are found.
Step 1: Run backend tests

cd backend-single
mvn test

Expected: BUILD SUCCESS.

Step 2: Run mini program build

cd mini-program
npm run build:mp-weixin:test

Expected: build succeeds.

Step 3: Smoke-test backend task API

With a logged-in mini program token and an owned script id:

curl -X POST http://localhost:19089/tts/tasks ^
  -H "Content-Type: application/json" ^
  -H "Authorization: Bearer <token>" ^
  -d "{\"sourceType\":\"epic_script\",\"sourceId\":\"<script-id>\",\"voice\":\"default_zh_female\"}"

Expected: response contains a task with pending, processing, or success. A failed status means the backend integration works but the Python engine failed and server logs must be checked.

Step 4: Manual mini program test

Open script detail in WeChat DevTools:

Tap 生成朗读.
Observe generating state.
If engine is installed, wait for 播放朗读.
Tap play and pause.
Leave page and verify no background playback continues.
Step 5: Commit fixes if needed

git add <changed-files>
git commit -m "fix: stabilize tts playback"

Only run this commit if verification required code changes.

Self-Review

Spec coverage: task table, backend task API, Python service, CPU/no-GPU assumption, cache, mini program playback, and analytics events are covered.
Placeholder scan: no task leaves the TTS engine as an unspecified later step; the plan uses the official MeloTTS local install and Python API.
Type consistency: request and response use sourceType, sourceId, voice, status, and audioUrl consistently across backend and mini program.

34 KiB Raw Blame History

Script Text-to-Speech Implementation Plan

File Structure

Defaults

Task 1: Add TTS SQL Migration

Task 2: Add Backend TTS Entity, DTOs, and Mapper

Task 3: Add TTS Service and Engine Client

Task 4: Add TTS Controller

Task 5: Add Python TTS Service Skeleton

Task 6: Add Mini Program TTS Client and Player

Task 7: Wire Player into Script Detail

Task 8: Deploy and Verify TTS Service on Server

Task 9: Final TTS Verification

Self-Review

34 KiB

Raw Blame History