1068 lines
34 KiB
Markdown
1068 lines
34 KiB
Markdown
# Script Text-to-Speech Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Add CPU-friendly open-source text-to-speech generation for user scripts, with cached backend tasks and mini program playback.
|
|
|
|
**Architecture:** `backend-single` creates TTS tasks, validates source ownership, cleans script text, checks audio cache, and calls a private Python FastAPI TTS service. The Python service runs on `101.200.208.45` and writes audio files under `/data/uploads/emotion-museum/tts`. The mini program requests generation from `ScriptDetailView.vue`, polls status, and plays the returned audio URL with `uni.createInnerAudioContext()`.
|
|
|
|
**Tech Stack:** Spring Boot 2.7, MyBatis Plus, Java 17, MySQL, Python FastAPI, MeloTTS or equivalent CPU-friendly Chinese TTS engine, systemd, uni-app/Vue.
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
- Create `sql/2026-05-17-tts-task.sql`: `t_tts_task` migration.
|
|
- Create `backend-single/src/main/java/com/emotion/entity/TtsTask.java`: task entity.
|
|
- Create `backend-single/src/main/java/com/emotion/mapper/TtsTaskMapper.java`: task mapper.
|
|
- Create `backend-single/src/main/java/com/emotion/dto/request/tts/TtsTaskCreateRequest.java`: create request.
|
|
- Create `backend-single/src/main/java/com/emotion/dto/response/tts/TtsTaskResponse.java`: task response.
|
|
- Create `backend-single/src/main/java/com/emotion/service/TtsTaskService.java`: contract.
|
|
- Create `backend-single/src/main/java/com/emotion/service/impl/TtsTaskServiceImpl.java`: ownership, cache, async processing.
|
|
- Create `backend-single/src/main/java/com/emotion/service/TtsEngineClient.java`: internal engine client contract.
|
|
- Create `backend-single/src/main/java/com/emotion/service/impl/HttpTtsEngineClient.java`: calls Python service.
|
|
- Create `backend-single/src/main/java/com/emotion/controller/TtsController.java`: user-facing TTS endpoints.
|
|
- Modify `backend-single/src/main/resources/application.yml`: TTS config defaults.
|
|
- Modify `backend-single/src/main/resources/application-prod.yml`: production paths and internal URL.
|
|
- Create `backend-single/src/test/java/com/emotion/service/TtsTaskServiceTest.java`: service tests.
|
|
- Create `backend-single/tts-service/requirements.txt`: FastAPI deps; MeloTTS is installed from the official repository because the official docs use editable install plus `python -m unidic download`.
|
|
- Create `backend-single/tts-service/app.py`: FastAPI service.
|
|
- Create `backend-single/tts-service/README.md`: server setup.
|
|
- Create `backend-single/tts-service/emotion-museum-tts.service`: systemd unit.
|
|
- Create `mini-program/src/services/tts.js`: TTS API client.
|
|
- Create `mini-program/src/components/ScriptAudioPlayer.vue`: player component.
|
|
- Modify `mini-program/src/pages/main/ScriptDetailView.vue`: add player.
|
|
- Modify `mini-program/src/services/analytics.js` only if analytics plan has not already exported `track`.
|
|
|
|
## Defaults
|
|
|
|
- `sourceType`: `epic_script`
|
|
- `voice`: `default_zh_female`
|
|
- Text limit: 5000 cleaned characters
|
|
- Python internal URL: `http://127.0.0.1:19110`
|
|
- Audio output directory: `/data/uploads/emotion-museum/tts`
|
|
- Public URL prefix: `/uploads/emotion-museum/tts` or the existing static upload prefix configured by backend/Nginx
|
|
|
|
---
|
|
|
|
### Task 1: Add TTS SQL Migration
|
|
|
|
**Files:**
|
|
- Create: `sql/2026-05-17-tts-task.sql`
|
|
|
|
- [ ] **Step 1: Create migration**
|
|
|
|
```sql
|
|
CREATE TABLE IF NOT EXISTS t_tts_task (
|
|
id VARCHAR(64) PRIMARY KEY COMMENT 'Primary key',
|
|
user_id VARCHAR(64) NOT NULL COMMENT 'Owner user id',
|
|
source_type VARCHAR(50) NOT NULL COMMENT 'Source type, for example epic_script',
|
|
source_id VARCHAR(64) NOT NULL COMMENT 'Source content id',
|
|
text_hash VARCHAR(128) NOT NULL COMMENT 'Hash of cleaned text and voice',
|
|
text_length INT NOT NULL COMMENT 'Cleaned text length',
|
|
voice VARCHAR(64) NOT NULL DEFAULT 'default_zh_female' COMMENT 'Voice id',
|
|
status VARCHAR(20) NOT NULL DEFAULT 'pending' COMMENT 'pending, processing, success, failed',
|
|
audio_url VARCHAR(500) NULL COMMENT 'Public audio URL',
|
|
audio_path VARCHAR(500) NULL COMMENT 'Server audio path',
|
|
duration_ms BIGINT NULL COMMENT 'Audio duration',
|
|
error_message VARCHAR(1000) NULL COMMENT 'Failure message',
|
|
request_count INT NOT NULL DEFAULT 1 COMMENT 'Cache hit request count',
|
|
create_by VARCHAR(64) NULL COMMENT 'Creator',
|
|
create_time DATETIME DEFAULT CURRENT_TIMESTAMP COMMENT 'Create time',
|
|
update_by VARCHAR(64) NULL COMMENT 'Updater',
|
|
update_time DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Update time',
|
|
is_deleted TINYINT DEFAULT 0 COMMENT 'Logic delete flag',
|
|
remarks VARCHAR(500) NULL COMMENT 'Remarks'
|
|
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='Text-to-speech task table';
|
|
|
|
CREATE INDEX idx_tts_task_user_source ON t_tts_task (user_id, source_type, source_id);
|
|
CREATE INDEX idx_tts_task_text_hash ON t_tts_task (text_hash);
|
|
CREATE INDEX idx_tts_task_status ON t_tts_task (status);
|
|
CREATE INDEX idx_tts_task_create_time ON t_tts_task (create_time);
|
|
```
|
|
|
|
- [ ] **Step 2: Commit**
|
|
|
|
```bash
|
|
git add sql/2026-05-17-tts-task.sql
|
|
git commit -m "feat: add tts task table"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 2: Add Backend TTS Entity, DTOs, and Mapper
|
|
|
|
**Files:**
|
|
- Create: `backend-single/src/main/java/com/emotion/entity/TtsTask.java`
|
|
- Create: `backend-single/src/main/java/com/emotion/mapper/TtsTaskMapper.java`
|
|
- Create: `backend-single/src/main/java/com/emotion/dto/request/tts/TtsTaskCreateRequest.java`
|
|
- Create: `backend-single/src/main/java/com/emotion/dto/response/tts/TtsTaskResponse.java`
|
|
|
|
- [ ] **Step 1: Add entity**
|
|
|
|
```java
|
|
package com.emotion.entity;
|
|
|
|
import com.baomidou.mybatisplus.annotation.TableField;
|
|
import com.baomidou.mybatisplus.annotation.TableName;
|
|
import com.emotion.common.BaseEntity;
|
|
import lombok.*;
|
|
import lombok.experimental.SuperBuilder;
|
|
|
|
@Data
|
|
@EqualsAndHashCode(callSuper = true)
|
|
@SuperBuilder
|
|
@NoArgsConstructor
|
|
@AllArgsConstructor
|
|
@TableName("t_tts_task")
|
|
public class TtsTask extends BaseEntity {
|
|
@TableField("user_id")
|
|
private String userId;
|
|
@TableField("source_type")
|
|
private String sourceType;
|
|
@TableField("source_id")
|
|
private String sourceId;
|
|
@TableField("text_hash")
|
|
private String textHash;
|
|
@TableField("text_length")
|
|
private Integer textLength;
|
|
@TableField("voice")
|
|
private String voice;
|
|
@TableField("status")
|
|
private String status;
|
|
@TableField("audio_url")
|
|
private String audioUrl;
|
|
@TableField("audio_path")
|
|
private String audioPath;
|
|
@TableField("duration_ms")
|
|
private Long durationMs;
|
|
@TableField("error_message")
|
|
private String errorMessage;
|
|
@TableField("request_count")
|
|
private Integer requestCount;
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Add mapper**
|
|
|
|
```java
|
|
package com.emotion.mapper;
|
|
|
|
import com.baomidou.mybatisplus.core.mapper.BaseMapper;
|
|
import com.emotion.entity.TtsTask;
|
|
import org.apache.ibatis.annotations.Mapper;
|
|
|
|
@Mapper
|
|
public interface TtsTaskMapper extends BaseMapper<TtsTask> {
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 3: Add DTOs**
|
|
|
|
```java
|
|
package com.emotion.dto.request.tts;
|
|
|
|
import lombok.Data;
|
|
|
|
import javax.validation.constraints.NotBlank;
|
|
import javax.validation.constraints.Size;
|
|
|
|
@Data
|
|
public class TtsTaskCreateRequest {
|
|
@NotBlank
|
|
@Size(max = 50)
|
|
private String sourceType;
|
|
|
|
@NotBlank
|
|
@Size(max = 64)
|
|
private String sourceId;
|
|
|
|
@Size(max = 64)
|
|
private String voice;
|
|
}
|
|
```
|
|
|
|
```java
|
|
package com.emotion.dto.response.tts;
|
|
|
|
import lombok.Builder;
|
|
import lombok.Data;
|
|
|
|
@Data
|
|
@Builder
|
|
public class TtsTaskResponse {
|
|
private String id;
|
|
private String sourceType;
|
|
private String sourceId;
|
|
private String status;
|
|
private String voice;
|
|
private String audioUrl;
|
|
private Long durationMs;
|
|
private String errorMessage;
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 4: Compile**
|
|
|
|
```bash
|
|
cd backend-single
|
|
mvn -DskipTests compile
|
|
```
|
|
|
|
Expected: `BUILD SUCCESS`.
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add backend-single/src/main/java/com/emotion/entity/TtsTask.java backend-single/src/main/java/com/emotion/mapper/TtsTaskMapper.java backend-single/src/main/java/com/emotion/dto/request/tts backend-single/src/main/java/com/emotion/dto/response/tts
|
|
git commit -m "feat: add tts task model"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 3: Add TTS Service and Engine Client
|
|
|
|
**Files:**
|
|
- Create: `backend-single/src/main/java/com/emotion/service/TtsTaskService.java`
|
|
- Create: `backend-single/src/main/java/com/emotion/service/TtsEngineClient.java`
|
|
- Create: `backend-single/src/main/java/com/emotion/service/impl/HttpTtsEngineClient.java`
|
|
- Create: `backend-single/src/main/java/com/emotion/service/impl/TtsTaskServiceImpl.java`
|
|
- Modify: `backend-single/src/main/resources/application.yml`
|
|
- Modify: `backend-single/src/main/resources/application-prod.yml`
|
|
- Test: `backend-single/src/test/java/com/emotion/service/TtsTaskServiceTest.java`
|
|
|
|
- [ ] **Step 1: Add service contracts**
|
|
|
|
```java
|
|
package com.emotion.service;
|
|
|
|
import com.baomidou.mybatisplus.extension.service.IService;
|
|
import com.emotion.dto.request.tts.TtsTaskCreateRequest;
|
|
import com.emotion.dto.response.tts.TtsTaskResponse;
|
|
import com.emotion.entity.TtsTask;
|
|
|
|
public interface TtsTaskService extends IService<TtsTask> {
|
|
TtsTaskResponse createOrReuse(TtsTaskCreateRequest request);
|
|
TtsTaskResponse getTask(String id);
|
|
TtsTaskResponse getBySource(String sourceType, String sourceId, String voice);
|
|
}
|
|
```
|
|
|
|
```java
|
|
package com.emotion.service;
|
|
|
|
public interface TtsEngineClient {
|
|
TtsEngineResult synthesize(String text, String voice, String outputPath);
|
|
|
|
class TtsEngineResult {
|
|
private final boolean success;
|
|
private final String audioPath;
|
|
private final Long durationMs;
|
|
private final String errorMessage;
|
|
|
|
public TtsEngineResult(boolean success, String audioPath, Long durationMs, String errorMessage) {
|
|
this.success = success;
|
|
this.audioPath = audioPath;
|
|
this.durationMs = durationMs;
|
|
this.errorMessage = errorMessage;
|
|
}
|
|
|
|
public boolean isSuccess() { return success; }
|
|
public String getAudioPath() { return audioPath; }
|
|
public Long getDurationMs() { return durationMs; }
|
|
public String getErrorMessage() { return errorMessage; }
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Add application config**
|
|
|
|
In `application.yml`:
|
|
|
|
```yaml
|
|
emotion:
|
|
tts:
|
|
enabled: true
|
|
engine-url: http://127.0.0.1:19110
|
|
output-dir: /data/uploads/emotion-museum/tts
|
|
public-url-prefix: /uploads/emotion-museum/tts
|
|
max-text-length: 5000
|
|
default-voice: default_zh_female
|
|
```
|
|
|
|
In `application-prod.yml`, add the same `emotion.tts` block with production paths if not inherited.
|
|
|
|
- [ ] **Step 3: Add HTTP engine client**
|
|
|
|
```java
|
|
package com.emotion.service.impl;
|
|
|
|
import com.emotion.service.TtsEngineClient;
|
|
import org.springframework.beans.factory.annotation.Value;
|
|
import org.springframework.http.ResponseEntity;
|
|
import org.springframework.stereotype.Service;
|
|
import org.springframework.web.client.RestTemplate;
|
|
|
|
import java.util.Map;
|
|
|
|
@Service
|
|
public class HttpTtsEngineClient implements TtsEngineClient {
|
|
private final RestTemplate restTemplate;
|
|
|
|
@Value("${emotion.tts.engine-url:http://127.0.0.1:19110}")
|
|
private String engineUrl;
|
|
|
|
public HttpTtsEngineClient(RestTemplate restTemplate) {
|
|
this.restTemplate = restTemplate;
|
|
}
|
|
|
|
@Override
|
|
public TtsEngineResult synthesize(String text, String voice, String outputPath) {
|
|
try {
|
|
Map<String, Object> body = Map.of("text", text, "voice", voice, "outputPath", outputPath);
|
|
ResponseEntity<Map> response = restTemplate.postForEntity(engineUrl + "/synthesize", body, Map.class);
|
|
Map<?, ?> data = response.getBody();
|
|
boolean success = data != null && Boolean.TRUE.equals(data.get("success"));
|
|
if (!success) {
|
|
return new TtsEngineResult(false, null, null, String.valueOf(data != null ? data.get("errorMessage") : "empty response"));
|
|
}
|
|
Long durationMs = data.get("durationMs") instanceof Number ? ((Number) data.get("durationMs")).longValue() : null;
|
|
return new TtsEngineResult(true, String.valueOf(data.get("audioPath")), durationMs, null);
|
|
} catch (Exception e) {
|
|
return new TtsEngineResult(false, null, null, e.getMessage());
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 4: Add task service implementation**
|
|
|
|
```java
|
|
package com.emotion.service.impl;
|
|
|
|
import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
|
|
import com.baomidou.mybatisplus.extension.service.impl.ServiceImpl;
|
|
import com.emotion.dto.request.tts.TtsTaskCreateRequest;
|
|
import com.emotion.dto.response.tts.TtsTaskResponse;
|
|
import com.emotion.entity.EpicScript;
|
|
import com.emotion.entity.TtsTask;
|
|
import com.emotion.mapper.EpicScriptMapper;
|
|
import com.emotion.mapper.TtsTaskMapper;
|
|
import com.emotion.service.TtsEngineClient;
|
|
import com.emotion.service.TtsTaskService;
|
|
import com.emotion.util.UserContextHolder;
|
|
import org.springframework.beans.factory.annotation.Value;
|
|
import org.springframework.stereotype.Service;
|
|
import org.springframework.util.DigestUtils;
|
|
|
|
import java.nio.charset.StandardCharsets;
|
|
import java.util.concurrent.CompletableFuture;
|
|
|
|
@Service
|
|
public class TtsTaskServiceImpl extends ServiceImpl<TtsTaskMapper, TtsTask> implements TtsTaskService {
|
|
private final EpicScriptMapper epicScriptMapper;
|
|
private final TtsEngineClient ttsEngineClient;
|
|
|
|
@Value("${emotion.tts.output-dir:/data/uploads/emotion-museum/tts}")
|
|
private String outputDir;
|
|
@Value("${emotion.tts.public-url-prefix:/uploads/emotion-museum/tts}")
|
|
private String publicUrlPrefix;
|
|
@Value("${emotion.tts.max-text-length:5000}")
|
|
private int maxTextLength;
|
|
@Value("${emotion.tts.default-voice:default_zh_female}")
|
|
private String defaultVoice;
|
|
|
|
public TtsTaskServiceImpl(EpicScriptMapper epicScriptMapper, TtsEngineClient ttsEngineClient) {
|
|
this.epicScriptMapper = epicScriptMapper;
|
|
this.ttsEngineClient = ttsEngineClient;
|
|
}
|
|
|
|
@Override
|
|
public TtsTaskResponse createOrReuse(TtsTaskCreateRequest request) {
|
|
String userId = UserContextHolder.getCurrentUserId();
|
|
String voice = request.getVoice() == null || request.getVoice().isBlank() ? defaultVoice : request.getVoice();
|
|
String text = loadSourceText(userId, request.getSourceType(), request.getSourceId());
|
|
String cleaned = cleanText(text);
|
|
if (cleaned.length() > maxTextLength) {
|
|
cleaned = cleaned.substring(0, maxTextLength);
|
|
}
|
|
String hash = DigestUtils.md5DigestAsHex((voice + "\n" + cleaned).getBytes(StandardCharsets.UTF_8));
|
|
|
|
TtsTask existing = getOne(new LambdaQueryWrapper<TtsTask>()
|
|
.eq(TtsTask::getTextHash, hash)
|
|
.eq(TtsTask::getVoice, voice)
|
|
.eq(TtsTask::getIsDeleted, 0)
|
|
.last("LIMIT 1"));
|
|
if (existing != null) {
|
|
existing.setRequestCount((existing.getRequestCount() == null ? 0 : existing.getRequestCount()) + 1);
|
|
updateById(existing);
|
|
return toResponse(existing);
|
|
}
|
|
|
|
String filename = hash + ".mp3";
|
|
TtsTask task = TtsTask.builder()
|
|
.userId(userId)
|
|
.sourceType(request.getSourceType())
|
|
.sourceId(request.getSourceId())
|
|
.textHash(hash)
|
|
.textLength(cleaned.length())
|
|
.voice(voice)
|
|
.status("pending")
|
|
.audioPath(outputDir + "/" + filename)
|
|
.audioUrl(publicUrlPrefix + "/" + filename)
|
|
.requestCount(1)
|
|
.build();
|
|
save(task);
|
|
CompletableFuture.runAsync(() -> process(task.getId(), cleaned, voice, task.getAudioPath()));
|
|
return toResponse(task);
|
|
}
|
|
|
|
@Override
|
|
public TtsTaskResponse getTask(String id) {
|
|
TtsTask task = getById(id);
|
|
String userId = UserContextHolder.getCurrentUserId();
|
|
if (task == null || !userId.equals(task.getUserId())) {
|
|
return null;
|
|
}
|
|
return toResponse(task);
|
|
}
|
|
|
|
@Override
|
|
public TtsTaskResponse getBySource(String sourceType, String sourceId, String voice) {
|
|
String userId = UserContextHolder.getCurrentUserId();
|
|
TtsTask task = getOne(new LambdaQueryWrapper<TtsTask>()
|
|
.eq(TtsTask::getUserId, userId)
|
|
.eq(TtsTask::getSourceType, sourceType)
|
|
.eq(TtsTask::getSourceId, sourceId)
|
|
.eq(TtsTask::getVoice, voice == null || voice.isBlank() ? defaultVoice : voice)
|
|
.eq(TtsTask::getIsDeleted, 0)
|
|
.orderByDesc(TtsTask::getCreateTime)
|
|
.last("LIMIT 1"));
|
|
return task == null ? null : toResponse(task);
|
|
}
|
|
|
|
private void process(String taskId, String text, String voice, String outputPath) {
|
|
TtsTask task = getById(taskId);
|
|
if (task == null) return;
|
|
task.setStatus("processing");
|
|
updateById(task);
|
|
TtsEngineClient.TtsEngineResult result = ttsEngineClient.synthesize(text, voice, outputPath);
|
|
task = getById(taskId);
|
|
if (result.isSuccess()) {
|
|
task.setStatus("success");
|
|
task.setDurationMs(result.getDurationMs());
|
|
} else {
|
|
task.setStatus("failed");
|
|
task.setErrorMessage(result.getErrorMessage());
|
|
}
|
|
updateById(task);
|
|
}
|
|
|
|
private String loadSourceText(String userId, String sourceType, String sourceId) {
|
|
if (!"epic_script".equals(sourceType)) {
|
|
throw new IllegalArgumentException("Unsupported sourceType");
|
|
}
|
|
EpicScript script = epicScriptMapper.selectById(sourceId);
|
|
if (script == null || !userId.equals(script.getUserId())) {
|
|
throw new IllegalArgumentException("Script not found");
|
|
}
|
|
StringBuilder text = new StringBuilder();
|
|
if (script.getTitle() != null) text.append(script.getTitle()).append("\n\n");
|
|
if (script.getPlotIntro() != null) text.append(script.getPlotIntro()).append("\n\n");
|
|
if (script.getPlotTurning() != null) text.append(script.getPlotTurning()).append("\n\n");
|
|
if (script.getPlotClimax() != null) text.append(script.getPlotClimax()).append("\n\n");
|
|
if (script.getPlotEnding() != null) text.append(script.getPlotEnding()).append("\n\n");
|
|
if (script.getPlotJson() != null && script.getPlotJson().get("fullContent") != null) {
|
|
text.append(script.getPlotJson().get("fullContent"));
|
|
}
|
|
return text.toString();
|
|
}
|
|
|
|
public static String cleanText(String text) {
|
|
if (text == null) return "";
|
|
return text.replaceAll("[#>*_`\\-]", "")
|
|
.replaceAll("\\s+", " ")
|
|
.trim();
|
|
}
|
|
|
|
private TtsTaskResponse toResponse(TtsTask task) {
|
|
return TtsTaskResponse.builder()
|
|
.id(task.getId())
|
|
.sourceType(task.getSourceType())
|
|
.sourceId(task.getSourceId())
|
|
.status(task.getStatus())
|
|
.voice(task.getVoice())
|
|
.audioUrl("success".equals(task.getStatus()) ? task.getAudioUrl() : null)
|
|
.durationMs(task.getDurationMs())
|
|
.errorMessage(task.getErrorMessage())
|
|
.build();
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 5: Run compile**
|
|
|
|
```bash
|
|
cd backend-single
|
|
mvn -DskipTests compile
|
|
```
|
|
|
|
Expected: `BUILD SUCCESS`.
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add backend-single/src/main/java/com/emotion/service/TtsTaskService.java backend-single/src/main/java/com/emotion/service/TtsEngineClient.java backend-single/src/main/java/com/emotion/service/impl/HttpTtsEngineClient.java backend-single/src/main/java/com/emotion/service/impl/TtsTaskServiceImpl.java backend-single/src/main/resources/application.yml backend-single/src/main/resources/application-prod.yml
|
|
git commit -m "feat: add tts backend service"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 4: Add TTS Controller
|
|
|
|
**Files:**
|
|
- Create: `backend-single/src/main/java/com/emotion/controller/TtsController.java`
|
|
|
|
- [ ] **Step 1: Add controller**
|
|
|
|
```java
|
|
package com.emotion.controller;
|
|
|
|
import com.emotion.common.Result;
|
|
import com.emotion.dto.request.tts.TtsTaskCreateRequest;
|
|
import com.emotion.dto.response.tts.TtsTaskResponse;
|
|
import com.emotion.service.TtsTaskService;
|
|
import org.springframework.beans.factory.annotation.Autowired;
|
|
import org.springframework.web.bind.annotation.*;
|
|
|
|
import javax.validation.Valid;
|
|
|
|
@RestController
|
|
@RequestMapping("/tts")
|
|
public class TtsController {
|
|
@Autowired
|
|
private TtsTaskService ttsTaskService;
|
|
|
|
@PostMapping("/tasks")
|
|
public Result<TtsTaskResponse> create(@Valid @RequestBody TtsTaskCreateRequest request) {
|
|
try {
|
|
return Result.success(ttsTaskService.createOrReuse(request));
|
|
} catch (IllegalArgumentException e) {
|
|
return Result.badRequest(e.getMessage());
|
|
}
|
|
}
|
|
|
|
@GetMapping("/tasks/{id}")
|
|
public Result<TtsTaskResponse> detail(@PathVariable String id) {
|
|
TtsTaskResponse response = ttsTaskService.getTask(id);
|
|
return response == null ? Result.notFound("TTS task not found") : Result.success(response);
|
|
}
|
|
|
|
@GetMapping("/tasks/by-source")
|
|
public Result<TtsTaskResponse> bySource(@RequestParam String sourceType, @RequestParam String sourceId, @RequestParam(required = false) String voice) {
|
|
TtsTaskResponse response = ttsTaskService.getBySource(sourceType, sourceId, voice);
|
|
return Result.success(response);
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Run compile**
|
|
|
|
```bash
|
|
cd backend-single
|
|
mvn -DskipTests compile
|
|
```
|
|
|
|
Expected: `BUILD SUCCESS`.
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add backend-single/src/main/java/com/emotion/controller/TtsController.java
|
|
git commit -m "feat: expose tts task APIs"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 5: Add Python TTS Service Skeleton
|
|
|
|
**Files:**
|
|
- Create: `backend-single/tts-service/requirements.txt`
|
|
- Create: `backend-single/tts-service/app.py`
|
|
- Create: `backend-single/tts-service/README.md`
|
|
- Create: `backend-single/tts-service/emotion-museum-tts.service`
|
|
|
|
- [ ] **Step 1: Add requirements**
|
|
|
|
```text
|
|
fastapi==0.111.0
|
|
uvicorn[standard]==0.30.1
|
|
pydantic==2.7.4
|
|
```
|
|
|
|
MeloTTS itself is installed with:
|
|
|
|
```bash
|
|
git clone https://github.com/myshell-ai/MeloTTS.git /data/programs/MeloTTS
|
|
cd /data/programs/MeloTTS
|
|
/data/programs/emotion-museum/tts-service/.venv/bin/pip install -e .
|
|
/data/programs/emotion-museum/tts-service/.venv/bin/python -m unidic download
|
|
```
|
|
|
|
- [ ] **Step 2: Add FastAPI app**
|
|
|
|
```python
|
|
from pathlib import Path
|
|
from threading import Lock
|
|
from pydantic import BaseModel, Field
|
|
from fastapi import FastAPI
|
|
|
|
app = FastAPI(title="Emotion Museum TTS")
|
|
|
|
_model = None
|
|
_speaker_ids = None
|
|
_model_lock = Lock()
|
|
|
|
|
|
class SynthesizeRequest(BaseModel):
|
|
text: str = Field(min_length=1, max_length=5000)
|
|
voice: str = "default_zh_female"
|
|
outputPath: str
|
|
|
|
|
|
def get_model():
|
|
global _model, _speaker_ids
|
|
with _model_lock:
|
|
if _model is None:
|
|
from melo.api import TTS
|
|
|
|
_model = TTS(language="ZH", device="cpu")
|
|
_speaker_ids = _model.hps.data.spk2id
|
|
return _model, _speaker_ids
|
|
|
|
|
|
@app.get("/health")
|
|
def health():
|
|
return {"status": "ok"}
|
|
|
|
|
|
@app.post("/synthesize")
|
|
def synthesize(request: SynthesizeRequest):
|
|
output = Path(request.outputPath)
|
|
output.parent.mkdir(parents=True, exist_ok=True)
|
|
|
|
try:
|
|
model, speaker_ids = get_model()
|
|
speaker_id = speaker_ids.get("ZH")
|
|
model.tts_to_file(request.text, speaker_id, str(output), speed=1.0)
|
|
except Exception as exc:
|
|
return {
|
|
"success": False,
|
|
"audioPath": None,
|
|
"durationMs": None,
|
|
"engine": "melotts",
|
|
"errorMessage": str(exc),
|
|
}
|
|
|
|
return {
|
|
"success": True,
|
|
"audioPath": str(output),
|
|
"durationMs": None,
|
|
"engine": "melotts",
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 3: Add systemd unit**
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=Emotion Museum TTS Service
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
WorkingDirectory=/data/programs/emotion-museum/tts-service
|
|
ExecStart=/data/programs/emotion-museum/tts-service/.venv/bin/uvicorn app:app --host 127.0.0.1 --port 19110
|
|
Restart=always
|
|
RestartSec=5
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
- [ ] **Step 4: Add README commands**
|
|
|
|
```markdown
|
|
# Emotion Museum TTS Service
|
|
|
|
Install on `101.200.208.45`:
|
|
|
|
```bash
|
|
cd /data/programs/emotion-museum/tts-service
|
|
python3 -m venv .venv
|
|
. .venv/bin/activate
|
|
pip install -r requirements.txt
|
|
|
|
git clone https://github.com/myshell-ai/MeloTTS.git /data/programs/MeloTTS
|
|
cd /data/programs/MeloTTS
|
|
/data/programs/emotion-museum/tts-service/.venv/bin/pip install -e .
|
|
/data/programs/emotion-museum/tts-service/.venv/bin/python -m unidic download
|
|
|
|
cd /data/programs/emotion-museum/tts-service
|
|
uvicorn app:app --host 127.0.0.1 --port 19110
|
|
curl http://127.0.0.1:19110/health
|
|
```
|
|
```
|
|
|
|
- [ ] **Step 5: Run local syntax check**
|
|
|
|
```bash
|
|
python -m py_compile backend-single/tts-service/app.py
|
|
```
|
|
|
|
Expected: no output and exit code 0.
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add backend-single/tts-service
|
|
git commit -m "feat: add private tts service scaffold"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 6: Add Mini Program TTS Client and Player
|
|
|
|
**Files:**
|
|
- Create: `mini-program/src/services/tts.js`
|
|
- Create: `mini-program/src/components/ScriptAudioPlayer.vue`
|
|
|
|
- [ ] **Step 1: Add TTS client**
|
|
|
|
```javascript
|
|
import { get, post } from './request.js'
|
|
|
|
export const createTtsTask = ({ sourceType = 'epic_script', sourceId, voice = 'default_zh_female' }) => {
|
|
return post('/tts/tasks', { sourceType, sourceId, voice })
|
|
}
|
|
|
|
export const getTtsTask = (id) => {
|
|
return get(`/tts/tasks/${id}`)
|
|
}
|
|
|
|
export const getTtsTaskBySource = ({ sourceType = 'epic_script', sourceId, voice = 'default_zh_female' }) => {
|
|
return get('/tts/tasks/by-source', { sourceType, sourceId, voice })
|
|
}
|
|
|
|
export default {
|
|
createTtsTask,
|
|
getTtsTask,
|
|
getTtsTaskBySource
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 2: Add player component**
|
|
|
|
```vue
|
|
<template>
|
|
<view class="script-audio-player">
|
|
<button class="audio-button" :disabled="loading" @click="handleClick">
|
|
{{ buttonText }}
|
|
</button>
|
|
</view>
|
|
</template>
|
|
|
|
<script setup>
|
|
import { computed, onUnmounted, ref, watch } from 'vue'
|
|
import { createTtsTask, getTtsTask, getTtsTaskBySource } from '../services/tts.js'
|
|
import analytics from '../services/analytics.js'
|
|
|
|
const props = defineProps({
|
|
scriptId: { type: String, required: true }
|
|
})
|
|
|
|
const task = ref(null)
|
|
const loading = ref(false)
|
|
const playing = ref(false)
|
|
let audio = null
|
|
let timer = null
|
|
|
|
const buttonText = computed(() => {
|
|
if (loading.value) return '正在生成'
|
|
if (task.value?.status === 'success') return playing.value ? '暂停朗读' : '播放朗读'
|
|
if (task.value?.status === 'failed') return '重试朗读'
|
|
return '生成朗读'
|
|
})
|
|
|
|
const clearTimer = () => {
|
|
if (timer) clearInterval(timer)
|
|
timer = null
|
|
}
|
|
|
|
const pollTask = (id) => {
|
|
clearTimer()
|
|
timer = setInterval(async () => {
|
|
const res = await getTtsTask(id)
|
|
task.value = res.data
|
|
if (task.value?.status === 'success' || task.value?.status === 'failed') {
|
|
loading.value = false
|
|
clearTimer()
|
|
analytics.track(task.value.status === 'success' ? 'script_tts_success' : 'script_tts_error', {
|
|
script_id: props.scriptId,
|
|
task_id: id,
|
|
error: task.value?.errorMessage || ''
|
|
}, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
|
|
}
|
|
}, 2500)
|
|
}
|
|
|
|
const generate = async () => {
|
|
loading.value = true
|
|
analytics.track('script_tts_request', { script_id: props.scriptId }, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
|
|
const res = await createTtsTask({ sourceId: props.scriptId })
|
|
task.value = res.data
|
|
if (task.value?.status === 'success') {
|
|
loading.value = false
|
|
return
|
|
}
|
|
if (task.value?.id) {
|
|
pollTask(task.value.id)
|
|
}
|
|
}
|
|
|
|
const play = () => {
|
|
if (!task.value?.audioUrl) return
|
|
if (!audio) {
|
|
audio = uni.createInnerAudioContext()
|
|
audio.src = task.value.audioUrl
|
|
audio.onPlay(() => {
|
|
playing.value = true
|
|
analytics.track('script_tts_play', { script_id: props.scriptId, task_id: task.value.id }, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
|
|
})
|
|
audio.onPause(() => {
|
|
playing.value = false
|
|
analytics.track('script_tts_pause', { script_id: props.scriptId, task_id: task.value.id }, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
|
|
})
|
|
audio.onEnded(() => {
|
|
playing.value = false
|
|
analytics.track('script_tts_complete', { script_id: props.scriptId, task_id: task.value.id }, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
|
|
})
|
|
audio.onError((error) => {
|
|
playing.value = false
|
|
analytics.track('script_tts_error', { script_id: props.scriptId, task_id: task.value.id, error: error.errMsg || 'play failed' }, { eventType: 'tts', pagePath: '/pages/main/ScriptDetailView' })
|
|
})
|
|
}
|
|
if (playing.value) audio.pause()
|
|
else audio.play()
|
|
}
|
|
|
|
const handleClick = async () => {
|
|
if (task.value?.status === 'success') {
|
|
play()
|
|
return
|
|
}
|
|
await generate()
|
|
}
|
|
|
|
const loadExisting = async () => {
|
|
if (!props.scriptId) return
|
|
const res = await getTtsTaskBySource({ sourceId: props.scriptId })
|
|
task.value = res.data
|
|
}
|
|
|
|
watch(() => props.scriptId, loadExisting, { immediate: true })
|
|
|
|
onUnmounted(() => {
|
|
clearTimer()
|
|
if (audio) {
|
|
audio.stop()
|
|
audio.destroy()
|
|
audio = null
|
|
}
|
|
})
|
|
</script>
|
|
|
|
<style scoped>
|
|
.script-audio-player {
|
|
margin-top: 24rpx;
|
|
}
|
|
.audio-button {
|
|
height: 72rpx;
|
|
border-radius: 999rpx;
|
|
color: #fff;
|
|
font-size: 25rpx;
|
|
font-weight: 800;
|
|
background: linear-gradient(135deg, #24c6dc, #7f5af0);
|
|
}
|
|
</style>
|
|
```
|
|
|
|
- [ ] **Step 3: Build mini program**
|
|
|
|
```bash
|
|
cd mini-program
|
|
npm run build:mp-weixin:test
|
|
```
|
|
|
|
Expected: build succeeds.
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add mini-program/src/services/tts.js mini-program/src/components/ScriptAudioPlayer.vue
|
|
git commit -m "feat: add script audio player"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 7: Wire Player into Script Detail
|
|
|
|
**Files:**
|
|
- Modify: `mini-program/src/pages/main/ScriptDetailView.vue`
|
|
|
|
- [ ] **Step 1: Import component**
|
|
|
|
```javascript
|
|
import ScriptAudioPlayer from '../../components/ScriptAudioPlayer.vue'
|
|
```
|
|
|
|
- [ ] **Step 2: Add component below the hero stats**
|
|
|
|
In the `.hero-card` template, after the stats block:
|
|
|
|
```vue
|
|
<ScriptAudioPlayer v-if="script?.id" :script-id="script.id" />
|
|
```
|
|
|
|
- [ ] **Step 3: Build mini program**
|
|
|
|
```bash
|
|
cd mini-program
|
|
npm run build:mp-weixin:test
|
|
```
|
|
|
|
Expected: build succeeds.
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add mini-program/src/pages/main/ScriptDetailView.vue
|
|
git commit -m "feat: add tts control to script detail"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 8: Deploy and Verify TTS Service on Server
|
|
|
|
**Files:**
|
|
- Modify deployment scripts only if manual deployment proves repetitive.
|
|
|
|
- [ ] **Step 1: Upload service directory**
|
|
|
|
```bash
|
|
scp -r backend-single/tts-service root@101.200.208.45:/data/programs/emotion-museum/tts-service
|
|
```
|
|
|
|
Expected: files are copied.
|
|
|
|
- [ ] **Step 2: Install and start manually**
|
|
|
|
```bash
|
|
ssh root@101.200.208.45 "cd /data/programs/emotion-museum/tts-service && python3 -m venv .venv && . .venv/bin/activate && pip install -r requirements.txt && if [ ! -d /data/programs/MeloTTS ]; then git clone https://github.com/myshell-ai/MeloTTS.git /data/programs/MeloTTS; fi && cd /data/programs/MeloTTS && /data/programs/emotion-museum/tts-service/.venv/bin/pip install -e . && /data/programs/emotion-museum/tts-service/.venv/bin/python -m unidic download && cd /data/programs/emotion-museum/tts-service && nohup .venv/bin/uvicorn app:app --host 127.0.0.1 --port 19110 > tts.log 2>&1 &"
|
|
```
|
|
|
|
Expected: command exits.
|
|
|
|
- [ ] **Step 3: Check health**
|
|
|
|
```bash
|
|
ssh root@101.200.208.45 "curl -s http://127.0.0.1:19110/health"
|
|
```
|
|
|
|
Expected:
|
|
|
|
```json
|
|
{"status":"ok"}
|
|
```
|
|
|
|
- [ ] **Step 4: Install systemd unit after health passes**
|
|
|
|
```bash
|
|
scp backend-single/tts-service/emotion-museum-tts.service root@101.200.208.45:/etc/systemd/system/emotion-museum-tts.service
|
|
ssh root@101.200.208.45 "systemctl daemon-reload && systemctl enable emotion-museum-tts && systemctl restart emotion-museum-tts && systemctl status emotion-museum-tts --no-pager"
|
|
```
|
|
|
|
Expected: service status is active.
|
|
|
|
- [ ] **Step 5: Commit deployment doc tweaks if needed**
|
|
|
|
```bash
|
|
git add backend-single/tts-service/README.md
|
|
git commit -m "docs: update tts deployment notes"
|
|
```
|
|
|
|
Only run this commit if README changed.
|
|
|
|
---
|
|
|
|
### Task 9: Final TTS Verification
|
|
|
|
**Files:**
|
|
- No code changes unless bugs are found.
|
|
|
|
- [ ] **Step 1: Run backend tests**
|
|
|
|
```bash
|
|
cd backend-single
|
|
mvn test
|
|
```
|
|
|
|
Expected: `BUILD SUCCESS`.
|
|
|
|
- [ ] **Step 2: Run mini program build**
|
|
|
|
```bash
|
|
cd mini-program
|
|
npm run build:mp-weixin:test
|
|
```
|
|
|
|
Expected: build succeeds.
|
|
|
|
- [ ] **Step 3: Smoke-test backend task API**
|
|
|
|
With a logged-in mini program token and an owned script id:
|
|
|
|
```bash
|
|
curl -X POST http://localhost:19089/tts/tasks ^
|
|
-H "Content-Type: application/json" ^
|
|
-H "Authorization: Bearer <token>" ^
|
|
-d "{\"sourceType\":\"epic_script\",\"sourceId\":\"<script-id>\",\"voice\":\"default_zh_female\"}"
|
|
```
|
|
|
|
Expected: response contains a task with `pending`, `processing`, or `success`. A `failed` status means the backend integration works but the Python engine failed and server logs must be checked.
|
|
|
|
- [ ] **Step 4: Manual mini program test**
|
|
|
|
Open script detail in WeChat DevTools:
|
|
|
|
- Tap `生成朗读`.
|
|
- Observe generating state.
|
|
- If engine is installed, wait for `播放朗读`.
|
|
- Tap play and pause.
|
|
- Leave page and verify no background playback continues.
|
|
|
|
- [ ] **Step 5: Commit fixes if needed**
|
|
|
|
```bash
|
|
git add <changed-files>
|
|
git commit -m "fix: stabilize tts playback"
|
|
```
|
|
|
|
Only run this commit if verification required code changes.
|
|
|
|
## Self-Review
|
|
|
|
- Spec coverage: task table, backend task API, Python service, CPU/no-GPU assumption, cache, mini program playback, and analytics events are covered.
|
|
- Placeholder scan: no task leaves the TTS engine as an unspecified later step; the plan uses the official MeloTTS local install and Python API.
|
|
- Type consistency: request and response use `sourceType`, `sourceId`, `voice`, `status`, and `audioUrl` consistently across backend and mini program.
|