For this process we will use OpenAI API with GPT 3.5+turbo model.
The steps are pretty simple:
- Obtain the subtitle/soft-subs.
- Take a bunch of lines, with approx 100 word, just for safe with token limits.
- Feed them with proper prompt,
- Parse the answer or response to respective source lines.
Obtain the Subtitle
Subtitle file is .ass
/.ssa
extracted from downloaded .mkv
file using ffmpeg for example:
$ ffmpeg -i "Movie.mkv" -vn -an "Movie.ass"
Take a bunch of lines
I am using go to extract to subtitle with go-astisub. This bunch of lines should be formatted which each lines will become "<number>. {<ass-codes>} <text>"
. Why? Because we need to preserve the text format (eg. bold, coloring, etc.) and keep the translated output with its sequence and map to its source.
Prompting Properly
Here is an example for the prompt with closes by a marker <Dialog>
:
Translate following dialog from English to Indonesia. Please keep the line number and keep everything in curly braces.
<Dialog>
Then we append with the contents from bunch of lines with proper format, eg.:
1. {time=1, bold, red} This story is only fiction.
2. {time=10, italic} Don't mind whatever it tells.
3. {time=20, black} It will make you suffer and hard to sleep.
In golang we can use go-openai and calculate the cost of tokens with tiktoken-go.
llmclient := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
req := openai.ChatCompletionRequest{
Model: openai.GPT3Dot5Turbo,
Temperature: 1,
TopP: 1,
//MaxTokens: 20,
Messages: []openai.ChatCompletionMessage{
{
Role: openai.ChatMessageRoleUser,
Content: prompts,
},
},
Stream: true,
}
stream, err := llmclient.CreateChatCompletionStream(ctx, req)
Parse the Answer
With the output like this:
<Dialog>
1. {time=1, bold, red} Cerita ini hanya fiksi.
2. {time=10, italic} Jangan pedulikan apa pun yang diceritakan.
3. {time=20, black} Ini akan membuatmu menderita dan sulit tidur.
Just parse each lines and map the line number to corresponding source to create the translated output.
Happy coding !!!