Java集成FFmpeg实现音频转码

记于:2023-10-24 下午
地点:浙江省·温州市·家里
天气:晴天

背景#

Java项目上的一个需求,需要对各种格式的音频文件进行转码,输出为mp3格式;
在了解到可选实现方案有【阿里云媒体处理MPS】和【自己使用FFmpeg】后,优先选择前者进行调研与测试;
在一番配置之后,【阿里云媒体处理MPS】能够成功实现自动转码,但是转码后的文件始终不能在项目的硬件设备中播放;
比对项目要求的转码参数:采样率-16kHz,单声道,比特率-32k,并且需要是CBR固定码率的模式;
注意到【阿里云媒体处理MPS】自动转码任务配置中,仅支持【ABR平均码率模式】,怀疑是这个导致(此时还没有找到合适的查看参数的工具(mediainfo));
在询问客服之后,得到的答复是,可以尝试手动调用API并指定相关参数的方式;
我是觉得手动调用API的方式比较麻烦,而且转码还要费用,索性选择自己使用FFmpeg进行转码;

尝试FFmpeg#

先是测试ffmpeg命令转码的方式;
首先指定了基本参数:
ffmpeg -hide_banner -i input.wav -ar 16000 -ab 32k -ac 1 output.mp3
输出文件在macos的【音乐】软件下可正常播放,但是查看其信息时发现码率始终为40k,与指定的32k参数不符;
在尝试各种参数及组合之后,始终不正确;
尝试参数大概有:

1
2
3
4
5
-vn 仅输出音频流
-b:a 32k 以不同形式指定码率
-c:a libmp3lame 明确指定mp3编码器
-af "volume=1" 指定音量不变
-map_metadata -1 -fflags +bitexact -flags:a +bitexact 各种去除元信息的参数

在各种尝试无果后,无意间发现一篇博客,标题为【解决ffmpeg生成mp3在ios上时长不对的问题】;
在博客中,作者提到了一个参数:-write_xing 0,同时发现这个问题是一个bug(见参考资料);
在添加了-write_xing 0参数后,转码成功,输出文件的码率也正确为32k(可以使用mediainfo查看);

最终命令为:
ffmpeg -hide_banner -i input.wav -ar 16000 -ab 32k -ac 1 -write_xing 0 -map_metadata -1 -fflags +bitexact -flags:a +bitexact -c:a libmp3lame output.mp3

参数解释如下:

1
2
3
4
5
6
7
8
9
10
11
`-hide_banner`: 隐藏 FFmpeg 的标志栏
`-i input.wav`: 指定输入文件为 "input.wav"
`-ar 16000`: 设置音频采样率为 16,000 Hz
`-ab 32k`: 设置音频比特率为 32 kbps(千位每秒)
`-ac 1`: 设置输出音频为单声道(单通道)
`-write_xing 0`: 禁用写入 Xing VBR 头部信息
`-map_metadata -1`: 删除输入文件的元数据(如标签信息)
`-fflags +bitexact`: 设置输入文件的比特精度标志
`-flags:a +bitexact`: 设置输出文件的音频比特精度标志
`-c:a libmp3lame`: 使用 libmp3lame 编码器来进行 MP3 压缩
`output.mp3`: 指定输出文件名为 "output.mp3"

Java集成#

发现没有直接用Java实现的ffmpeg库,要么是调用本地ffmpeg命令,要么是使用jni;
发现一个叫做【javacv】的库,可以使用Java调用ffmpeg命令,进行尝试了,由于定位不到ffmpeg命令,所以放弃了,改用Java的ProcessBuilder方式;
核心代码示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
    @Override
public int transcoding(String input, String output) throws IOException, InterruptedException {
String[] ffmpegCommand = {
"/usr/bin/ffmpeg", // linux
// "/opt/homebrew/bin/ffmpeg", // macos
"-hide_banner",
"-i", input,
"-ar", "16000",
"-ab", "32k",
"-ac", "1",
"-write_xing", "0",
"-map_metadata", "-1",
"-fflags", "+bitexact",
"-flags:a", "+bitexact",
"-c:a", "libmp3lame",
output
};

ProcessBuilder pb = new ProcessBuilder(ffmpegCommand);
int exitCode = pb.inheritIO().start().waitFor();
log.info("transcoding.exitCode: {}", exitCode);
return exitCode;
}

还有一个功能点,是获取音频文件的时长;
原来的思路是,调用ffmpeg输出音频文件信息,但是发现输出无法被捕获,经查询发现ffmpeg默认输出到了stderr,而不是stdout;
在连同stderr一起输出后,依然无法捕获,经过一番搜索与尝试,定位到问题在ProcessBuilder的重定向逻辑上;
如果有重定向的逻辑,调用方式为(/bin/sh):
ProcessBuilder pb = new ProcessBuilder("/bin/sh", "-c", String.join(" ", ffmpegCommand));
核心代码示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
    private Double getAudioLength(String input) {
String infoFile = input + ".info";
String[] ffmpegCommand = {
"/usr/bin/ffmpeg", // linux
// "/opt/homebrew/bin/ffmpeg", // macos
"-hide_banner",
"-i", input,
"2>&1", "|", "cat",
">", infoFile
};

Double durationInSeconds = null; // 初始化为 null

try {
// ProcessBuilder pb = new ProcessBuilder(ffmpegCommand);
// 如果有重定向的逻辑,一定要这样调用!!!
ProcessBuilder pb = new ProcessBuilder("/bin/sh", "-c", String.join(" ", ffmpegCommand));
Process process = pb.inheritIO().start();
int exitCode = process.waitFor();
log.info("exitCode: {}", exitCode);

// 读取infoFile,提取音频时长信息
List<String> lines = FileUtil.readLines(infoFile, "UTF-8");
for (String line : lines) {
if (line.contains("Duration:")) {
// 提取包含 "Duration:" 的行
String durationLine = line.trim();
String durationPart = durationLine.split("Duration:")[1].trim().split(",")[0].trim();

// 解析时长信息,转换为 Double,精确到毫秒
String[] timeParts = durationPart.split(":");
double hours = Double.parseDouble(timeParts[0]);
double minutes = Double.parseDouble(timeParts[1]);
double seconds = Double.parseDouble(timeParts[2]);
double milliseconds = (hours * 3600 + minutes * 60 + seconds) * 1000;

durationInSeconds = milliseconds / 1000.0; // 转换为秒
}
}

process.destroy();

if (durationInSeconds != null) {
log.info("音频时长(秒,精确到毫秒):{}", durationInSeconds);
} else {
log.info("未找到音频时长信息。");
}
} catch (IOException | InterruptedException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
} finally {
FileUtil.del(infoFile);
}

return durationInSeconds;
}

Docker部署#

由于项目在非本地环境是以Docker形式部署,所以又出现问题了;
ffmpeg命令安装的问题,如果是安装到宿主机上,担心跟环境绑定得太死;
所以选择使用Dockerfile将其安装到容器中;
但是使用原镜像(pig4cloud/java:8-jre)无法直接安装ffmpeg;
经过一番搜索与测试,最终使用(openjdk:8-jdk-alpine)镜像,同时配置源,ffmpeg安装成功;
Dockerfile示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
FROM openjdk:8-jdk-alpine

MAINTAINER yeshimin

ENV TZ=Asia/Shanghai
ENV JAVA_OPTS="-Xms512m -Xmx1024m -Djava.security.egd=file:/dev/./urandom"

RUN ln -sf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

RUN mkdir -p /xxx-module

RUN mkdir -p /tmp/xxx-audio

RUN echo "http://mirrors.aliyun.com/alpine/v3.6/main" > /etc/apk/repositories \
&& echo "http://mirrors.aliyun.com/alpine/v3.6/community" >> /etc/apk/repositories \
&& apk update upgrade \
&& apk add --no-cache procps unzip curl bash tzdata \
&& apk add yasm && apk add ffmpeg \
&& ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \
&& echo "Asia/Shanghai" > /etc/timezone

WORKDIR /xxx-module

EXPOSE 10003

ADD ./target/xxx-module-biz.jar ./

CMD java $JAVA_OPTS -jar xxx-module-biz.jar

控制输出文件大小#

要求:在采样率、码率、声道数固定的情况下,控制输出文件的大小不超过60K;
经过搜索,可以使用-fs参数来控制输出文件的大小;
但是实际执行之后会报错:

1
2
3
# ffmpeg -hide_banner -i caiqin.wav -ar 16000 -ab 32k -ac 1 \
-write_xing 0 -map_metadata -1 -fflags +bitexact -flags:a +bitexact \
-c:a libmp3lame -fs 60k caiqin.mp3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'caiqin.wav':
Duration: 00:50:07.91, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
File 'caiqin.mp3' already exists. Overwrite? [y/N] y
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp3, to 'caiqin.mp3':
Stream #0:0: Audio: mp3, 16000 Hz, mono, s16p, 32 kb/s
Metadata:
encoder : Lavc libmp3lame
[out#0/mp3 @ 0x6000016b80c0] Error muxing a packet.0kbits/s speed=N/A
size= 59kB time=00:00:15.15 bitrate= 31.7kbits/s speed= 307x
video:0kB audio:59kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.033307%
Conversion failed!

根据输出的第14行信息,应该是特定的采样率、码率、声道数下,转码后所需的文件大小超出指定的60K(展示为59kB)所导致;

在对时长进行控制后(使用-t参数),可以正常转码:

1
2
3
# ffmpeg -hide_banner -i caiqin.wav -ar 16000 -ab 32k -ac 1 \
-write_xing 0 -map_metadata -1 -fflags +bitexact -flags:a +bitexact \
-c:a libmp3lame -fs 60k -t 14 caiqin.mp3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'caiqin.wav':
Duration: 00:50:07.91, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
File 'caiqin.mp3' already exists. Overwrite? [y/N] y
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp3, to 'caiqin.mp3':
Stream #0:0: Audio: mp3, 16000 Hz, mono, s16p, 32 kb/s
Metadata:
encoder : Lavc libmp3lame
size= 55kB time=00:00:13.97 bitrate= 32.3kbits/s speed= 230x
video:0kB audio:55kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.035521%

在以上方案实现后,偶然发现ffmpeg指定输出文件大小后,虽然转码超过指定大小时依然会报错,但还是会自动截断并生成指定大小的文件;
所以不需要像上面方案那么绕了,直接执行一次,结果就满足了,只是exitCode返回非0而已;

问题处理-mp3文件附带封面导致指定大小参数失效#

在对某个音频文件进行转码时,发现转码后的文件大小没有被自动截断,执行信息如下:

1
2
3
# ffmpeg -hide_banner -i xx.mp3 -ar 16000 -ab 32k -ac 1 \
-write_xing 0 -map_metadata -1 -fflags +bitexact -flags:a +bitexact \
-c:a libmp3lame -fs 60k yy60k.mp3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
Input #0, mp3, from 'xx.mp3':
Metadata:
encoder : Lavf58.76.100
comment : 163 key(Don't modify):cEgtqHxwpdRKcBCRPBlPkg2DgcBlNoDaE7DLI/2UHqz2lBjNbotyjPpbetGfYEpSHze+evmlulEGcCplT1L5nqhLHeK3XTcyEgnHwW9Ow1JY2+gbzf04bwvVKdfiIWN3LL/zdLfLxvlVsPvobu6nWgvQC6Nw7MGw5umsTeRPyYHy4w5YVSD8VHyFEo0eQKeaubW3qofyzOlJ0gYXQlMHN4RCn1jpkefyBGBCOcjKE
album : 大风遇到了雨
title : 不找了
artist : 隔壁老樊
album_artist : 隔壁老樊
disc : 01
track : 1
Duration: 00:04:11.81, start: 0.023021, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 48000 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc58.13
Stream #0:1: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 1080x1920 [SAR 72:72 DAR 9:16], 90k tbr, 90k tbn (attached pic)
Metadata:
comment : Other
Stream mapping:
Stream #0:1 -> #0:0 (mjpeg (native) -> png (native))
Stream #0:0 -> #0:1 (mp3 (mp3float) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
[swscaler @ 0x128378000] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x128378000] [swscaler @ 0x118008000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x128378000] [swscaler @ 0x128388000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x128378000] [swscaler @ 0x128398000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x128378000] [swscaler @ 0x1283a8000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x128378000] [swscaler @ 0x1283b8000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x128378000] [swscaler @ 0x1283c8000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x128378000] [swscaler @ 0x1283d8000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x128378000] [swscaler @ 0x1283e8000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x128378000] [swscaler @ 0x1283f8000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x110008000] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x110008000] [swscaler @ 0x118008000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x110008000] [swscaler @ 0x118018000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x110008000] [swscaler @ 0x118028000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x110008000] [swscaler @ 0x118038000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x110008000] [swscaler @ 0x118048000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x110008000] [swscaler @ 0x118058000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x110008000] [swscaler @ 0x118068000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x110008000] [swscaler @ 0x118078000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x110008000] [swscaler @ 0x118088000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x130008000] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x130008000] [swscaler @ 0x118088000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x130008000] [swscaler @ 0x118008000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x130008000] [swscaler @ 0x118018000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x130008000] [swscaler @ 0x118028000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x130008000] [swscaler @ 0x118038000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x130008000] [swscaler @ 0x118048000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x130008000] [swscaler @ 0x118058000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x130008000] [swscaler @ 0x118068000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x130008000] [swscaler @ 0x118078000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x118078000] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x118078000] [swscaler @ 0x118088000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x118078000] [swscaler @ 0x118008000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x118078000] [swscaler @ 0x118018000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x118078000] [swscaler @ 0x118028000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x118078000] [swscaler @ 0x118038000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x118078000] [swscaler @ 0x118048000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x118078000] [swscaler @ 0x118058000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x118078000] [swscaler @ 0x118068000] No accelerated colorspace conversion found from yuv420p to rgb24.
[swscaler @ 0x118078000] [swscaler @ 0x118098000] No accelerated colorspace conversion found from yuv420p to rgb24.
[vost#0:0/png @ 0x126f0cef0] Frame rate very high for a muxer not efficiently supporting it.
Please consider specifying a lower framerate, a different muxer or setting vsync/fps_mode to vfr
Output #0, mp3, to 'yy60k.mp3':
Stream #0:0: Video: png, rgb24(pc, gbr/unknown/unknown, progressive), 1080x1920 [SAR 1:1 DAR 9:16], q=2-31, 200 kb/s, 90k fps, 90k tbn (attached pic)
Metadata:
comment : Other
encoder : Lavc png
Stream #0:1: Audio: mp3, 16000 Hz, mono, fltp, 32 kb/s
Metadata:
encoder : Lavc libmp3lame
[out#0/mp3 @ 0x6000017a80c0] Error muxing a packet02:27.60 bitrate= 0.0kbits/s speed= 288x
frame= 1 fps=0.0 q=-0.0 Lsize= 1061kB time=00:04:11.75 bitrate= 34.5kbits/s speed= 287x
video:78kB audio:983kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.003959%
Conversion failed!

可以看到里面有一个图片流:Stream #0:1 -> #0:0 (mjpeg (native) -> png (native)),怀疑是这个差异导致的;
一开始把图片也当作元信息,原来使用的参数中已经包含-map_metadata -1,用于去除元信息,为什么会无效;
查询了解到音频流中的封面图片属于数据流的一部分,不能算是常规的元信息;
然后使用了-map 0:a参数,指定只输出音频流,最终问题得到解决;

参考资料#