ailia_voice package¶
Classes¶
- class ailia_voice.AiliaVoiceModel¶
Bases:
object
- class ailia_voice.G2P(env_id=-1, num_thread=0, memory_mode=11, flags=0)¶
Bases:
AiliaVoiceModelConstructor of ailia Voice model instance.
- Parameters:
env_id (int, optional, default: ENVIRONMENT_AUTO(-1)) –
environment id of ailia execution. To retrieve env_id value, use
ailia.get_environment_count() / ailia.get_environment() pair
- or
ailia.get_gpu_environment_id() .
num_thread (int, optional, default: MULTITHREAD_AUTO(0)) –
number of threads. valid values:
MULTITHREAD_AUTO=0 [means systems’s logical processor count], 1 to 32.
memory_mode (int, optional, default: 11 (reuse interstage)) – memory management mode of ailia execution. To retrieve memory_mode value, use ailia.get_memory_mode() .
flags (int, optional, default: AILIA_VOICE_FLAG_NONE) – Reserved
- __init__(env_id=-1, num_thread=0, memory_mode=11, flags=0)¶
- g2p(text, g2p_type)¶
Generates phonemes from text.
- Parameters:
text (string) – Input text
g2p_type (int) – Format of G2P. Specify with AILIA_VOICE_G2P_TYPE_GPT_SOVITS_*.
- initialize_model(model_path='./', user_dict_path=None, chinese=False, model_type=1)¶
Initialize and download the model.
- Parameters:
model_path (string, optional, default: "./") – Destination for saving the model file
user_dict_path (string, optional, default: None) – Specify the path of the user dictionary. The user dictionary is in mecab format.
chinese (bool, optional, default: False) – Enable Chinese language support by downloading and loading Chinese G2P dictionary.
model_type (int, optional, default: AILIA_VOICE_MODEL_TYPE_GPT_SOVITS) – Model type for G2P processing. Specify with AILIA_VOICE_MODEL_TYPE_*. This determines the G2P behavior (e.g., V1/V2 vs V3 phoneme format).
- class ailia_voice.GPTSoVITS(env_id=-1, num_thread=0, memory_mode=11, flags=0)¶
Bases:
G2P- initialize_model(model_path='./', user_dict_path=None, chinese=False)¶
Initialize and download the model.
- Parameters:
model_path (string, optional, default: "./") – Destination for saving the model file.
user_dict_path (string, optional, default: None) – Specify the path of the user dictionary. The user dictionary is in mecab format.
chinese (bool, optional, default: False) – Enable Chinese language support by downloading and loading Chinese G2P dictionary.
- set_reference_audio(ref_text, g2p_type, audio_waveform, sampling_rate)¶
Specify the voice that will serve as the timbre for speech synthesis.
- Parameters:
ref_text (string,) – Text of the speech content in the audio PCM.
g2p_type (int) – Format of G2P. Specify with AILIA_VOICE_G2P_TYPE_GPT_SOVITS_*.
audio_waveform (np.ndarray) – PCM data, formatted as either (num_samples) or (channels, num_samples).
sampling_rate (int) – Sampling rate (Hz).
- set_speed(speed)¶
Set the speech speed for synthesis.
- Parameters:
speed (float) – Speed value (default 1.0, must be greater than 0). Values < 1.0 produce slower speech, > 1.0 produce faster speech.
- synthesize_voice(text, g2p_type)¶
Synthesizes voice from input text.
- Parameters:
text (string) – Input text.
g2p_type (int) – Format of G2P. Specify with AILIA_VOICE_G2P_TYPE_GPT_SOVITS_*.
- Returns:
buf (np.ndarray) – PCM data, formatted as either (num_samples).
sampling_rate (int) – Sampling rate (Hz).