ailia_voice package¶

Classes¶

class ailia_voice.G2P(env_id=-1, num_thread=0, memory_mode=11, flags=0)¶

Constructor of ailia Voice model instance.

Parameters:

env_id (int, optional, default: ENVIRONMENT_AUTO(-1)) –
environment id of ailia execution. To retrieve env_id value, use

ailia.get_environment_count() / ailia.get_environment() pair

or
ailia.get_gpu_environment_id() .
num_thread (int, optional, default: MULTITHREAD_AUTO(0)) –
number of threads. valid values:

MULTITHREAD_AUTO=0 [means systems’s logical processor count], 1 to 32.
memory_mode (int, optional, default: 11 (reuse interstage)) – memory management mode of ailia execution. To retrieve memory_mode value, use ailia.get_memory_mode() .
flags (int, optional, default: AILIA_VOICE_FLAG_NONE) – Reserved

g2p(text, g2p_type)¶

Generates phonemes from text.

Parameters:

text (string) – Input text
g2p_type (int) – Format of G2P. Specify with AILIA_VOICE_G2P_TYPE_GPT_SOVITS_*.

initialize_model(model_path='./', user_dict_path=None)¶

Initialize and download the model.

Parameters:

model_path (string, optional, default: "./") – Destination for saving the model file
user_dict_path (string, optional, default: None) – Specify the path of the user dictionary. The user dictionary is in mecab format.

class ailia_voice.GPTSoVITS(env_id=-1, num_thread=0, memory_mode=11, flags=0)¶

Bases: G2P

initialize_model(model_path='./', user_dict_path=None)¶

Initialize and download the model.

Parameters:

model_path (string, optional, default: "./") – Destination for saving the model file.
user_dict_path (string, optional, default: None) – Specify the path of the user dictionary. The user dictionary is in mecab format.

set_reference_audio(ref_text, g2p_type, audio_waveform, sampling_rate)¶

Specify the voice that will serve as the timbre for speech synthesis.

Parameters:

ref_text (string,) – Text of the speech content in the audio PCM.
g2p_type (int) – Format of G2P. Specify with AILIA_VOICE_G2P_TYPE_GPT_SOVITS_*.
audio_waveform (np.ndarray) – PCM data, formatted as either (num_samples) or (channels, num_samples).
sampling_rate (int) – Sampling rate (Hz).

synthesize_voice(text, g2p_type)¶

Synthesizes voice from input text.

Parameters:

text (string) – Input text.
g2p_type (int) – Format of G2P. Specify with AILIA_VOICE_G2P_TYPE_GPT_SOVITS_*.

Returns: