ailia_voice package

Classes

class ailia_voice.AiliaVoiceModel

Bases: object

class ailia_voice.G2P(env_id=-1, num_thread=0, memory_mode=11, flags=0)

Bases: AiliaVoiceModel

Constructor of ailia Voice model instance.

Parameters:
  • env_id (int, optional, default: ENVIRONMENT_AUTO(-1)) –

    environment id of ailia execution. To retrieve env_id value, use

    ailia.get_environment_count() / ailia.get_environment() pair

    or

    ailia.get_gpu_environment_id() .

  • num_thread (int, optional, default: MULTITHREAD_AUTO(0)) –

    number of threads. valid values:

    MULTITHREAD_AUTO=0 [means systems’s logical processor count], 1 to 32.

  • memory_mode (int, optional, default: 11 (reuse interstage)) – memory management mode of ailia execution. To retrieve memory_mode value, use ailia.get_memory_mode() .

  • flags (int, optional, default: AILIA_VOICE_FLAG_NONE) – Reserved

__init__(env_id=-1, num_thread=0, memory_mode=11, flags=0)
g2p(text, g2p_type)

Generates phonemes from text.

Parameters:
  • text (string) – Input text

  • g2p_type (int) – Format of G2P. Specify with AILIA_VOICE_G2P_TYPE_GPT_SOVITS_*.

initialize_model(model_path='./', user_dict_path=None)

Initialize and download the model.

Parameters:
  • model_path (string, optional, default: "./") – Destination for saving the model file

  • user_dict_path (string, optional, default: None) – Specify the path of the user dictionary. The user dictionary is in mecab format.

class ailia_voice.GPTSoVITS(env_id=-1, num_thread=0, memory_mode=11, flags=0)

Bases: G2P

initialize_model(model_path='./', user_dict_path=None)

Initialize and download the model.

Parameters:
  • model_path (string, optional, default: "./") – Destination for saving the model file.

  • user_dict_path (string, optional, default: None) – Specify the path of the user dictionary. The user dictionary is in mecab format.

set_reference_audio(ref_text, g2p_type, audio_waveform, sampling_rate)

Specify the voice that will serve as the timbre for speech synthesis.

Parameters:
  • ref_text (string,) – Text of the speech content in the audio PCM.

  • g2p_type (int) – Format of G2P. Specify with AILIA_VOICE_G2P_TYPE_GPT_SOVITS_*.

  • audio_waveform (np.ndarray) – PCM data, formatted as either (num_samples) or (channels, num_samples).

  • sampling_rate (int) – Sampling rate (Hz).

synthesize_voice(text, g2p_type)

Synthesizes voice from input text.

Parameters:
  • text (string) – Input text.

  • g2p_type (int) – Format of G2P. Specify with AILIA_VOICE_G2P_TYPE_GPT_SOVITS_*.

Returns:

  • buf (np.ndarray) – PCM data, formatted as either (num_samples).

  • sampling_rate (int) – Sampling rate (Hz).