ailia_speech package¶
Classes¶
- class ailia_speech.AiliaSpeechModel(env_id=-1, num_thread=0, memory_mode=11, task=0, flags=0, callback=None)¶
Bases:
object- set_silent_threshold(silent_threshold, speech_sec, no_speech_sec)¶
Set silent threshold. If there are more than a certain number of sounded sections, and if the silent section lasts for a certain amount of time or more, the remaining buffer is processed without waiting for 30 seconds.
- Parameters:
silent_threshold (float) – volume threshold, standard value 0.5
speech_sec (float) – speech time, standard value 1.0
no_speech_sec (float) – no_speech time, standard value 1.0
- transcribe(audio_waveform, sampling_rate, lang=None)¶
Perform speech recognition. Processes the entire audio at once.
- Parameters:
audio_waveform (np.ndarray) – PCM data, formatted as either (num_samples) or (channels, num_samples)
sampling_rate (int) – Sampling rate (Hz)
lang (str, optional, default: None) – Language code (ja, en, etc.) (automatic detection if None)
- Yields:
dict –
- textstr
Recognized speech text
- time_stamp_beginfloat
Start time (seconds)
- time_stamp_endfloat
End time (seconds)
- speaker_idint or None
Speaker ID (when diarization is enabled)
- languagestr
Language code
- confidencefloat
Confidence level
- transcribe_step(audio_waveform, sampling_rate, complete, lang=None)¶
Perform speech recognition. Processes the audio sequentially.
- Parameters:
audio_waveform (np.ndarray) – PCM data, formatted as either (num_samples) or (channels, num_samples)
sampling_rate (int) – Sampling rate (Hz)
complete (bool) – True if this is the final audio input. transcribe_step executes a step each time there is microphone input, and by setting complete to True at the end, the buffer can be flushed.
lang (str, optional, default: None) – Language code (ja, en, etc.) (automatic detection if None)
- Yields:
dict –
- textstr
Recognized speech text
- time_stamp_beginfloat
Start time (seconds)
- time_stamp_endfloat
End time (seconds)
- speaker_idint or None
Speaker ID (when diarization is enabled)
- languagestr
Language code
- confidencefloat
Confidence level
- class ailia_speech.Whisper(env_id=-1, num_thread=0, memory_mode=11, task=0, flags=0, callback=None)¶
Bases:
AiliaSpeechModel- __init__(env_id=-1, num_thread=0, memory_mode=11, task=0, flags=0, callback=None)¶
Constructor of ailia Speech model instance.
- Parameters:
env_id (int, optional, default: ENVIRONMENT_AUTO(-1)) –
environment id of ailia execution. To retrieve env_id value, use
ailia.get_environment_count() / ailia.get_environment() pair
- or
ailia.get_gpu_environment_id() .
num_thread (int, optional, default: MULTITHREAD_AUTO(0)) –
number of threads. valid values:
MULTITHREAD_AUTO=0 [means systems’s logical processor count], 1 to 32.
memory_mode (int, optional, default: 11 (reuse interstage)) – memory management mode of ailia execution. To retrieve memory_mode value, use ailia.get_memory_mode() .
task (int, optional, default: AILIA_SPEECH_TASK_TRANSCRIBE) – AILIA_SPEECH_TASK_TRANSCRIBE or AILIA_SPEECH_TASK_TRANSLATE
flags (int, optional, default: AILIA_SPEECH_FLAG_NONE) – Reserved
callback (func or None, optional, default: None) –
Callback for receiving intermediate result text . .. rubric:: Examples
>>> def f_callback(text): ... print(text)
- initialize_model(model_path='./', model_type=0, vad_type=0, vad_version='4', diarization_type=None, is_fp16=False)¶
Initialize and download the model.
- Parameters:
model_path (string, optional, default: "./") – Destination for saving the model file
model_type (int, optional, default: AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_TINY) – Type of model. Can be set to AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_TINY, AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_BASE, AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_SMALL, AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_MEDIUM, AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE, AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3 or AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_LARGE_V3_TURBO.
vad_type (int, optional, default: AILIA_SPEECH_VAD_TYPE_SILERO) – Type of VAD. Can be set to None or AILIA_SPEECH_VAD_TYPE_SILERO.
vad_version (string, optional, default: "4") – Versions 4, 5, and 6.2 of SileroVAD can be specified.
diarization_type (int, optional, default: None) – Type of diarization. Can be set to None or AILIA_SPEECH_DIARIZATION_TYPE_PYANNOTE_AUDIO. By specifying AILIA_SPEECH_DIARIZATION_TYPE_PYANNOTE_AUDIO, speaker diarization can be enabled. The results of the speaker diarization are stored in speaker_id.
is_fp16 (bool, optional, default: False) – Whether to use an FP16 model.
- class ailia_speech.SenseVoice(env_id=-1, num_thread=0, memory_mode=11, task=0, flags=0, callback=None)¶
Bases:
AiliaSpeechModel- __init__(env_id=-1, num_thread=0, memory_mode=11, task=0, flags=0, callback=None)¶
Constructor of ailia Speech model instance.
- Parameters:
env_id (int, optional, default: ENVIRONMENT_AUTO(-1)) –
environment id of ailia execution. To retrieve env_id value, use
ailia.get_environment_count() / ailia.get_environment() pair
- or
ailia.get_gpu_environment_id() .
num_thread (int, optional, default: MULTITHREAD_AUTO(0)) –
number of threads. valid values:
MULTITHREAD_AUTO=0 [means systems’s logical processor count], 1 to 32.
memory_mode (int, optional, default: 11 (reuse interstage)) – memory management mode of ailia execution. To retrieve memory_mode value, use ailia.get_memory_mode() .
task (int, optional, default: AILIA_SPEECH_TASK_TRANSCRIBE) – AILIA_SPEECH_TASK_TRANSCRIBE or AILIA_SPEECH_TASK_TRANSLATE
flags (int, optional, default: AILIA_SPEECH_FLAG_NONE) – Reserved
callback (func or None, optional, default: None) –
Callback for receiving intermediate result text . .. rubric:: Examples
>>> def f_callback(text): ... print(text)
- initialize_model(model_path='./', model_type=10, vad_type=0, vad_version='4', diarization_type=None, is_fp16=False)¶
Initialize and download the model.
- Parameters:
model_path (string, optional, default: "./") – Destination for saving the model file
model_type (int, optional, default: AILIA_SPEECH_MODEL_TYPE_SENSEVOICE_SMALL) – Type of model. Can be set to AILIA_SPEECH_MODEL_TYPE_SENSEVOICE_SMALL.
vad_type (int, optional, default: AILIA_SPEECH_VAD_TYPE_SILERO) – Type of VAD. Can be set to None or AILIA_SPEECH_VAD_TYPE_SILERO.
vad_version (string, optional, default: "4") – Versions 4, 5, and 6.2 of SileroVAD can be specified.
diarization_type (int, optional, default: None) – Type of diarization. Can be set to None or AILIA_SPEECH_DIARIZATION_TYPE_PYANNOTE_AUDIO. By specifying AILIA_SPEECH_DIARIZATION_TYPE_PYANNOTE_AUDIO, speaker diarization can be enabled. The results of the speaker diarization are stored in speaker_id.
is_fp16 (bool, optional, default: False) – Whether to use an FP16 model.