AvailableRequest contains the license token acquired be the AppTek license server
to authenticate any request to list available engines.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
Language description in detail.
Field | Type | Label | Description |
lang_code | string | The language code. |
|
english_name | string | The english name of the language. |
|
native_name | string | The native name of the language. |
|
direction | Direction | The writing direction. |
LineSegmentationAvailable describes one available language for ILS.
Field | Type | Label | Description |
language | Language | The language details. |
|
domain | string | The domain. |
LineSegmentationAvailableResponse contains the list of all available languages for ILS.
It is returned by the LineSegmentationGetAvailable API.
Field | Type | Label | Description |
list | LineSegmentationAvailable | repeated | List of available languages for ILS. |
LineSegmentationOptions contains optional parameters in order to tune the output
of a line segmentation request.
Field | Type | Label | Description |
character_limit | google.protobuf.UInt32Value | The maximum number of character per subtitle line. |
|
max_lines | google.protobuf.UInt32Value | The maximum numbers of lines in a subtitle block. |
|
max_duration | google.protobuf.FloatValue | The maximum duration of a subtitle block in seconds. |
|
min_duration | google.protobuf.FloatValue | The minimum duration of a subtitle block in seconds. The minimum duration of a subtitle block in seconds. |
|
min_duration_between | google.protobuf.FloatValue | The minimum spacing between two subsequent subtitle blocks in seconds |
|
max_pause_within_sentence | google.protobuf.FloatValue | Split sentences at gaps larger than this duration (in seconds). |
|
new_speaker_symbol | string | E.g. '>>' or '-'. Will be added in case of speaker changes when input format is XML and speaker labels are available. Default is to not add a symbol. |
|
max_close_gap_duration | google.protobuf.FloatValue | Gaps shorter than this duration (in seconds) will be closed to create back-to-back subtitles by increasing the first subtitle's end time. |
|
max_reading_speed | google.protobuf.FloatValue | The maximum number of characters per second in a subtitle block (soft constraint). In the ASR use case, achieved by increasing end time of subtitle blocks. For translation, where timings are not changed, the constraint influences the actual segmentation instead, the number of allowed characters per block will be accounted for (if possible). |
|
use_multi_sentence_lines | google.protobuf.BoolValue | Enforces a specific behavior of when to put subsequent sentences onto the same line. If set to true, put short sentences into the same line as preceding or following sentence wherever possible (see also 'multi_sentence_lines_max_pause'). If set to false, keep all sentences separate, this may however lead to short blocks that violate the minimum duration in some cases. If not set, no specific behavior is enforced and the algorithm tries to make optimal decisions per individual case, possibly depending on other configuration fields and language-specific defaults. |
|
use_multi_sentence_blocks | google.protobuf.BoolValue | Enforces a specific behavior of when to put several sentences into one block. If set to true, put multiple sentences into one block separated by line breaks wherever possible (see also 'multi_sentence_blocks_max_pause'), if spoken by the same speaker. If set to false, start each sentence in a new block. If not set, no specific behavior is enforced and the algorithm tries to make optimal decisions per individual case, possibly depending on other configuration fields and language-specific defaults. If no speaker ids are available (via speaker diarization), all sentences are assumed to be spoken by the same speaker. |
|
use_multi_speaker_blocks | google.protobuf.BoolValue | If set, implies 'use_multi_sentence_blocks' but allow putting multiple sentences into the same block even if spoken by different speakers. See also the 'dialogue_dash' option. |
|
multi_sentence_lines_max_pause | google.protobuf.FloatValue | Only allow putting multiple sentences into one line according to 'use_multi_sentence_lines' if the pause between the sentences (in seconds) is not longer than this value. |
|
multi_sentence_blocks_max_pause | google.protobuf.FloatValue | Only allow putting multiple sentences into one block according to 'use_multi_sentence_blocks' if the pause between the sentences (in seconds) is not longer than this value. |
|
dialogue_dash | google.protobuf.StringValue | If 'use_multi_speaker_blocks' is set, use the following string to indicate speaker changes within multi-speaker blocks. Spacing sensitive, for example set to "- " to use hyphen with space. Use empty string to not add a symbol at all. Differs from 'new_speaker_symbol' in that the symbol is not added at every speaker change, but only where necessary to distinguish speakers within a multi-speaker block. |
LineSegmentationRequest contains the request to perform a line segmentation from
an input audio transcription into a SRT document.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
source_format | LineSegmentationSourceFormat | The format of the source document. |
|
source | string | The source document. |
|
lang_code | string | The language code of the source document. |
|
options | LineSegmentationOptions | The optional parameters to tune the output. |
|
domain | string | The optional domain name (leave empty if in doubt). |
LineSegmentationResponse contains the target document of a line segmentation request.
Field | Type | Label | Description |
target | string | The target document as SRT. |
|
subtitles | LineSegmentationSubtitles | A structured response of the subtitles. |
Field | Type | Label | Description |
index | uint32 | The index of the subtitle. |
|
start_time | google.protobuf.Duration | The start time of the subtitle. |
|
stop_time | google.protobuf.Duration | The stop time of the subtitle. |
|
lines | LineSegmentationSubtitleLine | repeated | All lines of the subtitle. |
Field | Type | Label | Description |
line | string | The line of text. |
|
speaker_id | string | The speaker id if available. |
Field | Type | Label | Description |
subtitles | LineSegmentationSubtitle | repeated | All subtitles. |
Recognize2AudioConfig describes the details of the audio stream from the client.
Field | Type | Label | Description |
sample_type | Recognize2AudioConfig.SampleType | The sample type. |
|
sample_rate_hz | uint32 | The sample rate in Hz |
|
channels | uint32 | The number of channels (only mono supported at this time). |
Recognize2Available describes one available recognition v2 engine
Field | Type | Label | Description |
domain | string | The Domain name. |
|
language | Language | The language details. |
|
diarization | bool | True if this model supports speaker diarization. |
Recognize2AvailableResponse contains the list of all available recognition engines.
It is returned by the RecognizeGetAvailable API.
Field | Type | Label | Description |
list | Recognize2Available | repeated | The List of all available recognition engines. |
Recognize2DiarizationInformation are generated if diarization was enabled.
Field | Type | Label | Description |
speaker_id | string | Each speaker in an audio is labeled with an unique id. |
|
speaker_name | string | optional | Some diarization models provide the name of pre-defined speakers. |
Recognize2DiarizerConfig describes the optional diarizer configuration.
Field | Type | Label | Description |
enable | bool |
|
Recognize2PCTranscription returns the latest stable word after being processed by the capitalizer and sentence marker.
For sentence marker the stable field is not set.
If a text is supposed to be attached to the previous word without space, the is_attached_left field is true.
If a text contains a sentence end punctuation (for large sentences this might happen on a comma), the is_sentence_end marker is true.
Field | Type | Label | Description |
stable_transcription | Recognize2TimedTranscription | The stable transcription from the recognizer. |
|
text | string | The stable word after punctuation and capitalization. |
|
is_attached_left | bool | True if the text is supposed to be attached to the previous text without whitespace, false otherwise. |
|
is_sentence_end | bool | True if the text is a sentence end marker, false otherwise. |
Recognize2Progress response contains the position of the recognizer in the audio stream.
Field | Type | Label | Description |
timestamp | google.protobuf.Duration | recognizer timestamp |
Recognize2Segment contains the text of the latest segment (sentence). The segment text is post-processed by
inverse text normalization.
Field | Type | Label | Description |
start | google.protobuf.Duration | The start time of the segment (start of the first word). |
|
stop | google.protobuf.Duration | The stop time of the segment (stop of the last word). |
|
timed_transcriptions | Recognize2TimedTranscription | repeated | All stable timed transcription details of this segment. |
text | string | The full text post-processed by ITN (inverse text normalization). |
|
diarizer_info | Recognize2DiarizationInformation |
|
Recognize2SegmenterConfig describes the optional segmenter/punctuator configuration.
Field | Type | Label | Description |
sentence_end_token_threshold | uint32 | optional | Segmentation is done at a sentence end. For longer sentences a comma might trigger a sentence end as well to ensure lower latencies in a real-time subtitling scenario. This parameter sets the minimum number of tokens (words) in a new sentence that must be seen before a comma can trigger a sentence end. If not set it is set to the models default value. |
Recognize2SpeechConfig describes the asr model to be used.
Field | Type | Label | Description |
domain | string | The domain name (leave empty if not sure). |
|
lang_code | string | The language code of the audio. |
Recognize2StreamConfig is the configuration message of a Recognize2Stream request. It must be the first and
only the first message in the stream.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
audio_configuration | Recognize2AudioConfig | The audio configuration. |
|
speech_configuration | Recognize2SpeechConfig | The speech configuration. |
|
diarizer_configuration | Recognize2DiarizerConfig | The diarizer configuration. |
|
translate_configurations | Recognize2TranslateConfig | repeated | Translate configurations for each requests target language. |
segmenter_configuration | Recognize2SegmenterConfig | The segmenter configuration. |
Recognize2StreamRequest are sent on the upstream from the client. The first and only the first message must
be the configuration message followed by content messages.
Field | Type | Label | Description |
configuration | Recognize2StreamConfig | The Configuration message. |
|
content | bytes | PCM maudio data matching sample_type/sample_rate/channels in Recognize2AudioConfig. |
Recognize2StreamResponse messages are streamed back by the Recognize2Stream rpc.
Field | Type | Label | Description |
progress | Recognize2Progress | A progress status message. |
|
transcription | Recognize2Transcription | A transcription result from the recognizer. |
|
pc_transcription | Recognize2PCTranscription | A transcription result after being punctuated and capitalized. |
|
segment | Recognize2Segment | The transcription of a segment (sentence) after being processed by ITN. |
|
translation | Recognize2Translation | A translation of a segment. |
Recognize2TimedTranscription contains the details of a recognized word.
Field | Type | Label | Description |
start | google.protobuf.Duration | The start time of a word. |
|
stop | google.protobuf.Duration | The stop time of a word. |
|
text | string | The recognized word. |
|
confidence | float | The confidence score between 0.0 and 1.0 |
|
lang_code | string | For multi-language models this contains the language code of the word. |
Recognize2Transcription contains the latest stable and/or unstable transcriptions.
Unstables contains the current recognize words which are still subject to change. Any change returns the full
list of unstable words minus words that became stable.
Stables contains the latest words that became stable. Stable words are only returned once.
Field | Type | Label | Description |
unstable_transcriptions | Recognize2TimedTranscription | repeated | List of still unstable transcriptions. |
stable_transcriptions | Recognize2TimedTranscription | repeated | List of stable transcriptions. |
Recognize2TranslateConfig describes the details of one of the requested target languages.
Field | Type | Label | Description |
domain | string | The Domain name (leave empty if not sure). |
|
target_lang_code | string | The language code of the requested target language. |
|
glossary_options | TranslateGlossaryOptions | Optional use of a glossary. |
|
meta_options | TranslateTextMetaOptions | Optional meta option settings. |
Recognize2Translation is the translation of a segment into (one of) the target language.
Field | Type | Label | Description |
start | google.protobuf.Duration | The start time of the translated text segment. |
|
stop | google.protobuf.Duration | The stop time of the translated text segment. |
|
text | string | The translated text. |
|
domain | string | The domain name of the selected translation model. |
|
lang_code | string | The language code of the translated text. |
RecognizeAudioConfig is the configuration message for the audio data in the RecognizeStream API.
It must be sent as the first and only the first message in the upstream.
Field | Type | Label | Description |
encoding | RecognizeAudioEncoding | The encoding of the audio data. |
|
sample_rate_hz | uint32 | The sample rate of the audio data in Hz. |
|
lang_code | string | language code of the audio |
|
domain | string | The domain name (leave empty if not sure). |
|
mutable_suffix_length | google.protobuf.UInt32Value | The maximum mutable suffix length (leave empty if not sure). |
|
diarization | bool | Enable diarization. |
|
custom_vocabulary_id | string | Load custom vocabulary with this id. |
RecognizeAvailable describes one available recognition engine.
Field | Type | Label | Description |
domain | string | The domain name. |
|
sample_rate_hz | uint32 | The audio sample rate in Hz. |
|
language | Language | The language details. |
|
mt_target_languages | Language | repeated | The list of available direct translations. |
diarization | bool | The value is true if and only if diarization is available, false otherwise. |
RecognizeAvailableResponse contains the list of all available recognition engines.
It is returned by the RecognizeGetAvailable API.
Field | Type | Label | Description |
list | RecognizeAvailable | repeated | The list of all available recognition engines. |
RecognizeDiarization messages are sent for each complete segment. They contain all speaker sections of the transcriptions and translations in this segment.
This requires to enable diarization in the RecognizeAudioConfig message.
Field | Type | Label | Description |
segment | RecognizeSegmentDetails | The segment details. |
|
speaker_sections | RecognizeDiarizationSpeakerSection | repeated | List of all speaker sections. |
RecognizeDiarizationSpeakerSection are part of RecognizeDiarization messages.
They contains the postprocessed (and optionally) translated transcription of a speaker within a complete segment.
Field | Type | Label | Description |
speaker | string | The speaker name. |
|
orth | string | he transcribed text. |
|
postprocessed | string | The postprocessed text. |
|
translation | string | The translated test. |
|
start_time_ms | uint64 | The start time of the speaker section in milliseconds. |
|
stop_time_ms | uint64 | The stop time of the speaker section in milliseconds. |
RecognizePostprocessing contains post-processed versions of recognized segments.
For most languages we provide automatic postprocessing of raw recognition results in order to add punctuation, capitalization and translation of spoken forms of numbers, dates, spelled words, etc. into an easy to read and standard written form.
If postprocessing is not available for the source language, the postprocessed field matches the orth field.
Not all RecognizeTranscription messages trigger a corresponding RecognizePostprocessing message, but it is guaranteed to get the postprocessing result of the complete segment transcription.
Field | Type | Label | Description |
segment | RecognizeSegmentDetails | The segment details. |
|
orth | string | The source text. |
|
postprocessed | string | The postprocessed version of the source text. |
RecognizeProgressStatus messages are sent from time to time to indicate the progress of the current stream.
This is particular useful when segments in the audio do not contain speech, which might seem to make the recognizer unresponsive since there is nothing to return.
Usually audio streams are limited in time. The remaining time value returns the remaining seconds of audio data the client is allowed to stream.
All values are in seconds.
Field | Type | Label | Description |
audio_decoder_time_sec | float | The audio decoder progress time in seconds. |
|
segmenter_progress_time_sec | float | The segmenter progress time in seconds. |
|
recognizer_progress_time_sec | float | The recognizer progress time in seconds. |
|
remaining_time_sec | float | The remaining time in seconds. |
RecognizeSegmentDetails contains the segment id (rolling number starting with 1) and a complete indicator. It is sent with most result messages.
If complete is set to true, the message will be the last message for the particular response type for the given segment id.
Field | Type | Label | Description |
id | uint32 | The unique segment id. |
|
complete | bool | Complete is true if and only if this is the last message for this response type with the given id, false otherwise. |
RecognizeSegmentEnd messages are sent when the segmenter detects a segment end.
These messages are helpful when the client intends to send a certain number of segments. Once the client receives the appropriate RecognizeSegmentEnd message, it can stop sending audio data.
Field | Type | Label | Description |
segment_id | uint32 | The unique segment id. |
|
segmenter_progress_time_sec | float | The progress time in seconds since the last segment end in seconds. |
RecognizeSpeakerChange messages are sent every time a speaker change is detected.
This requires to enable diarization in the RecognizeAudioConfig message.
Field | Type | Label | Description |
start_time_ms | uint64 | The Start time of the speaker in milliseconds. |
|
speaker | string | The speaker name. |
RecognizeStreamConfig is the configuration message in the RecognizeStream API.
It must be sent as the first and only the first message in the upstream.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
audio_configuration | RecognizeAudioConfig | The audio data configuration. |
|
translate_configuration | RecognizeTranslateConfig | The optional translate configuration. |
RecognizeStreamRequest represents the upstream data in the RecognizeStream API.
The first message must contain the configuration, all subsequent messages must contain audio data.
Field | Type | Label | Description |
configuration | RecognizeStreamConfig | The configuration message. This must be the first message. |
|
audio | bytes | The audio data. |
RecognizeStreamResponse represents the downstream data in the RecognizeStream API.
It contains always one and only one response type.
Field | Type | Label | Description |
transcription | RecognizeTranscription | A transcription response. |
|
segment_end | RecognizeSegmentEnd | A segment end response. |
|
progress_status | RecognizeProgressStatus | A progress status response. |
|
postprocessing | RecognizePostprocessing | A postprocessing response. |
|
translation | RecognizeTranslation | A translation response. |
|
speaker_change | RecognizeSpeakerChange | A speaker change response. |
|
diarization | RecognizeDiarization | A diarization response. |
RecognizeTranscription is a partial recognition result.
Field | Type | Label | Description |
segment | RecognizeSegmentDetails | The segment details. |
|
orth | string | The transcribed segment in one string. |
|
words | RecognizeWord | repeated | The list of all transcribed words with detailed information. |
RecognizeTranslateConfig is the optional translation configuration for the RecognizeStream API.
Please make sure the requested target lang code is in the list of available mt target languages for your audio source language.
Field | Type | Label | Description |
translate_interval | RecognizeTranslateInterval | The interval in which translation results are produced. |
|
target_lang_code | string | The lang code of the requested target language. |
|
glossary_options | TranslateGlossaryOptions | Enable use of a registered glossary. |
RecognizeTranslation contains translation results.
Field | Type | Label | Description |
segment | RecognizeSegmentDetails | The segment details. |
|
source | string | The source text for this translation. |
|
translation | string | The translated text |
RecognizeWord describes all the details to a word in a RecognizeTranscription message.
Field | Type | Label | Description |
word | string | The recognized word. |
|
start_time_ms | uint64 | The start time in milliseconds. |
|
stop_time_ms | uint64 | The stop time in milliseconds. |
|
confidence | float | The confidence score (between 0.0 and 1.0). |
SpeakAddLexiconRequest contains your lexicon to be registered.
It can be used later to produce speech with fine tuned pronunciation to words in the lexicon.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
lexicon | SpeakLexicon | The speak lexicon. |
SpeakAddLexiconResponse returns the uniq id of the registered lexicon. Use this id in your
speak request to enable usage of the lexicon. The lexicon is automatically removed from the server at
the expiration time or manually with using the SpeakRemoveLexicon API.
Field | Type | Label | Description |
id | string | The unique id of the lexicon. |
|
exp | int64 | The expiration date of the lexicon in seconds since epoch. |
SpeakAddReferenceAudioRequest contains your reference audio to be registered.
It can be used later to produce speech with an adapted voice.
The reference audio must not be longer than 30sec.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
encoding | SpeakReferenceAudioEncoding | The encoding of the reference audio data. |
|
audio | bytes | The reference audio data. |
SpeakAddReferenceAudioResponse returns the unique id of the registered reference
audio data. Use this id in your speak request to adapt the speakers voice.
The audio is automatically removed from the server at the expiration time or
manually with using the SpeakRemoveReferenceAudio API.
Field | Type | Label | Description |
id | string | The unique id of the reference audio data. |
|
exp | int64 | The expiration time in second since epoch. |
SpeakAvailable describes one available TTS engine.
Field | Type | Label | Description |
domain | string | The domain name. |
|
language | Language | The language details. |
|
speaker | string | The speaker name. |
|
adaptable | bool | If true, the TTS allows to adapt the speaker voice with reference audio data. |
|
gender | string | The Gender of the TTS speaker (might be empty, e. g. for adaptable speakers). |
SpeakAvailableResponse contains the list of all available TTS engines.
It is returned by the SpeakGetAvailable API.
Field | Type | Label | Description |
list | SpeakAvailable | repeated | List of all available TTS engines. |
SpeakGenerateLexiconSymbolsRequest is used to generate the phonetic transcription (symbols) for a given
word/phrase. To generate a phonetic description for foreign words an optional language pronunciation
lang code can be set.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
text | string | The word or a small phrase. |
|
lang_code | string | The tts language code. |
|
speaker | string | The speaker to generate the audio. |
|
domain | string | The domain name |
|
pronunciation_lang_code | string | optional | The optional language code, e. g. for pronouncing foreign words. |
SpeakGenerateLexiconSymbolsResponse returns the phonetic transcription (symbols) and the audio data (as wav file).
Field | Type | Label | Description |
symbols | string | The symbols. |
|
data | bytes | The chunk of audio data as a wave file. |
SpeakLexicon to be used to optimize the TTS.
Field | Type | Label | Description |
lexicon | SpeakLexicon.LexiconEntry | repeated | Maps source word/phrase to replace rule. |
Field | Type | Label | Description |
key | string |
|
|
value | SpeakLexicon.ReplaceRule |
|
Field | Type | Label | Description |
case_sensitive | bool | If true, the pattern matching will be performed case sensitive. |
|
text | string | The desired tts input as text. |
|
symbols | string | The desired tts input as symbols. |
SpeakMeta returns the details of the TTS audio data.
Field | Type | Label | Description |
encoding | SpeakAudioEncoding | The encoding of the TTS audio data. |
SpeakRemoveLexicon removes a lexicon from the server.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
id | string | The unique id of the lexicon. |
SpeakRemoveReferenceAudio removes the reference audio from the server.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
id | string | The unique id of the reference audio data. |
SpeakRequest contains the request to convert an input text into audio data.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
encoding | SpeakAudioEncoding | The requested encoding of the output audio stream. |
|
text | string | The input text. |
|
lang_code | string | The language code of the input text. |
|
speaker | string | The TTS speaker name. |
|
domain | string | The domain name. |
|
tempo | float | Tempo specifies the playback speed of the generated audio file. If set to 0.0 it defaults internally to normal speed (1.0), values between 0.5 and 1.5 are applicable. |
|
reference_audio_id | string | The unique id of a registered reference audio to adapt the speakers voice. |
|
lexicon_id | string | The unique id of a register lexicon. |
SpeakResponse contains the generated audio data from a speak request.
Field | Type | Label | Description |
meta | SpeakMeta | The first and only the first message is a meta message describing the content of the following audio messages. |
|
audio | bytes | The audio containing the generated speech. |
Field | Type | Label | Description |
license_token | string |
|
|
text | string |
|
|
num_results | uint32 | optional |
|
probability_threshold | float | optional |
|
Field | Type | Label | Description |
results | TextLanguageIdResult | repeated |
|
Field | Type | Label | Description |
language | Language |
|
|
probability | float |
|
TransformTextAvailable describes one available transformation model.
Field | Type | Label | Description |
domain | string | The domain |
|
language | Language | The language details. |
|
mode | TransformTextMode | The transformation mode. |
TransformTextResponse lists all available transformation models.
Field | Type | Label | Description |
list | TransformTextAvailable | repeated | List of available ITN languages. |
TransformTextRequest transforms the source text according to the chosen mode.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
source | string | The source text. |
|
lang_code | string | The source language code. |
|
domain | string | The domain. |
|
mode | TransformTextMode |
|
|
text_layout | TextLayout | The layout of the text for plain source texts. |
TransformTextResponse contains the text after transformation.
Field | Type | Label | Description |
target | string | The targe text. |
TranslateAddGlossaryRequest message is used to register a new glossary.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
glossary | TranslateGlossary | The glossary. |
TranslateAddGlossaryResponse message returns the id and expiration time of the
registered glossary.
Field | Type | Label | Description |
id | string | The unique id of the glossary. |
|
exp | int64 | Minimum expiration time of the glossary. |
TranslateAvailable describes one available translation engine.
Field | Type | Label | Description |
domain | string | The domain name. |
|
source_language | Language | The language details for the source text. |
|
target_language | Language | The language details for the target text. |
|
formats | TranslateTextFormat | repeated | The available source text formats. |
gender_codes | string | repeated | The available gender codes. |
genre_codes | string | repeated | The available genre codes. |
length_codes | string | repeated | The available length codes. |
style_codes | string | repeated | The available style codes. |
topic_codes | string | repeated | The available topic codes. |
extended_context | bool | True iff model supports translation with extended context. |
|
compute_confidence | bool | True iff model supports computation of confidence scores. |
|
glossary | bool | True iff use of glossary for source language is supported. |
|
glossary_override | bool | True iff annotations with translation, but without tag are supported. |
|
glossary_tags | bool | True iff annotations with both translation and tag are supported. |
|
markup_transfer | bool | True iff annotations with tag, but without translation are supported. |
|
subtitle_condensation | bool | True iff subtitle condensation is available. |
TranslateAvailableResponse contains the list of all available translation engines.
It is returned by the TranslateGetAvailable API.
Field | Type | Label | Description |
list | TranslateAvailable | repeated | List of all available translation engines. |
TranslateGlossary contains all suggested translations and the language codes for source and target.
Field | Type | Label | Description |
source_lang_code | string | The source language code. |
|
target_lang_code | string | The target language code. |
|
entries | TranslateGlossaryEntry | repeated | List of all suggested translations. |
TranslateGlossaryEntry contains one specific suggested translation of word(s).
It is also possible to define additional translations.
Field | Type | Label | Description |
source | string | The source word. |
|
target | string | The suggested target word |
|
target_alternatives | string | repeated | The optional alternative suggested translations. |
TranslateGlossaryOptions enables use of a previously added glossary to the backend via the TranslateAddGlossary API.
The glossary can be used to encourage the translation system to use specific translations.
Words which are found in the glossary will be given a tag to easily recognize them in the translation.
These tags can be accessed by matching the tags in the source and target annotations.
Optionally tagging of numbers can be enabled.
Field | Type | Label | Description |
glossary_id | string | The unique id of the registered glossary. |
|
enable_number_tagging | bool | Numbers can be are tagged in order to show which number on the source side has been used in which position on the target side e.g.: Input: “Tom bought 6 eggs and 4 tomatoes.” Annotation: “Tom bought <1>6</1> eggs and <2>4</2> tomatoes.” Translation: “Tom hat <1>6</1> Eier und <2>4</2> Tomaten gekauft.” |
TranslateRemoveGlossaryRequest message is used to remove a registered glossary.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
id | string | The unique id of the glossary. |
TranslateSubtitlesLineSegmentationOptions can be set to tune the line segmentation.
Field | Type | Label | Description |
character_limit | uint32 | optional | The maximum number of character per subtitle line. |
maximum_lines | uint32 | optional | The maximum numbers of lines in a subtitle block. |
max_reading_speed | float | optional | The maximum number of characters per second in a subtitle block. If set, translation will be condensed to better meet this limit. |
max_pause_within_sentence | google.protobuf.Duration | optional | Split sentences at gaps larger than this duration. |
TranslateSubtitleMetaOptions can be set to tune the translation.
Field | Type | Label | Description |
gender_code | string | optional | The gender_code controls the grammar/morphological forms of the translation (for example, female vs. male 1st person verb forms in Slavic languages), in dependency on the gender of the speaker/author of the sentence. |
genre_code | string | optional | The genre_code provides information about the genre of the input text, the translation is then adjusted to fit a particular genre (for example, news, patents, etc.). |
length_code | string | optional | The length_code lets you control the length of the translation output, in relation to the length of the input sentence. |
style_code | string | optional | The style_code controls the style/formality of the translation (for example, using formal "you" instead of informal "you"). |
topic_code | string | optional | The topic_code provides information about the topic of the input text, the translation is then adjusted to fit a particular topic (for example, politics, sports, culture within the genre of "news"). |
TranslateSubtitleRequest contains the request to translate formatted subtitles into the given target language.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
source_lang_code | string | The language code of your source (input) text. |
|
target_lang_code | string | The language code of your desired target language. |
|
domain | string | The domain name of the desired translation model. |
|
subtitles | TranslateSubtitlesSourceSubtitle | repeated | The source text in form of formatted subtitles. |
meta_options | TranslateSubtitlesMetaOptions | optional | The global meta options to tune the translation. |
line_segmentation_options | TranslateSubtitlesLineSegmentationOptions | optional | The line segmentation options for formatting the subtitles. |
glossary_options | TranslateGlossaryOptions | optional | Enable use of a registered glossary. |
compute_confidence | bool | Compute the confidence score per subtitle. |
|
extended_context | bool | Use extended context during translation. |
|
condense_to_maximum_lines | bool | Optimize translation to better adhere to maximum_lines line segmentation option. |
TranslateSubtitleResponse contains the translated subtitles.
Field | Type | Label | Description |
subtitles | TranslateSubtitlesTargetSubtitle | repeated | The translated subtitles. |
segments | TranslateTextSegment | repeated | The detailed results for each translated segment. |
TranslateSubtitleSourceSubtitle contains one source subtitle.
Field | Type | Label | Description |
index | uint32 | The unique index of the subtitle. |
|
start_time | google.protobuf.Duration | The start time of the subtitle. |
|
stop_time | google.protobuf.Duration | The stop time of the subtitle. |
|
lines | TranslateSubtitlesSourceSubtitleLine | repeated | The lines of text of the subtitle. |
do_translate | bool | This flag can be used to update parts of an earlier translation. The output will contain translations only of the subtitles with this flag set, as well as possibly surrounding subtitles that are affected by the updated translation. The client can assemble the updated subtitle file via the subtitle indices. |
TranslateSubtitlesSourceSubtitleLine contains the content of one line in a subtitle.
Field | Type | Label | Description |
line | string | One line of a subtitle block. May include formatting tags, e.g. '<i> ... </i>'. In addition, we support tags for glossary term translation of the form '<term>glossary term</term>' or '<term target="Glossareintrag">glossary term</term>'. The 'target' attribute defines the desired translation of the term between the tags, and a missing 'target' attribute means the term should be kept as is during translation. |
|
extended_context | bool | optional | Use extended context during translation, overrides global setting if set for this line. |
meta_options | TranslateSubtitlesMetaOptions | optional | Overrides global meta options if set for this line. |
TranslatesSubtitlesTarget contains one target subtitle.
Field | Type | Label | Description |
index | uint32 | The unique index of the subtitle |
|
start_time | google.protobuf.Duration | The start time of the subtitle. |
|
stop_time | google.protobuf.Duration | The stop time of the subtitle. |
|
lines | TranslateSubtitlesTargetSubtitleLine | repeated | The lines of text of the subtitle. |
confidence | float | optional | The confidence score if requested. |
TranslateSubtitleTargetSubtitleLine contains on line of a translated subtitle.
Field | Type | Label | Description |
line | string | The line of text. |
TranslateTextAnnotation messages describe annotations for a (span of) word(s) which allows
to align source text with the target text using the tag when a glossary was used.
Field | Type | Label | Description |
first_word_index | uint32 | Index of first word in the word_items list. |
|
last_word_index | uint32 | Index of the last word (exclusive). |
|
tag | string | The tag of the corresponding source annotation (if it was set) to identify the span.tag in the translation. |
|
translation_alternatives | string | repeated | Optional alternative translations. |
start_tag | string | The tart tag to be inserted before the span when converting to plain text representation, e.g. <i>. |
|
stop_tag | string | The stop tags to be inserted after the span when converting to plain text representation, e.g. </i>. |
TranslateTextHypothesis contains the translation and details for one translation hypothesis.
Field | Type | Label | Description |
translation | string | The translated text. |
|
confidence | float | The confidence score for this hypothesis. |
|
word_items | TranslateTextWordItem | repeated | List of all word items. |
target_annotations | TranslateTextAnnotation | repeated | List of all target annotations. |
TranslateTextMetaOptions are optional parameters to tune the translation.
Field | Type | Label | Description |
gender_code | string | The gender_code controls the grammar/morphological forms of the translation (for example, female vs. male 1st person verb forms in Slavic languages), in dependency on the gender of the speaker/author of the sentence. |
|
genre_code | string | The Genre_code provides information about the genre of the input text, the translation is then adjusted to fit a particular genre (for example, news, patents, etc.). |
|
length_code | string | The length_code lets you control the length of the translation output, in relation to the length of the input sentence. |
|
style_code | string | The style_code controls the style/formality of the translation (for example, using formal "you" instead of informal "you"). |
|
topic_code | string | The topic_code provides information about the topic of the input text, the translation is then adjusted to fit a particular topic (for example, politics, sports, culture within the genre of "news"). |
TranslateTextRequest contains the request to translate a source text into the
given target language.
Field | Type | Label | Description |
license_token | string | The license token acquired from the license server. |
|
source_text | string | The plain source text. |
|
source_lang_code | string | The language code of the provided source text. |
|
target_lang_code | string | The language code of the requested target language. |
|
domain | string | The domain name. |
|
format | TranslateTextFormat | The source text format. |
|
srt_options | TranslateTextSrtOptions | The optional parameters if source text is a SRT. |
|
meta_options | TranslateTextMetaOptions | The optional meta parameters to tune the translation. |
|
extended_context | bool | Enable use of extended context. |
|
compute_confidence | bool | Enable computation confidence scores, this should only be enabled when required, since it slows down the translation. |
|
num_hypotheses | uint32 | The maximum number of hypotheses to be returned (minimum and default if not set is 1). |
|
glossary_options | TranslateGlossaryOptions | Enable use of a glossary. |
|
parse_xml_tags | bool | Enable parsing tags of the form '<tag>tagged term</tag>', see above. |
|
condense_to_maximum_lines | bool | Optimize translation to better adhere to maximum_lines line segmentation option. |
|
text_layout | TextLayout | The layout of the text for plain source texts. |
TranslateTextResponse contains the target text of a TranslateTextRequest.
Field | Type | Label | Description |
target_text | string | Target text of the best hypotheses in one string. |
|
segments | TranslateTextSegment | repeated | Detailed results for each translated segment. |
TranslateTextSegment contains all hypotheses for one segment.
Field | Type | Label | Description |
hypotheses | TranslateTextHypothesis | repeated | List of all hypotheses. |
source_annotations | TranslateTextSourceAnnotation | repeated | List of all source annotations. |
source_words | TranslateTextSourceWordItem | repeated | List of all source words matching the indices in the annotations. |
TranslateTextSourceAnnotation message describe the annotations on the source text.
Field | Type | Label | Description |
annotation | TranslateTextAnnotation | The annotation. |
|
target | string | The generated target. |
TranslateTextSourceWorkItem messages are returned when a glossary was used.
Field | Type | Label | Description |
word | string | The word. |
|
is_attached_left | bool | True if word is attached to the previous word on the left, false otherwise. |
TranslateTextSrtOptions are optional parameters to tune the generation of
translated SRT documents.
Field | Type | Label | Description |
character_limit | google.protobuf.UInt32Value | The maximum number of characters per line. |
|
max_lines | google.protobuf.UInt32Value | The maximum number of lines per frame. |
|
max_reading_speed | google.protobuf.FloatValue | The maximum number of characters per second in a subtitle block. If set, translation will be condensed to better meet this limit. |
|
max_pause_within_sentence | google.protobuf.Duration | Split sentences at gaps larger than this duration. |
TranslateTextWordItem contains details for a single word.
Field | Type | Label | Description |
word | string | The word. |
|
confidence | float | The words confidence score |
|
is_attached_left | bool | True if word is attached to the previous word on the left, false otherwise. |
|
tag | int32 | Deprecated. Replaced by tag in the annotation message. |
Name | Option |
tag | true |
Direction indicates the display direction of a language.
Name | Number | Description |
Left2Right | 0 | Left to right like in roman or germanic languages. |
Right2Left | 1 | Right to left like in arabic |
LineSegmentationSourceFormat lists all available source document formats for a
line segmentation request.
Name | Number | Description |
LINE_SEGMENTATION_SOURCE_RASR_XML | 0 | RASR XML transcription file. |
Name | Number | Description |
INT16 | 0 | 16-bit signed integer. |
FLOAT32 | 1 | 32-bit floating-point. |
RecognizeAudioEncoding lists all available audio encodings for the upstream.
Name | Number | Description |
PCM_16bit_SI_MONO | 0 | PCM 16bit signed integer mono. |
RecognizeTranslateInterval specifies the interval in which translation results are produced if direct translations of transcriptions is requested.
Name | Number | Description |
TRANSLATION_INTERVAL_SEGMENT_END | 0 | Produce one translation only at segment end. |
TRANSLATION_INTERVAL_CONTINUOUS | 1 | Update translation multiple times during a segment. |
SpeakAudioEncoding are the available TTS audio data formats.
Name | Number | Description |
TTS_PCM_22050_16bit_SI_MONO | 0 | PCM 22050 Hz 16bit signed integer mono. |
TTS_PCM_16000_16bit_SI_MONO | 1 | PCM 16000 Hz 16bit signed integer mono. |
TTS_PCM_24000_16bit_SI_MONO | 2 | PCM 24000 Hz 16bit signed integer mono. |
TTS_PCM_48000_16bit_SI_MONO | 3 | PCM 48000 Hz 16bit signed integer mono. |
TTS_PCM_8000_16bit_SI_MONO | 4 | PCM 48000 Hz 16bit signed integer mono. |
SpeakReferenceAudioEncoding lists all available audio encodings for the upstream.
Name | Number | Description |
TTS_REF_PCM_16000_16bit_SI_MONO | 0 | PCM 16000 Hz 16bit signed integer mono. |
Name | Number | Description |
TEXT_LAYOUT_ONE_SENTENCE_PER_LINE | 0 | Default is a line based layout, each line has one sentence and the translation will contain exactly one segment per input line |
TEXT_LAYOUT_PARAGRAPHS | 1 | Input is a natural, continuous text. Line breaks, if used at all, are interpreted to mark paragraph boundaries. |
Name | Number | Description |
TRANSFORM_TEXT_MODE_UNSPECIFIED | 0 | |
TRANSFORM_TEXT_MODE_ITN | 1 | inverse text normalization |
TRANSFORM_TEXT_MODE_DIACRITIZATION | 2 | |
TRANSFORM_TEXT_MODE_CONDENSATION | 3 |
TranslateTextFormat lists all available input formats for the TranslateText API.
Name | Number | Description |
TRANSLATE_TEXT_FORMAT_PLAIN | 0 | Plain text. |
TRANSLATE_TEXT_FORMAT_SRT | 1 | SRT subtitle file (SubRip). |
TRANSLATE_TEXT_FORMAT_SUBTITLES | 2 | SUBTITLES format (only supported by TranslateSubtitles API). |
The AppTek gRPC gateway provides access to our services for reduced latency stateful processing.
Each request to the gateway requires a license token. See https://license.apptek.com/doc/index.html for details
on how to acquire license tokens from the AppTek license server.
Method Name | Request Type | Response Type | Description |
RecognizeGetAvailable | AvailableRequest | RecognizeAvailableResponse | RecognizeStream is considered deprecated and will be removed in the near future. Please use Recognize2Stream instead. RecognizeGetAvailable returns all available languages for the RecognizeStream API. |
RecognizeStream | RecognizeStreamRequest stream | RecognizeStreamResponse stream | RecognizeStream is considered deprecated and will be removed in the near future. Please use Recognize2Stream instead. RecognizeStream transcribes an incoming audio stream and returns the transcription in raw and postprocessed form and, if requested, translation into a target language. The first and only the first message in the upstream must contain a RecognizeStreamConfig message, followed by messages containing audio data. The audio data is automatically partitioned into segments, where each result message contains a segment id (starting with 1). Segment end events trigger a separate notification message. During a segment returned results a subject to change. The last message in a segment is marked as "complete" and the next messages will be for the next segment. The results for transcriptions, postprocessing and translations are independent of each other in the sense, that if e. g. the transcriptions already moved on to the next segments, trailing postprocessing and translation results can still be returned from the previous segment. Not every transcription message triggers a corresponding postprocessing and translation message, however it is guaranteed to always receive postprocessing and translation results for the "complete" segment transcription. When diarization is enabled on every speaker change a RecognizeSpeakerChange is sent containing the start time of the speaker and the speakers name. Additionally on a complete segment a RecognizeDiarization message is sent containing the postprocessed and optionally translated transcriptions for each speaker in that segment. During the audio stream the API periodically returns progress status messages. |
Recognize2GetAvailable | AvailableRequest | Recognize2AvailableResponse | Recognize2GetAvailable returns all available languages for the Recognize2Stream API. |
Recognize2Stream | Recognize2StreamRequest stream | Recognize2StreamResponse stream | Recognize2Stream transcribes an incoming audio stream and returns the transcription. The transcription is produces and returned in the following ways: - unstable: transcribed words that might change as the audio stream moves on, - stable: transcribed words in raw recognizer output, that will not change anymore, - pc: punctuated and capitalized version of stable words, - segment: once a sentence end is reached, the full sentence after inverse text normalization, - translation(s): if requested, the segment translated in a different language(s). |
TranslateGetAvailable | AvailableRequest | TranslateAvailableResponse | TranslateGetAvailable returns all available languages for the TranslateText API. |
TranslateText | TranslateTextRequest | TranslateTextResponse | TranslateText translates a source text into the provided target language. |
TranslateSubtitles | TranslateSubtitlesRequest | TranslateSubtitlesResponse | TranslateSubtitles translates formatted subtitles. |
TranslateAddGlossary | TranslateAddGlossaryRequest | TranslateAddGlossaryResponse | TranslateAddGlossary registers a glossary which can be used to TranslateText requests. Registered glossaries expire after a while but are guaranteed to be usable during the life-time of the used license token. |
TranslateRemoveGlossary | TranslateRemoveGlossaryRequest | .google.protobuf.Empty | TranslateRemoveGlossary removes a registered glossary. |
TextLanguageId | TextLanguageIdRequest | TextLanguageIdResponse | TextLanguageId to determine the language of a given text. |
TransformTextGetAvailable | AvailableRequest | TransformTextAvailableResponse | TransformTextGetAvailable returns all available ITN languages. |
TransformText | TransformTextRequest | TransformTextResponse | TransformText transforms provides various text transformation models, e. g. for inverse text normalization, text condensation. |
LineSegmentationGetAvailable | AvailableRequest | LineSegmentationAvailableResponse | LineSegmentationGetAvailable returns all available languages for the LineSegmentation API. |
LineSegmentation | LineSegmentationRequest | LineSegmentationResponse | LineSegmentation performs ILS (intelligent line segmentation) on a transcription and returns a SRT document. |
SpeakGetAvailable | AvailableRequest | SpeakAvailableResponse | SpeakGetAvailable returns all available languages for the Speak API. |
Speak | SpeakRequest | SpeakResponse stream | Speak generate speech from a given input text. The audio is returned in a stream where the first message describes the content of the following audio data messages. |
SpeakAddLexicon | SpeakAddLexiconRequest | SpeakAddLexiconResponse | SpeakAddLexicon registers a lexicon which can be used to tune the pronunciation of words. The amount of lexicons than can be registered is limited. The registered lexicons automatically expire after a while but are guaranteed to be usable during the life-time of the used license token. |
SpeakRemoveLexicon | SpeakRemoveLexiconRequest | .google.protobuf.Empty | SpeakRemoveLexicon removes a registered lexicon. |
SpeakGenerateLexiconSymbols | SpeakGenerateLexiconSymbolsRequest | SpeakGenerateLexiconSymbolsResponse stream | SpeakGenerateLexicon creates the symbols of a word/phrase that can be add to a SpeakLexicon and returns an example audio as WAV file. The submitted text can be tagged with a language code different from the language code of the TTS to generate pronunciations of foreign words. |
SpeakAddReferenceAudio | SpeakAddReferenceAudioRequest | SpeakAddReferenceAudioResponse | SpeakAddReferenceAudio registers reference audio data which can be used to adapt the speaker voice. The amount of reference audio data that can be registered is limited. The registered references automatically expire after a while but are guaranteed to be usable during the life-time of the used license token. Reference audio data can only be used with adaptable TTS engines. |
SpeakRemoveReferenceAudio | SpeakRemoveReferenceAudioRequest | .google.protobuf.Empty | SpeakRemoveReferenceAudio removes a registered reference audio. |
.proto Type | Notes | C++ | Java | Python | Go | C# | PHP | Ruby |
double | double | double | float | float64 | double | float | Float | |
float | float | float | float | float32 | float | float | Float | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | long | int/long | int64 | long | integer/string | Bignum |
uint32 | Uses variable-length encoding. | uint32 | int | int/long | uint32 | uint | integer | Bignum or Fixnum (as required) |
uint64 | Uses variable-length encoding. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum or Fixnum (as required) |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | long | int/long | int64 | long | integer/string | Bignum |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 | uint | integer | Bignum or Fixnum (as required) |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum |
sfixed32 | Always four bytes. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
sfixed64 | Always eight bytes. | int64 | long | int/long | int64 | long | integer/string | Bignum |
bool | bool | boolean | boolean | bool | bool | boolean | TrueClass/FalseClass | |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | String | str/unicode | string | string | string | String (UTF-8) |
bytes | May contain any arbitrary sequence of bytes. | string | ByteString | str | []byte | ByteString | string | String (ASCII-8BIT) |