Protocol Documentation

Table of Contents

gateway_grpc.proto

Top

AvailableRequest

AvailableRequest contains the license token acquired be the AppTek license server

to authenticate any request to list available engines.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

Language

Language description in detail.

FieldTypeLabelDescription
lang_code string

The language code.

english_name string

The english name of the language.

native_name string

The native name of the language.

direction Direction

The writing direction.

LineSegmentationAvailable

LineSegmentationAvailable describes one available language for ILS.

FieldTypeLabelDescription
language Language

The language details.

domain string

The domain.

LineSegmentationAvailableResponse

LineSegmentationAvailableResponse contains the list of all available languages for ILS.

It is returned by the LineSegmentationGetAvailable API.

FieldTypeLabelDescription
list LineSegmentationAvailable repeated

List of available languages for ILS.

LineSegmentationOptions

LineSegmentationOptions contains optional parameters in order to tune the output

of a line segmentation request.

FieldTypeLabelDescription
character_limit google.protobuf.UInt32Value

The maximum number of character per subtitle line.

max_lines google.protobuf.UInt32Value

The maximum numbers of lines in a subtitle block.

max_duration google.protobuf.FloatValue

The maximum duration of a subtitle block in seconds.

min_duration google.protobuf.FloatValue

The minimum duration of a subtitle block in seconds. The minimum duration of a subtitle block in seconds.

min_duration_between google.protobuf.FloatValue

The minimum spacing between two subsequent subtitle blocks in seconds

max_pause_within_sentence google.protobuf.FloatValue

Split sentences at gaps larger than this duration (in seconds).

new_speaker_symbol string

E.g. '>>' or '-'. Will be added in case of speaker changes when input format is XML and speaker labels are available. Default is to not add a symbol.

max_close_gap_duration google.protobuf.FloatValue

Gaps shorter than this duration (in seconds) will be closed to create back-to-back subtitles by increasing the first subtitle's end time.

max_reading_speed google.protobuf.FloatValue

The maximum number of characters per second in a subtitle block (soft constraint). In the ASR use case, achieved by increasing end time of subtitle blocks. For translation, where timings are not changed, the constraint influences the actual segmentation instead, the number of allowed characters per block will be accounted for (if possible).

use_multi_sentence_lines google.protobuf.BoolValue

Enforces a specific behavior of when to put subsequent sentences onto the same line. If set to true, put short sentences into the same line as preceding or following sentence wherever possible (see also 'multi_sentence_lines_max_pause'). If set to false, keep all sentences separate, this may however lead to short blocks that violate the minimum duration in some cases. If not set, no specific behavior is enforced and the algorithm tries to make optimal decisions per individual case, possibly depending on other configuration fields and language-specific defaults.

use_multi_sentence_blocks google.protobuf.BoolValue

Enforces a specific behavior of when to put several sentences into one block. If set to true, put multiple sentences into one block separated by line breaks wherever possible (see also 'multi_sentence_blocks_max_pause'), if spoken by the same speaker. If set to false, start each sentence in a new block. If not set, no specific behavior is enforced and the algorithm tries to make optimal decisions per individual case, possibly depending on other configuration fields and language-specific defaults. If no speaker ids are available (via speaker diarization), all sentences are assumed to be spoken by the same speaker.

use_multi_speaker_blocks google.protobuf.BoolValue

If set, implies 'use_multi_sentence_blocks' but allow putting multiple sentences into the same block even if spoken by different speakers. See also the 'dialogue_dash' option.

multi_sentence_lines_max_pause google.protobuf.FloatValue

Only allow putting multiple sentences into one line according to 'use_multi_sentence_lines' if the pause between the sentences (in seconds) is not longer than this value.

multi_sentence_blocks_max_pause google.protobuf.FloatValue

Only allow putting multiple sentences into one block according to 'use_multi_sentence_blocks' if the pause between the sentences (in seconds) is not longer than this value.

dialogue_dash google.protobuf.StringValue

If 'use_multi_speaker_blocks' is set, use the following string to indicate speaker changes within multi-speaker blocks. Spacing sensitive, for example set to "- " to use hyphen with space. Use empty string to not add a symbol at all. Differs from 'new_speaker_symbol' in that the symbol is not added at every speaker change, but only where necessary to distinguish speakers within a multi-speaker block.

LineSegmentationRequest

LineSegmentationRequest contains the request to perform a line segmentation from

an input audio transcription into a SRT document.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

source_format LineSegmentationSourceFormat

The format of the source document.

source string

The source document.

lang_code string

The language code of the source document.

options LineSegmentationOptions

The optional parameters to tune the output.

domain string

The optional domain name (leave empty if in doubt).

LineSegmentationResponse

LineSegmentationResponse contains the target document of a line segmentation request.

FieldTypeLabelDescription
target string

The target document as SRT.

subtitles LineSegmentationSubtitles

A structured response of the subtitles.

LineSegmentationSubtitle

FieldTypeLabelDescription
index uint32

The index of the subtitle.

start_time google.protobuf.Duration

The start time of the subtitle.

stop_time google.protobuf.Duration

The stop time of the subtitle.

lines LineSegmentationSubtitleLine repeated

All lines of the subtitle.

LineSegmentationSubtitleLine

FieldTypeLabelDescription
line string

The line of text.

speaker_id string

The speaker id if available.

LineSegmentationSubtitles

FieldTypeLabelDescription
subtitles LineSegmentationSubtitle repeated

All subtitles.

Recognize2AudioConfig

Recognize2AudioConfig describes the details of the audio stream from the client.

FieldTypeLabelDescription
sample_type Recognize2AudioConfig.SampleType

The sample type.

sample_rate_hz uint32

The sample rate in Hz

channels uint32

The number of channels (only mono supported at this time).

Recognize2Available

Recognize2Available describes one available recognition v2 engine

FieldTypeLabelDescription
domain string

The Domain name.

language Language

The language details.

diarization bool

True if this model supports speaker diarization.

Recognize2AvailableResponse

Recognize2AvailableResponse contains the list of all available recognition engines.

It is returned by the RecognizeGetAvailable API.

FieldTypeLabelDescription
list Recognize2Available repeated

The List of all available recognition engines.

Recognize2DiarizationInformation

Recognize2DiarizationInformation are generated if diarization was enabled.

FieldTypeLabelDescription
speaker_id string

Each speaker in an audio is labeled with an unique id.

speaker_name string optional

Some diarization models provide the name of pre-defined speakers.

Recognize2DiarizerConfig

Recognize2DiarizerConfig describes the optional diarizer configuration.

FieldTypeLabelDescription
enable bool

Recognize2PCTranscription

Recognize2PCTranscription returns the latest stable word after being processed by the capitalizer and sentence marker.

For sentence marker the stable field is not set.

If a text is supposed to be attached to the previous word without space, the is_attached_left field is true.

If a text contains a sentence end punctuation (for large sentences this might happen on a comma), the is_sentence_end marker is true.

FieldTypeLabelDescription
stable_transcription Recognize2TimedTranscription

The stable transcription from the recognizer.

text string

The stable word after punctuation and capitalization.

is_attached_left bool

True if the text is supposed to be attached to the previous text without whitespace, false otherwise.

is_sentence_end bool

True if the text is a sentence end marker, false otherwise.

Recognize2Progress

Recognize2Progress response contains the position of the recognizer in the audio stream.

FieldTypeLabelDescription
timestamp google.protobuf.Duration

recognizer timestamp

Recognize2Segment

Recognize2Segment contains the text of the latest segment (sentence). The segment text is post-processed by

inverse text normalization.

FieldTypeLabelDescription
start google.protobuf.Duration

The start time of the segment (start of the first word).

stop google.protobuf.Duration

The stop time of the segment (stop of the last word).

timed_transcriptions Recognize2TimedTranscription repeated

All stable timed transcription details of this segment.

text string

The full text post-processed by ITN (inverse text normalization).

diarizer_info Recognize2DiarizationInformation

Recognize2SegmenterConfig

Recognize2SegmenterConfig describes the optional segmenter/punctuator configuration.

FieldTypeLabelDescription
sentence_end_token_threshold uint32 optional

Segmentation is done at a sentence end. For longer sentences a comma might trigger a sentence end as well to ensure lower latencies in a real-time subtitling scenario. This parameter sets the minimum number of tokens (words) in a new sentence that must be seen before a comma can trigger a sentence end. If not set it is set to the models default value.

Recognize2SpeechConfig

Recognize2SpeechConfig describes the asr model to be used.

FieldTypeLabelDescription
domain string

The domain name (leave empty if not sure).

lang_code string

The language code of the audio.

Recognize2StreamConfig

Recognize2StreamConfig is the configuration message of a Recognize2Stream request. It must be the first and

only the first message in the stream.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

audio_configuration Recognize2AudioConfig

The audio configuration.

speech_configuration Recognize2SpeechConfig

The speech configuration.

diarizer_configuration Recognize2DiarizerConfig

The diarizer configuration.

translate_configurations Recognize2TranslateConfig repeated

Translate configurations for each requests target language.

segmenter_configuration Recognize2SegmenterConfig

The segmenter configuration.

Recognize2StreamRequest

Recognize2StreamRequest are sent on the upstream from the client. The first and only the first message must

be the configuration message followed by content messages.

FieldTypeLabelDescription
configuration Recognize2StreamConfig

The Configuration message.

content bytes

PCM maudio data matching sample_type/sample_rate/channels in Recognize2AudioConfig.

Recognize2StreamResponse

Recognize2StreamResponse messages are streamed back by the Recognize2Stream rpc.

FieldTypeLabelDescription
progress Recognize2Progress

A progress status message.

transcription Recognize2Transcription

A transcription result from the recognizer.

pc_transcription Recognize2PCTranscription

A transcription result after being punctuated and capitalized.

segment Recognize2Segment

The transcription of a segment (sentence) after being processed by ITN.

translation Recognize2Translation

A translation of a segment.

Recognize2TimedTranscription

Recognize2TimedTranscription contains the details of a recognized word.

FieldTypeLabelDescription
start google.protobuf.Duration

The start time of a word.

stop google.protobuf.Duration

The stop time of a word.

text string

The recognized word.

confidence float

The confidence score between 0.0 and 1.0

lang_code string

For multi-language models this contains the language code of the word.

Recognize2Transcription

Recognize2Transcription contains the latest stable and/or unstable transcriptions.

Unstables contains the current recognize words which are still subject to change. Any change returns the full

list of unstable words minus words that became stable.

Stables contains the latest words that became stable. Stable words are only returned once.

FieldTypeLabelDescription
unstable_transcriptions Recognize2TimedTranscription repeated

List of still unstable transcriptions.

stable_transcriptions Recognize2TimedTranscription repeated

List of stable transcriptions.

Recognize2TranslateConfig

Recognize2TranslateConfig describes the details of one of the requested target languages.

FieldTypeLabelDescription
domain string

The Domain name (leave empty if not sure).

target_lang_code string

The language code of the requested target language.

glossary_options TranslateGlossaryOptions

Optional use of a glossary.

meta_options TranslateTextMetaOptions

Optional meta option settings.

Recognize2Translation

Recognize2Translation is the translation of a segment into (one of) the target language.

FieldTypeLabelDescription
start google.protobuf.Duration

The start time of the translated text segment.

stop google.protobuf.Duration

The stop time of the translated text segment.

text string

The translated text.

domain string

The domain name of the selected translation model.

lang_code string

The language code of the translated text.

RecognizeAudioConfig

RecognizeAudioConfig is the configuration message for the audio data in the RecognizeStream API.

It must be sent as the first and only the first message in the upstream.

FieldTypeLabelDescription
encoding RecognizeAudioEncoding

The encoding of the audio data.

sample_rate_hz uint32

The sample rate of the audio data in Hz.

lang_code string

language code of the audio

domain string

The domain name (leave empty if not sure).

mutable_suffix_length google.protobuf.UInt32Value

The maximum mutable suffix length (leave empty if not sure).

diarization bool

Enable diarization.

custom_vocabulary_id string

Load custom vocabulary with this id.

RecognizeAvailable

RecognizeAvailable describes one available recognition engine.

FieldTypeLabelDescription
domain string

The domain name.

sample_rate_hz uint32

The audio sample rate in Hz.

language Language

The language details.

mt_target_languages Language repeated

The list of available direct translations.

diarization bool

The value is true if and only if diarization is available, false otherwise.

RecognizeAvailableResponse

RecognizeAvailableResponse contains the list of all available recognition engines.

It is returned by the RecognizeGetAvailable API.

FieldTypeLabelDescription
list RecognizeAvailable repeated

The list of all available recognition engines.

RecognizeDiarization

RecognizeDiarization messages are sent for each complete segment. They contain all speaker sections of the transcriptions and translations in this segment.

This requires to enable diarization in the RecognizeAudioConfig message.

FieldTypeLabelDescription
segment RecognizeSegmentDetails

The segment details.

speaker_sections RecognizeDiarizationSpeakerSection repeated

List of all speaker sections.

RecognizeDiarizationSpeakerSection

RecognizeDiarizationSpeakerSection are part of RecognizeDiarization messages.

They contains the postprocessed (and optionally) translated transcription of a speaker within a complete segment.

FieldTypeLabelDescription
speaker string

The speaker name.

orth string

he transcribed text.

postprocessed string

The postprocessed text.

translation string

The translated test.

start_time_ms uint64

The start time of the speaker section in milliseconds.

stop_time_ms uint64

The stop time of the speaker section in milliseconds.

RecognizePostprocessing

RecognizePostprocessing contains post-processed versions of recognized segments.

For most languages we provide automatic postprocessing of raw recognition results in order to add punctuation, capitalization and translation of spoken forms of numbers, dates, spelled words, etc. into an easy to read and standard written form.

If postprocessing is not available for the source language, the postprocessed field matches the orth field.

Not all RecognizeTranscription messages trigger a corresponding RecognizePostprocessing message, but it is guaranteed to get the postprocessing result of the complete segment transcription.

FieldTypeLabelDescription
segment RecognizeSegmentDetails

The segment details.

orth string

The source text.

postprocessed string

The postprocessed version of the source text.

RecognizeProgressStatus

RecognizeProgressStatus messages are sent from time to time to indicate the progress of the current stream.

This is particular useful when segments in the audio do not contain speech, which might seem to make the recognizer unresponsive since there is nothing to return.

Usually audio streams are limited in time. The remaining time value returns the remaining seconds of audio data the client is allowed to stream.

All values are in seconds.

FieldTypeLabelDescription
audio_decoder_time_sec float

The audio decoder progress time in seconds.

segmenter_progress_time_sec float

The segmenter progress time in seconds.

recognizer_progress_time_sec float

The recognizer progress time in seconds.

remaining_time_sec float

The remaining time in seconds.

RecognizeSegmentDetails

RecognizeSegmentDetails contains the segment id (rolling number starting with 1) and a complete indicator. It is sent with most result messages.

If complete is set to true, the message will be the last message for the particular response type for the given segment id.

FieldTypeLabelDescription
id uint32

The unique segment id.

complete bool

Complete is true if and only if this is the last message for this response type with the given id, false otherwise.

RecognizeSegmentEnd

RecognizeSegmentEnd messages are sent when the segmenter detects a segment end.

These messages are helpful when the client intends to send a certain number of segments. Once the client receives the appropriate RecognizeSegmentEnd message, it can stop sending audio data.

FieldTypeLabelDescription
segment_id uint32

The unique segment id.

segmenter_progress_time_sec float

The progress time in seconds since the last segment end in seconds.

RecognizeSpeakerChange

RecognizeSpeakerChange messages are sent every time a speaker change is detected.

This requires to enable diarization in the RecognizeAudioConfig message.

FieldTypeLabelDescription
start_time_ms uint64

The Start time of the speaker in milliseconds.

speaker string

The speaker name.

RecognizeStreamConfig

RecognizeStreamConfig is the configuration message in the RecognizeStream API.

It must be sent as the first and only the first message in the upstream.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

audio_configuration RecognizeAudioConfig

The audio data configuration.

translate_configuration RecognizeTranslateConfig

The optional translate configuration.

RecognizeStreamRequest

RecognizeStreamRequest represents the upstream data in the RecognizeStream API.

The first message must contain the configuration, all subsequent messages must contain audio data.

FieldTypeLabelDescription
configuration RecognizeStreamConfig

The configuration message. This must be the first message.

audio bytes

The audio data.

RecognizeStreamResponse

RecognizeStreamResponse represents the downstream data in the RecognizeStream API.

It contains always one and only one response type.

FieldTypeLabelDescription
transcription RecognizeTranscription

A transcription response.

segment_end RecognizeSegmentEnd

A segment end response.

progress_status RecognizeProgressStatus

A progress status response.

postprocessing RecognizePostprocessing

A postprocessing response.

translation RecognizeTranslation

A translation response.

speaker_change RecognizeSpeakerChange

A speaker change response.

diarization RecognizeDiarization

A diarization response.

RecognizeTranscription

RecognizeTranscription is a partial recognition result.

FieldTypeLabelDescription
segment RecognizeSegmentDetails

The segment details.

orth string

The transcribed segment in one string.

words RecognizeWord repeated

The list of all transcribed words with detailed information.

RecognizeTranslateConfig

RecognizeTranslateConfig is the optional translation configuration for the RecognizeStream API.

Please make sure the requested target lang code is in the list of available mt target languages for your audio source language.

FieldTypeLabelDescription
translate_interval RecognizeTranslateInterval

The interval in which translation results are produced.

target_lang_code string

The lang code of the requested target language.

glossary_options TranslateGlossaryOptions

Enable use of a registered glossary.

RecognizeTranslation

RecognizeTranslation contains translation results.

FieldTypeLabelDescription
segment RecognizeSegmentDetails

The segment details.

source string

The source text for this translation.

translation string

The translated text

RecognizeWord

RecognizeWord describes all the details to a word in a RecognizeTranscription message.

FieldTypeLabelDescription
word string

The recognized word.

start_time_ms uint64

The start time in milliseconds.

stop_time_ms uint64

The stop time in milliseconds.

confidence float

The confidence score (between 0.0 and 1.0).

SpeakAddLexiconRequest

SpeakAddLexiconRequest contains your lexicon to be registered.

It can be used later to produce speech with fine tuned pronunciation to words in the lexicon.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

lexicon SpeakLexicon

The speak lexicon.

SpeakAddLexiconResponse

SpeakAddLexiconResponse returns the uniq id of the registered lexicon. Use this id in your

speak request to enable usage of the lexicon. The lexicon is automatically removed from the server at

the expiration time or manually with using the SpeakRemoveLexicon API.

FieldTypeLabelDescription
id string

The unique id of the lexicon.

exp int64

The expiration date of the lexicon in seconds since epoch.

SpeakAddReferenceAudioRequest

SpeakAddReferenceAudioRequest contains your reference audio to be registered.

It can be used later to produce speech with an adapted voice.

The reference audio must not be longer than 30sec.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

encoding SpeakReferenceAudioEncoding

The encoding of the reference audio data.

audio bytes

The reference audio data.

SpeakAddReferenceAudioResponse

SpeakAddReferenceAudioResponse returns the unique id of the registered reference

audio data. Use this id in your speak request to adapt the speakers voice.

The audio is automatically removed from the server at the expiration time or

manually with using the SpeakRemoveReferenceAudio API.

FieldTypeLabelDescription
id string

The unique id of the reference audio data.

exp int64

The expiration time in second since epoch.

SpeakAvailable

SpeakAvailable describes one available TTS engine.

FieldTypeLabelDescription
domain string

The domain name.

language Language

The language details.

speaker string

The speaker name.

adaptable bool

If true, the TTS allows to adapt the speaker voice with reference audio data.

gender string

The Gender of the TTS speaker (might be empty, e. g. for adaptable speakers).

SpeakAvailableResponse

SpeakAvailableResponse contains the list of all available TTS engines.

It is returned by the SpeakGetAvailable API.

FieldTypeLabelDescription
list SpeakAvailable repeated

List of all available TTS engines.

SpeakGenerateLexiconSymbolsRequest

SpeakGenerateLexiconSymbolsRequest is used to generate the phonetic transcription (symbols) for a given

word/phrase. To generate a phonetic description for foreign words an optional language pronunciation

lang code can be set.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

text string

The word or a small phrase.

lang_code string

The tts language code.

speaker string

The speaker to generate the audio.

domain string

The domain name

pronunciation_lang_code string optional

The optional language code, e. g. for pronouncing foreign words.

SpeakGenerateLexiconSymbolsResponse

SpeakGenerateLexiconSymbolsResponse returns the phonetic transcription (symbols) and the audio data (as wav file).

FieldTypeLabelDescription
symbols string

The symbols.

data bytes

The chunk of audio data as a wave file.

SpeakLexicon

SpeakLexicon to be used to optimize the TTS.

FieldTypeLabelDescription
lexicon SpeakLexicon.LexiconEntry repeated

Maps source word/phrase to replace rule.

SpeakLexicon.LexiconEntry

FieldTypeLabelDescription
key string

value SpeakLexicon.ReplaceRule

SpeakLexicon.ReplaceRule

FieldTypeLabelDescription
case_sensitive bool

If true, the pattern matching will be performed case sensitive.

text string

The desired tts input as text.

symbols string

The desired tts input as symbols.

SpeakMeta

SpeakMeta returns the details of the TTS audio data.

FieldTypeLabelDescription
encoding SpeakAudioEncoding

The encoding of the TTS audio data.

SpeakRemoveLexiconRequest

SpeakRemoveLexicon removes a lexicon from the server.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

id string

The unique id of the lexicon.

SpeakRemoveReferenceAudioRequest

SpeakRemoveReferenceAudio removes the reference audio from the server.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

id string

The unique id of the reference audio data.

SpeakRequest

SpeakRequest contains the request to convert an input text into audio data.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

encoding SpeakAudioEncoding

The requested encoding of the output audio stream.

text string

The input text.

lang_code string

The language code of the input text.

speaker string

The TTS speaker name.

domain string

The domain name.

tempo float

Tempo specifies the playback speed of the generated audio file. If set to 0.0 it defaults internally to normal speed (1.0), values between 0.5 and 1.5 are applicable.

reference_audio_id string

The unique id of a registered reference audio to adapt the speakers voice.

lexicon_id string

The unique id of a register lexicon.

SpeakResponse

SpeakResponse contains the generated audio data from a speak request.

FieldTypeLabelDescription
meta SpeakMeta

The first and only the first message is a meta message describing the content of the following audio messages.

audio bytes

The audio containing the generated speech.

TextLanguageIdRequest

FieldTypeLabelDescription
license_token string

text string

num_results uint32 optional

probability_threshold float optional

TextLanguageIdResponse

FieldTypeLabelDescription
results TextLanguageIdResult repeated

TextLanguageIdResult

FieldTypeLabelDescription
language Language

probability float

TransformTextAvailable

TransformTextAvailable describes one available transformation model.

FieldTypeLabelDescription
domain string

The domain

language Language

The language details.

mode TransformTextMode

The transformation mode.

TransformTextAvailableResponse

TransformTextResponse lists all available transformation models.

FieldTypeLabelDescription
list TransformTextAvailable repeated

List of available ITN languages.

TransformTextRequest

TransformTextRequest transforms the source text according to the chosen mode.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

source string

The source text.

lang_code string

The source language code.

domain string

The domain.

mode TransformTextMode

text_layout TextLayout

The layout of the text for plain source texts.

TransformTextResponse

TransformTextResponse contains the text after transformation.

FieldTypeLabelDescription
target string

The targe text.

TranslateAddGlossaryRequest

TranslateAddGlossaryRequest message is used to register a new glossary.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

glossary TranslateGlossary

The glossary.

TranslateAddGlossaryResponse

TranslateAddGlossaryResponse message returns the id and expiration time of the

registered glossary.

FieldTypeLabelDescription
id string

The unique id of the glossary.

exp int64

Minimum expiration time of the glossary.

TranslateAvailable

TranslateAvailable describes one available translation engine.

FieldTypeLabelDescription
domain string

The domain name.

source_language Language

The language details for the source text.

target_language Language

The language details for the target text.

formats TranslateTextFormat repeated

The available source text formats.

gender_codes string repeated

The available gender codes.

genre_codes string repeated

The available genre codes.

length_codes string repeated

The available length codes.

style_codes string repeated

The available style codes.

topic_codes string repeated

The available topic codes.

extended_context bool

True iff model supports translation with extended context.

compute_confidence bool

True iff model supports computation of confidence scores.

glossary bool

True iff use of glossary for source language is supported.

glossary_override bool

True iff annotations with translation, but without tag are supported.

glossary_tags bool

True iff annotations with both translation and tag are supported.

markup_transfer bool

True iff annotations with tag, but without translation are supported.

subtitle_condensation bool

True iff subtitle condensation is available.

TranslateAvailableResponse

TranslateAvailableResponse contains the list of all available translation engines.

It is returned by the TranslateGetAvailable API.

FieldTypeLabelDescription
list TranslateAvailable repeated

List of all available translation engines.

TranslateGlossary

TranslateGlossary contains all suggested translations and the language codes for source and target.

FieldTypeLabelDescription
source_lang_code string

The source language code.

target_lang_code string

The target language code.

entries TranslateGlossaryEntry repeated

List of all suggested translations.

TranslateGlossaryEntry

TranslateGlossaryEntry contains one specific suggested translation of word(s).

It is also possible to define additional translations.

FieldTypeLabelDescription
source string

The source word.

target string

The suggested target word

target_alternatives string repeated

The optional alternative suggested translations.

TranslateGlossaryOptions

TranslateGlossaryOptions enables use of a previously added glossary to the backend via the TranslateAddGlossary API.

The glossary can be used to encourage the translation system to use specific translations.

Words which are found in the glossary will be given a tag to easily recognize them in the translation.

These tags can be accessed by matching the tags in the source and target annotations.

Optionally tagging of numbers can be enabled.

FieldTypeLabelDescription
glossary_id string

The unique id of the registered glossary.

enable_number_tagging bool

Numbers can be are tagged in order to show which number on the source side has been used in which position on the target side e.g.: Input: “Tom bought 6 eggs and 4 tomatoes.” Annotation: “Tom bought <1>6</1> eggs and <2>4</2> tomatoes.” Translation: “Tom hat <1>6</1> Eier und <2>4</2> Tomaten gekauft.”

TranslateRemoveGlossaryRequest

TranslateRemoveGlossaryRequest message is used to remove a registered glossary.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

id string

The unique id of the glossary.

TranslateSubtitlesLineSegmentationOptions

TranslateSubtitlesLineSegmentationOptions can be set to tune the line segmentation.

FieldTypeLabelDescription
character_limit uint32 optional

The maximum number of character per subtitle line.

maximum_lines uint32 optional

The maximum numbers of lines in a subtitle block.

max_reading_speed float optional

The maximum number of characters per second in a subtitle block. If set, translation will be condensed to better meet this limit.

max_pause_within_sentence google.protobuf.Duration optional

Split sentences at gaps larger than this duration.

TranslateSubtitlesMetaOptions

TranslateSubtitleMetaOptions can be set to tune the translation.

FieldTypeLabelDescription
gender_code string optional

The gender_code controls the grammar/morphological forms of the translation (for example, female vs. male 1st person verb forms in Slavic languages), in dependency on the gender of the speaker/author of the sentence.

genre_code string optional

The genre_code provides information about the genre of the input text, the translation is then adjusted to fit a particular genre (for example, news, patents, etc.).

length_code string optional

The length_code lets you control the length of the translation output, in relation to the length of the input sentence.

style_code string optional

The style_code controls the style/formality of the translation (for example, using formal "you" instead of informal "you").

topic_code string optional

The topic_code provides information about the topic of the input text, the translation is then adjusted to fit a particular topic (for example, politics, sports, culture within the genre of "news").

TranslateSubtitlesRequest

TranslateSubtitleRequest contains the request to translate formatted subtitles into the given target language.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

source_lang_code string

The language code of your source (input) text.

target_lang_code string

The language code of your desired target language.

domain string

The domain name of the desired translation model.

subtitles TranslateSubtitlesSourceSubtitle repeated

The source text in form of formatted subtitles.

meta_options TranslateSubtitlesMetaOptions optional

The global meta options to tune the translation.

line_segmentation_options TranslateSubtitlesLineSegmentationOptions optional

The line segmentation options for formatting the subtitles.

glossary_options TranslateGlossaryOptions optional

Enable use of a registered glossary.

compute_confidence bool

Compute the confidence score per subtitle.

extended_context bool

Use extended context during translation.

condense_to_maximum_lines bool

Optimize translation to better adhere to maximum_lines line segmentation option.

TranslateSubtitlesResponse

TranslateSubtitleResponse contains the translated subtitles.

FieldTypeLabelDescription
subtitles TranslateSubtitlesTargetSubtitle repeated

The translated subtitles.

segments TranslateTextSegment repeated

The detailed results for each translated segment.

TranslateSubtitlesSourceSubtitle

TranslateSubtitleSourceSubtitle contains one source subtitle.

FieldTypeLabelDescription
index uint32

The unique index of the subtitle.

start_time google.protobuf.Duration

The start time of the subtitle.

stop_time google.protobuf.Duration

The stop time of the subtitle.

lines TranslateSubtitlesSourceSubtitleLine repeated

The lines of text of the subtitle.

do_translate bool

This flag can be used to update parts of an earlier translation. The output will contain translations only of the subtitles with this flag set, as well as possibly surrounding subtitles that are affected by the updated translation. The client can assemble the updated subtitle file via the subtitle indices.

TranslateSubtitlesSourceSubtitleLine

TranslateSubtitlesSourceSubtitleLine contains the content of one line in a subtitle.

FieldTypeLabelDescription
line string

One line of a subtitle block. May include formatting tags, e.g. '<i> ... </i>'. In addition, we support tags for glossary term translation of the form '<term>glossary term</term>' or '<term target="Glossareintrag">glossary term</term>'. The 'target' attribute defines the desired translation of the term between the tags, and a missing 'target' attribute means the term should be kept as is during translation.

extended_context bool optional

Use extended context during translation, overrides global setting if set for this line.

meta_options TranslateSubtitlesMetaOptions optional

Overrides global meta options if set for this line.

TranslateSubtitlesTargetSubtitle

TranslatesSubtitlesTarget contains one target subtitle.

FieldTypeLabelDescription
index uint32

The unique index of the subtitle

start_time google.protobuf.Duration

The start time of the subtitle.

stop_time google.protobuf.Duration

The stop time of the subtitle.

lines TranslateSubtitlesTargetSubtitleLine repeated

The lines of text of the subtitle.

confidence float optional

The confidence score if requested.

TranslateSubtitlesTargetSubtitleLine

TranslateSubtitleTargetSubtitleLine contains on line of a translated subtitle.

FieldTypeLabelDescription
line string

The line of text.

TranslateTextAnnotation

TranslateTextAnnotation messages describe annotations for a (span of) word(s) which allows

to align source text with the target text using the tag when a glossary was used.

FieldTypeLabelDescription
first_word_index uint32

Index of first word in the word_items list.

last_word_index uint32

Index of the last word (exclusive).

tag string

The tag of the corresponding source annotation (if it was set) to identify the span.tag in the translation.

translation_alternatives string repeated

Optional alternative translations.

start_tag string

The tart tag to be inserted before the span when converting to plain text representation, e.g. <i>.

stop_tag string

The stop tags to be inserted after the span when converting to plain text representation, e.g. </i>.

TranslateTextHypothesis

TranslateTextHypothesis contains the translation and details for one translation hypothesis.

FieldTypeLabelDescription
translation string

The translated text.

confidence float

The confidence score for this hypothesis.

word_items TranslateTextWordItem repeated

List of all word items.

target_annotations TranslateTextAnnotation repeated

List of all target annotations.

TranslateTextMetaOptions

TranslateTextMetaOptions are optional parameters to tune the translation.

FieldTypeLabelDescription
gender_code string

The gender_code controls the grammar/morphological forms of the translation (for example, female vs. male 1st person verb forms in Slavic languages), in dependency on the gender of the speaker/author of the sentence.

genre_code string

The Genre_code provides information about the genre of the input text, the translation is then adjusted to fit a particular genre (for example, news, patents, etc.).

length_code string

The length_code lets you control the length of the translation output, in relation to the length of the input sentence.

style_code string

The style_code controls the style/formality of the translation (for example, using formal "you" instead of informal "you").

topic_code string

The topic_code provides information about the topic of the input text, the translation is then adjusted to fit a particular topic (for example, politics, sports, culture within the genre of "news").

TranslateTextRequest

TranslateTextRequest contains the request to translate a source text into the

given target language.

FieldTypeLabelDescription
license_token string

The license token acquired from the license server.

source_text string

The plain source text.

source_lang_code string

The language code of the provided source text.

target_lang_code string

The language code of the requested target language.

domain string

The domain name.

format TranslateTextFormat

The source text format.

srt_options TranslateTextSrtOptions

The optional parameters if source text is a SRT.

meta_options TranslateTextMetaOptions

The optional meta parameters to tune the translation.

extended_context bool

Enable use of extended context.

compute_confidence bool

Enable computation confidence scores, this should only be enabled when required, since it slows down the translation.

num_hypotheses uint32

The maximum number of hypotheses to be returned (minimum and default if not set is 1).

glossary_options TranslateGlossaryOptions

Enable use of a glossary.

parse_xml_tags bool

Enable parsing tags of the form '<tag>tagged term</tag>', see above.

condense_to_maximum_lines bool

Optimize translation to better adhere to maximum_lines line segmentation option.

text_layout TextLayout

The layout of the text for plain source texts.

TranslateTextResponse

TranslateTextResponse contains the target text of a TranslateTextRequest.

FieldTypeLabelDescription
target_text string

Target text of the best hypotheses in one string.

segments TranslateTextSegment repeated

Detailed results for each translated segment.

TranslateTextSegment

TranslateTextSegment contains all hypotheses for one segment.

FieldTypeLabelDescription
hypotheses TranslateTextHypothesis repeated

List of all hypotheses.

source_annotations TranslateTextSourceAnnotation repeated

List of all source annotations.

source_words TranslateTextSourceWordItem repeated

List of all source words matching the indices in the annotations.

TranslateTextSourceAnnotation

TranslateTextSourceAnnotation message describe the annotations on the source text.

FieldTypeLabelDescription
annotation TranslateTextAnnotation

The annotation.

target string

The generated target.

TranslateTextSourceWordItem

TranslateTextSourceWorkItem messages are returned when a glossary was used.

FieldTypeLabelDescription
word string

The word.

is_attached_left bool

True if word is attached to the previous word on the left, false otherwise.

TranslateTextSrtOptions

TranslateTextSrtOptions are optional parameters to tune the generation of

translated SRT documents.

FieldTypeLabelDescription
character_limit google.protobuf.UInt32Value

The maximum number of characters per line.

max_lines google.protobuf.UInt32Value

The maximum number of lines per frame.

max_reading_speed google.protobuf.FloatValue

The maximum number of characters per second in a subtitle block. If set, translation will be condensed to better meet this limit.

max_pause_within_sentence google.protobuf.Duration

Split sentences at gaps larger than this duration.

TranslateTextWordItem

TranslateTextWordItem contains details for a single word.

FieldTypeLabelDescription
word string

The word.

confidence float

The words confidence score

is_attached_left bool

True if word is attached to the previous word on the left, false otherwise.

tag int32

Deprecated. Replaced by tag in the annotation message.

Fields with deprecated option

Name Option
tag

true

Direction

Direction indicates the display direction of a language.

NameNumberDescription
Left2Right 0

Left to right like in roman or germanic languages.

Right2Left 1

Right to left like in arabic

LineSegmentationSourceFormat

LineSegmentationSourceFormat lists all available source document formats for a

line segmentation request.

NameNumberDescription
LINE_SEGMENTATION_SOURCE_RASR_XML 0

RASR XML transcription file.

Recognize2AudioConfig.SampleType

NameNumberDescription
INT16 0

16-bit signed integer.

FLOAT32 1

32-bit floating-point.

RecognizeAudioEncoding

RecognizeAudioEncoding lists all available audio encodings for the upstream.

NameNumberDescription
PCM_16bit_SI_MONO 0

PCM 16bit signed integer mono.

RecognizeTranslateInterval

RecognizeTranslateInterval specifies the interval in which translation results are produced if direct translations of transcriptions is requested.

NameNumberDescription
TRANSLATION_INTERVAL_SEGMENT_END 0

Produce one translation only at segment end.

TRANSLATION_INTERVAL_CONTINUOUS 1

Update translation multiple times during a segment.

SpeakAudioEncoding

SpeakAudioEncoding are the available TTS audio data formats.

NameNumberDescription
TTS_PCM_22050_16bit_SI_MONO 0

PCM 22050 Hz 16bit signed integer mono.

TTS_PCM_16000_16bit_SI_MONO 1

PCM 16000 Hz 16bit signed integer mono.

TTS_PCM_24000_16bit_SI_MONO 2

PCM 24000 Hz 16bit signed integer mono.

TTS_PCM_48000_16bit_SI_MONO 3

PCM 48000 Hz 16bit signed integer mono.

TTS_PCM_8000_16bit_SI_MONO 4

PCM 48000 Hz 16bit signed integer mono.

SpeakReferenceAudioEncoding

SpeakReferenceAudioEncoding lists all available audio encodings for the upstream.

NameNumberDescription
TTS_REF_PCM_16000_16bit_SI_MONO 0

PCM 16000 Hz 16bit signed integer mono.

TextLayout

NameNumberDescription
TEXT_LAYOUT_ONE_SENTENCE_PER_LINE 0

Default is a line based layout, each line has one sentence and the translation will contain exactly one segment per input line

TEXT_LAYOUT_PARAGRAPHS 1

Input is a natural, continuous text. Line breaks, if used at all, are interpreted to mark paragraph boundaries.

TransformTextMode

NameNumberDescription
TRANSFORM_TEXT_MODE_UNSPECIFIED 0

TRANSFORM_TEXT_MODE_ITN 1

inverse text normalization

TRANSFORM_TEXT_MODE_DIACRITIZATION 2

TRANSFORM_TEXT_MODE_CONDENSATION 3

TranslateTextFormat

TranslateTextFormat lists all available input formats for the TranslateText API.

NameNumberDescription
TRANSLATE_TEXT_FORMAT_PLAIN 0

Plain text.

TRANSLATE_TEXT_FORMAT_SRT 1

SRT subtitle file (SubRip).

TRANSLATE_TEXT_FORMAT_SUBTITLES 2

SUBTITLES format (only supported by TranslateSubtitles API).

Gateway

The AppTek gRPC gateway provides access to our services for reduced latency stateful processing.

Each request to the gateway requires a license token. See https://license.apptek.com/doc/index.html for details

on how to acquire license tokens from the AppTek license server.

Method NameRequest TypeResponse TypeDescription
RecognizeGetAvailable AvailableRequest RecognizeAvailableResponse

RecognizeStream is considered deprecated and will be removed in the near future. Please use Recognize2Stream instead. RecognizeGetAvailable returns all available languages for the RecognizeStream API.

RecognizeStream RecognizeStreamRequest stream RecognizeStreamResponse stream

RecognizeStream is considered deprecated and will be removed in the near future. Please use Recognize2Stream instead. RecognizeStream transcribes an incoming audio stream and returns the transcription in raw and postprocessed form and, if requested, translation into a target language. The first and only the first message in the upstream must contain a RecognizeStreamConfig message, followed by messages containing audio data. The audio data is automatically partitioned into segments, where each result message contains a segment id (starting with 1). Segment end events trigger a separate notification message. During a segment returned results a subject to change. The last message in a segment is marked as "complete" and the next messages will be for the next segment. The results for transcriptions, postprocessing and translations are independent of each other in the sense, that if e. g. the transcriptions already moved on to the next segments, trailing postprocessing and translation results can still be returned from the previous segment. Not every transcription message triggers a corresponding postprocessing and translation message, however it is guaranteed to always receive postprocessing and translation results for the "complete" segment transcription. When diarization is enabled on every speaker change a RecognizeSpeakerChange is sent containing the start time of the speaker and the speakers name. Additionally on a complete segment a RecognizeDiarization message is sent containing the postprocessed and optionally translated transcriptions for each speaker in that segment. During the audio stream the API periodically returns progress status messages.

Recognize2GetAvailable AvailableRequest Recognize2AvailableResponse

Recognize2GetAvailable returns all available languages for the Recognize2Stream API.

Recognize2Stream Recognize2StreamRequest stream Recognize2StreamResponse stream

Recognize2Stream transcribes an incoming audio stream and returns the transcription. The transcription is produces and returned in the following ways: - unstable: transcribed words that might change as the audio stream moves on, - stable: transcribed words in raw recognizer output, that will not change anymore, - pc: punctuated and capitalized version of stable words, - segment: once a sentence end is reached, the full sentence after inverse text normalization, - translation(s): if requested, the segment translated in a different language(s).

TranslateGetAvailable AvailableRequest TranslateAvailableResponse

TranslateGetAvailable returns all available languages for the TranslateText API.

TranslateText TranslateTextRequest TranslateTextResponse

TranslateText translates a source text into the provided target language.

TranslateSubtitles TranslateSubtitlesRequest TranslateSubtitlesResponse

TranslateSubtitles translates formatted subtitles.

TranslateAddGlossary TranslateAddGlossaryRequest TranslateAddGlossaryResponse

TranslateAddGlossary registers a glossary which can be used to TranslateText requests. Registered glossaries expire after a while but are guaranteed to be usable during the life-time of the used license token.

TranslateRemoveGlossary TranslateRemoveGlossaryRequest .google.protobuf.Empty

TranslateRemoveGlossary removes a registered glossary.

TextLanguageId TextLanguageIdRequest TextLanguageIdResponse

TextLanguageId to determine the language of a given text.

TransformTextGetAvailable AvailableRequest TransformTextAvailableResponse

TransformTextGetAvailable returns all available ITN languages.

TransformText TransformTextRequest TransformTextResponse

TransformText transforms provides various text transformation models, e. g. for inverse text normalization, text condensation.

LineSegmentationGetAvailable AvailableRequest LineSegmentationAvailableResponse

LineSegmentationGetAvailable returns all available languages for the LineSegmentation API.

LineSegmentation LineSegmentationRequest LineSegmentationResponse

LineSegmentation performs ILS (intelligent line segmentation) on a transcription and returns a SRT document.

SpeakGetAvailable AvailableRequest SpeakAvailableResponse

SpeakGetAvailable returns all available languages for the Speak API.

Speak SpeakRequest SpeakResponse stream

Speak generate speech from a given input text. The audio is returned in a stream where the first message describes the content of the following audio data messages.

SpeakAddLexicon SpeakAddLexiconRequest SpeakAddLexiconResponse

SpeakAddLexicon registers a lexicon which can be used to tune the pronunciation of words. The amount of lexicons than can be registered is limited. The registered lexicons automatically expire after a while but are guaranteed to be usable during the life-time of the used license token.

SpeakRemoveLexicon SpeakRemoveLexiconRequest .google.protobuf.Empty

SpeakRemoveLexicon removes a registered lexicon.

SpeakGenerateLexiconSymbols SpeakGenerateLexiconSymbolsRequest SpeakGenerateLexiconSymbolsResponse stream

SpeakGenerateLexicon creates the symbols of a word/phrase that can be add to a SpeakLexicon and returns an example audio as WAV file. The submitted text can be tagged with a language code different from the language code of the TTS to generate pronunciations of foreign words.

SpeakAddReferenceAudio SpeakAddReferenceAudioRequest SpeakAddReferenceAudioResponse

SpeakAddReferenceAudio registers reference audio data which can be used to adapt the speaker voice. The amount of reference audio data that can be registered is limited. The registered references automatically expire after a while but are guaranteed to be usable during the life-time of the used license token. Reference audio data can only be used with adaptable TTS engines.

SpeakRemoveReferenceAudio SpeakRemoveReferenceAudioRequest .google.protobuf.Empty

SpeakRemoveReferenceAudio removes a registered reference audio.

Scalar Value Types

.proto TypeNotesC++JavaPythonGoC#PHPRuby
double double double float float64 double float Float
float float float float float32 float float Float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int int32 int integer Bignum or Fixnum (as required)
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long int64 long integer/string Bignum
uint32 Uses variable-length encoding. uint32 int int/long uint32 uint integer Bignum or Fixnum (as required)
uint64 Uses variable-length encoding. uint64 long int/long uint64 ulong integer/string Bignum or Fixnum (as required)
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int int32 int integer Bignum or Fixnum (as required)
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long int64 long integer/string Bignum
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int uint32 uint integer Bignum or Fixnum (as required)
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long uint64 ulong integer/string Bignum
sfixed32 Always four bytes. int32 int int int32 int integer Bignum or Fixnum (as required)
sfixed64 Always eight bytes. int64 long int/long int64 long integer/string Bignum
bool bool boolean boolean bool bool boolean TrueClass/FalseClass
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode string string string String (UTF-8)
bytes May contain any arbitrary sequence of bytes. string ByteString str []byte ByteString string String (ASCII-8BIT)