Protocol Documentation

gateway_grpc.proto
Scalar Value Types

gateway_grpc.proto

Top

AvailableRequest

AvailableRequest contains the license token acquired be the AppTek license server

to authenticate any request to list available engines.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.

Language

Language description in detail.

Field	Type	Label	Description
lang_code	string		The language code.
english_name	string		The english name of the language.
native_name	string		The native name of the language.
direction	Direction		The writing direction.

LineSegmentationAvailable

LineSegmentationAvailable describes one available language for ILS.

Field	Type	Label	Description
language	Language		The language details.
domain	string		The domain.

LineSegmentationAvailableResponse

LineSegmentationAvailableResponse contains the list of all available languages for ILS.

It is returned by the LineSegmentationGetAvailable API.

Field	Type	Label	Description
list	LineSegmentationAvailable	repeated	List of available languages for ILS.

LineSegmentationOptions

LineSegmentationOptions contains optional parameters in order to tune the output

of a line segmentation request.

Field	Type	Label	Description
character_limit	google.protobuf.UInt32Value		The maximum number of character per subtitle line.
max_lines	google.protobuf.UInt32Value		The maximum numbers of lines in a subtitle block.
max_duration	google.protobuf.FloatValue		The maximum duration of a subtitle block in seconds.
min_duration	google.protobuf.FloatValue		The minimum duration of a subtitle block in seconds. The minimum duration of a subtitle block in seconds.
min_duration_between	google.protobuf.FloatValue		The minimum spacing between two subsequent subtitle blocks in seconds
max_pause_within_sentence	google.protobuf.FloatValue		Split sentences at gaps larger than this duration (in seconds).
new_speaker_symbol	string		E.g. '>>' or '-'. Will be added in case of speaker changes when input format is XML and speaker labels are available. Default is to not add a symbol.
max_close_gap_duration	google.protobuf.FloatValue		Gaps shorter than this duration (in seconds) will be closed to create back-to-back subtitles by increasing the first subtitle's end time.
max_reading_speed	google.protobuf.FloatValue		The maximum number of characters per second in a subtitle block (soft constraint). In the ASR use case, achieved by increasing end time of subtitle blocks. For translation, where timings are not changed, the constraint influences the actual segmentation instead, the number of allowed characters per block will be accounted for (if possible).
use_multi_sentence_lines	google.protobuf.BoolValue		Enforces a specific behavior of when to put subsequent sentences onto the same line. If set to true, put short sentences into the same line as preceding or following sentence wherever possible (see also 'multi_sentence_lines_max_pause'). If set to false, keep all sentences separate, this may however lead to short blocks that violate the minimum duration in some cases. If not set, no specific behavior is enforced and the algorithm tries to make optimal decisions per individual case, possibly depending on other configuration fields and language-specific defaults.
use_multi_sentence_blocks	google.protobuf.BoolValue		Enforces a specific behavior of when to put several sentences into one block. If set to true, put multiple sentences into one block separated by line breaks wherever possible (see also 'multi_sentence_blocks_max_pause'), if spoken by the same speaker. If set to false, start each sentence in a new block. If not set, no specific behavior is enforced and the algorithm tries to make optimal decisions per individual case, possibly depending on other configuration fields and language-specific defaults. If no speaker ids are available (via speaker diarization), all sentences are assumed to be spoken by the same speaker.
use_multi_speaker_blocks	google.protobuf.BoolValue		If set, implies 'use_multi_sentence_blocks' but allow putting multiple sentences into the same block even if spoken by different speakers. See also the 'dialogue_dash' option.
multi_sentence_lines_max_pause	google.protobuf.FloatValue		Only allow putting multiple sentences into one line according to 'use_multi_sentence_lines' if the pause between the sentences (in seconds) is not longer than this value.
multi_sentence_blocks_max_pause	google.protobuf.FloatValue		Only allow putting multiple sentences into one block according to 'use_multi_sentence_blocks' if the pause between the sentences (in seconds) is not longer than this value.
dialogue_dash	google.protobuf.StringValue		If 'use_multi_speaker_blocks' is set, use the following string to indicate speaker changes within multi-speaker blocks. Spacing sensitive, for example set to "- " to use hyphen with space. Use empty string to not add a symbol at all. Differs from 'new_speaker_symbol' in that the symbol is not added at every speaker change, but only where necessary to distinguish speakers within a multi-speaker block.

LineSegmentationRequest

LineSegmentationRequest contains the request to perform a line segmentation from

an input audio transcription into a SRT document.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
source_format	LineSegmentationSourceFormat		The format of the source document.
source	string		The source document.
lang_code	string		The language code of the source document.
options	LineSegmentationOptions		The optional parameters to tune the output.
domain	string		The optional domain name (leave empty if in doubt).

LineSegmentationResponse

LineSegmentationResponse contains the target document of a line segmentation request.

Field	Type	Label	Description
target	string		The target document as SRT.
subtitles	LineSegmentationSubtitles		A structured response of the subtitles.

LineSegmentationSubtitle

Field	Type	Label	Description
index	uint32		The index of the subtitle.
start_time	google.protobuf.Duration		The start time of the subtitle.
stop_time	google.protobuf.Duration		The stop time of the subtitle.
lines	LineSegmentationSubtitleLine	repeated	All lines of the subtitle.

LineSegmentationSubtitleLine

Field	Type	Label	Description
line	string		The line of text.
speaker_id	string		The speaker id if available.

LineSegmentationSubtitles

Field	Type	Label	Description
subtitles	LineSegmentationSubtitle	repeated	All subtitles.

Recognize2AudioConfig

Recognize2AudioConfig describes the details of the audio stream from the client.

Field	Type	Label	Description
sample_type	Recognize2AudioConfig.SampleType		The sample type.
sample_rate_hz	uint32		The sample rate in Hz
channels	uint32		The number of channels (only mono supported at this time).

Recognize2Available

Recognize2Available describes one available recognition v2 engine

Field	Type	Label	Description
domain	string		The Domain name.
language	Language		The language details.
diarization	bool		True if this model supports speaker diarization.

Recognize2AvailableResponse

Recognize2AvailableResponse contains the list of all available recognition engines.

It is returned by the RecognizeGetAvailable API.

Field	Type	Label	Description
list	Recognize2Available	repeated	The List of all available recognition engines.

Recognize2DiarizationInformation

Recognize2DiarizationInformation are generated if diarization was enabled.

Field	Type	Label	Description
speaker_id	string		Each speaker in an audio is labeled with an unique id.
speaker_name	string	optional	Some diarization models provide the name of pre-defined speakers.

Recognize2DiarizerConfig

Recognize2DiarizerConfig describes the optional diarizer configuration.

Field	Type	Label	Description
enable	bool

Recognize2PCTranscription

Recognize2PCTranscription returns the latest stable word after being processed by the capitalizer and sentence marker.

For sentence marker the stable field is not set.

If a text is supposed to be attached to the previous word without space, the is_attached_left field is true.

If a text contains a sentence end punctuation (for large sentences this might happen on a comma), the is_sentence_end marker is true.

Field	Type	Label	Description
stable_transcription	Recognize2TimedTranscription		The stable transcription from the recognizer.
text	string		The stable word after punctuation and capitalization.
is_attached_left	bool		True if the text is supposed to be attached to the previous text without whitespace, false otherwise.
is_sentence_end	bool		True if the text is a sentence end marker, false otherwise.

Recognize2Progress

Recognize2Progress response contains the position of the recognizer in the audio stream.

Field	Type	Label	Description
timestamp	google.protobuf.Duration		recognizer timestamp

Recognize2Segment

Recognize2Segment contains the text of the latest segment (sentence). The segment text is post-processed by

inverse text normalization.

Field	Type	Label	Description
start	google.protobuf.Duration		The start time of the segment (start of the first word).
stop	google.protobuf.Duration		The stop time of the segment (stop of the last word).
timed_transcriptions	Recognize2TimedTranscription	repeated	All stable timed transcription details of this segment.
text	string		The full text post-processed by ITN (inverse text normalization).
diarizer_info	Recognize2DiarizationInformation

Recognize2SegmenterConfig

Recognize2SegmenterConfig describes the optional segmenter/punctuator configuration.

Field	Type	Label	Description
sentence_end_token_threshold	uint32	optional	Segmentation is done at a sentence end. For longer sentences a comma might trigger a sentence end as well to ensure lower latencies in a real-time subtitling scenario. This parameter sets the minimum number of tokens (words) in a new sentence that must be seen before a comma can trigger a sentence end. If not set it is set to the models default value.

Recognize2SpeechConfig

Recognize2SpeechConfig describes the asr model to be used.

Field	Type	Label	Description
domain	string		The domain name (leave empty if not sure).
lang_code	string		The language code of the audio.

Recognize2StreamConfig

Recognize2StreamConfig is the configuration message of a Recognize2Stream request. It must be the first and

only the first message in the stream.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
audio_configuration	Recognize2AudioConfig		The audio configuration.
speech_configuration	Recognize2SpeechConfig		The speech configuration.
diarizer_configuration	Recognize2DiarizerConfig		The diarizer configuration.
translate_configurations	Recognize2TranslateConfig	repeated	Translate configurations for each requests target language.
segmenter_configuration	Recognize2SegmenterConfig		The segmenter configuration.

Recognize2StreamRequest

Recognize2StreamRequest are sent on the upstream from the client. The first and only the first message must

be the configuration message followed by content messages.

Field	Type	Label	Description
configuration	Recognize2StreamConfig		The Configuration message.
content	bytes		PCM maudio data matching sample_type/sample_rate/channels in Recognize2AudioConfig.

Recognize2StreamResponse

Recognize2StreamResponse messages are streamed back by the Recognize2Stream rpc.

Field	Type	Label	Description
progress	Recognize2Progress		A progress status message.
transcription	Recognize2Transcription		A transcription result from the recognizer.
pc_transcription	Recognize2PCTranscription		A transcription result after being punctuated and capitalized.
segment	Recognize2Segment		The transcription of a segment (sentence) after being processed by ITN.
translation	Recognize2Translation		A translation of a segment.

Recognize2TimedTranscription

Recognize2TimedTranscription contains the details of a recognized word.

Field	Type	Label	Description
start	google.protobuf.Duration		The start time of a word.
stop	google.protobuf.Duration		The stop time of a word.
text	string		The recognized word.
confidence	float		The confidence score between 0.0 and 1.0
lang_code	string		For multi-language models this contains the language code of the word.

Recognize2Transcription

Recognize2Transcription contains the latest stable and/or unstable transcriptions.

Unstables contains the current recognize words which are still subject to change. Any change returns the full

list of unstable words minus words that became stable.

Stables contains the latest words that became stable. Stable words are only returned once.

Field	Type	Label	Description
unstable_transcriptions	Recognize2TimedTranscription	repeated	List of still unstable transcriptions.
stable_transcriptions	Recognize2TimedTranscription	repeated	List of stable transcriptions.

Recognize2TranslateConfig

Recognize2TranslateConfig describes the details of one of the requested target languages.

Field	Type	Label	Description
domain	string		The Domain name (leave empty if not sure).
target_lang_code	string		The language code of the requested target language.
glossary_options	TranslateGlossaryOptions		Optional use of a glossary.
meta_options	TranslateTextMetaOptions		Optional meta option settings.

Recognize2Translation

Recognize2Translation is the translation of a segment into (one of) the target language.

Field	Type	Label	Description
start	google.protobuf.Duration		The start time of the translated text segment.
stop	google.protobuf.Duration		The stop time of the translated text segment.
text	string		The translated text.
domain	string		The domain name of the selected translation model.
lang_code	string		The language code of the translated text.

RecognizeAudioConfig

RecognizeAudioConfig is the configuration message for the audio data in the RecognizeStream API.

It must be sent as the first and only the first message in the upstream.

Field	Type	Label	Description
encoding	RecognizeAudioEncoding		The encoding of the audio data.
sample_rate_hz	uint32		The sample rate of the audio data in Hz.
lang_code	string		language code of the audio
domain	string		The domain name (leave empty if not sure).
mutable_suffix_length	google.protobuf.UInt32Value		The maximum mutable suffix length (leave empty if not sure).
diarization	bool		Enable diarization.
custom_vocabulary_id	string		Load custom vocabulary with this id.

RecognizeAvailable

RecognizeAvailable describes one available recognition engine.

Field	Type	Label	Description
domain	string		The domain name.
sample_rate_hz	uint32		The audio sample rate in Hz.
language	Language		The language details.
mt_target_languages	Language	repeated	The list of available direct translations.
diarization	bool		The value is true if and only if diarization is available, false otherwise.

RecognizeAvailableResponse

RecognizeAvailableResponse contains the list of all available recognition engines.

It is returned by the RecognizeGetAvailable API.

Field	Type	Label	Description
list	RecognizeAvailable	repeated	The list of all available recognition engines.

RecognizeDiarization

RecognizeDiarization messages are sent for each complete segment. They contain all speaker sections of the transcriptions and translations in this segment.

This requires to enable diarization in the RecognizeAudioConfig message.

Field	Type	Label	Description
segment	RecognizeSegmentDetails		The segment details.
speaker_sections	RecognizeDiarizationSpeakerSection	repeated	List of all speaker sections.

RecognizeDiarizationSpeakerSection

RecognizeDiarizationSpeakerSection are part of RecognizeDiarization messages.

They contains the postprocessed (and optionally) translated transcription of a speaker within a complete segment.

Field	Type	Label	Description
speaker	string		The speaker name.
orth	string		he transcribed text.
postprocessed	string		The postprocessed text.
translation	string		The translated test.
start_time_ms	uint64		The start time of the speaker section in milliseconds.
stop_time_ms	uint64		The stop time of the speaker section in milliseconds.

RecognizePostprocessing

RecognizePostprocessing contains post-processed versions of recognized segments.

For most languages we provide automatic postprocessing of raw recognition results in order to add punctuation, capitalization and translation of spoken forms of numbers, dates, spelled words, etc. into an easy to read and standard written form.

If postprocessing is not available for the source language, the postprocessed field matches the orth field.

Not all RecognizeTranscription messages trigger a corresponding RecognizePostprocessing message, but it is guaranteed to get the postprocessing result of the complete segment transcription.

Field	Type	Label	Description
segment	RecognizeSegmentDetails		The segment details.
orth	string		The source text.
postprocessed	string		The postprocessed version of the source text.

RecognizeProgressStatus

RecognizeProgressStatus messages are sent from time to time to indicate the progress of the current stream.

This is particular useful when segments in the audio do not contain speech, which might seem to make the recognizer unresponsive since there is nothing to return.

Usually audio streams are limited in time. The remaining time value returns the remaining seconds of audio data the client is allowed to stream.

All values are in seconds.

Field	Type	Label	Description
audio_decoder_time_sec	float		The audio decoder progress time in seconds.
segmenter_progress_time_sec	float		The segmenter progress time in seconds.
recognizer_progress_time_sec	float		The recognizer progress time in seconds.
remaining_time_sec	float		The remaining time in seconds.

RecognizeSegmentDetails

RecognizeSegmentDetails contains the segment id (rolling number starting with 1) and a complete indicator. It is sent with most result messages.

If complete is set to true, the message will be the last message for the particular response type for the given segment id.

Field	Type	Label	Description
id	uint32		The unique segment id.
complete	bool		Complete is true if and only if this is the last message for this response type with the given id, false otherwise.

RecognizeSegmentEnd

RecognizeSegmentEnd messages are sent when the segmenter detects a segment end.

These messages are helpful when the client intends to send a certain number of segments. Once the client receives the appropriate RecognizeSegmentEnd message, it can stop sending audio data.

Field	Type	Label	Description
segment_id	uint32		The unique segment id.
segmenter_progress_time_sec	float		The progress time in seconds since the last segment end in seconds.

RecognizeSpeakerChange

RecognizeSpeakerChange messages are sent every time a speaker change is detected.

This requires to enable diarization in the RecognizeAudioConfig message.

Field	Type	Label	Description
start_time_ms	uint64		The Start time of the speaker in milliseconds.
speaker	string		The speaker name.

RecognizeStreamConfig

RecognizeStreamConfig is the configuration message in the RecognizeStream API.

It must be sent as the first and only the first message in the upstream.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
audio_configuration	RecognizeAudioConfig		The audio data configuration.
translate_configuration	RecognizeTranslateConfig		The optional translate configuration.

RecognizeStreamRequest

RecognizeStreamRequest represents the upstream data in the RecognizeStream API.

The first message must contain the configuration, all subsequent messages must contain audio data.

Field	Type	Label	Description
configuration	RecognizeStreamConfig		The configuration message. This must be the first message.
audio	bytes		The audio data.

RecognizeStreamResponse

RecognizeStreamResponse represents the downstream data in the RecognizeStream API.

It contains always one and only one response type.

Field	Type	Label	Description
transcription	RecognizeTranscription		A transcription response.
segment_end	RecognizeSegmentEnd		A segment end response.
progress_status	RecognizeProgressStatus		A progress status response.
postprocessing	RecognizePostprocessing		A postprocessing response.
translation	RecognizeTranslation		A translation response.
speaker_change	RecognizeSpeakerChange		A speaker change response.
diarization	RecognizeDiarization		A diarization response.

RecognizeTranscription

RecognizeTranscription is a partial recognition result.

Field	Type	Label	Description
segment	RecognizeSegmentDetails		The segment details.
orth	string		The transcribed segment in one string.
words	RecognizeWord	repeated	The list of all transcribed words with detailed information.

RecognizeTranslateConfig

RecognizeTranslateConfig is the optional translation configuration for the RecognizeStream API.

Please make sure the requested target lang code is in the list of available mt target languages for your audio source language.

Field	Type	Label	Description
translate_interval	RecognizeTranslateInterval		The interval in which translation results are produced.
target_lang_code	string		The lang code of the requested target language.
glossary_options	TranslateGlossaryOptions		Enable use of a registered glossary.

RecognizeTranslation

RecognizeTranslation contains translation results.

Field	Type	Label	Description
segment	RecognizeSegmentDetails		The segment details.
source	string		The source text for this translation.
translation	string		The translated text

RecognizeWord

RecognizeWord describes all the details to a word in a RecognizeTranscription message.

Field	Type	Label	Description
word	string		The recognized word.
start_time_ms	uint64		The start time in milliseconds.
stop_time_ms	uint64		The stop time in milliseconds.
confidence	float		The confidence score (between 0.0 and 1.0).

SpeakAddLexiconRequest

SpeakAddLexiconRequest contains your lexicon to be registered.

It can be used later to produce speech with fine tuned pronunciation to words in the lexicon.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
lexicon	SpeakLexicon		The speak lexicon.

SpeakAddLexiconResponse

SpeakAddLexiconResponse returns the uniq id of the registered lexicon. Use this id in your

speak request to enable usage of the lexicon. The lexicon is automatically removed from the server at

the expiration time or manually with using the SpeakRemoveLexicon API.

Field	Type	Label	Description
id	string		The unique id of the lexicon.
exp	int64		The expiration date of the lexicon in seconds since epoch.

SpeakAddReferenceAudioRequest

SpeakAddReferenceAudioRequest contains your reference audio to be registered.

It can be used later to produce speech with an adapted voice.

The reference audio must not be longer than 30sec.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
encoding	SpeakReferenceAudioEncoding		The encoding of the reference audio data.
audio	bytes		The reference audio data.

SpeakAddReferenceAudioResponse

SpeakAddReferenceAudioResponse returns the unique id of the registered reference

audio data. Use this id in your speak request to adapt the speakers voice.

The audio is automatically removed from the server at the expiration time or

manually with using the SpeakRemoveReferenceAudio API.

Field	Type	Label	Description
id	string		The unique id of the reference audio data.
exp	int64		The expiration time in second since epoch.

SpeakAvailable

SpeakAvailable describes one available TTS engine.

Field	Type	Label	Description
domain	string		The domain name.
language	Language		The language details.
speaker	string		The speaker name.
adaptable	bool		If true, the TTS allows to adapt the speaker voice with reference audio data.
gender	string		The Gender of the TTS speaker (might be empty, e. g. for adaptable speakers).

SpeakAvailableResponse

SpeakAvailableResponse contains the list of all available TTS engines.

It is returned by the SpeakGetAvailable API.

Field	Type	Label	Description
list	SpeakAvailable	repeated	List of all available TTS engines.

SpeakGenerateLexiconSymbolsRequest

SpeakGenerateLexiconSymbolsRequest is used to generate the phonetic transcription (symbols) for a given

word/phrase. To generate a phonetic description for foreign words an optional language pronunciation

lang code can be set.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
text	string		The word or a small phrase.
lang_code	string		The tts language code.
speaker	string		The speaker to generate the audio.
domain	string		The domain name
pronunciation_lang_code	string	optional	The optional language code, e. g. for pronouncing foreign words.

SpeakGenerateLexiconSymbolsResponse

SpeakGenerateLexiconSymbolsResponse returns the phonetic transcription (symbols) and the audio data (as wav file).

Field	Type	Label	Description
symbols	string		The symbols.
data	bytes		The chunk of audio data as a wave file.

SpeakLexicon

SpeakLexicon to be used to optimize the TTS.

Field	Type	Label	Description
lexicon	SpeakLexicon.LexiconEntry	repeated	Maps source word/phrase to replace rule.

SpeakLexicon.LexiconEntry

Field	Type	Label	Description
key	string
value	SpeakLexicon.ReplaceRule

SpeakLexicon.ReplaceRule

Field	Type	Label	Description
case_sensitive	bool		If true, the pattern matching will be performed case sensitive.
text	string		The desired tts input as text.
symbols	string		The desired tts input as symbols.

SpeakMeta

SpeakMeta returns the details of the TTS audio data.

Field	Type	Label	Description
encoding	SpeakAudioEncoding		The encoding of the TTS audio data.

SpeakRemoveLexiconRequest

SpeakRemoveLexicon removes a lexicon from the server.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
id	string		The unique id of the lexicon.

SpeakRemoveReferenceAudioRequest

SpeakRemoveReferenceAudio removes the reference audio from the server.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
id	string		The unique id of the reference audio data.

SpeakRequest

SpeakRequest contains the request to convert an input text into audio data.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
encoding	SpeakAudioEncoding		The requested encoding of the output audio stream.
text	string		The input text.
lang_code	string		The language code of the input text.
speaker	string		The TTS speaker name.
domain	string		The domain name.
tempo	float		Tempo specifies the playback speed of the generated audio file. If set to 0.0 it defaults internally to normal speed (1.0), values between 0.5 and 1.5 are applicable.
reference_audio_id	string		The unique id of a registered reference audio to adapt the speakers voice.
lexicon_id	string		The unique id of a register lexicon.

SpeakResponse

SpeakResponse contains the generated audio data from a speak request.

Field	Type	Label	Description
meta	SpeakMeta		The first and only the first message is a meta message describing the content of the following audio messages.
audio	bytes		The audio containing the generated speech.

TextLanguageIdRequest

Field	Type	Label	Description
license_token	string
text	string
num_results	uint32	optional
probability_threshold	float	optional

TextLanguageIdResponse

Field	Type	Label	Description
results	TextLanguageIdResult	repeated

TextLanguageIdResult

Field	Type	Label	Description
language	Language
probability	float

TransformTextAvailable

TransformTextAvailable describes one available transformation model.

Field	Type	Label	Description
domain	string		The domain
language	Language		The language details.
mode	TransformTextMode		The transformation mode.

TransformTextAvailableResponse

TransformTextResponse lists all available transformation models.

Field	Type	Label	Description
list	TransformTextAvailable	repeated	List of available ITN languages.

TransformTextRequest

TransformTextRequest transforms the source text according to the chosen mode.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
source	string		The source text.
lang_code	string		The source language code.
domain	string		The domain.
mode	TransformTextMode
text_layout	TextLayout		The layout of the text for plain source texts.

TransformTextResponse

TransformTextResponse contains the text after transformation.

Field	Type	Label	Description
target	string		The targe text.

TranslateAddGlossaryRequest

TranslateAddGlossaryRequest message is used to register a new glossary.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
glossary	TranslateGlossary		The glossary.

TranslateAddGlossaryResponse

TranslateAddGlossaryResponse message returns the id and expiration time of the

registered glossary.

Field	Type	Label	Description
id	string		The unique id of the glossary.
exp	int64		Minimum expiration time of the glossary.

TranslateAvailable

TranslateAvailable describes one available translation engine.

Field	Type	Label	Description
domain	string		The domain name.
source_language	Language		The language details for the source text.
target_language	Language		The language details for the target text.
formats	TranslateTextFormat	repeated	The available source text formats.
gender_codes	string	repeated	The available gender codes.
genre_codes	string	repeated	The available genre codes.
length_codes	string	repeated	The available length codes.
style_codes	string	repeated	The available style codes.
topic_codes	string	repeated	The available topic codes.
extended_context	bool		True iff model supports translation with extended context.
compute_confidence	bool		True iff model supports computation of confidence scores.
glossary	bool		True iff use of glossary for source language is supported.
glossary_override	bool		True iff annotations with translation, but without tag are supported.
glossary_tags	bool		True iff annotations with both translation and tag are supported.
markup_transfer	bool		True iff annotations with tag, but without translation are supported.
subtitle_condensation	bool		True iff subtitle condensation is available.

TranslateAvailableResponse

TranslateAvailableResponse contains the list of all available translation engines.

It is returned by the TranslateGetAvailable API.

Field	Type	Label	Description
list	TranslateAvailable	repeated	List of all available translation engines.

TranslateGlossary

TranslateGlossary contains all suggested translations and the language codes for source and target.

Field	Type	Label	Description
source_lang_code	string		The source language code.
target_lang_code	string		The target language code.
entries	TranslateGlossaryEntry	repeated	List of all suggested translations.

TranslateGlossaryEntry

TranslateGlossaryEntry contains one specific suggested translation of word(s).

It is also possible to define additional translations.

Field	Type	Label	Description
source	string		The source word.
target	string		The suggested target word
target_alternatives	string	repeated	The optional alternative suggested translations.

TranslateGlossaryOptions

TranslateGlossaryOptions enables use of a previously added glossary to the backend via the TranslateAddGlossary API.

The glossary can be used to encourage the translation system to use specific translations.

Words which are found in the glossary will be given a tag to easily recognize them in the translation.

These tags can be accessed by matching the tags in the source and target annotations.

Optionally tagging of numbers can be enabled.

Field	Type	Label	Description
glossary_id	string		The unique id of the registered glossary.
enable_number_tagging	bool		Numbers can be are tagged in order to show which number on the source side has been used in which position on the target side e.g.: Input: “Tom bought 6 eggs and 4 tomatoes.” Annotation: “Tom bought <1>6</1> eggs and <2>4</2> tomatoes.” Translation: “Tom hat <1>6</1> Eier und <2>4</2> Tomaten gekauft.”

TranslateRemoveGlossaryRequest

TranslateRemoveGlossaryRequest message is used to remove a registered glossary.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
id	string		The unique id of the glossary.

TranslateSubtitlesLineSegmentationOptions

TranslateSubtitlesLineSegmentationOptions can be set to tune the line segmentation.

Field	Type	Label	Description
character_limit	uint32	optional	The maximum number of character per subtitle line.
maximum_lines	uint32	optional	The maximum numbers of lines in a subtitle block.
max_reading_speed	float	optional	The maximum number of characters per second in a subtitle block. If set, translation will be condensed to better meet this limit.
max_pause_within_sentence	google.protobuf.Duration	optional	Split sentences at gaps larger than this duration.

TranslateSubtitlesMetaOptions

TranslateSubtitleMetaOptions can be set to tune the translation.

Field	Type	Label	Description
gender_code	string	optional	The gender_code controls the grammar/morphological forms of the translation (for example, female vs. male 1st person verb forms in Slavic languages), in dependency on the gender of the speaker/author of the sentence.
genre_code	string	optional	The genre_code provides information about the genre of the input text, the translation is then adjusted to fit a particular genre (for example, news, patents, etc.).
length_code	string	optional	The length_code lets you control the length of the translation output, in relation to the length of the input sentence.
style_code	string	optional	The style_code controls the style/formality of the translation (for example, using formal "you" instead of informal "you").
topic_code	string	optional	The topic_code provides information about the topic of the input text, the translation is then adjusted to fit a particular topic (for example, politics, sports, culture within the genre of "news").

TranslateSubtitlesRequest

TranslateSubtitleRequest contains the request to translate formatted subtitles into the given target language.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
source_lang_code	string		The language code of your source (input) text.
target_lang_code	string		The language code of your desired target language.
domain	string		The domain name of the desired translation model.
subtitles	TranslateSubtitlesSourceSubtitle	repeated	The source text in form of formatted subtitles.
meta_options	TranslateSubtitlesMetaOptions	optional	The global meta options to tune the translation.
line_segmentation_options	TranslateSubtitlesLineSegmentationOptions	optional	The line segmentation options for formatting the subtitles.
glossary_options	TranslateGlossaryOptions	optional	Enable use of a registered glossary.
compute_confidence	bool		Compute the confidence score per subtitle.
extended_context	bool		Use extended context during translation.
condense_to_maximum_lines	bool		Optimize translation to better adhere to maximum_lines line segmentation option.

TranslateSubtitlesResponse

TranslateSubtitleResponse contains the translated subtitles.

Field	Type	Label	Description
subtitles	TranslateSubtitlesTargetSubtitle	repeated	The translated subtitles.
segments	TranslateTextSegment	repeated	The detailed results for each translated segment.

TranslateSubtitlesSourceSubtitle

TranslateSubtitleSourceSubtitle contains one source subtitle.

Field	Type	Label	Description
index	uint32		The unique index of the subtitle.
start_time	google.protobuf.Duration		The start time of the subtitle.
stop_time	google.protobuf.Duration		The stop time of the subtitle.
lines	TranslateSubtitlesSourceSubtitleLine	repeated	The lines of text of the subtitle.
do_translate	bool		This flag can be used to update parts of an earlier translation. The output will contain translations only of the subtitles with this flag set, as well as possibly surrounding subtitles that are affected by the updated translation. The client can assemble the updated subtitle file via the subtitle indices.

TranslateSubtitlesSourceSubtitleLine

TranslateSubtitlesSourceSubtitleLine contains the content of one line in a subtitle.

Field	Type	Label	Description
line	string		One line of a subtitle block. May include formatting tags, e.g. '<i> ... </i>'. In addition, we support tags for glossary term translation of the form '<term>glossary term</term>' or '<term target="Glossareintrag">glossary term</term>'. The 'target' attribute defines the desired translation of the term between the tags, and a missing 'target' attribute means the term should be kept as is during translation.
extended_context	bool	optional	Use extended context during translation, overrides global setting if set for this line.
meta_options	TranslateSubtitlesMetaOptions	optional	Overrides global meta options if set for this line.

TranslateSubtitlesTargetSubtitle

TranslatesSubtitlesTarget contains one target subtitle.

Field	Type	Label	Description
index	uint32		The unique index of the subtitle
start_time	google.protobuf.Duration		The start time of the subtitle.
stop_time	google.protobuf.Duration		The stop time of the subtitle.
lines	TranslateSubtitlesTargetSubtitleLine	repeated	The lines of text of the subtitle.
confidence	float	optional	The confidence score if requested.

TranslateSubtitlesTargetSubtitleLine

TranslateSubtitleTargetSubtitleLine contains on line of a translated subtitle.

Field	Type	Label	Description
line	string		The line of text.

TranslateTextAnnotation

TranslateTextAnnotation messages describe annotations for a (span of) word(s) which allows

to align source text with the target text using the tag when a glossary was used.

Field	Type	Label	Description
first_word_index	uint32		Index of first word in the word_items list.
last_word_index	uint32		Index of the last word (exclusive).
tag	string		The tag of the corresponding source annotation (if it was set) to identify the span.tag in the translation.
translation_alternatives	string	repeated	Optional alternative translations.
start_tag	string		The tart tag to be inserted before the span when converting to plain text representation, e.g. <i>.
stop_tag	string		The stop tags to be inserted after the span when converting to plain text representation, e.g. </i>.

TranslateTextHypothesis

TranslateTextHypothesis contains the translation and details for one translation hypothesis.

Field	Type	Label	Description
translation	string		The translated text.
confidence	float		The confidence score for this hypothesis.
word_items	TranslateTextWordItem	repeated	List of all word items.
target_annotations	TranslateTextAnnotation	repeated	List of all target annotations.

TranslateTextMetaOptions

TranslateTextMetaOptions are optional parameters to tune the translation.

Field	Type	Label	Description
gender_code	string		The gender_code controls the grammar/morphological forms of the translation (for example, female vs. male 1st person verb forms in Slavic languages), in dependency on the gender of the speaker/author of the sentence.
genre_code	string		The Genre_code provides information about the genre of the input text, the translation is then adjusted to fit a particular genre (for example, news, patents, etc.).
length_code	string		The length_code lets you control the length of the translation output, in relation to the length of the input sentence.
style_code	string		The style_code controls the style/formality of the translation (for example, using formal "you" instead of informal "you").
topic_code	string		The topic_code provides information about the topic of the input text, the translation is then adjusted to fit a particular topic (for example, politics, sports, culture within the genre of "news").

TranslateTextRequest

TranslateTextRequest contains the request to translate a source text into the

given target language.

Field	Type	Label	Description
license_token	string		The license token acquired from the license server.
source_text	string		The plain source text.
source_lang_code	string		The language code of the provided source text.
target_lang_code	string		The language code of the requested target language.
domain	string		The domain name.
format	TranslateTextFormat		The source text format.
srt_options	TranslateTextSrtOptions		The optional parameters if source text is a SRT.
meta_options	TranslateTextMetaOptions		The optional meta parameters to tune the translation.
extended_context	bool		Enable use of extended context.
compute_confidence	bool		Enable computation confidence scores, this should only be enabled when required, since it slows down the translation.
num_hypotheses	uint32		The maximum number of hypotheses to be returned (minimum and default if not set is 1).
glossary_options	TranslateGlossaryOptions		Enable use of a glossary.
parse_xml_tags	bool		Enable parsing tags of the form '<tag>tagged term</tag>', see above.
condense_to_maximum_lines	bool		Optimize translation to better adhere to maximum_lines line segmentation option.
text_layout	TextLayout		The layout of the text for plain source texts.

TranslateTextResponse

TranslateTextResponse contains the target text of a TranslateTextRequest.

Field	Type	Label	Description
target_text	string		Target text of the best hypotheses in one string.
segments	TranslateTextSegment	repeated	Detailed results for each translated segment.

TranslateTextSegment

TranslateTextSegment contains all hypotheses for one segment.

Field	Type	Label	Description
hypotheses	TranslateTextHypothesis	repeated	List of all hypotheses.
source_annotations	TranslateTextSourceAnnotation	repeated	List of all source annotations.
source_words	TranslateTextSourceWordItem	repeated	List of all source words matching the indices in the annotations.

TranslateTextSourceAnnotation

TranslateTextSourceAnnotation message describe the annotations on the source text.

Field	Type	Label	Description
annotation	TranslateTextAnnotation		The annotation.
target	string		The generated target.

TranslateTextSourceWordItem

TranslateTextSourceWorkItem messages are returned when a glossary was used.

Field	Type	Label	Description
word	string		The word.
is_attached_left	bool		True if word is attached to the previous word on the left, false otherwise.

TranslateTextSrtOptions

TranslateTextSrtOptions are optional parameters to tune the generation of

translated SRT documents.

Field	Type	Label	Description
character_limit	google.protobuf.UInt32Value		The maximum number of characters per line.
max_lines	google.protobuf.UInt32Value		The maximum number of lines per frame.
max_reading_speed	google.protobuf.FloatValue		The maximum number of characters per second in a subtitle block. If set, translation will be condensed to better meet this limit.
max_pause_within_sentence	google.protobuf.Duration		Split sentences at gaps larger than this duration.

TranslateTextWordItem

TranslateTextWordItem contains details for a single word.

Field	Type	Label	Description
word	string		The word.
confidence	float		The words confidence score
is_attached_left	bool		True if word is attached to the previous word on the left, false otherwise.
tag	int32		Deprecated. Replaced by tag in the annotation message.

Fields with deprecated option

Name	Option
tag	true

Direction

Direction indicates the display direction of a language.

Name	Number	Description
Left2Right	0	Left to right like in roman or germanic languages.
Right2Left	1	Right to left like in arabic

LineSegmentationSourceFormat

LineSegmentationSourceFormat lists all available source document formats for a

line segmentation request.

Name	Number	Description
LINE_SEGMENTATION_SOURCE_RASR_XML	0	RASR XML transcription file.

Recognize2AudioConfig.SampleType

Name	Number	Description
INT16	0	16-bit signed integer.
FLOAT32	1	32-bit floating-point.

RecognizeAudioEncoding

RecognizeAudioEncoding lists all available audio encodings for the upstream.

Name	Number	Description
PCM_16bit_SI_MONO	0	PCM 16bit signed integer mono.

RecognizeTranslateInterval

RecognizeTranslateInterval specifies the interval in which translation results are produced if direct translations of transcriptions is requested.

Name	Number	Description
TRANSLATION_INTERVAL_SEGMENT_END	0	Produce one translation only at segment end.
TRANSLATION_INTERVAL_CONTINUOUS	1	Update translation multiple times during a segment.

SpeakAudioEncoding

SpeakAudioEncoding are the available TTS audio data formats.

Name	Number	Description
TTS_PCM_22050_16bit_SI_MONO	0	PCM 22050 Hz 16bit signed integer mono.
TTS_PCM_16000_16bit_SI_MONO	1	PCM 16000 Hz 16bit signed integer mono.
TTS_PCM_24000_16bit_SI_MONO	2	PCM 24000 Hz 16bit signed integer mono.
TTS_PCM_48000_16bit_SI_MONO	3	PCM 48000 Hz 16bit signed integer mono.
TTS_PCM_8000_16bit_SI_MONO	4	PCM 48000 Hz 16bit signed integer mono.

SpeakReferenceAudioEncoding

SpeakReferenceAudioEncoding lists all available audio encodings for the upstream.

Name	Number	Description
TTS_REF_PCM_16000_16bit_SI_MONO	0	PCM 16000 Hz 16bit signed integer mono.

TextLayout

Name	Number	Description
TEXT_LAYOUT_ONE_SENTENCE_PER_LINE	0	Default is a line based layout, each line has one sentence and the translation will contain exactly one segment per input line
TEXT_LAYOUT_PARAGRAPHS	1	Input is a natural, continuous text. Line breaks, if used at all, are interpreted to mark paragraph boundaries.

TransformTextMode

Name	Number	Description
TRANSFORM_TEXT_MODE_UNSPECIFIED	0
TRANSFORM_TEXT_MODE_ITN	1	inverse text normalization
TRANSFORM_TEXT_MODE_DIACRITIZATION	2
TRANSFORM_TEXT_MODE_CONDENSATION	3

TranslateTextFormat

TranslateTextFormat lists all available input formats for the TranslateText API.

Name	Number	Description
TRANSLATE_TEXT_FORMAT_PLAIN	0	Plain text.
TRANSLATE_TEXT_FORMAT_SRT	1	SRT subtitle file (SubRip).
TRANSLATE_TEXT_FORMAT_SUBTITLES	2	SUBTITLES format (only supported by TranslateSubtitles API).

Gateway

The AppTek gRPC gateway provides access to our services for reduced latency stateful processing.

Each request to the gateway requires a license token. See https://license.apptek.com/doc/index.html for details

on how to acquire license tokens from the AppTek license server.

Method Name	Request Type	Response Type	Description
RecognizeGetAvailable	AvailableRequest	RecognizeAvailableResponse	RecognizeStream is considered deprecated and will be removed in the near future. Please use Recognize2Stream instead. RecognizeGetAvailable returns all available languages for the RecognizeStream API.
RecognizeStream	RecognizeStreamRequest stream	RecognizeStreamResponse stream	RecognizeStream is considered deprecated and will be removed in the near future. Please use Recognize2Stream instead. RecognizeStream transcribes an incoming audio stream and returns the transcription in raw and postprocessed form and, if requested, translation into a target language. The first and only the first message in the upstream must contain a RecognizeStreamConfig message, followed by messages containing audio data. The audio data is automatically partitioned into segments, where each result message contains a segment id (starting with 1). Segment end events trigger a separate notification message. During a segment returned results a subject to change. The last message in a segment is marked as "complete" and the next messages will be for the next segment. The results for transcriptions, postprocessing and translations are independent of each other in the sense, that if e. g. the transcriptions already moved on to the next segments, trailing postprocessing and translation results can still be returned from the previous segment. Not every transcription message triggers a corresponding postprocessing and translation message, however it is guaranteed to always receive postprocessing and translation results for the "complete" segment transcription. When diarization is enabled on every speaker change a RecognizeSpeakerChange is sent containing the start time of the speaker and the speakers name. Additionally on a complete segment a RecognizeDiarization message is sent containing the postprocessed and optionally translated transcriptions for each speaker in that segment. During the audio stream the API periodically returns progress status messages.
Recognize2GetAvailable	AvailableRequest	Recognize2AvailableResponse	Recognize2GetAvailable returns all available languages for the Recognize2Stream API.
Recognize2Stream	Recognize2StreamRequest stream	Recognize2StreamResponse stream	Recognize2Stream transcribes an incoming audio stream and returns the transcription. The transcription is produces and returned in the following ways: - unstable: transcribed words that might change as the audio stream moves on, - stable: transcribed words in raw recognizer output, that will not change anymore, - pc: punctuated and capitalized version of stable words, - segment: once a sentence end is reached, the full sentence after inverse text normalization, - translation(s): if requested, the segment translated in a different language(s).
TranslateGetAvailable	AvailableRequest	TranslateAvailableResponse	TranslateGetAvailable returns all available languages for the TranslateText API.
TranslateText	TranslateTextRequest	TranslateTextResponse	TranslateText translates a source text into the provided target language.
TranslateSubtitles	TranslateSubtitlesRequest	TranslateSubtitlesResponse	TranslateSubtitles translates formatted subtitles.
TranslateAddGlossary	TranslateAddGlossaryRequest	TranslateAddGlossaryResponse	TranslateAddGlossary registers a glossary which can be used to TranslateText requests. Registered glossaries expire after a while but are guaranteed to be usable during the life-time of the used license token.
TranslateRemoveGlossary	TranslateRemoveGlossaryRequest	.google.protobuf.Empty	TranslateRemoveGlossary removes a registered glossary.
TextLanguageId	TextLanguageIdRequest	TextLanguageIdResponse	TextLanguageId to determine the language of a given text.
TransformTextGetAvailable	AvailableRequest	TransformTextAvailableResponse	TransformTextGetAvailable returns all available ITN languages.
TransformText	TransformTextRequest	TransformTextResponse	TransformText transforms provides various text transformation models, e. g. for inverse text normalization, text condensation.
LineSegmentationGetAvailable	AvailableRequest	LineSegmentationAvailableResponse	LineSegmentationGetAvailable returns all available languages for the LineSegmentation API.
LineSegmentation	LineSegmentationRequest	LineSegmentationResponse	LineSegmentation performs ILS (intelligent line segmentation) on a transcription and returns a SRT document.
SpeakGetAvailable	AvailableRequest	SpeakAvailableResponse	SpeakGetAvailable returns all available languages for the Speak API.
Speak	SpeakRequest	SpeakResponse stream	Speak generate speech from a given input text. The audio is returned in a stream where the first message describes the content of the following audio data messages.
SpeakAddLexicon	SpeakAddLexiconRequest	SpeakAddLexiconResponse	SpeakAddLexicon registers a lexicon which can be used to tune the pronunciation of words. The amount of lexicons than can be registered is limited. The registered lexicons automatically expire after a while but are guaranteed to be usable during the life-time of the used license token.
SpeakRemoveLexicon	SpeakRemoveLexiconRequest	.google.protobuf.Empty	SpeakRemoveLexicon removes a registered lexicon.
SpeakGenerateLexiconSymbols	SpeakGenerateLexiconSymbolsRequest	SpeakGenerateLexiconSymbolsResponse stream	SpeakGenerateLexicon creates the symbols of a word/phrase that can be add to a SpeakLexicon and returns an example audio as WAV file. The submitted text can be tagged with a language code different from the language code of the TTS to generate pronunciations of foreign words.
SpeakAddReferenceAudio	SpeakAddReferenceAudioRequest	SpeakAddReferenceAudioResponse	SpeakAddReferenceAudio registers reference audio data which can be used to adapt the speaker voice. The amount of reference audio data that can be registered is limited. The registered references automatically expire after a while but are guaranteed to be usable during the life-time of the used license token. Reference audio data can only be used with adaptable TTS engines.
SpeakRemoveReferenceAudio	SpeakRemoveReferenceAudioRequest	.google.protobuf.Empty	SpeakRemoveReferenceAudio removes a registered reference audio.

Scalar Value Types

.proto Type	Notes	C++	Java	Python	Go	C#	PHP	Ruby
double		double	double	float	float64	double	float	Float
float		float	float	float	float32	float	float	Float
int32	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
int64	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.	int64	long	int/long	int64	long	integer/string	Bignum
uint32	Uses variable-length encoding.	uint32	int	int/long	uint32	uint	integer	Bignum or Fixnum (as required)
uint64	Uses variable-length encoding.	uint64	long	int/long	uint64	ulong	integer/string	Bignum or Fixnum (as required)
sint32	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
sint64	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	int64	long	int/long	int64	long	integer/string	Bignum
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 2^28.	uint32	int	int	uint32	uint	integer	Bignum or Fixnum (as required)
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 2^56.	uint64	long	int/long	uint64	ulong	integer/string	Bignum
sfixed32	Always four bytes.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
sfixed64	Always eight bytes.	int64	long	int/long	int64	long	integer/string	Bignum
bool		bool	boolean	boolean	bool	bool	boolean	TrueClass/FalseClass
string	A string must always contain UTF-8 encoded or 7-bit ASCII text.	string	String	str/unicode	string	string	string	String (UTF-8)
bytes	May contain any arbitrary sequence of bytes.	string	ByteString	str	[]byte	ByteString	string	String (ASCII-8BIT)