7.3.59. table_tokenize¶
7.3.59.1. Summary¶
table_tokenize command tokenizes text by the specified table's tokenizer.
7.3.59.2. Syntax¶
This command takes many parameters.
table and string are required parameters. Others are
optional:
table_tokenize table
string
[flags=NONE]
[mode=GET]
7.3.59.3. Usage¶
Here is a simple example.
Execution example:
register token_filters/stop_word
# [[0,0.0,0.0],true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStopWord
# [[0,0.0,0.0],true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0,0.0,0.0],true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0,0.0,0.0],1]
table_tokenize Terms "Hello and Good-bye" --mode GET
# [
# [
# 0,
# 0.0,
# 0.0
# ],
# [
# {
# "value": "hello",
# "position": 0
# },
# {
# "value": "good",
# "position": 2
# },
# {
# "value": "-",
# "position": 3
# },
# {
# "value": "bye",
# "position": 4
# }
# ]
# ]
Terms table is set TokenBigram tokenizer, NormalizerAuto normalizer,
TokenFilterStopWord token filter. It returns tokens that is
generated by tokenizeing "Hello and Good-bye" with TokenBigram tokenizer.
It is normalized by NormalizerAuto normalizer.
and token is removed with TokenFilterStopWord token filter.
7.3.59.4. Parameters¶
This section describes all parameters. Parameters are categorized.
7.3.59.4.1. Required parameters¶
There are required parameters, table and string.
7.3.59.4.1.1. table¶
Specifies the lexicon table. table_tokenize command uses the
tokenizer, the normalizer, the token filters that is set the
lexicon table.
7.3.59.5. Return value¶
table_tokenize command returns tokenized tokens.
See Return value option in tokenize about details.