RumbleDB parameters
The parameters that can be used on the command line as well as on the planned HTTP server are shown below.
RumbleDB runs in three modes. You can select the mode passing a verb as the first parameter. For example:
spark-submit rumbledb.org run file.jq -o output-dir -P 1
spark-submit rumbledb.org run -q '1+1'
spark-submit rumbledb.org serve -p 8001
spark-submit rumbledb.org repl -c 10
Previous parameters (--shell, --query-path, --server) work in a backward compatible fashion, however we do recommend to start using the new verb-based format.
Shell parameter | Shortcut | HTTP parameter | example values | Semantics |
---|---|---|---|---|
--shell | repl | N/A | yes, no | yes runs the interactive shell. No executes a query specified with --query-path |
--shell-filter | N/A | N/A | jq . | Post-processes the output of JSONiq queries on the shell with the specified command (reading the RumbleDB output via stdin) |
--query | -q | query | 1+1 | A JSONiq query directly provided as a string. |
--query-path | (any text without -- or - is recognized as a query path) | query-path | file:///folder/file.jq | A JSONiq query file to read from (from any file system, even the Web!). |
--output-path | -o | output-path | file:///folder/output | Where to output to (if the output is large, it will create a sharded directory, otherwise it will create a file) |
--output-format | -f | N/A | json, csv, avro, parquet, or any other format supported by Spark | An output format to use for the output. Formats other than json can only be output if the query outputs a highly structured sequence of objects (you can nest your query in an annotate() call to specify a schema if it does not). |
--output-format-option:foo | N/A | N/A | bar | Options to further specify the output format (example: separator character for CSV, compression format...) |
--overwrite | -O (meaning --overwrite yes) | overwrite | yes, no | Whether to overwrite to --output-path. No throws an error if the output file/folder exists. |
--materialization-cap | -c | materialization-cap | 200 | A cap on the maximum number of items to materialize for large sequences within a query or for outputting on screen (used to be called --result-size). |
--number-of-output-partitions | -P | N/A | ad hoc | How many partitions to create in the output, i.e., the number of files that will be created in the output path directory. |
--log-path | N/A | log-path | file:///folder/log.txt | Where to output log information |
--print-iterator-tree | N/A | N/A | yes, no | For debugging purposes, prints out the expression tree and runtime interator tree. |
--show-error-info | -v (meaning --show-error-info yes) | show-error-info | yes, no | For debugging purposes. If you want to report a bug, you can use this to get the full exception stack. If no, then only a short message is shown in case of error. |
--static-typing | -t (meaning --static-typing yes) | static-typing | yes, no | Activates static type analysis, which annotates the expression tree with inferred types at compile time and enables more optimizations (experimental). Deactivated by default. |
--server | serve | N/A | yes, no | yes runs RumbleDB as a server on port 8001. Run queries with http://localhost:8001/jsoniq?query-path=/folder/foo.json |
--port | -p | N/A | 8001 (default) | Changes the port of the RumbleDB HTTP server to any of your liking |
--host | -h | N/A | localhost (default) | Changes the host of the RumbleDB HTTP server to any of your liking |
--variable:foo | N/A | variable:foo | bar | --variable:foo bar initialize the global variable $foo to "bar". The query must contain the corresponding global variable declaration, e.g., "declare variable $foo external;" |
--context-item | -I | context-item | bar | initializes the global context item $$ to "bar". The query must contain the corresponding global variable declaration, e.g., "declare context item external;" |
--context-item-input | -i | context-item-input | - | reads the context item value from the standard input |
--context-item-input-format | N/A | context-item-input-format | text or json | sets the input format to use |
for parsing the standard input (as text or as a serialized json value) | ||||
--dates-with-timezone | N/A | dates-with-timezone | yes or no | activates timezone support for the type xs:date (deactivated by default) |