This feature is deprecated and replaced by Data Imports
Data transports allows you to transport your data from your production databases to your analytics/reporting database, without having to write much code. It copies database tables of your choice from a source database to a destination database.
Currently we support the following transport sources:
- Amazon Redshift
- Google BigQuery
- Microsoft SQL Server
And the following destination databases:
- Amazon Redshift
- Google BigQuery
- SQL Server
This feature requires Holistics CLI, make sure you have set up it.
Then log in to Holistics with the below
Log in with the below command. You can get the token in Holistics (go to Settings, under API Key section)
Run following command to list all data sources, remember its ID so that you can specify it in the configuration file
For help on the syntax, type:
For example, to copy table public.users from Postgres to Redshift, simply run:
The above command automatically deduces the columns and types of the destination Redshift table based on the original Postgres table.
It, however, doesn't allow you to do any special adjustments, or specifying Redshift's sort keys, distribution keys or compression types.
For such things, please refer to the advanced example below.
The following example transports data from MySQL to Amazon Redshift, extracting page views data from table
events into destination table
events' schema structure:
The above example will translate into the following SQL query to be executed against the original database (MySQL):
Once the result is returned, it'll be loaded into the destination database table with structure you've specified above.
By default, the whole table is copied and replaced into the destination database. This simple approach works great with small data size, but as your tables grow, it makes more sense to only transport the difference (the recently updated) data.
For incremental transport to work, an incremental column has to be specified and present on both source and destination tables. We'll base on the max value of this column to decide the needed records to move over.
Specify 2 configs in the JSON template:
And run off the transport command normally. On first run, if the destination table is not there, it'll perform a full transport. On subsequent run, incremental transport will kick in.
You can also overwrite the transport mode using the command-line:
Here is the screenshot of our upsert feature:
Incremental column: only loads data from source that have the corresponding values of this column bigger than all records in the destination source. Sample use case: If you set the updated_at as the incremental column, then only data from source updated later than all records in the destination will be loaded.
Primary key column: Help us delete any out-dated records fast and accurately. Note that this column is not required,. Rendering invalid column (not containing primary keys) may result in data crash.
Note: Right now this feature only supports for Postgres destination.
After moving the table to destination, if you want to add indexes to the table, you can specify the indexes in the
indexes section of the JSON:
The above JSON will create 2 indexes, one on
date_d field, and one composite index on
Holistics Transport support loading data from MongoDB (NoSQL) into your SQL reporting database. Due to the NoSQL nature of MongoDB, here is a few things to take note.
- For nested object, you can use MongoDB dot notation to navigate to the wanted value.
- When specifying
where_clause, you need to specify a JSON object similar to
find()method of MongoDB.
See an example of a transport file based on the MongoDB sample restaurants dataset:
Note: if you're using incremental transport on MongoDB, and your increment column is of type
datetime, please add the following to your transport JSON:
(required) ID/slugs of the source and destination datasources.
(required) specifying fully-qualified names of the source table and destination tables
(required) An array of hashes that represent the columns schema of destination table
(optional) Additional where condition in case you want to perform more filters from original table.
(optional) Takes in either
incremental. Default to
(required when mode is
incremental) Specify column that is used for incremental transport. This column must be present in both source and destination tables.
Extra parameters to be included in the COPY command used to load data into destination table. Note that this option only works for destination data source Amazon Redshift.
(required) name of destination table's column
(optional) string value, columns's data type, default to
varchar(255) if not specified
(optional) string value, contains SQL expression to be executed to retrieve value for the current column, default to column_name (i.e. extracting the column's value if not specified
Distribution Key: Read more about Redshift' distribution key.
dist_style: string value, specifying distribution style. Values:
dist_key: string value, if
KEY, specifying distribution column
Sort Keys: Read more about Redshift's sort keys.
sort_style: string value, denoting sort style. Values:
sort_keys: string or array of string value, denoting the column(s) as sort keys
Compression Types: For each column, you can optionally specify the compression type of the column. Read more about Redshift's compression type.