RDS Manager: Creating a Data Product
Data products are composed of a data source and its associated metadata, which includes the variables, variable groups, and classifications that are used in the data. To create a new data product a manager will start by connecting to a data source, which will provide RDS with basic information about the variables that are in the data source. After the connection is defined and established, more in depth metadata can be added around the data product as a whole or its specific variables and classifications.
Specifying the Data Source
Data lives in various data formats. It could be in an open ASCII format like CSV, a proprietary file like SAS, or something more dynamic like a SQL table. When managers navigate to the Data Product creation page they will be presented with a variety of data sources to choose from. Here the sources will be differentied as database or file based sources.
The important thing to understand is that RDS needs to have access to a data source that is SQL-like, meaning that it can be queried, filtered, subset, on the fly. There are two ways to accomplish this, clients can manage their own data bases or they can import files into RDS which will create the database tables for them.
Client Managed Database
The client organization can manage their database tables themselves after which they would link to them in RDS. This provides several advantages. Specifically, this allows the data base administrator to index and optimize the tables as they see fit. They could use tables to store the original data and build views on top of them that expose commonly used computed variables, exposing the views through RDS. This approach is the recommended approach due to the fact that it gives the client organization more control over the data.
To connect to a database clients manage, the manager can select the appropriate data source. Each will have its own configuration fields, below is an example of a SQL data source.
SQL Properties
JDBC Connection String - RDS connects to SQL databases through the use of the Java Database Connectivity (JDBC) API. The connection string specifies the host, port, and schema / database to use. Each database engine has a slightly different connection string. If you do not know the JDBC connection string you can contact your DBA an dthey should be able to provide one for you.
Table Name -The name of the databse table to use in this data product.
User - The database user name to use. This should typically be an account that is set up for read only access to the database.
Password - The database user password.
Data Product Properties
Below the source configuration is an area where the Data Product properties can be configured. This includes the Catalog that the Data Product should be added to, the ID of the Data Product, and its name.
Once all these field are filled out, the “CREATE” button can be used to start a process that will create the Data Product.
Generated Database
If clients do not feel comfortable managing databases themselves, or would like to use RDS as a way to import everything into database tables as a starting point, data files can be provided to RDS which will use the file generate a database table and insert the data from the file into it. After selecting the format of the file they have, managers will be presented with a choice to generate the scripts to manually create the database themselves or to have RDS creat the table and import the data for them.
Note that currently the script generation is currently not supporte but is on the roadmap for introduction in the future.
After choosing to import the file into the database the manager will be presented with a page where they can provide the file and configure the database the database connection to import the file into.
File Upload
Managers can use the file upload field in the top right of the page to select the data file to upload, it must be the correct type of file based on what format they specified.
Target Database
Managers have the option to specify a database to import the data file into. If they have a database they can point to the database using the JDBC Connection String, User, and Password fields (the table name is optional, if none is provided one will be generated). If managers do not know or have a database to import the file into, they can select the “Embedded Database” option in the drop down.
Data Product Properties
Below the source configuration is an area where the Data Product properties can be configured. This includes the Catalog that the Data Product should be added to, the ID of the Data Product, and its name.
Once all these field are filled out, the “CREATE” button can be used to start a process that will create the Data Product.