Large data sets can be overwhelming to work with. Often, you only need a small portion of the data, but getting the right subset can be a challenge. The RDS Explorer can help with that. Built on top of the RDS API, it allows for easy browsing or in-depth filtering with the capabilities of a query language.
In these examples, we will be using a data set describing COVID-19 case information from the Alaska Department of Health and Social Services.
Building a variable subset
Browse and Select
Toggling over to Dictionary view (located under the blue data product bar) allows for viewing all variable metadata at once, which is ideal for browsing what’s available before diving into analysis. In the example below, we are interested in just three variables: the report date, county, and hospitalization status. By checking off the variables of interest and clicking “Selected” in the variables section, the Explorer creates a condensed view of only the variable metadata we need. Switching back to the data view preserves these preferences, creating a more manageable data set. You could also choose to exclude certain variables by choosing “Unselected” variables instead. To go back to viewing all variables, you can choose “All” in the variables section, or “Reset View” to clear all selections.
When dealing with large data sets, it may be easier to search for the variables you need instead of scrolling through the list. From Dictionary view, search results will show up as variable checkboxes available to be selected. After selecting one or more from the list, you can either choose to show the “Selected” or “Unselected” variables from the variables menu as we did in the previous example, or click “Show in Data” next to the search bar to be taken straight to the data view of your selected variables.
One helpful feature of RDS is the ability to filter data based on data values. The GIF below shows that the age_group variable has responses coded into ten-year increments. Clicking over to the “stats” pane of the variable view shows a visualization of the frequency data associated with each category, along with how many are missing values.
Filter on Codes
Suppose we were interested in finding cases where the affected individual is 50 years old or older. By checking off the selected categories and clicking “Apply Filter”, the data view is updated to show only records that fit the 50+ age criteria. When we toggle back to the visualization tab and click the “Refresh Comparison” button to apply our filter, an inner ring is added to the frequency chart, showing data frequencies for just our population of interest.
Once a filter is applied the “Data Filter” section in the left pane will be populated, allowing you to toggle the data filter on or off, or invert the data filter, showing only values that do not meet the filter criteria.
Date and Numeric Filtering
Easily narrow down your data set by adding a date or numeric filter. Once your date or numeric variable is selected, navigate to the filter tab and specify the value criteria. Add one filter to create an open-ended range as shown below, or add multiple filters to create a closed range. Make sure you press the “+” circle icon to add the filter before applying. You can also specify whether or not to include null and empty values from the null filter dropdown.
Go Forth and Filter
We hope you now have the know-how to build the data set that fits your needs. If there’s something you’re stuck on, we want to help. Submit a ticket and we’ll get you back on track.