Manipulating Data in SAS Studio Flows Part 2: Sorting Data
- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Sorting data is perhaps the most common data manipulation task. Your data may need to be ordered from newest to oldest observation or largest to smallest measurement. You might need to list all of a store’s departments in alphabetical order, then list each department’s products by lowest to highest MSRP. You may simply be preparing your data for another procedure that requires your observations to be sorted.
Regardless of your needs, you can fulfill them quickly and easily with SAS Studio Flows. Each method I’ll show today is available with a SAS Studio Basic license, which is a testament to how common this task is.
I’ll demonstrate the sorting capabilities today with the SASHELP.Springs dataset, which stores information about hot spring locations in the United States.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
I’ve used the scatter map step to briefly visualize the locations represented in this data.
Every location is in the United States, which is north of the Equator and west of the Prime Meridian.
Sort step
The Sort step enables you to sort your observations by any selected variables. You have the freedom to sort by all variables or only one, in addition to selecting the sort order (ascending or descending) for each variable. This step generates PROC SORT code to store sorted results in an output table.
The step options are straightforward. As you select columns, you can select your sort order. Additional options are available for handling behavior like removing duplicates. For more information, visit the step documentation.
In this example, I’d like to sort by descending temperature so that my observations are ordered from hottest to coolest spring. I’ll use Fahrenheit, but Celsius is also available in this data.
The results show the hottest spring at the top (Makushin Volcano Fumaroles in Unalaska).
However, some locations share the same temperature. I’d like to go back and do some additional sorting to see the southernmost location first.
The southernmost location will have the shortest latitude, and latitude will be a positive number since our locations are north of the Equator. Therefore, I can add ascending latitude as my second sort column.
Notice that now, in addition to sorting by temperature, locations with the same temperature are sorted by latitude. As you may have suspected, these locations are very close in latitude. I’ve used the scatter map step once again to show these results.
Query step
The Query step enables you to build a custom SQL query. With this step, you can utilize every optional SQL query clause (WHERE, GROUP BY, HAVING, ORDER BY) and common SQL tasks like joining tables or calculating new columns. Options for different clauses and features are hosted in separate sub-tabs. Running this step will generate PROC SQL code to store results in an output table. For more information, view the step documentation.
Because this step builds a query, I’ll need to choose my output columns in the Select tab first before I can consider sorting. I’ll add all columns for the purpose of this demo. On the Sort tab, I can select my sorting columns. The interface is similar to the Sort step. This time, I’ll sort by descending temperature in Celsius.
I can set additional query options if needed, but that isn’t necessary for my case.
My results are very similar since I am simply using a different measurement of the same temperature. However, there are some edge cases where two different temperatures in Fahrenheit are equivalent to the same temperature in Celsius. In this case, 208 degrees and 209 degrees Fahrenheit both equal 98 degrees Celsius.
I’d like to add another sort variable again. This time, I want to see the easternmost location first for locations with the same temperature. Because our locations are all west of the Prime Meridian, longitude is expressed as a negative value - meaning we’ll need to sort by descending longitude.
This time, longitude varies a little more for locations with Celsius 98.
Considerations & Summary
Unlike other data manipulation tasks, you do not need to consider your SAS Studio license to select your sort methodology. Both the Sort and Query steps are available with every SAS Studio license!
Whichever step you use will depend on your needs. If you simply need to sort your data, the Sort step is perfect. However, if you need to complete a few different tasks like calculating columns, filtering rows, and sorting, the Query step might work better since it performs all of these operations (and more).
In this post, I’ve discussed two short-and-sweet methods for sorting your data in SAS Studio Flows. If you’re interested in other data manipulation tasks, check out my previous post on appending data and stay tuned for upcoming posts! How often do you need to sort your data? Are there other data manipulation tasks you’d like to hear about? Let me know!
Find more articles from SAS Global Enablement and Learning here.