
The SAS parquet libname engine is unable to determine the length of string variables and by default assigns a length of 32767 bytes. For datasets of even a medium size (~100k rows) this results in unusably slow reading times. The recommended solution is to use CHAR_COLUMN_LIMIT to change the default size allocation. This however is a substandard solution of a couple of reasons:
1. Not all columns need the same length, a flag column ("Y" / "N") doesn't need 400 bytes of space to be allocated just because another column does need that much space
2. Users are required to specify this in advance, potentially even before they know what is in the dataset, if no spec was provided with the data the user may not know what an appropriate length limit is
3. If the user selects a length too short SAS just silently truncates the data with no warning / error
The feature request here is that SAS should determine for itself what the length of each column individually is and assign the appropriate amount of space with no prior knowledge or input required from the user.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.