SAS Enterprise Guide

Desktop productivity for business analysts and programmers
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
alepage
Barite | Level 11

Hello,

 

Here's the last line of my csv file read with notepad++

 

960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49cflf
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|crlf

crlf

 

The last crlf cause to get an empty row that I would like to eliminate during the input statement.  How to do that?

 

 

data dest1.%scan(&csvFile,1,'.');		

   		attrib 
          			POLICY_NUMBER				label="Policy Number" 		length=$25. format=$char25.	informat=$char25.
		  		POLICY_NUMBER_SRC 	 			label="Policy Number Source" 		length=$25. format=$char25. 	informat=$char25.
		  		LINE_OF_BUSINESS				label="Line of Business" 		length=$3.  format=$char3.  	informat=$char3.
		  		PROVINCE       				label="Province" 			length=$10. format=$char10. 	informat=$char10.
		  		LEGACY_UNDERWRITING_COMPANY 			label="Legacy Underwriting Company" 	length=$10. format=$char10. 	informat=$char10.
		  		UNDERWRITING_COMPANY       			label="Underwriting Company" 		length=$10. format=$char10. 	informat=$char10.
		  		DISTRIBUTOR_NUMBER				label="Distributor Number" 		length=$10. format=$char10. 	informat=$char10.
		  		DISTRIBUTOR_NUMBER_SRC	    		label="Distributor Number Source" 	length=$10. format=$char10. 	informat=$char10.
		  		SOURCE_SYSTEM 	 			label="Source System"		length=$15. format=$char15. 	informat=$char15.
		  		DATASOURCE_BACKEND_REFERENCE			label="Data Source Backend Reference"	length=$10. format=$char10. 	informat=$char10.
		  		LAST_TERM_EFFECTIVE_DATE    			label="Last Term Effective Date" 	length=8 	format=YYMMDD10. 	informat=YYMMDD10.
		  		LAST_TERM_EXPIRY_DATE 			label="Last Term Expiry Date" 	length=8	format=YYMMDD10. 	informat=YYMMDD10.
		  		CONTRACT_UUID				label="Contract UUID" 	          length=$36. format=$char36. 	informat=$char36.
   			;


		
infile "&csvinputpath/&csvFile." LRECL=32767 TERMSTR=CRLF DLM='|' MISSOVER FIRSTOBS=2 DSD;
		INPUT
          		POLICY_NUMBER					: $char25.
		  		POLICY_NUMBER_SRC 				: $char25.
		  		LINE_OF_BUSINESS				: $char3.
		  		PROVINCE       				: $char10.
		  		LEGACY_UNDERWRITING_COMPANY 			: $char10.
		  		UNDERWRITING_COMPANY       			: $char10.
		  		DISTRIBUTOR_NUMBER				: $char10.
		  		DISTRIBUTOR_NUMBER_SRC			: $char10.
		  		SOURCE_SYSTEM 	 			: $char15.
		  		DATASOURCE_BACKEND_REFERENCE			: $char10.
		  		LAST_TERM_EFFECTIVE_DATE    			: YYMMDD10.
		  		LAST_TERM_EXPIRY_DATE 			: YYMMDD10.
		  		CONTRACT_UUID				: $char36.
		;
                    run;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Add a DELETE statement to delete the empty line.  If you only want to delete the empty line when it is the LAST line then add the END= option to your INFILE statement so you know which line is the last one.

 

NOTE:

  • There is no need to include a period in a LENGTH, the values are always integers. 
  • And no need to attach formats or informats to normal character variables.
  • Best to avoid the MISSOVER option.  Shouldn't matter with this code since you are using LIST MODE input.  But if you slipped into FORMATTED MODE input the MISSOVER option will throw away values at the end of the line that are too short for the informat being used.  So unless you really want that to happen always use the TRUNCOVER option instead.
  • Avoid including TAB characters in your program code. It makes the code jump around when you try to edit it.

 

Example:

options parmcards=csv;
filename csv temp termstr=crlf;
parmcards;
header row goes here
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|

;

data want;
  attrib
    POLICY_NUMBER label="Policy Number" length=$25
    POLICY_NUMBER_SRC label="Policy Number Source" length=$25
    LINE_OF_BUSINESS label="Line of Business" length=$3
    PROVINCE label="Province" length=$10
    LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10
    UNDERWRITING_COMPANY label="Underwriting Company" length=$10
    DISTRIBUTOR_NUMBER label="Distributor Number" length=$10
    DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10
    SOURCE_SYSTEM label="Source System" length=$15
    DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10
    LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
    LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
    CONTRACT_UUID label="Contract UUID" length=$36
  ;
  infile csv DSD DLM='|' TERMSTR=CRLF TRUNCOVER FIRSTOBS=2 end=eof ;
  INPUT POLICY_NUMBER -- CONTRACT_UUID ;
  if eof and _infile_=' ' then delete;
run;

proc print;
run;

Log:

NOTE: 3 records were read from the infile CSV.
      The minimum record length was 0.
      The maximum record length was 98.
NOTE: The data set WORK.WANT has 2 observations and 13 variables.

Results

                                                           The SAS System                           19:28 Thursday, June 5, 2025   1

                                                               D
                                        L                      A
                                        E                      T
                                        G                      A
                                        A                      S      L
                                        C                      O      A
                                        Y             D        U      S
                                        _             I        R      T           L
                                        U   U         S        C      _           A
                                        N   N         T        E      T           S
                                        D   D    D    R        _      E           T
                       P                E   E    I    I        B      R           _
                       O        L       R   R    S    B        A      M           T
                       L        I       W   W    T    U        C      _           E
                       I        N       R   R    R    T        K      E           R
            P          C        E       I   I    I    O   S    E      F           M                        C
            O          Y        _       T   T    B    R   O    N      F           _                        O
            L          _        O       I   I    U    _   U    D      E           E                        N
            I          N        F       N   N    T    N   R    _      C           X                        T
            C          U        _       G   G    O    U   C    R      T           P                        R
            Y          M        B   P   _   _    R    M   E    E      I           I                        A
            _          B        U   R   C   C    _    B   _    F      V           R                        C
            N          E        S   O   O   O    N    E   S    E      E           Y                        T
            U          R        I   V   M   M    U    R   Y    R      _           _                        _
            M          _        N   I   P   P    M    _   S    E      D           D                        U
      O     B          S        E   N   A   A    B    S   T    N      A           A                        U
      b     E          R        S   C   N   N    E    R   E    C      T           T                        I
      s     R          C        S   E   Y   Y    R    C   M    E      E           E                        D

      1  960-6217  AAA-AAA     RE   QC  B  010  BEL      CLAS     2008-05-01  2010-05-01  011047fd-c0d6-485f-9712-b5e8068b7b49
      2  A12-3358  AAAA-BBBBB  AUP  QC  A  001  3143     CLAS     2005-03-01  2006-03-01


View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

Add a DELETE statement to delete the empty line.  If you only want to delete the empty line when it is the LAST line then add the END= option to your INFILE statement so you know which line is the last one.

 

NOTE:

  • There is no need to include a period in a LENGTH, the values are always integers. 
  • And no need to attach formats or informats to normal character variables.
  • Best to avoid the MISSOVER option.  Shouldn't matter with this code since you are using LIST MODE input.  But if you slipped into FORMATTED MODE input the MISSOVER option will throw away values at the end of the line that are too short for the informat being used.  So unless you really want that to happen always use the TRUNCOVER option instead.
  • Avoid including TAB characters in your program code. It makes the code jump around when you try to edit it.

 

Example:

options parmcards=csv;
filename csv temp termstr=crlf;
parmcards;
header row goes here
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|

;

data want;
  attrib
    POLICY_NUMBER label="Policy Number" length=$25
    POLICY_NUMBER_SRC label="Policy Number Source" length=$25
    LINE_OF_BUSINESS label="Line of Business" length=$3
    PROVINCE label="Province" length=$10
    LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10
    UNDERWRITING_COMPANY label="Underwriting Company" length=$10
    DISTRIBUTOR_NUMBER label="Distributor Number" length=$10
    DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10
    SOURCE_SYSTEM label="Source System" length=$15
    DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10
    LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
    LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
    CONTRACT_UUID label="Contract UUID" length=$36
  ;
  infile csv DSD DLM='|' TERMSTR=CRLF TRUNCOVER FIRSTOBS=2 end=eof ;
  INPUT POLICY_NUMBER -- CONTRACT_UUID ;
  if eof and _infile_=' ' then delete;
run;

proc print;
run;

Log:

NOTE: 3 records were read from the infile CSV.
      The minimum record length was 0.
      The maximum record length was 98.
NOTE: The data set WORK.WANT has 2 observations and 13 variables.

Results

                                                           The SAS System                           19:28 Thursday, June 5, 2025   1

                                                               D
                                        L                      A
                                        E                      T
                                        G                      A
                                        A                      S      L
                                        C                      O      A
                                        Y             D        U      S
                                        _             I        R      T           L
                                        U   U         S        C      _           A
                                        N   N         T        E      T           S
                                        D   D    D    R        _      E           T
                       P                E   E    I    I        B      R           _
                       O        L       R   R    S    B        A      M           T
                       L        I       W   W    T    U        C      _           E
                       I        N       R   R    R    T        K      E           R
            P          C        E       I   I    I    O   S    E      F           M                        C
            O          Y        _       T   T    B    R   O    N      F           _                        O
            L          _        O       I   I    U    _   U    D      E           E                        N
            I          N        F       N   N    T    N   R    _      C           X                        T
            C          U        _       G   G    O    U   C    R      T           P                        R
            Y          M        B   P   _   _    R    M   E    E      I           I                        A
            _          B        U   R   C   C    _    B   _    F      V           R                        C
            N          E        S   O   O   O    N    E   S    E      E           Y                        T
            U          R        I   V   M   M    U    R   Y    R      _           _                        _
            M          _        N   I   P   P    M    _   S    E      D           D                        U
      O     B          S        E   N   A   A    B    S   T    N      A           A                        U
      b     E          R        S   C   N   N    E    R   E    C      T           T                        I
      s     R          C        S   E   Y   Y    R    C   M    E      E           E                        D

      1  960-6217  AAA-AAA     RE   QC  B  010  BEL      CLAS     2008-05-01  2010-05-01  011047fd-c0d6-485f-9712-b5e8068b7b49
      2  A12-3358  AAAA-BBBBB  AUP  QC  A  001  3143     CLAS     2005-03-01  2006-03-01


alepage
Barite | Level 11
I have added end=eof and the if eof and _infile_=' ' then delete; at the end of the input statment as you did and it works fine
Ksharp
Super User

I would like to use Perl Regular Express to find the start of each row . and use SCAN() to get all these variables.

options parmcards=x;
filename x temp;
parmcards;
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|


960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
;

data temp;
infile x length=len;
input x $varying5000. len;
if prxmatch('/^\d{3}-\d{4}/',x) then group+1;
run;
data want;
do until(last.group);
 set temp;
 by group;
 length line $ 20000;
 line=cats(line,x);
end;

length col1-col4 $ 80;
col1=scan(line,1,'|');
col2=scan(line,2,'|');
col3=scan(line,3,'|');
col4=scan(line,4,'|');

drop x;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 674 views
  • 0 likes
  • 3 in conversation