- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Here's the last line of my csv file read with notepad++
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49cflf
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|crlf
crlf
The last crlf cause to get an empty row that I would like to eliminate during the input statement. How to do that?
data dest1.%scan(&csvFile,1,'.');
attrib
POLICY_NUMBER label="Policy Number" length=$25. format=$char25. informat=$char25.
POLICY_NUMBER_SRC label="Policy Number Source" length=$25. format=$char25. informat=$char25.
LINE_OF_BUSINESS label="Line of Business" length=$3. format=$char3. informat=$char3.
PROVINCE label="Province" length=$10. format=$char10. informat=$char10.
LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10. format=$char10. informat=$char10.
UNDERWRITING_COMPANY label="Underwriting Company" length=$10. format=$char10. informat=$char10.
DISTRIBUTOR_NUMBER label="Distributor Number" length=$10. format=$char10. informat=$char10.
DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10. format=$char10. informat=$char10.
SOURCE_SYSTEM label="Source System" length=$15. format=$char15. informat=$char15.
DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10. format=$char10. informat=$char10.
LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
CONTRACT_UUID label="Contract UUID" length=$36. format=$char36. informat=$char36.
;
infile "&csvinputpath/&csvFile." LRECL=32767 TERMSTR=CRLF DLM='|' MISSOVER FIRSTOBS=2 DSD;
INPUT
POLICY_NUMBER : $char25.
POLICY_NUMBER_SRC : $char25.
LINE_OF_BUSINESS : $char3.
PROVINCE : $char10.
LEGACY_UNDERWRITING_COMPANY : $char10.
UNDERWRITING_COMPANY : $char10.
DISTRIBUTOR_NUMBER : $char10.
DISTRIBUTOR_NUMBER_SRC : $char10.
SOURCE_SYSTEM : $char15.
DATASOURCE_BACKEND_REFERENCE : $char10.
LAST_TERM_EFFECTIVE_DATE : YYMMDD10.
LAST_TERM_EXPIRY_DATE : YYMMDD10.
CONTRACT_UUID : $char36.
;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Add a DELETE statement to delete the empty line. If you only want to delete the empty line when it is the LAST line then add the END= option to your INFILE statement so you know which line is the last one.
NOTE:
- There is no need to include a period in a LENGTH, the values are always integers.
- And no need to attach formats or informats to normal character variables.
- Best to avoid the MISSOVER option. Shouldn't matter with this code since you are using LIST MODE input. But if you slipped into FORMATTED MODE input the MISSOVER option will throw away values at the end of the line that are too short for the informat being used. So unless you really want that to happen always use the TRUNCOVER option instead.
- Avoid including TAB characters in your program code. It makes the code jump around when you try to edit it.
Example:
options parmcards=csv;
filename csv temp termstr=crlf;
parmcards;
header row goes here
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
;
data want;
attrib
POLICY_NUMBER label="Policy Number" length=$25
POLICY_NUMBER_SRC label="Policy Number Source" length=$25
LINE_OF_BUSINESS label="Line of Business" length=$3
PROVINCE label="Province" length=$10
LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10
UNDERWRITING_COMPANY label="Underwriting Company" length=$10
DISTRIBUTOR_NUMBER label="Distributor Number" length=$10
DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10
SOURCE_SYSTEM label="Source System" length=$15
DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10
LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
CONTRACT_UUID label="Contract UUID" length=$36
;
infile csv DSD DLM='|' TERMSTR=CRLF TRUNCOVER FIRSTOBS=2 end=eof ;
INPUT POLICY_NUMBER -- CONTRACT_UUID ;
if eof and _infile_=' ' then delete;
run;
proc print;
run;
Log:
NOTE: 3 records were read from the infile CSV. The minimum record length was 0. The maximum record length was 98. NOTE: The data set WORK.WANT has 2 observations and 13 variables.
Results
The SAS System 19:28 Thursday, June 5, 2025 1 D L A E T G A A S L C O A Y D U S _ I R T L U U S C _ A N N T E T S D D D R _ E T P E E I I B R _ O L R R S B A M T L I W W T U C _ E I N R R R T K E R P C E I I I O S E F M C O Y _ T T B R O N F _ O L _ O I I U _ U D E E N I N F N N T N R _ C X T C U _ G G O U C R T P R Y M B P _ _ R M E E I I A _ B U R C C _ B _ F V R C N E S O O O N E S E E Y T U R I V M M U R Y R _ _ _ M _ N I P P M _ S E D D U O B S E N A A B S T N A A U b E R S C N N E R E C T T I s R C S E Y Y R C M E E E D 1 960-6217 AAA-AAA RE QC B 010 BEL CLAS 2008-05-01 2010-05-01 011047fd-c0d6-485f-9712-b5e8068b7b49 2 A12-3358 AAAA-BBBBB AUP QC A 001 3143 CLAS 2005-03-01 2006-03-01
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Add a DELETE statement to delete the empty line. If you only want to delete the empty line when it is the LAST line then add the END= option to your INFILE statement so you know which line is the last one.
NOTE:
- There is no need to include a period in a LENGTH, the values are always integers.
- And no need to attach formats or informats to normal character variables.
- Best to avoid the MISSOVER option. Shouldn't matter with this code since you are using LIST MODE input. But if you slipped into FORMATTED MODE input the MISSOVER option will throw away values at the end of the line that are too short for the informat being used. So unless you really want that to happen always use the TRUNCOVER option instead.
- Avoid including TAB characters in your program code. It makes the code jump around when you try to edit it.
Example:
options parmcards=csv;
filename csv temp termstr=crlf;
parmcards;
header row goes here
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
;
data want;
attrib
POLICY_NUMBER label="Policy Number" length=$25
POLICY_NUMBER_SRC label="Policy Number Source" length=$25
LINE_OF_BUSINESS label="Line of Business" length=$3
PROVINCE label="Province" length=$10
LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10
UNDERWRITING_COMPANY label="Underwriting Company" length=$10
DISTRIBUTOR_NUMBER label="Distributor Number" length=$10
DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10
SOURCE_SYSTEM label="Source System" length=$15
DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10
LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
CONTRACT_UUID label="Contract UUID" length=$36
;
infile csv DSD DLM='|' TERMSTR=CRLF TRUNCOVER FIRSTOBS=2 end=eof ;
INPUT POLICY_NUMBER -- CONTRACT_UUID ;
if eof and _infile_=' ' then delete;
run;
proc print;
run;
Log:
NOTE: 3 records were read from the infile CSV. The minimum record length was 0. The maximum record length was 98. NOTE: The data set WORK.WANT has 2 observations and 13 variables.
Results
The SAS System 19:28 Thursday, June 5, 2025 1 D L A E T G A A S L C O A Y D U S _ I R T L U U S C _ A N N T E T S D D D R _ E T P E E I I B R _ O L R R S B A M T L I W W T U C _ E I N R R R T K E R P C E I I I O S E F M C O Y _ T T B R O N F _ O L _ O I I U _ U D E E N I N F N N T N R _ C X T C U _ G G O U C R T P R Y M B P _ _ R M E E I I A _ B U R C C _ B _ F V R C N E S O O O N E S E E Y T U R I V M M U R Y R _ _ _ M _ N I P P M _ S E D D U O B S E N A A B S T N A A U b E R S C N N E R E C T T I s R C S E Y Y R C M E E E D 1 960-6217 AAA-AAA RE QC B 010 BEL CLAS 2008-05-01 2010-05-01 011047fd-c0d6-485f-9712-b5e8068b7b49 2 A12-3358 AAAA-BBBBB AUP QC A 001 3143 CLAS 2005-03-01 2006-03-01
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would like to use Perl Regular Express to find the start of each row . and use SCAN() to get all these variables.
options parmcards=x;
filename x temp;
parmcards;
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
;
data temp;
infile x length=len;
input x $varying5000. len;
if prxmatch('/^\d{3}-\d{4}/',x) then group+1;
run;
data want;
do until(last.group);
set temp;
by group;
length line $ 20000;
line=cats(line,x);
end;
length col1-col4 $ 80;
col1=scan(line,1,'|');
col2=scan(line,2,'|');
col3=scan(line,3,'|');
col4=scan(line,4,'|');
drop x;
run;