Note this might now work with all version of AWK. I’ve experienced some troubled passing parameter inside the AWK code in some version of solaris implementation. Code written bellow however definitely works on OSX.
Following snippet print first line of file ( column names in CSV ) and then given percentage of it’s randomly picked lines
1 |
cat yourFile.csv| awk -v seed=$RANDOM 'BEGIN{srand(seed)}{if (NR==1) {print $0} else if ( rand()<0.01 ) {print $0;}}' |
This will randomly pick 1% of files in the file and prints them. To change the percentage to 50%, just change the 0.01 to 0.5 in the condition inside the AWK command.
Also, if you dont want to make any special exceptions with first line of file, you can modify the snippet accordingly
1 |
cat yourFile.csv| awk -v seed=$RANDOM 'BEGIN{srand(seed)}{if ( rand()<0.01 ) {print $0;}}' |