A few, small AWK filters — use these programs at your own risk

Those filters (as well as all my gawk-code) are always written on one line of code and I always use them like that, but some people like indents. The indented versions of those filters are automatically converted with pgawk, which sometimes adds control bytes (e.g. 0x00) in gawk scripts, which doesn't always work that well. I recommend the original non-indented scripts because they do not contain such control characters and thus avoid these problems.

These filters output a minimal number of new lines. Use adjlines.awk below if you want several lines. The reason of my avoiding new lines when possible (e.g. when outputting HTML) is to skip the problem with different systems using many sorts of line break characters, for example : • U+000A line feed (LF), • U+000D carriage return (CR), • U+0085 next line (NEL), • U+2028 line separator and • U+2029 paragraph separator

addligat.awk (or if you like indents and dislike comments maybe addligat.awk.indents works)
purposeAdds HTML code for ligatures in UTF-8 encoded HTML or text files avoiding changing the content of HTML tags and entities. Also works with ASCII encoded HTML file where some extra conversions occur, for example to add circled digits and letters as follows: digits 0x24ea (0), 0x2460-0x2473 (1-20), 0x3251-0x325f (21-35), 0x32b1-0x32bf (36-50); reversed 0x24ff (0), 0x2776-0x277f (1-10), 0x24eb-0x24f4 (11-20); double 0x24f5-0x24fe (1-10); latin 0x24b6-0x24cf (A-Z), 0x24d0-0x24e9 (a-z). Read the code for details.
codeBEGIN{anf="^(([^<]*(<([^\042\047>]|\042[^\042]*\042|\047[^\047]*\047)*>))*[^<&]*((&[a-zA-Z#][a-zA-Z0-9]+[; ])[^<&]*)*)";antlig=split("st;st;ffl;ffi;fl;fi;ff",tkn,";");bn="\134\133&bull;&nbsp;";uta=bn "(([1234]?[[:digit:]])|(50))\134\135";utb=bn "\\.((1?[[:digit:]])|(20))\134\135"} {while((nmr=gensub(anf uta,"\\7|","1"))!=$0){nmr+=0;$0=gensub(anf uta,"\\1\\&#" nmr+((nmr>35)?12941:((nmr>20)?12860:((nmr<1)?9450:9311))) ";","1")};while((nmr=gensub(anf utb,"\\7|","1"))!=$0){nmr+=0;$0=gensub(anf utb,"\\1\\&#" nmr+((nmr>10)?9440:((nmr<1)?9471:10101))";","1")};$0=gensub("\134\133&bull;\134\135","\\&#9673;","g");for(i=97;i<123;i++){$0=gensub(anf bn sprintf("%c",i-32) "\134\135","\\1\\&#" i+9301 ";","g");$0=gensub(anf bn sprintf("%c",i) "\134\135","\\1\\&#" i+9327 ";","g")}; for(i=1;i<=antlig;i++){while($0~(tlf=anf tkn[i])){$0=gensub(tlf,"\\1\\&#" 64263-i ";","1")}};print}
run-examplecat filename.txt | cp2htmlz.awk | addligat.awk
adjlines.awk (or if you like indents and dislike comments maybe adjlines.awk.indents works)
purposeadjusts line lengths in a text file to the longest possible line length, but at most to 63 bytes. No word is cut down, thus words longer than 63 bytes are preserved and thus some output lines may be longer than 63 bytes. If paragraphs already exists in the text file, you may filter with mkparagr.awk first
codeBEGIN{favoradl=63} {gsub(/ +/," ");while(length($0)>favoradl){punkt=favoradl+1; if(substr($0,1,punkt)~" ") {while(substr($0,punkt,1)!=" "){punkt-=1}} else {punkt=index($0 " "," ")}; print(substr($0,1,punkt-1));$0=substr($0 " " " ",punkt+1); while(substr($0,length($0))==" "){$0=substr($0,1,length($0)-1)}};if($0!=""){if(RT==""){printf "%s",$0}else{print $0 "\n"}}}
run-examplecat filename.txt | mkparagr.awk | adjlines.awk > filename.htm
catnline.awk (or if you like indents and dislike comments maybe catnline.awk.indents works)
purposeFilters so that end of lines, apostrophs and backslashes are escaped. You may output this later by: echo 'TheOutputOfCatnline'
code{if(NR>1){printf "\\n"};gsub(/\\/,"\\\\");gsub(/\041/,"\\041");gsub(/\047/,"\\047");printf "%s", $0}
run-examplecatnline file.txt
checkmir.awk (or if you like indents and dislike comments maybe checkmir.awk.indents works)
purposechecks that parentheses and other mirror characters in an ISO_8859, CP_1252 or ASCII file occur in pairs. Prints lines where this is not the case, beginning with the line number between the mirror characters
codeBEGIN{peili="(;);[;];{;};-\240;\240-;<B>;</B>;<EM>;</EM>;<I>;</I>;<S>;</S>;<STRIKE>;</STRIKE>;<SUB>;</SUB>;<SUP>;</SUP>;<TT>;</TT>;<U>;</U>;\227\240;\240\227";if(length(sprintf("%c%c%c%c",0,16,216,128))==3){gsub(/\240/,"\302\240");gsub(/\227/,"\342\200\224")};antalspeglar=split(peili,tkn,";")/2;IGNORECASE=1}{for(peili=1;peili<=antalspeglar;peili++){if((split($0,slask,tkn[peili*2-1])!=split($0,slask,tkn[peili*2]))||(index($0,tkn[peili*2-1] tkn[peili*2]))||($0~/""/)){print tkn[peili*2-1] NR tkn[peili*2] " " $0}} if((split($0,slask,"\"")%2==0)&&(length($0)>0)){print "\"" NR "\"" " " $0}}
checknas.awk (or if you like indents and dislike comments maybe checknas.awk.indents works)
purposechecks for non-ascii characters in a file
code{for(tkn=128;tkn<256;tkn++){if($0~sprintf("%c",tkn)){print NR " " tkn}}}
chsguess.awk (or if you like indents and dislike comments maybe chsguess.awk.indents works)
purposeOutputs the name of possible charsets coding a given character as a given byte (group). Notice that different versions of "echo" may behave in different ways in different environments when trying to send unique bytes. For this reason, it may be better to use printf as in the following examples. This filter works in both UTF-8, ISO_8859-1 and CP-1252 environments
codeBEGIN{RS="";FS=ORS=" ";gc="iconv -l";gc | getline;close(gc);gsub(/[ ,\/\n]+/," ");split($0,ct);for(i in ct){cs[ct[i]]=""};ocs=substr("cp1252utf8",13-6*length("\303\244"),6)}{for(i in cs){ch="";gc="echo \047" gensub(/\047/,"\\\\" "047","g",$1) "\047 | iconv -c -f \"" i "\" -t " ocs " - 2> /dev/null";gc | getline ch;close(gc);if(ch==$2){print i}};printf "\n"}
run-exampleprintf "OneByteOrByteGroup OneCharacter" | chsguess.awk
cp2htmla.awk (or if you like indents and dislike comments maybe cp2htmla.awk.indents works)
purposeconvert the four ASCII characters &, <, > and " to their HTML encodings in a text file. See cp2htmlc.awk for UTF-8, ISO_8859-1 and CP-1252 to HTML conversions
code{gsub(/&/,"&amp;");gsub(/</,"\\&lt;");gsub(/>/,"\\&gt;");gsub(/\"/,"\\&quot;");print}
cp2htmlc.awk (or if you like indents and dislike comments maybe cp2htmlc.awk.indents works)
purposea text2xhtml / text2html filter converting text to (x)html. Uses 0xfffd for unknown and control characters. Doesn't change the four characters <>&" so the text file may already include some ASCII encoded HTML code — if you need to convert those four characters, please filter with 'cp2htmla.awk' first, then with 'cp2htmlc.awk' (see example). Assumes that paragraphs in input text file are separated by an empty line. If this is not the case, filter with 'xtraline.awk' first (see example). For special HTML-formatings, see 'cp2htmld.awk', 'cp2htmll.awk', 'cp2htmlp.awk', 'cp2htmls.awk', 'cp2htmlu.awk', 'cp2htmly.awk' and 'cp2htmlz.awk' (or use switches -D, -L, -P, -S, -U, -Y and -Z respectively). In a UTF-8 environment, assumes the input file is UTF-8 encoded text and outputs a UTF-8 encoded XHTML+RDFa 1.0 file; for the output file to be readable with old (or badly uppdated) browsers, give the value 1 to the age variable at the beginning of the filter. In an eight bits characters environment, assumes that the input text file is ISO_8859-1 or CP-1252 encoded and outputs an ASCII-encoded HTML 4.01 file.
codeBEGIN{age=1;argoval="C";if((ARGV[1]~/^-/)&&(length(ARGV[1])==2)){chs=toupper(substr(ARGV[1],2,1));delete ARGV[1];if("CDLPSUYZ"~chs){argoval=chs}};FS=":";RS="";if(((chs=length(sprintf("%c%c%c%c",0,16,216,128)))==4)&&(argoval=="P")){argoval="C"};split("euro:#xfffd:sbquo:fnof:bdquo:hellip:dagger:Dagger:circ:permil:Scaron:lsaquo:OElig:#xfffd:#x017d:#xfffd:#xfffd:lsquo:rsquo:ldquo:rdquo:bull:ndash:mdash:tilde:trade:scaron:rsaquo:oelig:#xfffd:#x017e:Yuml:nbsp:iexcl:cent:pound:curren:yen:brvbar:sect:uml:copy:ordf:laquo:not:shy:reg:macr:deg:plusmn:sup2:sup3:acute:micro:para:middot:cedil:sup1:ordm:raquo:frac14:frac12:frac34:iquest:Agrave:Aacute:Acirc:Atilde:Auml:Aring:AElig:Ccedil:Egrave:Eacute:Ecirc:Euml:Igrave:Iacute:Icirc:Iuml:ETH:Ntilde:Ograve:Oacute:Ocirc:Otilde:Ouml:times:Oslash:Ugrave:Uacute:Ucirc:Uuml:Yacute:THORN:szlig:agrave:aacute:acirc:atilde:auml:aring:aelig:ccedil:egrave:eacute:ecirc:euml:igrave:iacute:icirc:iuml:eth:ntilde:ograve:oacute:ocirc:otilde:ouml:divide:oslash:ugrave:uacute:ucirc:uuml:yacute:thorn:yuml",tkn);for(i=1;i<split("US-ASCII:UTF-8:&nbsp;:\302\240:&mdash;:\342\200\224:&shy;:\302\255:&#xfffd;:\357\277\275:&#x200b;:\342\200\213:</BODY></HTML>:</body></html>",mt);i+=2){nt[gensub(/[\\\&;]/,"","g",mt[i])]=mt[i+(chs % 2)]};delete mt;shyv="\\1\\" nt["shy"] "\\2";if("AZ"!~argoval){if((argoval=="C")||(argoval=="D")){stt="P.d {text-align: justify;}"};if(argoval=="D"){stt="H2.b {color: #f00;} " stt};if(argoval=="L"){stt="P.l {line-height: 87%; margin-bottom: 0.01cm; margin-left: 0.6cm; text-align: justify; text-indent: -0.6cm;} P.t {font-weight: bold;}"};if(argoval=="S"){stt="P.s {line-height: 87%; margin-bottom: 0.03cm; margin-left: 0.4cm; text-align: justify; text-indent: -0.4cm;}"};if(argoval=="U"){stt="line-height: 150%; margin-left: 1cm; page-break-inside: avoid; text-align: left; text-indent: -1cm;";stt="P.c {line-height: 120%; margin-bottom: 0.7cm; margin-left: 2cm; margin-right: 1cm; text-align: justify; } P.f {" stt "} P.p {font-size: smaller; "stt "} P.r {font-weight: bold; line-height: 150%; margin-top: 1cm; page-break-after: avoid; text-align: left; } P.u {line-height: 150%; margin-bottom: 0.5cm; text-align: justify;}"};if(stt!=""){stt="<STYLE TYPE=\"text/css\"><!-- " stt " --></STYLE>"};stt=stt "<LINK REL=\"stylesheet\" TYPE=\"text/css\" HREF=\"http://www.acc.umu.se/~saasha/gemensam.css\" /></HEAD><BODY>";if(chs==3){stt=tolower(stt)};if(argoval=="P"){stt="<link rel=\"stylesheet\" type=\"text/css\" media=\"screen, projection, print\" href=\"http://www.w3.org/Talks/Tools/Slidy/slidy.css\" /><script src=\"http://www.w3.org/Talks/Tools/Slidy/slidy.js\" charset=\"utf-8\" type=\"text/javascript\"> </script></head><body>"};if(chs==3){if(age==0){printf "%s","<?xml version=\"1.0\" encoding=\"" nt["US-ASCII"] "\"?>"}else{stt="<meta http-equiv=\"Content-type\" content=\"text/html; charset=" nt["US-ASCII"] "\" />" stt};printf "%s","<\041DOCTYPE html PUBLIC \"-//W3C//DTD XHTML+RDFa 1.0//EN\" \"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" version=\"XHTML+RDFa 1.0\" xml:lang=\"sv\"><head><title></title>" stt};if(chs!=3){sub(/ \/>/,">",stt);printf "%s","<\041DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"><HTML><HEAD><META HTTP-EQUIV=\"content-type\" CONTENT=\"text/html; charset=" nt["US-ASCII"] "\"><TITLE></TITLE>" stt}}}{cla="";if(match($0,/^<[hH][1-6]>/)){stt="H" substr($0,3,1);gsub(/^<[hH][1-6]>|<\/[hH][1-6]>$/,"")}else{stt="P";if(argoval=="C"){cla="d"}};gsub(/[\a\b\v\n\r\t ]+/," ");gsub(/^ | $/,"");if((argoval=="U")&&(($0~/^\\/)||($0~/^:; /))){cla="p";flt="cp2htmla.awk";print |& flt;close(flt,"to");flt |& getline;close(flt);sub(/^\\?: *; */,"")};for(i=0;i<33;i++){gsub(sprintf("%c",(i+127)%128),nt["#xfffd"])};if($0=="[extrarad]"){$0=nt["nbsp"]};if(chs!=3){for(i=1;i<129;i++){gsub(sprintf("%c",i+127),"\\&" tkn[i] ";")}};if($0=="[sidbrytning]"){$0="\f"};if(argoval=="D"){if((length($0)<70)&&(substr($0,1,2)!="s.")&&((substr($0,1,11)!~"^\\[sida" nt["nbsp"])||(substr($0,length($0)-2,3)!="ar]"))){stt="H2";if(index($0,nt["nbsp"] nt["mdash"] nt["nbsp"])){cla="b"}};if(NR==1){stt="H1"};if(stt=="P"){cla="d"};};if(argoval=="L"){if(substr($0,1,8)~"^s." nt["nbsp"]){cla="l"}else{cla="t"}};if(argoval=="S"){cla="s"};if((argoval=="U")&&(cla=="")){cla="u";versalrad=$0;gsub(/&#?[[:alnum:]]+;/,"",versalrad);if((versalrad==toupper(versalrad))&&(versalrad~/[[:alpha:]][[:alpha:]]/)){cla="r"};if($0~/^(&[lr][ads]quo;|\302\253|\342\200\234|\342\200\230|\302\273|\342\200\235|\342\200\231)/){cla="c"};if((($0~/^[^ ]+[\(\[]/)||($0~/\014/))&&(cla!="c")){$0=gensub("(^[^ ]+)" nt["nbsp"] "(and|och)" nt["nbsp"],"\\1 \\2 ","1");$0=gensub("^([^([]+\\.,)" nt["nbsp"],"\\1 ","g");cla="f"}else{$0=gensub(/([[:alnum:]]\/)([[:alpha:]])/,"\\1\\" nt["#x200b"] "\\2","g");$0=gensub(/(ations|iblioteks|lldhets)([[:alpha:]])/,shyv,"g");$0=gensub(/(nings)(s[[:alpha:]])/,shyv,"g");$0=gensub(/([b-df-hj-np-tv-xz]s)(system)/,shyv,"g")}};if("YZ"~argoval){printf "%s",$0}else{if(cla!=""){cla=" CLASS=\"" cla "\""};if(chs==3){cla=tolower(cla);stt=tolower(stt)};printf "%s","<" stt cla ">" $0 "</" stt ">"}} END{if("AZ"!~argoval){printf nt["</BODY></HTML>"]}}
run-examplecat filename.txt | xtraline.awk | cp2htmla.awk | cp2htmlc.awk | addligat.awk > filename.htm
cp2htmld.awk (or if you like indents and dislike comments maybe cp2htmld.awk.indents works)
purposemostly like 'cp2htmlc.awk -D', but cp2htmld.awk tries to guess some features in the input text to make a 'default' formating
codeBEGIN{age=1;argoval="D";if((ARGV[1]~/^-/)&&(length(ARGV[1])==2)){chs=toupper(substr(ARGV[1],2,1));delete ARGV[1];if("CDLPSUYZ"~chs){argoval=chs}};FS=":";RS="";if(((chs=length("\000\020\303\200"))==4)&&(argoval=="P")){argoval="C"};split("euro:#xfffd:sbquo:fnof:bdquo:hellip:dagger:Dagger:circ:permil:Scaron:lsaquo:OElig:#xfffd:#x017d:#xfffd:#xfffd:lsquo:rsquo:ldquo:rdquo:bull:ndash:mdash:tilde:trade:scaron:rsaquo:oelig:#xfffd:#x017e:Yuml:nbsp:iexcl:cent:pound:curren:yen:brvbar:sect:uml:copy:ordf:laquo:not:shy:reg:macr:deg:plusmn:sup2:sup3:acute:micro:para:middot:cedil:sup1:ordm:raquo:frac14:frac12:frac34:iquest:Agrave:Aacute:Acirc:Atilde:Auml:Aring:AElig:Ccedil:Egrave:Eacute:Ecirc:Euml:Igrave:Iacute:Icirc:Iuml:ETH:Ntilde:Ograve:Oacute:Ocirc:Otilde:Ouml:times:Oslash:Ugrave:Uacute:Ucirc:Uuml:Yacute:THORN:szlig:agrave:aacute:acirc:atilde:auml:aring:aelig:ccedil:egrave:eacute:ecirc:euml:igrave:iacute:icirc:iuml:eth:ntilde:ograve:oacute:ocirc:otilde:ouml:divide:oslash:ugrave:uacute:ucirc:uuml:yacute:thorn:yuml",tkn);for(i=1;i<split("US-ASCII:UTF-8:&nbsp;:\302\240:&mdash;:\342\200\224:&shy;:\302\255:&#xfffd;:\357\277\275:&#x200b;:\342\200\213:</BODY></HTML>:</body></html>",mt);i+=2){nt[gensub(/[\\\&;]/,"","g",mt[i])]=mt[i+(chs % 2)]};delete mt;shyv="\\1\\" nt["shy"] "\\2";if("AZ"!~argoval){if((argoval=="C")||(argoval=="D")){stt="P.d {text-align: justify;}"};if(argoval=="D"){stt="H2.b {color: #f00;} " stt};if(argoval=="L"){stt="P.l {line-height: 87%; margin-bottom: 0.01cm; margin-left: 0.6cm; text-align: justify; text-indent: -0.6cm;} P.t {font-weight: bold;}"};if(argoval=="S"){stt="P.s {line-height: 87%; margin-bottom: 0.03cm; margin-left: 0.4cm; text-align: justify; text-indent: -0.4cm;}"};if(argoval=="U"){stt="line-height: 150%; margin-left: 1cm; page-break-inside: avoid; text-align: left; text-indent: -1cm;";stt="P.c {line-height: 120%; margin-bottom: 0.7cm; margin-left: 2cm; margin-right: 1cm; text-align: justify; } P.f {" stt "} P.p {font-size: smaller; "stt "} P.r {font-weight: bold; line-height: 150%; margin-top: 1cm; page-break-after: avoid; text-align: left; } P.u {line-height: 150%; margin-bottom: 0.5cm; text-align: justify;}"};if(stt!=""){stt="<STYLE TYPE=\"text/css\"><!-- " stt " --></STYLE>"};stt=stt "<LINK REL=\"stylesheet\" TYPE=\"text/css\" HREF=\"http://www.acc.umu.se/~saasha/gemensam.css\" /></HEAD><BODY>";if(chs==3){stt=tolower(stt)};if(argoval=="P"){stt="<link rel=\"stylesheet\" type=\"text/css\" media=\"screen, projection, print\" href=\"http://www.w3.org/Talks/Tools/Slidy/slidy.css\" /><script src=\"http://www.w3.org/Talks/Tools/Slidy/slidy.js\" charset=\"utf-8\" type=\"text/javascript\"> </script></head><body>"};if(chs==3){if(age==0){printf "%s","<?xml version=\"1.0\" encoding=\"" nt["US-ASCII"] "\"?>"}else{stt="<meta http-equiv=\"Content-type\" content=\"text/html; charset=" nt["US-ASCII"] "\" />" stt};printf "%s","<\041DOCTYPE html PUBLIC \"-//W3C//DTD XHTML+RDFa 1.0//EN\" \"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" version=\"XHTML+RDFa 1.0\" xml:lang=\"sv\"><head><title></title>" stt};if(chs!=3){sub(/ \/>/,">",stt);printf "%s","<\041DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"><HTML><HEAD><META HTTP-EQUIV=\"content-type\" CONTENT=\"text/html; charset=" nt["US-ASCII"] "\"><TITLE></TITLE>" stt}}}{cla="";if(match($0,/^<[hH][1-6]>/)){stt="H" substr($0,3,1);gsub(/^<[hH][1-6]>|<\/[hH][1-6]>$/,"")}else{stt="P";if(argoval=="C"){cla="d"}};gsub(/[\a\b\v\n\r\t ]+/," ");gsub(/^ | $/,"");if((argoval=="U")&&(($0~/^\\/)||($0~/^:; /))){cla="p";flt="cp2htmla.awk";print |& flt;close(flt,"to");flt |& getline;close(flt);sub(/^\\?: *; */,"")};for(i=0;i<33;i++){gsub(sprintf("%c",(i+127)%128),nt["#xfffd"])};if($0=="[extrarad]"){$0=nt["nbsp"]};if(chs!=3){for(i=1;i<129;i++){gsub(sprintf("%c",i+127),"\\&" tkn[i] ";")}};if($0=="[sidbrytning]"){$0="\f"};if(argoval=="D"){if((length($0)<70)&&(substr($0,1,2)!="s.")&&((substr($0,1,11)!~"^\\[sida" nt["nbsp"])||(substr($0,length($0)-2,3)!="ar]"))){stt="H2";if(index($0,nt["nbsp"] nt["mdash"] nt["nbsp"])){cla="b"}};if(NR==1){stt="H1"};if(stt=="P"){cla="d"};};if(argoval=="L"){if(substr($0,1,8)~"^s." nt["nbsp"]){cla="l"}else{cla="t"}};if(argoval=="S"){cla="s"};if((argoval=="U")&&(cla=="")){cla="u";versalrad=$0;gsub(/&#?[[:alnum:]]+;/,"",versalrad);if((versalrad==toupper(versalrad))&&(versalrad~/[[:alpha:]][[:alpha:]]/)){cla="r"};if($0~/^(&[lr][ads]quo;|\302\253|\342\200\234|\342\200\230|\302\273|\342\200\235|\342\200\231)/){cla="c"};if(($0~/^[^ ]+[\(\[]/)||($0~/\014/)){$0=gensub("(^[^ ]+)" nt["nbsp"] "(and|och)" nt["nbsp"],"\\1 \\2 ","1");$0=gensub("^([^([]+\\.,)" nt["nbsp"],"\\1 ","g");cla="f"}else{$0=gensub(/([[:alnum:]]\/)([[:alpha:]])/,"\\1\\" nt["#x200b"] "\\2","g");$0=gensub(/(ations|iblioteks|lldhets)([[:alpha:]])/,shyv,"g");$0=gensub(/(nings)(s[[:alpha:]])/,shyv,"g");$0=gensub(/([b-df-hj-np-tv-xz]s)(system)/,shyv,"g")}};if("YZ"~argoval){printf "%s",$0}else{if(cla!=""){cla=" CLASS=\"" cla "\""};if(chs==3){cla=tolower(cla);stt=tolower(stt)};printf "%s","<" stt cla ">" $0 "</" stt ">"}} END{if("AZ"!~argoval){printf nt["</BODY></HTML>"]}}
cp2htmll.awk (or if you like indents and dislike comments maybe cp2htmll.awk.indents works)
purposemostly like 'cp2htmlc.awk -L', but cp2htmll.awk is a paper-saving alternative for text you wish to print out
codeBEGIN{argoval="L";if((ARGV[1]~/^-/)&&(length(ARGV[1])==2)){chs=toupper(substr(ARGV[1],2,1));delete ARGV[1];if("CDLPSUYZ"~chs){argoval=chs}};FS=":";RS="";if(((chs=length("\000\020\303\200"))==4)&&(argoval=="P")){argoval="C"};split("euro:#xfffd:sbquo:fnof:bdquo:hellip:dagger:Dagger:circ:permil:Scaron:lsaquo:OElig:#xfffd:#x017d:#xfffd:#xfffd:lsquo:rsquo:ldquo:rdquo:bull:ndash:mdash:tilde:trade:scaron:rsaquo:oelig:#xfffd:#x017e:Yuml:nbsp:iexcl:cent:pound:curren:yen:brvbar:sect:uml:copy:ordf:laquo:not:shy:reg:macr:deg:plusmn:sup2:sup3:acute:micro:para:middot:cedil:sup1:ordm:raquo:frac14:frac12:frac34:iquest:Agrave:Aacute:Acirc:Atilde:Auml:Aring:AElig:Ccedil:Egrave:Eacute:Ecirc:Euml:Igrave:Iacute:Icirc:Iuml:ETH:Ntilde:Ograve:Oacute:Ocirc:Otilde:Ouml:times:Oslash:Ugrave:Uacute:Ucirc:Uuml:Yacute:THORN:szlig:agrave:aacute:acirc:atilde:auml:aring:aelig:ccedil:egrave:eacute:ecirc:euml:igrave:iacute:icirc:iuml:eth:ntilde:ograve:oacute:ocirc:otilde:ouml:divide:oslash:ugrave:uacute:ucirc:uuml:yacute:thorn:yuml",tkn);for(i=1;i<split("US-ASCII:UTF-8:&nbsp;:\302\240:&mdash;:\342\200\224:&shy;:\302\255:&#xfffd;:\357\277\275:&#x200b;:\342\200\213:</BODY></HTML>:</body></html>",mt);i+=2){nt[gensub(/[\\\&;]/,"","g",mt[i])]=mt[i+(chs % 2)]};delete mt;shyv="\\1\\" nt["shy"] "\\2";if("AZ"!~argoval){if((argoval=="C")||(argoval=="D")){stt="P.d {text-align: justify;}"};if(argoval=="D"){stt="H2.b {color: #f00;} " stt};if(argoval=="L"){stt="P.l {line-height: 87%; margin-bottom: 0.01cm; margin-left: 0.6cm; text-align: justify; text-indent: -0.6cm;} P.t {font-weight: bold;}"};if(argoval=="S"){stt="P.s {line-height: 87%; margin-bottom: 0.03cm; margin-left: 0.4cm; text-align: justify; text-indent: -0.4cm;}"};if(argoval=="U"){stt="line-height: 150%; margin-left: 1cm; page-break-inside: avoid; text-align: left; text-indent: -1cm;";stt="P.c {line-height: 120%; margin-bottom: 0.7cm; margin-left: 2cm; margin-right: 1cm; text-align: justify; } P.f {" stt "} P.p {font-size: smaller; "stt "} P.r {font-weight: bold; line-height: 150%; margin-top: 1cm; page-break-after: avoid; text-align: left; } P.u {line-height: 150%; margin-bottom: 0.5cm; text-align: justify;}"};stt="<STYLE TYPE=\"text/css\"><!-- " stt " --></STYLE></HEAD><BODY>";if(chs==3){stt=tolower(stt)};if(argoval=="P"){stt="<link rel=\"stylesheet\" type=\"text/css\" media=\"screen, projection, print\" href=\"http://www.w3.org/Talks/Tools/Slidy/slidy.css\" /><script src=\"http://www.w3.org/Talks/Tools/Slidy/slidy.js\" charset=\"utf-8\" type=\"text/javascript\"></script></head><body>"};if(chs==3){printf "%s","<?xml version=\"1.0\" encoding=\"" nt["US-ASCII"] "\"?><\041DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"sv\"><head><title></title>" stt};if(chs!=3){printf "%s","<\041DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"><HTML><HEAD><META HTTP-EQUIV=\"content-type\" CONTENT=\"text/html; charset=" nt["US-ASCII"] "\"><TITLE></TITLE>" stt}}}{cla="";if(match($0,/^<[hH][1-6]>/)){stt="H" substr($0,3,1);gsub(/^<[hH][1-6]>|<\/[hH][1-6]>$/,"")}else{stt="P";if(argoval=="C"){cla="d"}};gsub(/[\a\b\v\n\r\t ]+/," ");gsub(/^ | $/,"");if((argoval=="U")&&(($0~/^\\/)||($0~/^:; /))){cla="p";flt="cp2htmla.awk";print |& flt;close(flt,"to");flt |& getline;close(flt);sub(/^\\?: *; */,"")};for(i=0;i<33;i++){gsub(sprintf("%c",(i+127)%128),nt["#xfffd"])};if($0=="[extrarad]"){$0=nt["nbsp"]};if(chs!=3){for(i=1;i<129;i++){gsub(sprintf("%c",i+127),"\\&" tkn[i] ";")}};if($0=="[sidbrytning]"){$0="\f"};if(argoval=="D"){if((length($0)<70)&&(substr($0,1,2)!="s.")&&((substr($0,1,11)!~"^\\[sida" nt["nbsp"])||(substr($0,length($0)-2,3)!="ar]"))){stt="H2";if(index($0,nt["nbsp"] nt["mdash"] nt["nbsp"])){cla="l"}};if(NR==1){stt="H1"};if(stt=="P"){cla="d"};};if(argoval=="L"){if(substr($0,1,8)~"^s." nt["nbsp"]){cla="l"}else{cla="t"}};if(argoval=="S"){cla="s"};if((argoval=="U")&&(cla=="")){cla="u";versalrad=$0;gsub(/&#?[[:alnum:]]+;/,"",versalrad);if((versalrad==toupper(versalrad))&&(versalrad~/[[:alpha:]][[:alpha:]]/)){cla="r"};if($0~/^(&[lr][ads]quo;|\302\253|\342\200\234|\342\200\230|\302\273|\342\200\235|\342\200\231)/){cla="c"};if(($0~/^[^ ]+[\(\[]/)||($0~/\014/)){$0=gensub("(^[^ ]+)" nt["nbsp"] "(and|och)" nt["nbsp"],"\\1 \\2 ","1");$0=gensub("^([^([]+\\.,)" nt["nbsp"],"\\1 ","g");cla="f"}else{$0=gensub(/([[:alnum:]]\/)([[:alpha:]])/,"\\1\\" nt["#x200b"] "\\2","g");$0=gensub(/(ations|iblioteks|lldhets)([[:alpha:]])/,shyv,"g");$0=gensub(/([b-df-hj-np-tv-xz]s)(system)/,shyv,"g")}};if("YZ"~argoval){printf "%s",$0}else{if(cla!=""){cla=" CLASS=\"" cla "\""};if(chs==3){cla=tolower(cla);stt=tolower(stt)};printf "%s","<" stt cla ">" $0 "</" stt ">"}} END{if("AZ"!~argoval){printf nt["</BODY></HTML>"]}}
cp2htmlp.awk (or if you like indents and dislike comments maybe cp2htmlp.awk.indents works)
purposeWORK-IN-PROGRESS mostly like 'cp2htmlc.awk -P', but in UTF-8 environments 'cp2htmlp.awk' is a special-formating alternative to output slidy-presentations. In an eight bits characters environment (NON utf-8), 'cp2htmlp.awk' does the same work as 'cp2htmlc.awk'.
codeBEGIN{argoval="P";if((ARGV[1]~/^-/)&&(length(ARGV[1])==2)){chs=toupper(substr(ARGV[1],2,1));delete ARGV[1];if("CDLPSUYZ"~chs){argoval=chs}};FS=":";RS="";if(((chs=length("\000\020\303\200"))==4)&&(argoval=="P")){argoval="C"};split("euro:#xfffd:sbquo:fnof:bdquo:hellip:dagger:Dagger:circ:permil:Scaron:lsaquo:OElig:#xfffd:#x017d:#xfffd:#xfffd:lsquo:rsquo:ldquo:rdquo:bull:ndash:mdash:tilde:trade:scaron:rsaquo:oelig:#xfffd:#x017e:Yuml:nbsp:iexcl:cent:pound:curren:yen:brvbar:sect:uml:copy:ordf:laquo:not:shy:reg:macr:deg:plusmn:sup2:sup3:acute:micro:para:middot:cedil:sup1:ordm:raquo:frac14:frac12:frac34:iquest:Agrave:Aacute:Acirc:Atilde:Auml:Aring:AElig:Ccedil:Egrave:Eacute:Ecirc:Euml:Igrave:Iacute:Icirc:Iuml:ETH:Ntilde:Ograve:Oacute:Ocirc:Otilde:Ouml:times:Oslash:Ugrave:Uacute:Ucirc:Uuml:Yacute:THORN:szlig:agrave:aacute:acirc:atilde:auml:aring:aelig:ccedil:egrave:eacute:ecirc:euml:igrave:iacute:icirc:iuml:eth:ntilde:ograve:oacute:ocirc:otilde:ouml:divide:oslash:ugrave:uacute:ucirc:uuml:yacute:thorn:yuml",tkn);for(i=1;i<split("US-ASCII:UTF-8:&nbsp;:\302\240:&mdash;:\342\200\224:&shy;:\302\255:&#xfffd;:\357\277\275:&#x200b;:\342\200\213:</BODY></HTML>:</body></html>",mt);i+=2){nt[gensub(/[\\\&;]/,"","g",mt[i])]=mt[i+(chs % 2)]};delete mt;shyv="\\1\\" nt["shy"] "\\2";if("AZ"!~argoval){if((argoval=="C")||(argoval=="D")){stt="P.d {text-align: justify;}"};if(argoval=="D"){stt="H2.b {color: #f00;} " stt};if(argoval=="L"){stt="P.l {line-height: 87%; margin-bottom: 0cm; margin-left: 0.6cm; text-align: justify; text-indent: -0.6cm;} P.t {font-weight: bold;}"};if(argoval=="S"){stt="P.s {line-height: 87%; margin-bottom: 0.03cm; margin-left: 0.4cm; text-align: justify; text-indent: -0.4cm;}"};if(argoval=="U"){stt="line-height: 150%; margin-left: 1cm; page-break-inside: avoid; text-align: left; text-indent: -1cm;";stt="P.c {line-height: 120%; margin-bottom: 0.7cm; margin-left: 2cm; margin-right: 1cm; text-align: justify; } P.f {" stt "} P.p {font-size: smaller; "stt "} P.r {font-weight: bold; line-height: 150%; margin-top: 1cm; page-break-after: avoid; text-align: left; } P.u {line-height: 150%; margin-bottom: 0.5cm; text-align: justify;}"};stt="<STYLE TYPE=\"text/css\"><!-- " stt " --></STYLE></HEAD><BODY>";if(chs==3){stt=tolower(stt)};if(argoval=="P"){stt="<link rel=\"stylesheet\" type=\"text/css\" media=\"screen, projection, print\" href=\"http://www.w3.org/Talks/Tools/Slidy/slidy.css\" /><script src=\"http://www.w3.org/Talks/Tools/Slidy/slidy.js\" charset=\"utf-8\" type=\"text/javascript\"></script></head><body>"};if(chs==3){printf "%s","<?xml version=\"1.0\" encoding=\"" nt["US-ASCII"] "\"?><\041DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"sv\"><head><title></title>" stt};if(chs!=3){printf "%s","<\041DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"><HTML><HEAD><META HTTP-EQUIV=\"content-type\" CONTENT=\"text/html; charset=" nt["US-ASCII"] "\"><TITLE></TITLE>" stt}}}{cla="";if(match($0,/^<[hH][1-6]>/)){stt="H" substr($0,3,1);gsub(/^<[hH][1-6]>|<\/[hH][1-6]>$/,"")}else{stt="P";if(argoval=="C"){cla="d"}};gsub(/[\a\b\v\n\r\t ]+/," ");gsub(/^ | $/,"");if((argoval=="U")&&(($0~/^\\/)||($0~/^:; /))){cla="p";flt="cp2htmla.awk";print |& flt;close(flt,"to");flt |& getline;close(flt);sub(/^\\?: *; */,"")};for(i=0;i<33;i++){gsub(sprintf("%c",(i+127)%128),nt["#xfffd"])};if($0=="[extrarad]"){$0=nt["nbsp"]};if(chs!=3){for(i=1;i<129;i++){gsub(sprintf("%c",i+127),"\\&" tkn[i] ";")}};if($0=="[sidbrytning]"){$0="\f"};if(argoval=="D"){if((length($0)<70)&&(substr($0,1,2)!="s.")&&((substr($0,1,11)!~"^\\[sida" nt["nbsp"])||(substr($0,length($0)-2,3)!="ar]"))){stt="H2";if(index($0,nt["nbsp"] nt["mdash"] nt["nbsp"])){cla="l"}};if(NR==1){stt="H1"};if(stt=="P"){cla="d"};};if(argoval=="L"){if(substr($0,1,8)~"^s." nt["nbsp"]){cla="l"}else{cla="b"}};if(argoval=="S"){cla="s"};if((argoval=="U")&&(cla=="")){cla="u";versalrad=$0;gsub(/&#?[[:alnum:]]+;/,"",versalrad);if((versalrad==toupper(versalrad))&&(versalrad~/[[:alpha:]][[:alpha:]]/)){cla="r"};if($0~/^(&[lr][ads]quo;|\302\253|\342\200\234|\342\200\230|\302\273|\342\200\235|\342\200\231)/){cla="c"};if(($0~/^[^ ]+[\(\[]/)||($0~/\014/)){$0=gensub("(^[^ ]+)" nt["nbsp"] "(and|och)" nt["nbsp"],"\\1 \\2 ","1");$0=gensub("^([^([]+\\.,)" nt["nbsp"],"\\1 ","g");cla="f"}else{$0=gensub(/([[:alnum:]]\/)([[:alpha:]])/,"\\1\\" nt["#x200b"] "\\2","g");$0=gensub(/(ations|iblioteks|lldhets)([[:alpha:]])/,shyv,"g");$0=gensub(/([b-df-hj-np-tv-xz]s)(system)/,shyv,"g")}};if("YZ"~argoval){printf "%s",$0}else{if(cla!=""){cla=" CLASS=\"" cla "\""};if(chs==3){cla=tolower(cla);stt=tolower(stt)};printf "%s","<" stt cla ">" $0 "</" stt ">"}} END{if("AZ"!~argoval){printf nt["</BODY></HTML>"]}}
run-examplegrep "something" filename.txt | cp2htmlp.awk
cp2htmls.awk (or if you like indents and dislike comments maybe cp2htmls.awk.indents works)
purposemostly like 'cp2htmlc.awk -S', but cp2htmls.awk is a special-formating alternative to output in columns
codeBEGIN{argoval="S";if((ARGV[1]~/^-/)&&(length(ARGV[1])==2)){chs=toupper(substr(ARGV[1],2,1));delete ARGV[1];if("CDLPSUYZ"~chs){argoval=chs}};FS=":";RS="";if(((chs=length("\000\020\303\200"))==4)&&(argoval=="P")){argoval="C"};split("euro:#xfffd:sbquo:fnof:bdquo:hellip:dagger:Dagger:circ:permil:Scaron:lsaquo:OElig:#xfffd:#x017d:#xfffd:#xfffd:lsquo:rsquo:ldquo:rdquo:bull:ndash:mdash:tilde:trade:scaron:rsaquo:oelig:#xfffd:#x017e:Yuml:nbsp:iexcl:cent:pound:curren:yen:brvbar:sect:uml:copy:ordf:laquo:not:shy:reg:macr:deg:plusmn:sup2:sup3:acute:micro:para:middot:cedil:sup1:ordm:raquo:frac14:frac12:frac34:iquest:Agrave:Aacute:Acirc:Atilde:Auml:Aring:AElig:Ccedil:Egrave:Eacute:Ecirc:Euml:Igrave:Iacute:Icirc:Iuml:ETH:Ntilde:Ograve:Oacute:Ocirc:Otilde:Ouml:times:Oslash:Ugrave:Uacute:Ucirc:Uuml:Yacute:THORN:szlig:agrave:aacute:acirc:atilde:auml:aring:aelig:ccedil:egrave:eacute:ecirc:euml:igrave:iacute:icirc:iuml:eth:ntilde:ograve:oacute:ocirc:otilde:ouml:divide:oslash:ugrave:uacute:ucirc:uuml:yacute:thorn:yuml",tkn);for(i=1;i<split("US-ASCII:UTF-8:&nbsp;:\302\240:&mdash;:\342\200\224:&shy;:\302\255:&#xfffd;:\357\277\275:&#x200b;:\342\200\213:</BODY></HTML>:</body></html>",mt);i+=2){nt[gensub(/[\\\&;]/,"","g",mt[i])]=mt[i+(chs % 2)]};delete mt;shyv="\\1\\" nt["shy"] "\\2";if("AZ"!~argoval){if((argoval=="C")||(argoval=="D")){stt="P.d {text-align: justify;}"};if(argoval=="D"){stt="H2.b {color: #f00;} " stt};if(argoval=="L"){stt="P.l {line-height: 87%; margin-bottom: 0cm; margin-left: 0.6cm; text-align: justify; text-indent: -0.6cm;} P.t {font-weight: bold;}"};if(argoval=="S"){stt="P.s {line-height: 87%; margin-bottom: 0.03cm; margin-left: 0.4cm; text-align: justify; text-indent: -0.4cm;}"};if(argoval=="U"){stt="line-height: 150%; margin-left: 1cm; page-break-inside: avoid; text-align: left; text-indent: -1cm;";stt="P.c {line-height: 120%; margin-bottom: 0.7cm; margin-left: 2cm; margin-right: 1cm; text-align: justify; } P.f {" stt "} P.p {font-size: smaller; "stt "} P.r {font-weight: bold; line-height: 150%; margin-top: 1cm; page-break-after: avoid; text-align: left; } P.u {line-height: 150%; margin-bottom: 0.5cm; text-align: justify;}"};stt="<STYLE TYPE=\"text/css\"><!-- " stt " --></STYLE></HEAD><BODY>";if(chs==3){stt=tolower(stt)};if(argoval=="P"){stt="<link rel=\"stylesheet\" type=\"text/css\" media=\"screen, projection, print\" href=\"http://www.w3.org/Talks/Tools/Slidy/slidy.css\" /><script src=\"http://www.w3.org/Talks/Tools/Slidy/slidy.js\" charset=\"utf-8\" type=\"text/javascript\"></script></head><body>"};if(chs==3){printf "%s","<?xml version=\"1.0\" encoding=\"" nt["US-ASCII"] "\"?><\041DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"sv\"><head><title></title>" stt};if(chs!=3){printf "%s","<\041DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"><HTML><HEAD><META HTTP-EQUIV=\"content-type\" CONTENT=\"text/html; charset=" nt["US-ASCII"] "\"><TITLE></TITLE>" stt}}}{cla="";if(match($0,/^<[hH][1-6]>/)){stt="H" substr($0,3,1);gsub(/^<[hH][1-6]>|<\/[hH][1-6]>$/,"")}else{stt="P";if(argoval=="C"){cla="d"}};gsub(/[\a\b\v\n\r\t ]+/," ");gsub(/^ | $/,"");if((argoval=="U")&&(($0~/^\\/)||($0~/^:; /))){cla="p";flt="cp2htmla.awk";print |& flt;close(flt,"to");flt |& getline;close(flt);sub(/^\\?: *; */,"")};for(i=0;i<33;i++){gsub(sprintf("%c",(i+127)%128),nt["#xfffd"])};if($0=="[extrarad]"){$0=nt["nbsp"]};if(chs!=3){for(i=1;i<129;i++){gsub(sprintf("%c",i+127),"\\&" tkn[i] ";")}};if($0=="[sidbrytning]"){$0="\f"};if(argoval=="D"){if((length($0)<70)&&(substr($0,1,2)!="s.")&&((substr($0,1,11)!~"^\\[sida" nt["nbsp"])||(substr($0,length($0)-2,3)!="ar]"))){stt="H2";if(index($0,nt["nbsp"] nt["mdash"] nt["nbsp"])){cla="l"}};if(NR==1){stt="H1"};if(stt=="P"){cla="d"};};if(argoval=="L"){if(substr($0,1,8)~"^s." nt["nbsp"]){cla="l"}else{cla="b"}};if(argoval=="S"){cla="s"};if((argoval=="U")&&(cla=="")){cla="u";versalrad=$0;gsub(/&#?[[:alnum:]]+;/,"",versalrad);if((versalrad==toupper(versalrad))&&(versalrad~/[[:alpha:]][[:alpha:]]/)){cla="r"};if($0~/^(&[lr][ads]quo;|\302\253|\342\200\234|\342\200\230|\302\273|\342\200\235|\342\200\231)/){cla="c"};if(($0~/^[^ ]+[\(\[]/)||($0~/\014/)){$0=gensub("(^[^ ]+)" nt["nbsp"] "(and|och)" nt["nbsp"],"\\1 \\2 ","1");$0=gensub("^([^([]+\\.,)" nt["nbsp"],"\\1 ","g");cla="f"}else{$0=gensub(/([[:alnum:]]\/)([[:alpha:]])/,"\\1\\" nt["#x200b"] "\\2","g");$0=gensub(/(ations|iblioteks|lldhets)([[:alpha:]])/,shyv,"g");$0=gensub(/([b-df-hj-np-tv-xz]s)(system)/,shyv,"g")}};if("YZ"~argoval){printf "%s",$0}else{if(cla!=""){cla=" CLASS=\"" cla "\""};if(chs==3){cla=tolower(cla);stt=tolower(stt)};printf "%s","<" stt cla ">" $0 "</" stt ">"}} END{if("AZ"!~argoval){printf nt["</BODY></HTML>"]}}
run-examplegrep "something" filename.txt | cp2htmls.awk
cp2htmlu.awk (or if you like indents and dislike comments maybe cp2htmlu.awk.indents works)
purposemostly like 'cp2htmlc.awk -U', but cp2htmlu.awk is a special-formating alternative
codeBEGIN{age=1;argoval="U";if((ARGV[1]~/^-/)&&(length(ARGV[1])==2)){chs=toupper(substr(ARGV[1],2,1));delete ARGV[1];if("CDLPSUYZ"~chs){argoval=chs}};FS=":";RS="";if(((chs=length(sprintf("%c%c%c%c",0,16,216,128)))==4)&&(argoval=="P")){argoval="C"};split("euro:#xfffd:sbquo:fnof:bdquo:hellip:dagger:Dagger:circ:permil:Scaron:lsaquo:OElig:#xfffd:#x017d:#xfffd:#xfffd:lsquo:rsquo:ldquo:rdquo:bull:ndash:mdash:tilde:trade:scaron:rsaquo:oelig:#xfffd:#x017e:Yuml:nbsp:iexcl:cent:pound:curren:yen:brvbar:sect:uml:copy:ordf:laquo:not:shy:reg:macr:deg:plusmn:sup2:sup3:acute:micro:para:middot:cedil:sup1:ordm:raquo:frac14:frac12:frac34:iquest:Agrave:Aacute:Acirc:Atilde:Auml:Aring:AElig:Ccedil:Egrave:Eacute:Ecirc:Euml:Igrave:Iacute:Icirc:Iuml:ETH:Ntilde:Ograve:Oacute:Ocirc:Otilde:Ouml:times:Oslash:Ugrave:Uacute:Ucirc:Uuml:Yacute:THORN:szlig:agrave:aacute:acirc:atilde:auml:aring:aelig:ccedil:egrave:eacute:ecirc:euml:igrave:iacute:icirc:iuml:eth:ntilde:ograve:oacute:ocirc:otilde:ouml:divide:oslash:ugrave:uacute:ucirc:uuml:yacute:thorn:yuml",tkn);for(i=1;i<split("US-ASCII:UTF-8:&nbsp;:\302\240:&mdash;:\342\200\224:&shy;:\302\255:&#xfffd;:\357\277\275:&#x200b;:\342\200\213:</BODY></HTML>:</body></html>",mt);i+=2){nt[gensub(/[\\\&;]/,"","g",mt[i])]=mt[i+(chs % 2)]};delete mt;shyv="\\1\\" nt["shy"] "\\2";if("AZ"!~argoval){if((argoval=="C")||(argoval=="D")){stt="P.d {text-align: justify;}"};if(argoval=="D"){stt="H2.b {color: #f00;} " stt};if(argoval=="L"){stt="P.l {line-height: 87%; margin-bottom: 0.01cm; margin-left: 0.6cm; text-align: justify; text-indent: -0.6cm;} P.t {font-weight: bold;}"};if(argoval=="S"){stt="P.s {line-height: 87%; margin-bottom: 0.03cm; margin-left: 0.4cm; text-align: justify; text-indent: -0.4cm;}"};if(argoval=="U"){stt="line-height: 150%; margin-left: 1cm; page-break-inside: avoid; text-align: left; text-indent: -1cm;";stt="P.c {line-height: 120%; margin-bottom: 0.7cm; margin-left: 2cm; margin-right: 1cm; text-align: justify; } P.f {" stt "} P.p {font-size: smaller; "stt "} P.r {font-weight: bold; line-height: 150%; margin-top: 1cm; page-break-after: avoid; text-align: left; } P.u {line-height: 150%; margin-bottom: 0.5cm; text-align: justify;}"};if(stt!=""){stt="<STYLE TYPE=\"text/css\"><!-- " stt " --></STYLE>"};stt=stt "<LINK REL=\"stylesheet\" TYPE=\"text/css\" HREF=\"http://www.acc.umu.se/~saasha/gemensam.css\" /></HEAD><BODY>";if(chs==3){stt=tolower(stt)};if(argoval=="P"){stt="<link rel=\"stylesheet\" type=\"text/css\" media=\"screen, projection, print\" href=\"http://www.w3.org/Talks/Tools/Slidy/slidy.css\" /><script src=\"http://www.w3.org/Talks/Tools/Slidy/slidy.js\" charset=\"utf-8\" type=\"text/javascript\"> </script></head><body>"};if(chs==3){if(age==0){printf "%s","<?xml version=\"1.0\" encoding=\"" nt["US-ASCII"] "\"?>"}else{stt="<meta http-equiv=\"Content-type\" content=\"text/html; charset=" nt["US-ASCII"] "\" />" stt};printf "%s","<\041DOCTYPE html PUBLIC \"-//W3C//DTD XHTML+RDFa 1.0//EN\" \"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" version=\"XHTML+RDFa 1.0\" xml:lang=\"sv\"><head><title></title>" stt};if(chs!=3){sub(/ \/>/,">",stt);printf "%s","<\041DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"><HTML><HEAD><META HTTP-EQUIV=\"content-type\" CONTENT=\"text/html; charset=" nt["US-ASCII"] "\"><TITLE></TITLE>" stt}}}{cla="";if(match($0,/^<[hH][1-6]>/)){stt="H" substr($0,3,1);gsub(/^<[hH][1-6]>|<\/[hH][1-6]>$/,"")}else{stt="P";if(argoval=="C"){cla="d"}};gsub(/[\a\b\v\n\r\t ]+/," ");gsub(/^ | $/,"");if((argoval=="U")&&(($0~/^\\/)||($0~/^:; /))){cla="p";flt="cp2htmla.awk";print |& flt;close(flt,"to");flt |& getline;close(flt);sub(/^\\?: *; */,"")};for(i=0;i<33;i++){gsub(sprintf("%c",(i+127)%128),nt["#xfffd"])};if($0=="[extrarad]"){$0=nt["nbsp"]};if(chs!=3){for(i=1;i<129;i++){gsub(sprintf("%c",i+127),"\\&" tkn[i] ";")}};if($0=="[sidbrytning]"){$0="\f"};if(argoval=="D"){if((length($0)<70)&&(substr($0,1,2)!="s.")&&((substr($0,1,11)!~"^\\[sida" nt["nbsp"])||(substr($0,length($0)-2,3)!="ar]"))){stt="H2";if(index($0,nt["nbsp"] nt["mdash"] nt["nbsp"])){cla="b"}};if(NR==1){stt="H1"};if(stt=="P"){cla="d"};};if(argoval=="L"){if(substr($0,1,8)~"^s." nt["nbsp"]){cla="l"}else{cla="t"}};if(argoval=="S"){cla="s"};if((argoval=="U")&&(cla=="")){cla="u";versalrad=$0;gsub(/&#?[[:alnum:]]+;/,"",versalrad);if((versalrad==toupper(versalrad))&&(versalrad~/[[:alpha:]][[:alpha:]]/)){cla="r"};if($0~/^(&[lr][ads]quo;|\302\253|\342\200\234|\342\200\230|\302\273|\342\200\235|\342\200\231)/){cla="c"};if((($0~/^[^ ]+[\(\[]/)||($0~/\014/))&&(cla!="c")){$0=gensub("(^[^ ]+)" nt["nbsp"] "(and|och)" nt["nbsp"],"\\1 \\2 ","1");$0=gensub("^([^([]+\\.,)" nt["nbsp"],"\\1 ","g");cla="f"}else{$0=gensub(/([[:alnum:]]\/)([[:alpha:]])/,"\\1\\" nt["#x200b"] "\\2","g");$0=gensub(/(ations|iblioteks|lldhets)([[:alpha:]])/,shyv,"g");$0=gensub(/(nings)(s[[:alpha:]])/,shyv,"g");$0=gensub(/([b-df-hj-np-tv-xz]s)(system)/,shyv,"g")}};if("YZ"~argoval){printf "%s",$0}else{if(cla!=""){cla=" CLASS=\"" cla "\""};if(chs==3){cla=tolower(cla);stt=tolower(stt)};printf "%s","<" stt cla ">" $0 "</" stt ">"}} END{if("AZ"!~argoval){printf nt["</BODY></HTML>"]}}
run-examplegrep "something" filename.txt | mkutkast.awk | cp2htmlu.awk
cp2htmly.awk (or if you like indents and dislike comments maybe cp2htmly.awk.indents works)
purposemostly like 'cp2htmlc.awk -Y', but cp2htmly.awk assumes that the input file already has P, UL, LI and / or Hx tags — no such tag is added
codeBEGIN{argoval="Y";if((ARGV[1]~/^-/)&&(length(ARGV[1])==2)){chs=toupper(substr(ARGV[1],2,1));delete ARGV[1];if("CDLPSUYZ"~chs){argoval=chs}};FS=":";RS="";if(((chs=length("\000\020\303\200"))==4)&&(argoval=="P")){argoval="C"};split("euro:#xfffd:sbquo:fnof:bdquo:hellip:dagger:Dagger:circ:permil:Scaron:lsaquo:OElig:#xfffd:#x017d:#xfffd:#xfffd:lsquo:rsquo:ldquo:rdquo:bull:ndash:mdash:tilde:trade:scaron:rsaquo:oelig:#xfffd:#x017e:Yuml:nbsp:iexcl:cent:pound:curren:yen:brvbar:sect:uml:copy:ordf:laquo:not:shy:reg:macr:deg:plusmn:sup2:sup3:acute:micro:para:middot:cedil:sup1:ordm:raquo:frac14:frac12:frac34:iquest:Agrave:Aacute:Acirc:Atilde:Auml:Aring:AElig:Ccedil:Egrave:Eacute:Ecirc:Euml:Igrave:Iacute:Icirc:Iuml:ETH:Ntilde:Ograve:Oacute:Ocirc:Otilde:Ouml:times:Oslash:Ugrave:Uacute:Ucirc:Uuml:Yacute:THORN:szlig:agrave:aacute:acirc:atilde:auml:aring:aelig:ccedil:egrave:eacute:ecirc:euml:igrave:iacute:icirc:iuml:eth:ntilde:ograve:oacute:ocirc:otilde:ouml:divide:oslash:ugrave:uacute:ucirc:uuml:yacute:thorn:yuml",tkn);for(i=1;i<split("US-ASCII:UTF-8:&nbsp;:\302\240:&mdash;:\342\200\224:&shy;:\302\255:&#xfffd;:\357\277\275:&#x200b;:\342\200\213:</BODY></HTML>:</body></html>",mt);i+=2){nt[gensub(/[\\\&;]/,"","g",mt[i])]=mt[i+(chs % 2)]};delete mt;shyv="\\1\\" nt["shy"] "\\2";if("AZ"!~argoval){if((argoval=="C")||(argoval=="D")){stt="P.d {text-align: justify;}"};if(argoval=="D"){stt="H2.b {color: #f00;} " stt};if(argoval=="L"){stt="P.l {line-height: 87%; margin-bottom: 0cm; margin-left: 0.6cm; text-align: justify; text-indent: -0.6cm;} P.t {font-weight: bold;}"};if(argoval=="S"){stt="P.s {line-height: 87%; margin-bottom: 0.03cm; margin-left: 0.4cm; text-align: justify; text-indent: -0.4cm;}"};if(argoval=="U"){stt="line-height: 150%; margin-left: 1cm; page-break-inside: avoid; text-align: left; text-indent: -1cm;";stt="P.c {line-height: 120%; margin-bottom: 0.7cm; margin-left: 2cm; margin-right: 1cm; text-align: justify; } P.f {" stt "} P.p {font-size: smaller; "stt "} P.r {font-weight: bold; line-height: 150%; margin-top: 1cm; page-break-after: avoid; text-align: left; } P.u {line-height: 150%; margin-bottom: 0.5cm; text-align: justify;}"};stt="<STYLE TYPE=\"text/css\"><!-- " stt " --></STYLE></HEAD><BODY>";if(chs==3){stt=tolower(stt)};if(argoval=="P"){stt="<link rel=\"stylesheet\" type=\"text/css\" media=\"screen, projection, print\" href=\"http://www.w3.org/Talks/Tools/Slidy/slidy.css\" /><script src=\"http://www.w3.org/Talks/Tools/Slidy/slidy.js\" charset=\"utf-8\" type=\"text/javascript\"></script></head><body>"};if(chs==3){printf "%s","<?xml version=\"1.0\" encoding=\"" nt["US-ASCII"] "\"?><\041DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"sv\"><head><title></title>" stt};if(chs!=3){printf "%s","<\041DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"><HTML><HEAD><META HTTP-EQUIV=\"content-type\" CONTENT=\"text/html; charset=" nt["US-ASCII"] "\"><TITLE></TITLE>" stt}}}{cla="";if(match($0,/^<[hH][1-6]>/)){stt="H" substr($0,3,1);gsub(/^<[hH][1-6]>|<\/[hH][1-6]>$/,"")}else{stt="P";if(argoval=="C"){cla="d"}};gsub(/[\a\b\v\n\r\t ]+/," ");gsub(/^ | $/,"");if((argoval=="U")&&(($0~/^\\/)||($0~/^:; /))){cla="p";flt="cp2htmla.awk";print |& flt;close(flt,"to");flt |& getline;close(flt);sub(/^\\?: *; */,"")};for(i=0;i<33;i++){gsub(sprintf("%c",(i+127)%128),nt["#xfffd"])};if($0=="[extrarad]"){$0=nt["nbsp"]};if(chs!=3){for(i=1;i<129;i++){gsub(sprintf("%c",i+127),"\\&" tkn[i] ";")}};if($0=="[sidbrytning]"){$0="\f"};if(argoval=="D"){if((length($0)<70)&&(substr($0,1,2)!="s.")&&((substr($0,1,11)!~"^\\[sida" nt["nbsp"])||(substr($0,length($0)-2,3)!="ar]"))){stt="H2";if(index($0,nt["nbsp"] nt["mdash"] nt["nbsp"])){cla="l"}};if(NR==1){stt="H1"};if(stt=="P"){cla="d"};};if(argoval=="L"){if(substr($0,1,8)~"^s." nt["nbsp"]){cla="l"}else{cla="b"}};if(argoval=="S"){cla="s"};if((argoval=="U")&&(cla=="")){cla="u";versalrad=$0;gsub(/&#?[[:alnum:]]+;/,"",versalrad);if((versalrad==toupper(versalrad))&&(versalrad~/[[:alpha:]][[:alpha:]]/)){cla="r"};if($0~/^(&[lr][ads]quo;|\302\253|\342\200\234|\342\200\230|\302\273|\342\200\235|\342\200\231)/){cla="c"};if(($0~/^[^ ]+[\(\[]/)||($0~/\014/)){$0=gensub("(^[^ ]+)" nt["nbsp"] "(and|och)" nt["nbsp"],"\\1 \\2 ","1");$0=gensub("^([^([]+\\.,)" nt["nbsp"],"\\1 ","g");cla="f"}else{$0=gensub(/([[:alnum:]]\/)([[:alpha:]])/,"\\1\\" nt["#x200b"] "\\2","g");$0=gensub(/(ations|iblioteks|lldhets)([[:alpha:]])/,shyv,"g");$0=gensub(/([b-df-hj-np-tv-xz]s)(system)/,shyv,"g")}};if("YZ"~argoval){printf "%s",$0}else{if(cla!=""){cla=" CLASS=\"" cla "\""};if(chs==3){cla=tolower(cla);stt=tolower(stt)};printf "%s","<" stt cla ">" $0 "</" stt ">"}} END{if("AZ"!~argoval){printf nt["</BODY></HTML>"]}}
run-examplegrep "something" filename.txt | cp2htmly.awk
cp2htmlz.awk (or if you like indents and dislike comments maybe cp2htmlz.awk.indents works)
purposemostly like 'cp2htmly.awk -Z', but cp2htmlz.awk doesn't add html, head or body tags — no such tag is added. cp2htmlz.awk is quite useless in an UTF-8 environment
codeBEGIN{argoval="Z";if((ARGV[1]~/^-/)&&(length(ARGV[1])==2)){chs=toupper(substr(ARGV[1],2,1));delete ARGV[1];if("CDLPSUYZ"~chs){argoval=chs}};FS=":";RS="";if(((chs=length("\000\020\303\200"))==4)&&(argoval=="P")){argoval="C"};split("euro:#xfffd:sbquo:fnof:bdquo:hellip:dagger:Dagger:circ:permil:Scaron:lsaquo:OElig:#xfffd:#x017d:#xfffd:#xfffd:lsquo:rsquo:ldquo:rdquo:bull:ndash:mdash:tilde:trade:scaron:rsaquo:oelig:#xfffd:#x017e:Yuml:nbsp:iexcl:cent:pound:curren:yen:brvbar:sect:uml:copy:ordf:laquo:not:shy:reg:macr:deg:plusmn:sup2:sup3:acute:micro:para:middot:cedil:sup1:ordm:raquo:frac14:frac12:frac34:iquest:Agrave:Aacute:Acirc:Atilde:Auml:Aring:AElig:Ccedil:Egrave:Eacute:Ecirc:Euml:Igrave:Iacute:Icirc:Iuml:ETH:Ntilde:Ograve:Oacute:Ocirc:Otilde:Ouml:times:Oslash:Ugrave:Uacute:Ucirc:Uuml:Yacute:THORN:szlig:agrave:aacute:acirc:atilde:auml:aring:aelig:ccedil:egrave:eacute:ecirc:euml:igrave:iacute:icirc:iuml:eth:ntilde:ograve:oacute:ocirc:otilde:ouml:divide:oslash:ugrave:uacute:ucirc:uuml:yacute:thorn:yuml",tkn);for(i=1;i<split("US-ASCII:UTF-8:&nbsp;:\302\240:&mdash;:\342\200\224:&shy;:\302\255:&#xfffd;:\357\277\275:&#x200b;:\342\200\213:</BODY></HTML>:</body></html>",mt);i+=2){nt[gensub(/[\\\&;]/,"","g",mt[i])]=mt[i+(chs % 2)]};delete mt;shyv="\\1\\" nt["shy"] "\\2";if("AZ"!~argoval){if((argoval=="C")||(argoval=="D")){stt="P.d {text-align: justify;}"};if(argoval=="D"){stt="H2.b {color: #f00;} " stt};if(argoval=="L"){stt="P.l {line-height: 87%; margin-bottom: 0cm; margin-left: 0.6cm; text-align: justify; text-indent: -0.6cm;} P.t {font-weight: bold;}"};if(argoval=="S"){stt="P.s {line-height: 87%; margin-bottom: 0.03cm; margin-left: 0.4cm; text-align: justify; text-indent: -0.4cm;}"};if(argoval=="U"){stt="line-height: 150%; margin-left: 1cm; page-break-inside: avoid; text-align: left; text-indent: -1cm;";stt="P.c {line-height: 120%; margin-bottom: 0.7cm; margin-left: 2cm; margin-right: 1cm; text-align: justify; } P.f {" stt "} P.p {font-size: smaller; "stt "} P.r {font-weight: bold; line-height: 150%; margin-top: 1cm; page-break-after: avoid; text-align: left; } P.u {line-height: 150%; margin-bottom: 0.5cm; text-align: justify;}"};stt="<STYLE TYPE=\"text/css\"><!-- " stt " --></STYLE></HEAD><BODY>";if(chs==3){stt=tolower(stt)};if(argoval=="P"){stt="<link rel=\"stylesheet\" type=\"text/css\" media=\"screen, projection, print\" href=\"http://www.w3.org/Talks/Tools/Slidy/slidy.css\" /><script src=\"http://www.w3.org/Talks/Tools/Slidy/slidy.js\" charset=\"utf-8\" type=\"text/javascript\"></script></head><body>"};if(chs==3){printf "%s","<?xml version=\"1.0\" encoding=\"" nt["US-ASCII"] "\"?><\041DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"sv\"><head><title></title>" stt};if(chs!=3){printf "%s","<\041DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"><HTML><HEAD><META HTTP-EQUIV=\"content-type\" CONTENT=\"text/html; charset=" nt["US-ASCII"] "\"><TITLE></TITLE>" stt}}}{cla="";if(match($0,/^<[hH][1-6]>/)){stt="H" substr($0,3,1);gsub(/^<[hH][1-6]>|<\/[hH][1-6]>$/,"")}else{stt="P";if(argoval=="C"){cla="d"}};gsub(/[\a\b\v\n\r\t ]+/," ");gsub(/^ | $/,"");if((argoval=="U")&&(($0~/^\\/)||($0~/^:; /))){cla="p";flt="cp2htmla.awk";print |& flt;close(flt,"to");flt |& getline;close(flt);sub(/^\\?: *; */,"")};for(i=0;i<33;i++){gsub(sprintf("%c",(i+127)%128),nt["#xfffd"])};if($0=="[extrarad]"){$0=nt["nbsp"]};if(chs!=3){for(i=1;i<129;i++){gsub(sprintf("%c",i+127),"\\&" tkn[i] ";")}};if($0=="[sidbrytning]"){$0="\f"};if(argoval=="D"){if((length($0)<70)&&(substr($0,1,2)!="s.")&&((substr($0,1,11)!~"^\\[sida" nt["nbsp"])||(substr($0,length($0)-2,3)!="ar]"))){stt="H2";if(index($0,nt["nbsp"] nt["mdash"] nt["nbsp"])){cla="l"}};if(NR==1){stt="H1"};if(stt=="P"){cla="d"};};if(argoval=="L"){if(substr($0,1,8)~"^s." nt["nbsp"]){cla="l"}else{cla="b"}};if(argoval=="S"){cla="s"};if((argoval=="U")&&(cla=="")){cla="u";versalrad=$0;gsub(/&#?[[:alnum:]]+;/,"",versalrad);if((versalrad==toupper(versalrad))&&(versalrad~/[[:alpha:]][[:alpha:]]/)){cla="r"};if($0~/^(&[lr][ads]quo;|\302\253|\342\200\234|\342\200\230|\302\273|\342\200\235|\342\200\231)/){cla="c"};if(($0~/^[^ ]+[\(\[]/)||($0~/\014/)){$0=gensub("(^[^ ]+)" nt["nbsp"] "(and|och)" nt["nbsp"],"\\1 \\2 ","1");$0=gensub("^([^([]+\\.,)" nt["nbsp"],"\\1 ","g");cla="f"}else{$0=gensub(/([[:alnum:]]\/)([[:alpha:]])/,"\\1\\" nt["#x200b"] "\\2","g");$0=gensub(/(ations|iblioteks|lldhets)([[:alpha:]])/,shyv,"g");$0=gensub(/([b-df-hj-np-tv-xz]s)(system)/,shyv,"g")}};if("YZ"~argoval){printf "%s",$0}else{if(cla!=""){cla=" CLASS=\"" cla "\""};if(chs==3){cla=tolower(cla);stt=tolower(stt)};printf "%s","<" stt cla ">" $0 "</" stt ">"}} END{if("AZ"!~argoval){printf nt["</BODY></HTML>"]}}
run-examplegrep "something" filename.txt | cp2htmlz.awk
deci2hex.awk (or if you like indents and dislike comments maybe deci2hex.awk.indents works)
purposeconverts decimal numbers prefixed with # to hexadecimal numbers preceeded by #x
code{soek="#[0-9]+";while(match($0,soek)){$0=substr($0,1,RSTART) "x" sprintf ("%x",substr($0,RSTART+1,RLENGTH-1)) substr($0,RSTART+RLENGTH)};print}
run-exampleecho "&#65;" | deci2hex.awk
keep4aht.awk (or if you like indents and dislike comments maybe keep4aht.awk.indents works)
purposeconverts the four HTML encoded ASCII characters in an HTML file so that files with these characters may be processed by keepital.awk later
code{gsub(/&amp;/,"&amp;");print gensub(/&(gt|lt|quot);/,"\\&amp;\\1;","g")}
run-examplecat filename.htm | mkparagr.awk | keep4aht.awk | keepital.awk > newfilename.htm
keepital.awk (or if you like indents and dislike comments maybe keepital.awk.indents works)
purposeadds italic and other marks so that an HTML file may be saved as text without loosing italics, etc. Notice that the biggest part of this filter fixes the position of some tags to skip inconsistent outputs caused by flawn input data. If you are only interested in keeping italics etc, only keep the last print-statement of the script and the definition of the variable it uses. See also keep4aht.awk.
codeBEGIN{taggar="b i U strike";split(taggar,tag);taggar="";for(i in tag){tagg[tolower(tag[i])]=ntag=toupper(tag[i]);if(ntag~/^[bB]$/){tagg["strong"]=tagg["STRONG"]=ntag};if(ntag~/^[iI]$/){tagg["em"]=tagg["EM"]=ntag};taggar=taggar "|" ntag};taggar=substr(taggar,2)} {gsub(/[\f\t ]+/," ");for(i in tagg){$0=gensub("(</?)" i ">","\\1" tagg[i] ">","g")};mtag="<SPAN LANG=\"[a-z][a-z]-[A-Z][A-Z]\">| |&nbsp;";split(taggar,tag,"|");for(i=1;i<=asort(tag);i++){if(i==1){gsub(/(&nbsp;)* +(&nbsp;)*/," ")};if(($0!=($0=gensub("<" tag[i] ">((" mtag ")?)</" tag[i] ">","\\1","g")))||($0!=($0=gensub("</" tag[i] ">((" mtag ")?)<" tag[i] ">","\\1","g")))||($0!=($0=gensub("<(" taggar ")>(" mtag ")","\\2<\\1>","g")))||($0!=($0=gensub("(" mtag ")</(" taggar ")>","</\\2>\\1","g")))||gsub(" </" tag[i] ">","</" tag[i] "> ")||gsub("<" tag[i] "> "," <" tag[i] ">")){i=0}};print gensub("<(/?(" taggar "))\\y","\\\046lt;\\1\\\046gt;&","g",$0)}
run-examplecat filename.htm | mkparagr.awk | keep4aht.awk | keepital.awk > newfilename.htm
lineshtm.awk (or if you like indents and dislike comments maybe lineshtm.awk.indents works)
purposetransforms a one-line HTML file to several lines according to the HTML formating in the file
codeBEGIN{IGNORECASE=1} {gsub(/</,"\n<");gsub(/\n<\//,"</");sub(/\n</,"<");gsub(/\n<B>/,"<B>");gsub(/\n<EM>/,"<EM>");gsub(/\n<I>/,"<I>");gsub(/\n<STRIKE>/,"<STRIKE>");gsub(/\n<TT>/,"<TT>");gsub(/\n<U>/,"<U>");gsub(/\n<HREF/,"<HREF");print}
run-examplecat filename.htm | lineshtm > newfilename.htm
mimeprep.awk (or if you like indents and dislike comments maybe mimeprep.awk.indents works)
purposereplaces quoted-printable code with 8-bits text
codeBEGIN{printf "%s", "=?iso-8859-1?q?"}{if($0~/=$/){sub(/=$/,"")}else{$0=$0 "?=\n=?iso-8859-1?q?"};printf "%s",$0}END{print"?="}
run-examplegrep "ngt" filename.txt -A 62 | mimeprep.awk | mimeqpde.awk > newfile.txt
mimeqpde.awk (or if you like indents and dislike comments maybe mimeqpde.awk.indents works)
purposereplaces quoted-printable code with 8-bits text
codeBEGIN{IGNORECASE=1; hexasiffra="0123456789abcdef";inledning="=?iso-8859-1?q?";inledlaengd=length(inledning)} {gsub(/=\?windows-1252\?q\?/,inledning);while(boerjan=index($0,inledning)){if(laengd=inledlaengd+index(substr($0,boerjan+inledlaengd),"?=")){snutt=substr($0,boerjan+inledlaengd,laengd-1-inledlaengd);gsub("_"," ",snutt);plats=index(snutt "=","=");while(plats<length(snutt)-1){if(fsiff=index(hexasiffra,substr(snutt,plats+1,1))){asiff=index(hexasiffra,substr(snutt,plats+2,1));if(asiff>0){snutt=substr(snutt,1,plats-1) sprintf("%c",fsiff*16+asiff-17) substr(snutt,plats+3)}};plats+=index(substr(snutt "=",plats+1),"=")}}$0=substr($0,1,boerjan-1) snutt substr($0,laengd+boerjan+1)};print}
run-examplecat filename.txt | mimeqpde.awk > newfile.txt
mkampxml.awk (or if you like indents and dislike comments maybe mkampxml.awk.indents works)
purposeReplaces some occurrences of the greater-than, lower-than and ampersand signs with their xml entity name, respectively. See code for details.
code{$0=gensub(/< /,"\\&lt; ","g");$0=gensub(/ & /," \\&amp; ","g");print gensub(/(([^-]-)| )>/,"\\1\\&gt;","g");}
run-exampleecho '_ -> # > # < # & # --> _' | mkampxml.awk
mkdrafts.awk (or if you like indents and dislike comments maybe mkdrafts.awk.indents works)
purposefilter to use before cp2htmlu.awk to create drafts. Compare with mkutkast.awk
code{if(($0!="[extrarad]")&&($0!~"^#pgg# ")&&($0!="[sidbrytning]")){print}}
run-examplecat filename.txt | mkdrafts.awk | cp2htmlu.awk
mkfldtsf.awk (or if you like indents and dislike comments maybe mkfldtsf.awk.indents works)
purposeprepares fld to tso
code{steg=300;omlott=100;sub(/^##kro /,"");if((length($0)>21)&&(index($0,"-")==16)){for(i=substr($0,10,6)+omlott;i<substr($0,17,6)-1;i+=steg){if(i % 10000 >5959){i+=4000-steg}else{printf "##kro " substr($0,1,9) "%06i fld \n",i}}}}
run-examplegrep "20070130i131500-150000" filename.txt | mkfldtsf.awk
mkfldutf.awk (or if you like indents and dislike comments maybe mkfldutf.awk.indents works)
purposeprepares to print fld
codeBEGIN{kroanf="##kro";ejfrad=0} {if((ejfrad==0)&&($0!~/(inst(\344|\303\244)ll[td])|( uppdelat )|( ejgj )/)&&($1==kroanf)&&(length($2)==22)){ejfrad=1;$1="";$3="";tidigast=substr($2,1,15);senast=substr($2,1,9) substr($2,17);$0=$0;sub(/^ /,"");print "<B>" $0 "</B>"};if((ejfrad==1)&&($1==kroanf)&&(length($2)==15)&&($2>=tidigast)&&($2<=senast)&&($3=="fld")){$1="";$2="";$3="";$0=$0;print}}
run-examplegrep "20070130i131500-150000" filename.txt -A 99 | mkfldutf.awk | xtraline.awk | cp2htmlc.awk
mkfnotef.awk (or if you like indents and dislike comments maybe mkfnotef.awk.indents works)
purposemoves foot notes from the bottom of a document to their place inside the text body in a document saved as HTML by OpenOffice. If the file has endnotes instead of footnotes, you may try to filter first with (but sometimes, it doesn't help!): awk '{gsub(/endnote/,"footnote");gsub(/<\/?FONT[^>]*>/,"");print}'
codeBEGIN{fotnot="sann";ejfaerdig="sann";notblockboerjan="<DIV ID=\"sdfootnote"} {if((fotnot!="")&&(substr($0,1,length(notblockboerjan))==notblockboerjan)){while((notnrplats=index($0,"anc\">"))==0){getline};notnrplats=substr($0,notnrplats+5);notnummer=substr(notnrplats,1,index(notnrplats,"<")-1);noten[notnummer]=substr(notnrplats,index(notnrplats ">",">")+1);getline;while($0!="</DIV>"){gsub(/\t/," ");noten[notnummer]=noten[notnummer] substr($0,(noten[notnummer]=="")?2:1);getline};noten[notnummer]=substr(noten[notnummer],1,length(noten[notnummer])-((noten[notnummer]~"</SPAN></P>$")?11:4));gsub("\r"," ",noten[notnummer]);gsub("\002"," ",noten[notnummer]);gsub(/<\/?[pP]( [^>]*)?>/," ",noten[notnummer]);gsub(/ +$/,"",noten[notnummer]);gsub(/^ +/,"",noten[notnummer]);gsub(/^&nbsp;/,"",noten[notnummer]);gsub(/<BR>/,"",noten[notnummer])}else{if(fotnot){while(substr($0,1,21)!="<!DOCTYPE HTML PUBLIC"){getline}};fotnot="";while(plats=match($0,/(<SUP>)?(<SPAN[^>]*>)?(<FONT[^>]*>)?<A CLASS=\"sdfootnoteanc\" NAME=\"sdfootnote/)) {notnummer=0+substr($0,plats+RLENGTH);match(substr($0 " ",plats) " ",/<\/SUP><\/A>(<\/FONT>)?(<\/SPAN>)?(<\/SUP>)?/);$0=substr($0,1,plats-1) " [not&nbsp;" notnummer ": " noten[notnummer] "]" substr($0 " ",plats+RSTART+RLENGTH-1)};gsub(/ +/," ");if((ejfaerdig)&&(substr($0,1,length(notblockboerjan))==notblockboerjan)){print "</BODY></HTML>";ejfaerdig=""}else{if(ejfaerdig){print}}}}
run-example(grep '<DIV ID=\"sdfootnote' documentConvertedToHtmlWithOpenOffice.htm -A 99999; cat documentConvertedToHtmlWithOpenOffice.htm) | mkfnotef.awk > documentWithFootNotesInsideTheText.htm
mkgpsild.awk (or if you like indents and dislike comments maybe mkgpsild.awk.indents works)
purposeInput is a first line with what should be focused on and remaining lines have coordinates in their sixth and seventh fields. Output is the command to make a map from gps coordinates, which lets you modify the command before it is executed.
codeBEGIN{marginal=99;skala=1;vinkel=0} {if(NR==1){fokus=$1}else{if(($3=="gps")&&($8!="")){if(koo==""){koo=$4;xmax=xmin=$6;ymax=ymin=$7};if($4==koo){i++;punktx[i]=$6;punkty[i]=$7;punktf[i]=($0~fokus);punktt[i]=gensub(/ .*/,"","1",substr($0,56));if($6>xmax){xmax=$6};if($7>ymax){ymax=$7};if($6<xmin){xmin=$6};if($7<ymin){ymin=$7}}}}} END{printf "convert -size %ix%i xc:transparent ",skala*(xmax-xmin+1+2*marginal),skala*(ymax-ymin+1+2*marginal);for(i in punktx){platsx=skala*(punktx[i]-xmin+marginal);platsy=skala*(ymax-punkty[i]+marginal);printf " -fill " ((punktf[i])?"red":"black") " -draw \047point %i,%i\047",platsx,platsy;if(fokus=="text"){printf " -annotate %ix%i+%i+%i \047%s\047",vinkel,vinkel,platsx,platsy,punktt[i]}};print " -trim +repage gps" koo fokus strftime("%s") ".png"}
run-examplegrep "something" | mkgpsild.awk
mkkroutf.awk (or if you like indents and dislike comments maybe mkkroutf.awk.indents works)
purposeprepares to print kro
codeBEGIN{kroanf="##kro"} {if(($1==kroanf)&&($0!~/inst(\344|\303\244)ll[td]/)){sub(/00-/,"-",$2);sub(/00-/,"-",$2);sub(/00$/,"",$2);sub(/00$/,"",$2);sub(/\.\.-/,"-",$2);sub(/\.\.-/,"-",$2);sub(/\.\.-/,"-",$2);sub(/\.\.$/,"",$2);sub(/\.\.$/,"",$2);sub(/\.\.$/,"",$2);sub(/\?\?-/,"-",$2);sub(/\?\?-/,"-",$2);sub(/\?\?-/,"-",$2);sub(/\?\?$/,"",$2);sub(/\?\?$/,"",$2);sub(/\?\?$/,"",$2);if($3=="fld"){$0=substr($0,25)}else{if(($4!="")&&(length($3)==3)){$3=""};sub(/sal:/,"",$4);$0=substr($0,(substr($0,11,1)=="0")?12:11)};gsub(" +"," ");if(length($0)>2){print}}}
run-examplegrep "something" filename.txt | mkkroutf
mkliacif.awk (or if you like indents and dislike comments maybe mkliacif.awk.indents works)
purposeconverts lia to line-quotations
codeBEGIN{si="-9xvil";aat="[1-2][0-9][0-9][0-9][ab]?";nbs="\240";raq="\273";if(length(sprintf("%c%c%c%c",0,16,216,128))==3){nbs="\302" nbs;raq="\302" raq}}{if($1=="##lia"){if($3==",,,"){gsub(/ (((\227|\342\200\224) )|\(|\[).*/,"");foer=$NF nbs};if(($4==",,,")&&(foer!="")){et=($0~/\(tillsammans med /)?"et" nbs "al." nbs:"";aar=gensub(".* \\[(" aat ")\\].*","\\1","1");if(aar==$0){aar=gensub(".* \\((SOU" nbs ")?(" aat ")(:[1-9][0-9]*)?\\).*","\\2","1")};if((aar==$0)&&(foer=="Prop." nbs)){aar=gensub(".*( |" nbs ")(" aat "/(" aat "|[0-9][0-9]):[1-9][0-9]*) .*","\\2","1")};aar=et ((aar==$0)?":":aar ", ")}};if($4=="0"){$4=$5="";$6=$6 " 0";$0=$0};if($0!~/,,,/){sid=gensub("0*([1" si "][0" si "]*-?)(0*([1" si "][0" si "]*))?","\\1\\3","g",$4);if(sid==9999){sid="utan nummer"};$1=$2=$3=$4=$5="";sub(/^ +/,"");print raq $0 raq " (" foer aar "s." nbs sid ")"}}
run-exampleleta 'butju' lia ngt | mkliacif.awk | xtraline.awk > outfile.txt
mkliatso.awk (or if you like indents and dislike comments maybe mkliatso.awk.indents works)
purposebuilts lia template from stdin
codeBEGIN{anf="##lia ";sta=3} function finx(r){ler=((r==9999)||(r==0))?2:1;for(j=1;j<10^ler;j++){printf "%s%s %s %0" len "i %0" ler "i \n",anf,$1,$2,r,j}}{len=length($3);for(i=4;i<=NF;i++){if((length($i)>len)&&($i!=9999)){len=length($i)}};if((NF>2)&&($1~/^[[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]]$/)&&($2~/^[[:alnum:]]+/)&&(($3=="0")||($3!=0))){print anf $1 " ,,, \n" anf $1 " " $2 " ,,, ";if((($4>0)&&($3<=$4))){sta=5;for(i=$3;i<=$4;i++){if(i!=$3){printf "%s%s %s %0" len "i-%0" len "i 1 \n",anf,$1,$2,i-1,i};finx(i)}};for(i=sta;i<=NF;i++){finx($i)};print "ksm ; checklia " $1 " " $2 " ; sparatid tsorall.awk"}}
run-exampleecho "foerf tacc 52 55 150 102 9999 0" | mkliatso.awk | env LC_COLLATE=POSIX sort | uniq
mkliautf.awk (or if you like indents and dislike comments maybe mkliautf.awk.indents works)
purposeprepares to print lia
codeBEGIN{liaanf="##lia";nbs="\240";if(length(sprintf("%c%c%c%c",0,16,216,128))==3){nbs="\302" nbs}} {if($1==liaanf){gsub(/\r/,"");if(($4~"^[[:digit:]]+-?[[:digit:]]")&&($3!=",,,")){$5="";while($4~/-0/){gsub(/-0/,"-",$4)}};if($3==",,,"){et=($0~/\(tillsammans med /)?"et" nbs "al.:":":";$0=substr($0,17);foerf=gensub(/( (((\227|\342\200\224) )|\(|\[).*)?$/,":","1");sub(/\.:/,".",foerf);$0=""};if($4==",,,"){$1="";$2="";$3="";$4=""};if($1==liaanf){$1=(($2=="intme")?"r":"s") "." nbs;$2=$3=$5=""};$0=gensub("^([rsv]\\." nbs ")[ 0]+","\\1","1",$0);gsub(/ +/," ");radboerjan=(substr($0,1,1)==" ")?((NR>2)?"\n\n":"") foerf:"\n";$0=gensub("^([rsv]\\." nbs ")9999 ","\\1? ","1",$0);if($0~"^[rsv]\\." nbs){sub(/ /,nbs)};if(($2!="")&&(($2!="[ej")||($3!="antecknat]"))){printf(radboerjan "%s",$0)}}}
run-examplegrep "something" filename.txt | mkliautf
mkparagr.awk (or if you like indents and dislike comments maybe mkparagr.awk.indents works)
purposesuppresses line-breaks except for empty lines in a text file. Compare with xtraline.awk
codeBEGIN{FS="\n";RS="";ejfrad=0}{gsub (/\n/," ");if(ejfrad){print ""};printf "%s", $0;ejfrad=1}
run-examplecat filename.txt | mkparagr.awk > newfilename.txt
mkprglis.awk (or if you like indents and dislike comments maybe mkprglis.awk.indents works)
purposeConverts perl, C, java or python code to dbk:article to be easily included in docbook documents with xi:include
codeBEGIN{prgm=1;pl="perl";comm="#[^\0413].*";calo=" *#([-a-zA-Z]+)[^<]*"}{sub(/[\t\r\f ]+$/,"");if(NR==1){if($0~/^(\/\*|\043include )/){pl="C";comm="^(/\\*.*| \\*.*)$|//.*";calo="TODO not yet defined"}else{if($0~/^(\/\/|import )/){pl="java";comm="^(/\\*.*| \\*.*)$|//.*";calo="TODO not yet defined"}else{if($0~/python/){pl="python";calo = "TODO not yet defined"}}};printf "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<dbk:article xmlns:dbk=\"http://docbook.org/ns/docbook\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://docbook.org/ns/docbook docbook5_docbook.xsd\"><dbk:programlisting language=\"" pl "\" xml:id=\"" pl length($0) "\" xml:space=\"preserve\">"};if(prgm){if($0~/^__(END|DATA)__$/){prgm=0}else{gsub(/&/,"&amp;");gsub(/</,"\\&lt;");gsub(/>/,"\\&gt;");gsub(/\"/,"\\&quot;");gsub(/\047/,"\\&\04339;");$0=gensub(comm,"<dbk:lineannotation>&</dbk:lineannotation>","1");$0=gensub(calo,"& <dbk:co xml:id=\"\\1-co\" linkends=\"\\1\" />","1");printf "%s%3i %s",((NR>1)?"\n":""),NR,$0}}}END{print "</dbk:programlisting></dbk:article>"}
run-examplemkprglis.awk myProg.pl > myProg_pl.xml
mkrauhtm.awk (or if you like indents and dislike comments maybe mkrauhtm.awk.indents works)
purposebuilts htm-files for rara
codeBEGIN{svp="/home/s/saasha/public_html/htmsnutt/";RS="";bredd=400;alte="rarasida";htmsnutt=svp "htmlanfa.txt";getline webbfilanf < htmsnutt;close (htmsnutt);htmsnutt=svp "htmltype.txt";getline webbfil < htmsnutt;close (htmsnutt);webbfilanf=webbfilanf webbfil "<TITLE>En alternativ RARA-sida</TITLE></HEAD><BODY><H1 ALIGN=\"center\">En alternativ RARA-sida</H1><TABLE ALIGN=\"center\" BORDER=\"0\" CELLPADDING=\"0\" CELLSPACING=\"0\"><TR><TD COLSPAN=\"2\" ALIGN=\"center\">Boken visas uppslagsvis</TD></TR><TR><TD ALIGN=\"right\">";htmsnutt=svp "htmlfoot.txt";getline webbfilsl < htmsnutt;close (htmsnutt);webbfilsl="</TD></TR></TABLE>" webbfilsl "</BODY></HTML>"}{for(sn=1;sn<NF+1;sn+=2){if(sn!=1){si[sn]="<A HREF=\"" gensub(/\.(png|jpg)/,".htm","1",$(sn-((sn==2)?1:2))) "\">f&ouml;reg&aring;ende uppslag</A> &mdash;"};si[sn]=si[sn] "</TD><TD ALIGN=\"left\">";if((sn==1)&&(NF>1)){si[sn]=si[sn] "&mdash; <A HREF=\"" gensub(/\.(png|jpg)/,".htm","1",$2) "\">n&auml;sta uppslag</A>"}else{if($(sn+2)!=""){si[sn]=si[sn] "&mdash; <A HREF=\"" gensub(/\.(png|jpg)/,".htm","1",$(sn+2)) "\">n&auml;sta uppslag</A>"}};si[sn]=si[sn] "</TD></TR><TR><TD WIDTH=\"" bredd "\">";if(sn>1){si[sn]=si[sn] "<IMG ALIGN=\"right\" SRC=\"" $sn "\" ALT=\"" alte "\">"};si[sn]=si[sn] "</TD><TD WIDTH=\"" bredd "\">";if(sn==1){si[sn]=si[sn] "<IMG ALIGN=\"left\" SRC=\"" $1 "\" ALT=\"" alte "\">"}else{if(sn<NF){si[sn]=si[sn] "<IMG ALIGN=\"left\" SRC=\"" $(sn+1) "\" ALT=\"" alte "\">"}};hf=gensub(/\.(png|jpg)/,".htm","1",$sn);filut=webbfilanf si[sn] webbfilsl;print "echo \"" gensub(/\"/,"\\\\\"","g",filut) "\" > " hf;if(sn==1){print "\\ln -s " hf " index.htm"};if(sn==1){sn=0}}}
run-examplels *.jpg | mkrauhtm.awk | sh
mkravhtm.awk (or if you like indents and dislike comments maybe mkravhtm.awk.indents works)
purposean improvement of mkrauhtm.awk
codeBEGIN{svp="/home/s/saasha/public_html/htmsnutt/";RS="";bredd=400;alte="rarasida";lsv="/~saasha/cgi-bin/acc_download.cgi/rara/";htmsnutt=svp "htmlanfa.txt";getline webbfilanf < htmsnutt;close (htmsnutt);htmsnutt=svp "htmltype.txt";getline webbfil < htmsnutt;close (htmsnutt);webbfilanf=webbfilanf webbfil "<TITLE>En alternativ RARA-sida</TITLE></HEAD><BODY><H1 ALIGN=\"CENTER\">En alternativ RARA-sida</H1><TABLE ALIGN=\"CENTER\" BORDER=\"0\" CELLPADDING=\"0\" CELLSPACING=\"0\"><TR><TD COLSPAN=\"2\" ALIGN=\"CENTER\">Boken visas uppslagsvis</TD></TR><TR><TD ALIGN=\"RIGHT\">";htmsnutt=svp "htmlfoot.txt";getline webbfilsl < htmsnutt;close (htmsnutt);webbfilsl="</TD></TR></TABLE>" webbfilsl "</BODY></HTML>"} function finx(nro){gensub(/.htm/,nro ".htm","1",hf)} function umf(filutt,finr){print "echo \"" gensub(/\"/,"\\\\\"","g",filutt) "\" > " gensub(/.htm/,finr ".htm","1",hf)} {for(sn=1;sn<NF+1;sn+=2){if(sn!=1){si[sn]="<A HREF=\"" lsv gensub(/\.(png|jpg)/,".htm","1",$(sn-((sn==2)?1:2))) "\">f&ouml;reg&aring;ende uppslag</A> &mdash;"};si[sn]=si[sn] "</TD><TD ALIGN=\"LEFT\">";if((sn==1)&&(NF>1)){si[sn]=si[sn] "&mdash; <A HREF=\"" lsv gensub(/\.(png|jpg)/,".htm","1",$2) "\">n&auml;sta uppslag</A>"}else{if($(sn+2)!=""){si[sn]=si[sn] "&mdash; <A HREF=\"" lsv gensub(/\.(png|jpg)/,".htm","1",$(sn+2)) "\">n&auml;sta uppslag</A>"}};hf=gensub(/\.(png|jpg)/,".htm","1",$sn);haeh="identify "$sn;haeh | getline hoejd;close(haeh);hoejd=substr(hoejd,index(hoejd," ")+1);hoejd=substr(hoejd,index(hoejd,"x")+1);hoejd=substr(hoejd,1,index(hoejd," ")-1);si[sn]=si[sn] "</TD></TR><TR HEIGHT=\"" hoejd "\"><TD WIDTH=\"" bredd "\">";if(sn>1){si[sn]=si[sn] "<IMG ALIGN=\"RIGHT\" SRC=\"" $sn "\" ALT=\"" alte "\">"};si[sn]=si[sn] "</TD><TD WIDTH=\"" bredd "\">";if(sn==1){si[sn]=si[sn] "<IMG ALIGN=\"LEFT\" SRC=\"" $1 "\" ALT=\"" alte "\">"}else{if(sn<NF){si[sn]=si[sn] "<IMG ALIGN=\"LEFT\" SRC=\"" $(sn+1) "\" ALT=\"" alte "\">"}};filut=webbfilanf si[sn] ((sn==NF)?"":("</TD></TR><TR><TD COLSPAN=\"2\"><P ALIGN=\"CENTER\"><A HREF=\"" lsv gensub(/.htm/,"h9.htm","1",hf) "\">V&auml;nd sidan f&ouml;r att bl&auml;ddra i boken</A></P>")) webbfilsl;umf(filut,"");if(sn==1){print "\\ln -s " hf " index.htm"};if(sn!=111){bkg=((sn>1)?$(sn-2):"");sub(/TD WIDTH=/,"TD BACKGROUND=\"" bkg "\" WIDTH=",filut);sub(/TD WIDTH=/,"TD BACKGROUND=\"" $(sn+((sn==1)?2:3)) "\" WIDTH=",filut);sub("TD BACKGROUND=\042" $0 "\042 WIDTH=","TD WIDTH=",filut);sub(/TD BACKGROUND=\"\" WIDTH=/,"TD WIDTH=",filut);gsub(/HEAD><META/,"HEAD><META http-equiv=\"refresh\" content=\"1; url=\"><META",filut);steg=3;for(vink=9;vink>-1;vink-=steg){if(sn!=44){if(vink>0){umf(gensub("IMG ALIGN=\042LEFT","IMG WIDTH=\"" vink "0%\" HEIGHT=\"" hoejd "\" ALIGN=\042LEFT","1",gensub(/1; url=/,"1; url=" gensub(/.htm/,"h" (vink-3) ".htm","1",hf) finx("h" vink-3),"1",filut)),"h" vink)}else{umf(gensub("IMG ALIGN=\042LEFT","IMG WIDTH=\"" vink "0%\" HEIGHT=\"" hoejd "\" ALIGN=\042LEFT","1",gensub(/1; url=/,"1; url=" gensub(/.htm/,"v" steg ".htm","1",gensub(/\.(png|jpg)/,".htm","1",$(sn+((sn==1)?1:2)) )),"1",filut)),"h" vink)}}};for(vink=9;vink>0;vink-=steg){if(sn!=66){if(vink<9){umf(gensub("IMG ALIGN=\042RIGHT","IMG WIDTH=\"" vink "0%\" HEIGHT=\"" hoejd "\" ALIGN=\042RIGHT","1",gensub(/1; url=/,"1; url=" gensub(/.htm/,"v" (vink+3) ".htm","1",hf) finx("v" vink+3),"1",filut)),"v" vink)}else{umf(gensub("IMG ALIGN=\042RIGHT","IMG WIDTH=\"" vink "0%\" HEIGHT=\"" hoejd "\" ALIGN=\042RIGHT","1",gensub(/1; url=/,"1; url=" gensub(/\.(png|jpg)/,".htm","1",$sn),"1",filut)),"v" vink)}}};};if(sn==1){sn=0}}}
run-examplels *.png | mkravhtm.awk | sh
mkutkast.awk (or if you like indents and dislike comments maybe mkutkast.awk.indents works)
purposefilter to use before cp2htmlu.awk to create anti-drafts. Compare with mkdrafts.awk
code{if(($0!="[extrarad]")&&($0!="[sidbrytning]")){print}}
run-examplegrep "^#pgg# " -C 3 filename.txt | mkutkast.awk | cp2htmlu.awk
ppmshort.awk (or if you like indents and dislike comments maybe ppmshort.awk.indents works)
purposeshortens .ppm-files to one line without comments
code{if($0!~/^#/){rad=rad $0 " "}}END{$0=rad;gsub(/ +/," ");gsub(/^ | $/,"");print}
run-exampleconvert picturename -compress none ppm:- | ppmshort.awk
renpajek.awk (or if you like indents and dislike comments maybe renpajek.awk.indents works)
purposecleans pajek SVG
code{gsub(/\r/,"");gsub(/<g transform="scale\(1\)">/,"<g>");if(NR==1){$0=$0 "<\041DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\" >"};if((NR>1)&&(NR<5)&&($1=="<\041--")){$0=""};if(NR==5){$0=gensub(/^(<svg )xml:space="preserve(" width="([0-9]+)" height="([0-9]+)" )(xmlns="http:\/\/www.w3.org\/2000\/svg" )(xmlns:xlink="http:\/\/www.w3.org\/1999\/xlink">)$/,"\\1\\5viewBox=\"0 0 \\3 \\4\" \\6<rect x=\"1\" y=\"1\\2fill=\"white\"/>","1")};if((NR>6)&&(NR<54)){$0=""};if($0~/^<ellipse cx="/){sub(/fill:rgb\(255,255,255/,"fill:rgb(255,0,0")};if($0!=""){print}}
rensaocr.awk (or if you like indents and dislike comments maybe rensaocr.awk.indents works)
purposecleans OCR output
codeBEGIN{ant=split("otke~othe~tke~the",tkn2,"~")/2;for(i=1;i<=ant;i++){tkna[tkn2[2*i-1]]=tkn2[2*i]};ant=split("fFs~ffs",tkn2,"~")/2;for(i=1;i<=ant;i++){tknm[tkn2[2*i-1]]=tkn2[2*i]};ant=split("contray~contrary~galley~gallery~geomety~geometry~libray~library~Libray~Library~Mew~New~narure~nature~novely~novelty~nrom~from~ofa~of a~theoy~theory~situng~sitting~",tkn2,"~")/2;for(i=1;i<=ant;i++){tkno[tkn2[2*i-1]]=tkn2[2*i]};ant=split("ajve~ative~aliy~ality~cUon~ction~guiy~guity~iliy~ility~iviy~ivity~nuiy~nuity",tkn2,"~")/2;for(i=1;i<=ant;i++){tkns[tkn2[2*i-1]]=tkn2[2*i]};utrad=""}{for(i in tkna){$0=gensub("\\y" i,tkna[i],"g")};for(i in tknm){$0=gensub(i,tknm[i],"g")};for(i in tkno){$0=gensub("\\y" i "\\y",tkno[i],"g")};for(i in tkns){$0=gensub(i "\\y",tkns[i],"g")};utrad=utrad $0 "\n"}END{printf "%s", utrad}
run-examplegocr -f ISO8859_1 -a 80 -i filename.png | rensaocr.awk
rensasvg.awk (or if you like indents and dislike comments maybe rensasvg.awk.indents works)
purposedrastically cleans SVG
code{if(NR==1){print};if(f){gsub(/^.*--.*$|scale\(1 1\) rotate\(0\) | (id|class)="[^"]+"|<title>[^<]+<\/title>|\r/,"");printf "%s",gensub(/\.[[:digit:]]+([" ,A-Za-z()])/,"\\1","g",$0)};if($0~/^<svg /){sub(/ width="[^"]*" height="[^"]*"/,"");printf "%s",$0;f=1}}
rensaxon.awk (or if you like indents and dislike comments maybe rensaxon.awk.indents works)
purposecleans input xhtml-files to be able to be transformed with saxon using xsl-transformations (xslt)
code{gsub(/\r|<\041DOCTYPE [^>]*>| xmlns="http:\/\/www.w3.org\/1999\/xhtml"/,"");if($0!=""){print}}
scal2sur.awk (or if you like indents and dislike comments maybe scal2sur.awk.indents works)
purposeReplaces hexadecimal scalars in XML-encoded code points (over U+FFFF) to decimal XML-encoded surrogate pairs, which is useful to built input to UTF-16 based software
code{while(match($0,/&#[xX]0?[^0]0?[[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]];/)){sca=strtonum("0" substr($0,RSTART+2,RLENGTH-3));$0=substr($0,1,RSTART+1) (int(sca/1024)+55232) ";&#" (56320+(sca%1024)) ";" substr($0,RSTART+RLENGTH)};print}
run-exampleecho "m_n&#x10ffff;n_m" | scal2sur.awk
su2swlu8.awk (or if you like indents and dislike comments maybe su2swlu8.awk.indents works)
purposeConverts a field-separated file containing fields with swedish sign language (iso_639-3 language code swl) text saved using the old (ASCII-based) SU fonts to a plain field-less text file. The default field separator is the at-sign (@), but this can easily be changed at the begining of the filter (variable fsep). In each input line, the fields are assumed to be grouped by three. The first field of each field group is assumed to be non-swl text and is output as is (except for sequences of three field separator characters, which are outputted as one plain occurrence of this character), the second field is assumed to be SU-encoded with SU's first font and the second field with SU's second font. In a UTF-8 environment, the input is assumed to be UTF-8 and the output is plain UTF-8, using the PUA for swl text. In a 8-bit environment, the input is assumed to be ISO_8859-1 or CP-1252 encoded and the output has the same encoding as the input, where swl characters are encoded as XML encoded characcter references in the PUA. A sans-serif font called swl_sans.ttf is a PUA font for swl available at http://www.acc.umu.se/~saasha/typsnitt/#swl_sans
codeBEGIN{fsep="@";if((chs=length(sprintf("%c%c%c%c",0,16,216,128)))==3){c[1,sprintf("%c%c",194,184)]=c[2,sprintf("%c%c",195,165)]=sprintf("%c%c%c%c",244,140,165,159);c[1,sprintf("%c%c",194,163)]=sprintf("%c%c%c%c",244,140,165,158);c[2,sprintf("%c%c",194,163)]=sprintf("%c%c%c%c",244,140,166,145);}else{c[1,sprintf("%c",184)]=c[2,sprintf("%c",229)]="&#x10c95f;";c[1,sprintf("%c",163)]="&#x10c95e;";c[2,sprintf("%c",163)]="&#x10c991;"};b=split(sprintf ("%c",33) "@10c900@10c960@&@10c905@10c964@#@10c902@10c961@$@10c903@10c962@%@10c904@10c963@)@10c906@10c91f@*@10c907@10c965@+@10c908@10c966@,@10c909@10c967@-@10c90a@10c968@.@10c90b@10c969@/@10c90c@10c96a@0@10c911@10c96b@1@10c912@10c96c@2@10c913@10c96d@3@10c914@10c96e@4@10c915@10c96f@5@10c916@10c970@6@10c917@10c971@7@10c918@10c972@8@10c919@10c973@9@10c91a@10c974@:@10c91b@10c975@;@10c91c@10c976@=@10c91f@@?@10c921@10c979@A@10c922@10c97a@B@10c923@10c97b@C@10c924@10c97c@D@10c925@10c97d@E@10c926@10c97e@F@10c927@10c97f@G@10c928@10c980@H@10c929@10c982@I@10c92a@10c942@J@10c92b@10c923@K@10c92c@10c924@L@10c92d@10c946@>@10c920@10c978@M@10c92e@10c947@N@10c92f@10c927@O@10c930@10c949@P@10c931@10c94a@Q@10c932@10c94b@R@10c933@10c92b@S@10c934@10c94c@T@10c935@10c92e@U@10c936@10c93d@V@10c937@10c92f@W@10c938@10c931@X@10c939@10c952@Y@10c93a@10c95f@<@10c91e@10c977@Z@10c93b@10c933@[@10c93c@10c983@]@10c93e@10c984@_@10c940@10c985@a@10c942@10c986@b@10c943@10c987@c@10c944@10c988@d@10c945@10c989@e@10c946@10c98a@f@10c947@10c98b@g@10c948@10c98c@h@10c949@10c922@i@10c94a@10c943@j@10c94b@10c944@k@10c94c@10c945@l@10c90e@10c925@m@10c94e@10c926@n@10c93d@10c948@o@10c93f@10c928@p@10c951@10c929@q@10c952@10c92a@r@10c953@10c92c@s@10c954@10c90e@t@10c955@10c92d@u@10c956@10c94e@v@10c957@10c93f@w@10c958@10c930@x@10c959@10c951@y@10c95a@10c932@z@10c95b@@~@10c95c@10c98e",a,"@");for(i=1;i<b;i+=3){for(j=1;j<3;j++){c[j,a[i]]=((chs==3)?(sprintf("%c%c%c%c",244,140,128+int((strtonum("0x" a[i+j]) % 4096)/64),(strtonum("0x" a[i+j]) % 64)+128)):("&#x" a[i+j] ";"))}}}{gsub(/\r/,"");for(i=1;i<=length($0);i++){m=substr($0,i,1);if(m==fsep){j++;j%=3;k++;if((k%=3)==0){n=n fsep}}else{k=0;n=n ((c[j,m]=="")?(m):(c[j,m]))}}}END{print n}
run-exampleecho "bla@b¸£@l@@@Yå£@bla@@@bla" | su2swlu8.awk
text2bra.awk (or if you like indents and dislike comments maybe text2bra.awk.indents works)
purposea text2braille filter converting a text file to a UTF-8 encoded braille unicode character file according to the swedish braille standard. If the environment is not UTF-8, then the input file is assumed to be a CP-1252 file and therefore, it works with ISO_8859-1 files too.
codeBEGIN{ant=split("\n~\n~ ~\342\240\200~!~\342\240\226~%~\342\240\217\342\240\211~(~\342\240\246~)~\342\240\264~*~\342\240\224~,~\342\240\202~-~\342\240\244~.~\342\240\204~/~\342\240\214~0~\342\240\274~1~\342\240\241~2~\342\240\243~3~\342\240\251~4~\342\240\271~5~\342\240\261~6~\342\240\253~7~\342\240\273~8~\342\240\263~9~\342\240\252~:~\342\240\222~;~\342\240\206~?~\342\240\242~A~\342\241\201~B~\342\241\203~C~\342\241\211~D~\342\241\231~E~\342\241\221~F~\342\241\213~G~\342\241\233~H~\342\241\223~I~\342\241\212~J~\342\241\232~K~\342\241\205~L~\342\241\207~M~\342\241\215~N~\342\241\235~O~\342\241\225~P~\342\241\217~Q~\342\241\237~R~\342\241\227~S~\342\241\216~T~\342\241\236~U~\342\241\245~V~\342\241\247~W~\342\241\272~X~\342\241\255~Y~\342\241\275~Z~\342\241\265~a~\342\240\201~b~\342\240\203~c~\342\240\211~d~\342\240\231~e~\342\240\221~f~\342\240\213~g~\342\240\233~h~\342\240\223~i~\342\240\212~j~\342\240\232~k~\342\240\250~l~\342\240\207~m~\342\240\215~n~\342\240\235~o~\342\240\225~p~\342\240\217~q~\342\240\237~r~\342\240\227~s~\342\240\216~t~\342\240\236~u~\342\240\245~v~\342\240\247~w~\342\240\272~x~\342\240\255~y~\342\240\275~z~\342\240\265~\265~\342\240\215\342\240\211~\304~\342\241\234~\305~\342\241\241~\310~\342\241\256~\311~\342\241\277~\326~\342\241\252~\334~\342\241\263~\340~\342\240\267~\344~\342\240\234~\345~\342\240\241~\350~\342\240\256~\351~\342\240\277~\366~\342\240\252~\374~\342\240\263~\211~\342\240\217\342\240\215~\302\265~\342\240\215\342\240\211~\303\204~\342\241\234~\303\205~\342\241\241~\303\210~\342\241\256~\303\211~\342\241\277~\303\226~\342\241\252~\303\234~\342\241\263~\303\240~\342\240\267~\303\244~\342\240\234~\303\245~\342\240\241~\303\250~\342\240\256~\303\251~\342\240\277~\303\266~\342\240\252~\303\274~\342\240\263~\342\200\260~\342\240\217\342\240\215",tkn2,"~")/2;for(i=1;i<=ant;i++){tkn[tkn2[2*i-1]]=tkn2[2*i]};utrad=""}{for(i=1;i<=length($0);i++){utrad=utrad tkn[substr($0,i,1)]}}END{printf "%s", utrad}
run-exampleecho " !()*,-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | text2bra.awk
winwidth.awk (or if you like indents and dislike comments maybe winwidth.awk.indents works)
purposecut lines longer than the window's width
codeBEGIN{haemtabredd="echo $COLUMNS";haemtabredd | getline bredden;close(haemtabredd)} {gsub(/\205/,".");print substr($0,1,bredden)}
run-examplecat filename.txt | winwidth.awk
ws2pajek.awk (or if you like indents and dislike comments maybe ws2pajek.awk.indents works)
purposefilter based on journals and cited authors which builts pajek files from WoS files
codeBEGIN{r="\\r\\n"}{gsub(/\r/,"");if($1=="CR"){$1=" ";cr=1};if((cr==1)&&($0~/^ /)){f=gensub(/^ *([^,]+),.*/,"\\1","1",$0);cf[f]++;li=li f ",";li}else{cr=0};if($1=="J9"){h=substr($0,4);ci[h]=ci[h] li;li="";t[h]++}}END{for(i in cf){j++;v=v j " \"" i "\"" r;fc[i]=j;cl=cl "1" r;e=e cf[i] r};for(i in ci){j++;e=e t[i] r;v=v j " \"" i "\"" r;sub(/,$/,"",ci[i]);for(k=split(ci[i],il,",");k>0;k--){a[j " " fc[il[k]]]++};cl=cl "2" r};for(i in a){ar=ar i " " a[i] r};j="echo -n \047*Vertices " j r;k="\047 > fileToPajek.";print j v "*Arcs" r ar k "net";print j cl k "clu";print j e k "vec"}
run-examplecat filename.txt | ws2pajek.awk | sh
xmlliner.awk (or if you like indents and dislike comments maybe xmlliner.awk.indents works)
purposeremoves line breaks and indentations in an XML file. Strings are assumed to be of type xs:token. See also xmlsvans.awk
code{gsub(/[\f\r\t ]+/," ");sub(/ $/,"");printf "%s",gensub(/^ </,"<","1")}
run-examplecat filename.xml | xmlliner.awk > newfilename.xml
xmlsvans.awk (or if you like indents and dislike comments maybe xmlsvans.awk.indents works)
purposeReminding of the lisp indent style, removes some useless line breaks and indentations in a traditionally indented XML file, thus (slightly) reducing the number of lines and the size of the file while keeping the file easy to read for humans. Also reduces multiple occurrence of the asterix character within comments. Note that xmlsvans.awk assumes that every xs:string is an xs:token according to the (not so intuitive) definition w3c has for the concept of xs:token. See also xmlliner.awk
code{gsub(/\r/,"");gsub(/ +$/,"");$0=gensub(/^[\t ]+(((<[\/\041]|[^ <]).*)[^ ]|)[\t ]*$/,"\\1","g");$0=gensub(/([^ \t])[\t ]+/,"\\1 ","g");do{a=$0;$0=gensub(/(<\041-- (\*?[^*>])*)(\*)\*+/,"\\1*","g",gensub(/(\*+)(\*(-?[^-])* -->)/,"\\2","g",a))}while($0!=a);printf "%s%s",(($0~/^(<\/.*|)$/||NR==1)?"":(($0~/^[^ <\]]/)?fdr:("\n"))),$0;fdr=(($0!~/^[^>]*>$/)?" ":"")}
run-exampleenv XMLLINT_INDENT=" "" "" "" " xmllint --format --nsclean --postvalid filename.xml | xmlsvans.awk > newfilename.xml
xtraline.awk (or if you like indents and dislike comments maybe xtraline.awk.indents works)
purposeadds an empty line when lacking between each line in a text file. Compare with mkparagr.awk
codeBEGIN{fdrad=0} {sub(/^[ \t\r]+$/,"");if($0!=""){if(fdrad){print "\n"}else{fdrad=1};printf "%s",$0}}
run-examplecat filename.txt | xtraline.awk > newfilename.txt

Valid HTML 4.01 TransitionalTested with WicklineBest viewed with any browserUbuntu