"readme-TXT" for "40H-TXT-2024" Tools & Utilities "40H-TXT" is a collection of 14 Text file command-line utility tools by Norman Pollock (USA) for use in "Windows" or in a "Java Runtime Environment". "40H-TXT" tools and instructions are copyright (c) 2011-24 by Norman Pollock. All rights reserved. "40H-TXT" can be freely distributed and used for non-commercial purposes as long as "40H-TXT" is kept intact and no changes are made to any files. Any commercial distribution or use of "40H-TXT", or any of its files, is strictly forbidden. Disclaimer: The "40H-TXT" package is distributed "as is". No warranty or guarantee of any kind is expressed or implied. The user assumes all risks of usage. The author is not responsible for any damage or losses of any kind caused by the use or misuse of the tools or this "readme-TXT" file. Current download sites for "40H Tools": 40hchess.epizy.com nk-qy.info/40h (Thanks to Frank Quisinsky) Please bookmark. Contact: Norman Pollock rc1242@yahoo.com =================================================================== FAQ: 1. How did the "40H-TXT" tools get to be on a chess page? The "40H" tools were originally written for use with chess files, specifically "pgn" and "epd" files. I then observed that several "epd" tools could be adapted for use with general text files. 2. Do the "40H" tools require Internet access? Only to download the tools. 3. Are the "40H" tools portable? Yes. You can even save them on a flashdrive and use them on different PC's. They are single files and no "dlls" are required. They do not require a "setup" tool and they do not affect the registry. They can be removed by simple deletion. 4. Are the "40H" tools 64-bit and do they use multiprocessing? They are each 32-bit tools and they do NOT use multiprocessing. 5. Are the "40H" downloads checked for virus/malware? "40H" downloads are checked at www.virustotal.com 6. Is the "readme" file included in the download? Yes, and it is also on the website. The version on the website has the latest updates and corrections. 7. Where are the full usage instructions for each "40H-TXT" tool? They are in the last section of this "readme". Please be sure to read the relevant instructions before use. ==================================================================== ==================================================================== OVERVIEW OF TOOLS (FULL INSTRUCTIONS AT THE END OF THIS FILE) Each "40H-TXT" tool consists of a single file. Each tool executes from a command-line in a "Command Prompt" window. The "40H-TXT" tools do not change any input file. Output appears in a new file(s). 1. "txtBlank" removes all blank lines from the input file. 2. "txtChar" outputs the ascii value and location of most "control" characters (ascii < 32) and "extended" characters (ascii > 127). 3. "txtColumn" outputs a range of columns of a text file based upon user-specified starting and ending column numbers, inclusive. 4. "txtCombine" joins 2, 3, 4 or 5 "txt" files by concatenation. 5. "txtJoin" combines two text files by horizontally joining their lines. The two input text files must have the same number of lines. 6. "txtMerge" joins 2, 3, 4 or 5 "txt" files by adding one line from each successive input file, then repeating. 7. "txtOccur" lists the number of occurrences of each distinct non-blank line. It also lists the line numbers where the lines occurred. 8. "txtRandom" randomly rearranges the lines of the input file. The user can optionally output a user-specified number of lines. 9. "txtSingle" removes any line that is a duplicate of a prior line. The remaining lines are in their original order. 10. "txtSort" sorts the lines alphanumerically in either ascending or descending order. 11. "txtSplit" splits a large text file into as many as five separate text files. Splitting can be repeated. 12. "txtToken" uses 1 or 2 user-specified numerical parameters to output either a single token or a range of tokens from each line. 13. "txtTokenSearch" uses a user-specified "search_token" to output all lines containing the "search_token". 14. "txtWrap" word-wraps long lines of text to make them easier to read. =================================================================== INTRODUCTION: Download the "40H-TXT" file. It is packed in "7-zip" format. You can unpack it using "7-zip", available at: http://www.7-zip.org Unpacking the "40H-TXT" download file results in 14 "Windows" executable files. Each "40H-TXT" tool is a "command-line tool", which means that it executes on a command-line in a "Command Prompt" window. The "40H-TXT" tools were written in "Java" and then compiled using "gcj-34". All coding is original. Each "40H-TXT" tool consists of a single self-contained file. No external "dlls" are required. Each "40H-TXT" tool is portable and just has to be copied to be installed. It does not need a setup program and it does not write any data to the registry. Unless otherwise indicated, each "40H-TXT" tool inputs a "txt" file. The "txt" extension is not required. "40H-TXT" tools DO NOT MAKE ANY CHANGES to input files. Output appears in a new file(s). Output files are pre-named. Users should rename the output file(s) before they are overridden by the next execution of the tool. The "40H-TXT" Utility Suite is NOT chess specific. The tools can be applied to all text files, including both chess-related ("pgn", "epd") and non-chess-related text files. Characteristics of text files that can be processed by "40H-TXT": 1. Uses the ASCII character set without any formatting (bold, italics, etc). 2. Each line of text is separated by the two character combination: CR and LF (ASCII codes 13 and 10). 3. Can be read by a "Text Editor". The 10 "40H-TXT" tools are txtBlank, txtChar, txtCombine, txtColumn", "txtMerge", "txtOccur", "txtSingle", "txtSort", "txtSplit", and "txtWrap". Thanks to Jim Ablett for helping me get started on this project and for showing me how to create "Windows" executables for the tools. ==================================================================== LIMITATIONS: "40H-TXT" tools ONLY execute from a command-line within a Command Prompt. See http://dosprompt.info/ for Command Prompt support. "40H-TXT" tools are subject to the capacities of the maximum array sizes that are specified in their coding. These maximum array sizes are adequate for most "txt" input files. However if you use a very large "txt" input file, there is a possibility that an overflow error will occur or that execution will take an extended amount of time. "40H-TXT" tools do not check that the input file is a text file. =================================================================== INSTALLATION, FILES, FOLDERS AND EXECUTION: Create a folder named "40H" if it doesn't already exist. Extract the download into the "40H" folder. The extraction will unpack the 14 tools into a "40H-TXT" subfolder named "40H-TXT-2021A". For users who only will be using the "40H-TXT" tools occasionally, the simplest arrangement is to copy the desired tool to the folder where the input file(s) are located, and then run in that folder. Likewise, you could copy the input file(s) to the folder where the "40H-TXT" tools are located, and then run in that folder. For users who will be using "40H-TXT" tools often, copy/move the "40H-TXT" tools to a folder that is already on the System Path. (Type "path" in a command window to see the System Path.) This way you can run any of the "40H-TXT" tools from any folder without having to specify the path. If you do not have such a folder on your System Path, you would first have to create the folder and then edit the Path Variable. Use "search" in "Settings" to find "System Environment Variables". Then click on "Environment Variables", then "Path" in "System Variables", and then "edit". This may vary depending on Windows version. The user should set the Working Folder to the folder that contains the input "txt" file and any other data file. This will avoid having to specify pathnames and will result in the input and output file(s) being located in the same folder. The input file has to be a text file, but its name does NOT need to have the extension "txt". Input files are not changed. The user should still save all input files on another storage medium. Output appears in a new file(s). Running a "40H-TXT" tool without mentioning all required files and parameters will list the version number of the tool, the syntax, an example of usage, and the names of the output files. Output files are TEMPORARY files. They are created in the Working Folder. Be sure to rename/copy/move any output files that you want to keep. The next execution of the tool in that folder will overwrite the previous output file(s). Do not change an original output file to "read-only" as that will prevent the creating tool from executing in that folder. A "40H-TXT" tool cannot use its output "txt" file as an input file unless the filename is changed. =================================================================== EXECUTION: Each tool executes from a command-line in a "Command Prompt" window. The general format for running a tool is: tool_name txt_filename [parameter(s)] Example: txtSort alpha.txt down Follow the specific instructions for each tool. After entering the proper command-line, and making sure all files are accessible (either in the Working Folder or having a pathname), you then press to start execution. Output files are created in the Working Folder. =================================================================== SIMILARITIES/DIFFERENCES BETWEEN "40H-TXT" AND "40H-EPD" TOOLS "epd" files are text files so therefore they can be processed by either "40H-TXT" tools or "40H-EPD" tools. The following "40H-TXT" tools and "40H-EPD" tools are different and may produce different results. The main difference is that the "40H-EPD" tools ignore the opcode section of "epd" records by only processing the first 4 tokens of each line, while the "40H-TXT" tools process entire lines. 1. "txtOccur" and "epdOccur" 2. "txtSingle" and "epdSingle" 3. "txtSort" and "epdSort" =================================================================== =================================================================== FULL USAGE INSTRUCTIONS: =========================(1) txtBlank ============================= "txtBlank" removes all blank lines from the input file. If a "blank" line is not removed in "outB.txt", it may have an invisible character outside the ascii range (32-127). Use "txtChar" (see below) to locate the character and "XVI32" to remove it. Syntax: txtBlank filename.txt Example: txtBlank alpha.txt Output: outB.txt Comment: 1. The input file is NOT required to have a "txt" extension. =========================(2) txtChar ============================== "txtChar" outputs the ascii value and location of most "control" characters (ascii < 32) and "extended" characters (ascii > 127). "txtChar" does not output the ascii value and location of "carriage return" (13), "line feed" (10) and "end of file" (26). These "control" characters are very common in text files. The "tab" character (9) often causes problems in text files. Many text editors insert a series of "spaces" (32) instead of a "tab". The hexeditor "XVI32" can be used to edit "control" characters and characters that cannot be edited by a text editor. A frequency summary is located at the bottom of the output file "charList". The location consists of the row and column of the character. The ascii value is given in "Decimal" and "Hexadecimal" format. Syntax: txtChar filename.txt Usage: txtChar alpha.txt Output: charList Comments: 1. The input file is NOT required to have a "txt" extension. 2. "txtChar" is a useful tool when a text file is suspected of causing a problem. =========================(3) txtColumn ============================ "txtColumn" outputs a range of columns of a text file based upon user-specified starting and ending column numbers, inclusive. There are 2 output files. "outT.txt" contains the extracted column. "excludeT.txt" contains the columns that were NOT extracted. The output in "excludeT.txt" is condensed and realigned. The user-specified column numbers must be from 1 to 1000. If the starting column number is greater than the ending column number, no data is output. "txtColumn" is very useful for extracting a contiguous range of columns in a text file that will be used as input in another program. Output files are "outT.txt" and "excludeT.txt". For example: txtColumn alpha.txt 1 20 outputs columns 1 to 20, inclusive, to outT.txt. Remaining data, columns 21 to the end, outputs to excludeT.txt. txtColumn alpha.txt 15 30 outputs columns 15 to 30, inclusive, to outT.txt. Remaining data, columns 1 to 14, inclusive, and column 31 to the end, outputs to excludeT.txt. If you want to exclude columns at the start of each line but are not sure what to use for the ending column, use the maximum ending column (1000). txtColumn alpha.txt 6 1000 outputs column 6 to the end, to outT.txt. Remaining data, columns 1 to 5, inclusive, outputs to excludeT.txt. "txtColumn" can be used truncate the lines of a text file. txtColumn alpha.txt 1 40 truncates each line after column 40. If there is no data from the starting column to the ending column, blank lines are output to outT.txt. Remaining data is output to excludeT.txt. If all the data is within the starting column to the ending column, inclusive, then all is output to outT.txt. Blank lines are output to excludeT.txt. Syntax: txtColumn filename.txt starting_column ending_column Example: txtColumn alpha.txt 21 60 Output: outT.txt, excludeT.txt Comments: 1. The input file is NOT required to have a "txt" extension. 2. The input file cannot contain a "tab" character. Only tabs created by multiple spaces are acceptable. =========================(4) txtCombine =========================== "txtCombine" joins 2, 3, 4 or 5 "txt" files by concatenation. Concatenation combines successive files into a larger file. Files do NOT have to have the same number of lines. Syntax: txtMerge file1.txt file2.txt [file3.txt file4.txt file5.txt] Examples: txtCombine alpha.txt beta.txt txtCombine alpha.txt beta.txt gamma.txt txtCombine alpha.txt beta.txt gamma.txt delta.txt txtCombine alpha.txt beta.txt gamma.txt delta.txt epsilon.txt Output: outC.txt Comments: 1. The input file is NOT required to have a "txt" extension. 2. Users who prefer to join files by inserting one line from each successive input file, then repeating, should use "txtMerge". =========================(5) txtJoin ============================== "txtJoin" combines two text files by horizontally joining their lines. The two input text files must have the same number of lines. By default, the output file has a single space as a separator between the joined lines. Optionally, the user can insert a different "separator" between the joined lines by using a parameter on the command line. The parameter consists of the separator surrounded by two quotation marks. Blank spaces can be included in the separator. For example: txtJoin alpha.txt beta.txt "- -" The separator "- -" is WITHIN the quotes. txtJoin alpha.txt beta.txt "" This separator has no characters in it. The lines are placed adjacent to each other without any character between them. Syntax: txtJoin filename1.txt filename2.txt [separator] Examples: txtJoin alpha.txt beta.txt txtJoin alpha.txt beta.txt " & % " Output: outJ.txt Comments: 1. The input files are NOT required to have a "txt" extension. =========================(6) txtMerge ============================= "txtMerge" joins 2, 3, 4 or 5 "txt" files by adding one line from each successive input file, then repeating. Files do NOT have to have the same number of lines. Example: Suppose fileA.txt has 4 lines, fileB.txt has 2 lines and fileC.txt has 3 lines. Using the command: txtMerge fileA.txt fileB.txt fileC.txt The output file "outM.txt" will be: line1 from fileA.txt line1 from fileB.txt line1 from fileC.txt line2 from fileA.txt line2 from fileB.txt line2 from fileC.txt line3 from fileA.txt line3 from fileC.txt line4 from fileA.txt Syntax: txtMerge file1.txt file2.txt [file3.txt file4.txt file5.txt] Examples: txtMerge alpha.txt beta.txt txtMerge alpha.txt beta.txt gamma.txt txtMerge alpha.txt beta.txt gamma.txt delta.txt txtMerge alpha.txt beta.txt gamma.txt delta.txt epsilon.txt Output: outM.txt Comments: 1. The input file is NOT required to have a "txt" extension. 2. Users who prefer to join files by by concatenating successive files, should use "txtCombine". =========================(7) txtOccur ============================= "txtOccur" lists the number of occurrences of each distinct non-blank line. It also lists the line numbers where the lines occurred. The output lines in "outL.txt" contain the line from the "txt" file, followed by the comment indicator "c0", followed by the number of occurrences and followed by ";". Sample output line from "outL.txt": Have a nice day! c0 10; The output lines in "outL2.txt" contain what "outL.txt" contains, followed by the comment indicator "c1", followed by the line numbers of the occurrences and followed by ";". Sample output line from "outL2.txt": Have a nice day! c0 10; c1 8 9 10 34 35 38 44 45 46 47; Syntax: txtOccur filename.txt Examples: txtOccur alpha.txt Output: outL.txt, outL2.txt Comments: 1. "txtOccur" ignores blank lines and does not output blank lines. =========================(8) txtRandom ============================ "txtRandom" randomly rearranges the lines of the input file. The user can optionally output a user-specified number of lines. The default is to output all the lines. A lesser number of lines can be output by specifying the number of desired lines on the command line. For example: txtRandom alpha.txt 10 Only the first 10 random numbers are output. The requested number of output lines cannot exceed the number of lines in the input file. If the same line is output more than once, it is because that line was repeated in the input file. Blank lines are randomly output like other lines. Syntax: txtRandom textfile_name [num_lines] Usage: txtRandom alpha.txt txtRandom alpha.txt 15 Output: outC.txt Comments: 1. The input file is NOT required to have a "txt" extension. =========================(9) txtSingle ============================ "txtSingle" removes any line that is a duplicate of a prior line. The remaining lines are in their original order. The prior line does NOT have to be immediately prior. It can be any line previous to the current line. The output file is outA.txt. The removed records are saved in excludeA.txt Syntax: txtSingle filename.txt Example: txtSingle alpha.txt Output: outA.txt, excludeA.txt Comments: 1. The input file is NOT required to have a "txt" extension. 2. "epdSingle" in "40H-EPD" tools removes duplicates of lines in an "epd" text file. "epdSingle" only matches the first 4 tokens of the line whereas "txtSingle" matches the whole line. =========================(10) txtSort ============================= "txtSort" sorts the lines alphanumerically in either ascending or descending order. Default is ascending order. Use the optional parameter "down" for descending order. Blank lines are deleted in the output file. Syntax: txtSort filename.txt [down] Examples: txtSort alpha.txt txtSort alpha.txt down Output: outS.txt Comments: 1. The input file is NOT required to have a "txt" extension. 2. "down" is case-sensitive. 3. "txtSort" sorts "text" files by comparing entire lines. "epdSort" in "40H-EPD" sorts "epd" files by comparing the first 4 tokens of the lines. =========================(11) txtSplit ============================ "txtSplit" splits a very large text file into as many as five separate text files. Splitting can be repeated. The user specifies a "split_number" from 2 to 5 for the number of output files. The number of lines in each output file will be approximately equal. "txtSplit" can be reused on the output files if they too are too big. However, you have to rename the output files before using them as input files. The number of lines in each output file will be approximately equal. Syntax: txtSplit filename.txt split_num Example: txtSplit alpha.txt 3 Output: pt1.txt, pt2.txt, ... , pt5.txt Comments: 1. The input file is NOT required to have a "txt" extension. 2. Use "copy /b" to concatenate the files back to the original. The "/b" option prevents the "eof" character from being appended to the end of the file. =========================(12) txtToken ============================ "txtToken" uses 1 or 2 user-specified numerical parameters to output either a single token or a range of tokens from each line. The output tokens are determined by the parameters. If only one parameter is specified, then the token in that position is output from each line. If two parameters are specified, then the tokens in the range specified by the parameters are output from each line. Examples: txtToken alpha.txt 5 will output the 5th token of each line, if it exists. Otherwise a blank line is output. txtToken alpha txt 3 7 will output the range of tokens from the 3rd token to the 7th token inclusive. If none exist, a blank line is output. The output tokens start at the beginning of their output line and are separated by a single space. The output file is outK.txt Syntax: txtToken filename.txt min_token [max_token] Usage: txtToken alpha.txt 3 txtToken alpha.txt 2 6 Output: outK.txt Comments: 1. The input file is NOT required to have a "txt" extension. =========================(13) txtTokenSearch ====================== "txtTokenSearch" uses a user-specified "search_token" to output all lines containing the "search_token". For example: txtTokenSearch alpha.txt Solar will output all lines in alpha.txt containing "Solar". The "search_token" is case-sensitive. The output file is "outS.txt". "excludeS.txt" contains lines not output to "outS.txt". Syntax: txtTokenSearch filename.txt search_token Usage: txtTokenSearch alpha.txt Happy_Birthday Output: outS.txt, excludeS.txt Comments: 1. The input file is NOT required to have a "txt" extension. 2. "search_token" is case-sensitive. 3. "search_token" can only have one token. 4. "search_token" can be in quotation marks, but cannot have an embedded space. =========================(14) txtWrap ============================= "txtWrap" word-wraps long lines of text to make them easier to read. Lines are word-wrapped if they have a word starting past position 70. "txtWrap" puts a caret ("^") character before each continuation line to inform the reader that the original line has been split. No words are split. Very long lines can have more than one continuation line. Blank lines are retained. Syntax: txtWrap filename.txt Example: txtWrap alpha.txt Output: outW.txt Comment: 1. The input file is NOT required to have a "txt" extension. =================================================================== ===================================================================