Opensort

A fast general purpose sorting software


OPENSORT


NAME
opensort - sort lines of text files or records of binary files

SYNOPSIS
opensort  [OPTIONS]

DESCRIPTION
Sort from standard input to standard output. Alternatively, disk files may be used as input and/or output.
Inputs and outputs may be binary fixed length record (FLR) or delimited text (TEXT), the default. For TEXT
inputs, the record delimiter is always the standard EOL sequence (LF or CRLF) of the operating system,
and the column/field delimiter is defined by the user.
 
OPTIONS
-h, -help
Displays the help screen

-i path
Input file path. If there are more than one input files, this option may be used multiple times.
Default is standard input. If this option is present, standard input is ignored.

-o path
Output file path. Default is standard output. If this option is present, standard output is ignored.

-t path
Temporary file path. If this option is absent, opensort will create a temporary file in the standard
temporary directory. This file will be removed by the end of the program's execution.

-m megabytes
Amount of RAM, in megabytes, opensort is allowed to use. If this option is absent, opensort will use 16
megabytes.

-b kilobytes
I/O buffer size in kilobytes. If this option is absent, opensort will use 512 kilobytes for each FLR I/O buffer,
or 64 kilobytes for each TEXT I/O buffer. Values above 65536 will be truncated to this limit. Under Linux, if
opensort has been built with -DSPLICE flag, this option is ignored for the FLR files.

-z bytes
Record size in bytes. This option implies  FLR I/O and is mutual exclusive to -delim (See below).

-delim (space, tab, column, semicolumn, pipe or comma)
Specify a delimiter character which separates fields/columns in a TEXT line. Valid delimiter options are:
space, tab, column, semicolumn, pipe or comma. The default behaviour is not to use any delimiter at all,
so the whole line is treated as a single field. This option implies TEXT I/O and is mutual exclusive
to -z.

-v bytes
Volume size in bytes. If the size, in bytes, of the output file exceed this number, multiple volumes will be
created. Minimum is equal to 256 kilobytes (256*1024).  If this option is absent, opensort will use the
maximum number allowed by the file system. No record will split between two volumes. If standard
output is used, this option is ignored.

-sorters n
Number of sorting threads. Default is 1, maximum is 8. Opensort may ignore this option and fall back to the default.

-directio
Use direct (unbuffered) I/O. This is a hint opensort may ignore, and fall back to buffered I/O. Under Linux, if opensort
has been built with -DSPLICE flag, this option is ignored.

-single
Disable parallel I/O. This option is useful when the temporary file lays on the same disk/array with input or 
output files. Requires -z. If standard input or standard output is in use, this option is ignored.

-statistics
Display statistics.

-k [+,-]start,end,(t,n,i8,u8,i16,u16,i32,u32,i64,u64,float,double)
Specify a sort key in the FLR. Start is the offset of the first byte of key, count from 0. End is the offset of the last byte
of the key, count from 0. The optional plus or minus sign in front of the start offset indicates the order of the key, where
plus means ascending order and minus means descending order. If the sign is absent, plus is implied. The last mandatory
option indicates the data type of key and is translated as follows:

t:  ASCII character or string.
tic: Same as t but ignores case.
n:  Numeric string (String to sort according to its numerical value).
i8, i16, i32, i64: 8/16/32/64 bit native signed integer.
u8, u16, u32, u64:  8/16/32/64 bit native unsigned integer.
float: Native float (usually 32 bit)
double: Native double (usually 64 bit)

This option is mandatory and must appear at least once in the command line, if -z has been used.

-k [+,-]pos,(t,n)
Specify a sort key in the TEXT line. Pos is the position of the key, as it is defined by a delimiter character, count from 1.
The optional plus or minus sign in front of the position indicates the order of the key, where plus means ascending order
and minus means descending order. If the sign is absent, plus is implied. The last mandatory option indicates the data type
of the key and is translated as follows:

t:  ASCII character or string.
tic: Same as t but ignores case.
n:  Numeric string (String to sort according to its numerical value).

Default is -k 1,t.

-detached
Use the detached sorter. This is the default sorter option, able to cover all possible key type and order combinations,
performing parallel I/O. Requires -z.

-pipeline
Use the pipeline sorter. This option is a hint. Opensort may ignore it and fallback to -detached option. It implements
a three stage pipeline: read, sort and write. This sorter will perform parallel I/O, optimized for multi disk/array systems,
where temporary file lays on a different  physical media than input and output files. Requires -z.

-earlyflush
Use the earlyflush sorter. This option is a hint. Opensort may ignore it and fallback to -detached option. This sorter is
optimized for RAM or CPU bound hardware and will perform limited parallel I/O. Performs better in single disk/array systems
or when temporary file lays on the same  physical media with input and output files. Combines well with -single
option. Requires -z.

-monoblock
Use the monoblock sorter. This option is a hint. Opensort may ignore it and fallback to -detached option. With this sorter
no parallel I/O is performed, unless more than one sorters are used. Targets to systems with deep multithreading capabilities,
vast amounts of RAM and high storage throughput. Requires -z.

-conservative
Use conservative prefetch policy during merge. 20% the RAM, available to opensort, will be used as a microblock
prefetch heap. Requires -z.

-standard
Use standard prefetch policy during merge. 33% the RAM, available to opensort, will be used as a microblock
prefetch heap. Requires -z.

-aggressive
Use aggressive prefetch policy during merge. Almost 50% of  RAM, available to opensort, will be used as a microblock
prefetch heap. Requires -z.

-mt
Use multithreaded implementations of quicksort or radix sort. This option is ignored when combined with -earlyflush.

-singletmp
Use one temporary file for all blocks. This is the default.

-multitmp
Use one temporary file for each block.

-debug
On runtime error, print additional information for debuging. This options does not affect the program's speed.

COMMENTS
There is no default prefetch option. If none of the options -conservative, -standard, -aggressive is present, opensort will disable
prefetch completely.

NOTES
I/O statistics for TEXT are broken.

EXAMPLES
The following example creates the sorted concatenated output file three.dat, using the FLR files one.dat and two.dat as inputs.
Sort keys are two, one from offset 0-9 descending text, and another from offset 31-38 ascending signed integer 64 bit wide.
Record size is 78:

opensort -i one.dat -i two.dat -o three.dat -k -0,9,t -k 31,38,i64 -z 78

The following example creates sorted output yourtext.txt, split in multiple files with maximum size 1000000 bytes each, using
 input file mytext.txt. Sort keys are two, one in position 7, ascending text, and another in position 2, descending numeric
string, defined by pipe character as delimiter.

opensort -i mytext.txt -o yourtext.txt -v 1000000 -k 7,t -k -2,n -delim pipe

AUTHOR
Written by Lucas Tsatiris.

REPORTING BUGS
You may report bugs or request features using the project's bug tracker at sourceforge:
http://sourceforge.net/tracker/?group_id=295617

Or by email to: opensort.project@gmail.com.

COPYRIGHT
Copyright © 2009, 2010, 2011 Lucas Tsatiris.   License GPLv2+: GNU GPL  version  2  or  later:
http://www.gnu.org/licenses/gpl-2.0.html

This  is  free  software: you are free to change and redistribute it.  There is NO WARRANTY, to  the extent permitted by law.

WARNING
This software is under development and may contain serious bugs. We do not recommend it for production use.



Opensort 0.4.2  - Augost 2011

Get opensort at SourceForge.net. Fast, secure and
                Free Open Source software downloads