Opensort

A fast general purpose sorting software


OPENSORT


NAME
opensort - sort text or binary files

SYNOPSIS
opensort  [OPTIONS]

DESCRIPTION
Opensort creates a sorted concatinated output to the disk, in one or multiple files, using all the input files.
It recognises two kinds of input files. Binary fixed length record (FLR) or delimited text (TEXT).  For the
TEXT files, the record delimiter is always the standard EOL sequence (LF or CRLF) of the operating system,
and the column/field delimiter is defined by the user.

OPTIONS
-h
Displays the help screen

-i path
The input file path. If there are more than one input files, this option may be used multiple times.
This option is mandatory and must appear at least once in the command line.

-o path
The output file path. This option is mandatory.

-t path
The temporary file path. If this option is absent, opensort will create a temporary file in the standard
temporary directory. The file will be removed by the end of the program's execution.

-m megabytes
The amount of RAM, in megabytes, opensort is allowed to use. If this option is absent, opensort will use 16
megabytes.

-b kilobytes
The I/O buffer size in kilobytes. If this option is absent, opensort will use 512 kilobytes for each FLR I/O buffer,
or 4 kilobytes for each TEXT I/O buffer. Values above 65536 will be truncated to this limit. Under Linux, if
opensort has been built with the -DSPLICE flag, this option is ignored for the FLR files.

-z bytes
The record size in bytes. This option is mandatory, implies  FLR I/O and is mutual exclusive to -delim (See below).

-delim (space, tab, column, semicolumn, pipe or comma)
Specify the delimiter character which separates the fields/columns in a TEXT line. Valid delimiter options are:
space, tab, column, semicolumn, pipe or comma. This option is mandatory, implicits TEXT I/O and is mutual exclusive
to -z.

-v bytes
The volume size in bytes. If the size, in bytes, of the output file exceed this number, multiple volumes will be
created. The minimum is equal to 256 kilobytes (256*1024).  If this option is absent, opensort will use the
maximum number allowed by the file system. No record will split between two volumes.

-sorters n
The number of sorting threads. Default is 1, maximum is 8.

-directio
Use direct (unbuffered) I/O. This is a hint opensort may ignore, and fall back to buffered I/O. Under Linux, if opensort
has been built with the -DSPLICE flag, this option is ignored.

-single
Disable parallel I/O. This option is useful when the temporary file lays on the same disk/array with the input or the
output file. Requires -z.

-statistics
Display statistics.

-k [+,-]start,end,(t,n,i8,u8,i16,u16,i32,u32,i64,u64,float,double)
Specify a sort key in the FLR. Start is the offset of the first byte of key, count from 0. End is the offset of the last byte
of the key, count from 0. The optional plus or minus sign in front of the start offset indicates the order of the key, where
plus means ascending order and minus means descending order. If the sign is absent, plus is implied. The last mandatory
option indicates the data type of the key and is translated as follows:

t:  ASCII character or string.
n:  Numeric string (String to sort according to its numerical value).
i8, i16, i32, i64: 8/16/32/64 bit native signed integer.
u8, u16, u32, u64:  8/16/32/64 bit native unsigned integer.
float: Native float (usually 32 bit)
double: Native double (usually 64 bit)

This option is mandatory and must appear at least once in the command line, if -z has been used.

-k [+,-]pos,(t,n)
Specify a sort key in the TEXT line. Pos is the position of the key, as it is defined by the delimiter character, count from 1.
The optional plus or minus sign in front of the position indicates the order of the key, where plus means ascending order
and minus means descending order. If the sign is absent, plus is implied. The last mandatory option indicates the data type
of the key and is translated as follows:

t:  ASCII character or string.
n:  Numeric string (String to sort according to its numerical value).

This option is mandatory and must appear at least once in the command line, if -delim has been used.

-detached
Use the detached sorter. This is the default sorter option, able to cover all the possible key type and order combinations,
performing parallel I/O. Requires -z.

-pipeline
Use the pipeline sorter. This option is a hint. Opensort may ignore it and fallback to the -detached option. It implements
a three stage pipeline: read, sort and write. This sorter will perform parallel I/O, optimized for multi disk/array systems,
where the temporary file lays on a different  physical media than the input and the output file. Requires -z.

-earlyflush
Use the earlyflush sorter. This option is a hint. Opensort may ignore it and fallback to the -detached option. This sorter is
optimized for RAM or CPU bound hardware and will perform limited parallel I/O. Performs better in single disk/array systems
or when the temporary file lays on the same  physical media with the input and the output file. Combines well with the -single
option. Requires -z.

-monoblock
Use the monoblock sorter. This option is a hint. Opensort may ignore it and fallback to the -detached option. With this sorter
no parallel I/O is performed, unless more than one sorters are used. Targets to systems with deep multithreading capabilities,
vast amounts of RAM and high storage throughput. Requires -z.

-conservative
Use conservative prefetch policy during merge. 20% of the RAM, available to opensort, will be used as a microblock
prefetch heap. Requires -z.

-standard
Use standard prefetch policy during merge. 33% of the RAM, available to opensort, will be used as a microblock
prefetch heap. Requires -z.

-aggressive
Use aggressive prefetch policy during merge. Almost 50% of the RAM, available to opensort, will be used as a microblock
prefetch heap. Requires -z.

-mt
Use the multithreaded implementations of quicksort or radix sort. This option is ignored when combined with -earlyflush.

-debug
On runtime error, print additional information for debuging. This options does not affect the program's speed.

COMMENTS
There is no default prefetch option. If none of the options -conservative, -standard, -aggressive is present, opensort will disable
prefetch completely.

EXAMPLES
The following example creates the sorted concatenated output file three.dat, using the FLR files one.dat and two.dat as inputs.
The sort keys are two, one from offset 0-9 descending text, and another from offset 31-38 ascending signed integer 64 bit wide.
The record size is 78:

opensort -i one.dat -i two.dat -o three.dat -k -0,9,t -k 31,38,i64 -z 78

The following example creates the sorted output yourtext.txt, split in multiple files with maximum size 1000000 bytes each, using
the input file mytext.txt. The sort keys are two, one in position 7, ascending text, and another in position 2, descending numeric
string, defined by the pipe character as delimiter.

opensort -i mytext.txt -o yourtext.txt -v 1000000 -k 7,t -k -2,n -delim pipe

AUTHOR
Written by Lucas Tsatiris.

REPORTING BUGS
You may report bugs or request features using the project's bug tracker at sourceforge:
http://sourceforge.net/tracker/?group_id=295617

Or by email to: opensort.project@gmail.com.

COPYRIGHT
Copyright © 2009, 2010, 2011 Lucas Tsatiris.   License GPLv2+: GNU GPL  version  2  or  later:
http://www.gnu.org/licenses/gpl-2.0.html

This  is  free  software: you are free to change and redistribute it.  There is NO WARRANTY, to  the extent permitted by law.


Opensort 0.2.0  - January 2011

Get opensort at SourceForge.net. Fast, secure and Free Open Source software downloads