186 lines
		
	
	
		
			6.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			186 lines
		
	
	
		
			6.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# seek-bzip
 | 
						|
 | 
						|
[![Build Status][1]][2] [![dependency status][3]][4] [![dev dependency status][5]][6]
 | 
						|
 | 
						|
`seek-bzip` is a pure-javascript Node.JS module adapted from [node-bzip](https://github.com/skeggse/node-bzip) and before that [antimatter15's pure-javascript bzip2 decoder](https://github.com/antimatter15/bzip2.js).  Like these projects, `seek-bzip` only does decompression (see [compressjs](https://github.com/cscott/compressjs) if you need compression code).  Unlike those other projects, `seek-bzip` can seek to and decode single blocks from the bzip2 file.
 | 
						|
 | 
						|
`seek-bzip` primarily decodes buffers into other buffers, synchronously.
 | 
						|
With the help of the [fibers](https://github.com/laverdet/node-fibers)
 | 
						|
package, it can operate on node streams; see `test/stream.js` for an
 | 
						|
example.
 | 
						|
 | 
						|
## How to Install
 | 
						|
 | 
						|
```
 | 
						|
npm install seek-bzip
 | 
						|
```
 | 
						|
 | 
						|
This package uses
 | 
						|
[Typed Arrays](https://developer.mozilla.org/en-US/docs/JavaScript/Typed_arrays), which are present in node.js >= 0.5.5.
 | 
						|
 | 
						|
## Usage
 | 
						|
 | 
						|
After compressing some example data into `example.bz2`, the following will recreate that original data and save it to `example`:
 | 
						|
 | 
						|
```
 | 
						|
var Bunzip = require('seek-bzip');
 | 
						|
var fs = require('fs');
 | 
						|
 | 
						|
var compressedData = fs.readFileSync('example.bz2');
 | 
						|
var data = Bunzip.decode(compressedData);
 | 
						|
 | 
						|
fs.writeFileSync('example', data);
 | 
						|
```
 | 
						|
 | 
						|
See the tests in the `tests/` directory for further usage examples.
 | 
						|
 | 
						|
For uncompressing single blocks of bzip2-compressed data, you will need
 | 
						|
an out-of-band index listing the start of each bzip2 block.  (Presumably
 | 
						|
you generate this at the same time as you index the start of the information
 | 
						|
you wish to seek to inside the compressed file.)  The `seek-bzip` module
 | 
						|
has been designed to be compatible with the C implementation `seek-bzip2`
 | 
						|
available from https://bitbucket.org/james_taylor/seek-bzip2.  That codebase
 | 
						|
contains a `bzip-table` tool which will generate bzip2 block start indices.
 | 
						|
There is also a pure-JavaScript `seek-bzip-table` tool in this package's
 | 
						|
`bin` directory.
 | 
						|
 | 
						|
## Documentation
 | 
						|
 | 
						|
`require('seek-bzip')` returns a `Bunzip` object.  It contains three static
 | 
						|
methods.  The first is a function accepting one or two parameters:
 | 
						|
 | 
						|
`Bunzip.decode = function(input, [Number expectedSize] or [output], [boolean multistream])`
 | 
						|
 | 
						|
The `input` argument can be a "stream" object (which must implement the
 | 
						|
`readByte` method), or a `Buffer`.
 | 
						|
 | 
						|
If `expectedSize` is not present, `decodeBzip` simply decodes `input` and
 | 
						|
returns the resulting `Buffer`.
 | 
						|
 | 
						|
If `expectedSize` is present (and numeric), `decodeBzip` will store
 | 
						|
the results in a `Buffer` of length `expectedSize`, and throw an error
 | 
						|
in the case that the size of the decoded data does not match
 | 
						|
`expectedSize`.
 | 
						|
 | 
						|
If you pass a non-numeric second parameter, it can either be a `Buffer`
 | 
						|
object (which must be of the correct length; an error will be thrown if
 | 
						|
the size of the decoded data does not match the buffer length) or
 | 
						|
a "stream" object (which must implement a `writeByte` method).
 | 
						|
 | 
						|
The optional third `multistream` parameter, if true, attempts to continue
 | 
						|
reading past the end of the bzip2 file.  This supports "multistream"
 | 
						|
bzip2 files, which are simply multiple bzip2 files concatenated together.
 | 
						|
If this argument is true, the input stream must have an `eof` method
 | 
						|
which returns true when the end of the input has been reached.
 | 
						|
 | 
						|
The second exported method is a function accepting two or three parameters:
 | 
						|
 | 
						|
`Bunzip.decodeBlock = function(input, Number blockStartBits, [Number expectedSize] or [output])`
 | 
						|
 | 
						|
The `input` and `expectedSize`/`output` parameters are as above.
 | 
						|
The `blockStartBits` parameter gives the start of the desired block, in bits.
 | 
						|
 | 
						|
If passing a stream as the `input` parameter, it must implement the
 | 
						|
`seek` method.
 | 
						|
 | 
						|
The final exported method is a function accepting two or three parameters:
 | 
						|
 | 
						|
`Bunzip.table = function(input, Function callback, [boolean multistream])`
 | 
						|
 | 
						|
The `input` and `multistream` parameters are identical to those for the
 | 
						|
`decode` method.
 | 
						|
 | 
						|
This function will invoke `callback(position, size)` once per bzip2 block,
 | 
						|
where `position` gives the starting position of the block (in *bits*), and
 | 
						|
`size` gives the uncompressed size of the block (in bytes).
 | 
						|
 | 
						|
This can be used to construct an index allowing direct access to a particular
 | 
						|
block inside a bzip2 file, using the `decodeBlock` method.
 | 
						|
 | 
						|
## Command-line
 | 
						|
There are binaries available in bin.  The first generates an index of all
 | 
						|
the blocks in a bzip2-compressed file:
 | 
						|
```
 | 
						|
$ bin/seek-bzip-table test/sample4.bz2
 | 
						|
32	99981
 | 
						|
320555	99981
 | 
						|
606348	99981
 | 
						|
847568	99981
 | 
						|
1089094	99981
 | 
						|
1343625	99981
 | 
						|
1596228	99981
 | 
						|
1843336	99981
 | 
						|
2090919	99981
 | 
						|
2342106	39019
 | 
						|
$
 | 
						|
```
 | 
						|
The first field is the starting position of the block, in bits, and the
 | 
						|
second field is the length of the block, in bytes.
 | 
						|
 | 
						|
The second binary decodes an arbitrary block of a bzip2 file:
 | 
						|
```
 | 
						|
$ bin/seek-bunzip -d -b 2342106 test/sample4.bz2 | tail
 | 
						|
élan's
 | 
						|
émigré
 | 
						|
émigré's
 | 
						|
émigrés
 | 
						|
épée
 | 
						|
épée's
 | 
						|
épées
 | 
						|
étude
 | 
						|
étude's
 | 
						|
études
 | 
						|
$
 | 
						|
```
 | 
						|
 | 
						|
Use `--help` to see other options.
 | 
						|
 | 
						|
## Help wanted
 | 
						|
 | 
						|
Improvements to this module would be generally useful.
 | 
						|
Feel free to fork on github and submit pull requests!
 | 
						|
 | 
						|
## Related projects
 | 
						|
 | 
						|
* https://github.com/skeggse/node-bzip node-bzip (original upstream source)
 | 
						|
* https://github.com/cscott/compressjs
 | 
						|
  Lots of compression/decompression algorithms from the same author as this
 | 
						|
  module, including bzip2 compression code.
 | 
						|
* https://github.com/cscott/lzjb fast LZJB compression/decompression
 | 
						|
 | 
						|
## License
 | 
						|
 | 
						|
#### MIT License
 | 
						|
 | 
						|
> Copyright © 2013-2015 C. Scott Ananian
 | 
						|
>
 | 
						|
> Copyright © 2012-2015 Eli Skeggs
 | 
						|
>
 | 
						|
> Copyright © 2011 Kevin Kwok
 | 
						|
>
 | 
						|
> Permission is hereby granted, free of charge, to any person obtaining
 | 
						|
> a copy of this software and associated documentation files (the
 | 
						|
> "Software"), to deal in the Software without restriction, including
 | 
						|
> without limitation the rights to use, copy, modify, merge, publish,
 | 
						|
> distribute, sublicense, and/or sell copies of the Software, and to
 | 
						|
> permit persons to whom the Software is furnished to do so, subject to
 | 
						|
> the following conditions:
 | 
						|
>
 | 
						|
> The above copyright notice and this permission notice shall be
 | 
						|
> included in all copies or substantial portions of the Software.
 | 
						|
>
 | 
						|
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 | 
						|
> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 | 
						|
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 | 
						|
> NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
 | 
						|
> LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 | 
						|
> OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
 | 
						|
> WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 | 
						|
 | 
						|
[1]: https://travis-ci.org/cscott/seek-bzip.png
 | 
						|
[2]: https://travis-ci.org/cscott/seek-bzip
 | 
						|
[3]: https://david-dm.org/cscott/seek-bzip.png
 | 
						|
[4]: https://david-dm.org/cscott/seek-bzip
 | 
						|
[5]: https://david-dm.org/cscott/seek-bzip/dev-status.png
 | 
						|
[6]: https://david-dm.org/cscott/seek-bzip#info=devDependencies
 |