Crate kitab

source · []
Expand description

kitab is a CLI tool to manage backup of media metadata information, primarily intended for bibliographical sources.

The tool can recursively apply metadata as extended attributes on all files in a filesystem location whose digests match the respective keys of the metadata.

Also, metadata can be imported from the same extended file attributes, as well as files containing bibtex entries and entries in kitab’s native store format.

Usage examples

# import rdf-turtle entries from file to store.
$ kitab import source.ttl

# import bibtex entries from file to store
$ kitab import source.bib
  
# import entries from any valid source under the given path
$ kitab import /path/to/metadata_and_or_media_files

# apply metadata on files matching digests in store
$ kitab apply /path/to/media_files

Native store format

The native data format is rdf-turtle, currently limited to a subset of the DublinCore vocabulary.

The subject of all entries is a URN specifying the digest of the matching file, in the format (digest hex for illustration purpose only):

<URN:sha256:2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae> predicate object
[...]

Please forgive the lack of a schema describing the data. It will follow.

Store location

Metadata files are stored under ~/.local/share/kitab/idx/<hex> where <hex> is the (lowercase) digest hex matching the URN in the record.

Supported digests

  • SHA512 (native)
  • SHA256

Metadata imported from extended attributes will use the SHA512 digest of the file as the storage key.

Example

The rust crate author’s PDF copy of the Bitcoin whitepaper has SHA256 hash b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553

The rdf-turtle record for this document could be:

@prefix dcterms: <https://purl.org/dc/terms/> .
@prefix dcmi: <https://purl.org/dc/dcmi/> .

<URN:sha256:b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553>
    dcterms:title "Bitcoin: A Peer-to-Peer Electronic Cash System" ;
    dcterms:subject "bitcoin,cryptocurrency,cryptography" ;
    dcterms:creator "Satoshi Nakamoto" ;
    dcterms:type "article" ;
    dcterms:MediaType "application/pdf" ;
    dcterms:language "en" .

After applying the metadata to the document itself, the extended attributes could look like this:

$ getfattr -d pub/papers/bitcoin.pdf 
user.dcterms:creator="Satoshi Nakamoto"
user.dcterms:language="en"
user.dcterms:subject="bitcoin,cryptocurrency"
user.dcterms:title="Bitcoin: A Peer-to-Peer Electronic Cash System"
user.dcterms:type="article"

Optional: File magic

If built with the magic feature, an attempt will be made to determine the media type for each file, and include the dcterms:MediaType predicate accordingly.

Without the magic feature, the dcterms.MediaType will not be included in the metadata record.

Debugging

kitab uses env_logger. Loglevel can be set using the RUST_LOG environment variable to see what’s going on when running the tool.

Caveats

For now only linux is supported.

Modules