LightMotif Stars

A lightweight platform-accelerated library for biological motif scanning using position weight matrices.

Actions Coverage License Docs Crate PyPI Wheel Bioconda Python Versions Python Impls Source Mirror Issues Changelog Downloads

Overview

Motif scanning with position weight matrices (also known as position-specific scoring matrices) is a robust method for identifying motifs of fixed length inside a biological sequence. They can be used to identify transcription factor binding sites in DNA, or protease cleavage site in polypeptides. Position weight matrices are often viewed as sequence logos:

https://raw.githubusercontent.com/althonos/lightmotif/main/docs/_static/prodoric_logo_mx000274.svg

The lightmotif library provides a Python module to run very efficient searches for a motif encoded in a position weight matrix. The position scanning combines several techniques to allow high-throughput processing of sequences:

  • Compile-time definition of alphabets and matrix dimensions.

  • Sequence symbol encoding for fast table look-ups, as implemented in HMMER or MEME

  • Striped sequence matrices to process several positions in parallel, inspired by Michael Farrar.

  • Vectorized matrix row look-up using permute instructions of AVX2.

This is the Python version, there is a Rust crate available as well.

Setup

Run pip install lightmotif in a shell to download the latest release and all its dependencies from PyPi, or have a look at the Installation page to find other ways to install the lightmotif Python package.

Library

License

This library is provided under the open-source MIT license.

This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.