Cover Image for new SpectrogramDataset package

new SpectrogramDataset package

Released package to convert audio folders to torch datasets of spectrograms, labels and masks, useful for training.

GitHub Repo

Cover Image for JaVAD: Just Another Voice Activity Detector

JaVAD: Just Another Voice Activity Detector

Released JaVAD: Just Another Voice Activity Detector, SOTA vad open source package that is faster than every existing open source package I tested it against.

GitHub Repo

Cover Image for JADIA: Dialogue diarization package

JADIA: Dialogue diarization package

JANET (Directional Alignment) -based diarization package v0.1.1 New lite model (lite_v2) provides 20% improve in Diarization Error Rate

GitHub Repo

VIDEO

Triplet Loss

VIDEO

Contrastive loss(es)

Cover Image for JADIA-Plot: visualization module for JADIA

JADIA-Plot: visualization module for JADIA

Simple way to visualize JADIA results.

pip install jadia-plot

Cover Image for JADIA: Dialogue diarization package

JADIA: Dialogue diarization package

JANET (Directional Alignment) -based diarization package v0.1 Uses K-means, best at diarizing dialogues with two people.

GitHub Repo

VIDEO

Cosine distance (and Cosine similarity)

Cover Image for TQDM wrapper & custom progress bar

TQDM wrapper & custom progress bar

Tired of dealing with TQDM logging issues, created a TQDM for CometML and W&B. It detects if W&B or Comet ML module is up and replaces TQDM logger with a custom logger with large periods between updates.

GitHub repo

VIDEO

Hinton's Dynamic Routing between capsules

VIDEO

What queries, keys and values are?

Cover Image for Whisper Fine-tuning script

Whisper Fine-tuning script

Simple Whisper trainer, directly compatible with OpenAI repo (no need for Huggingface libraries)

GitHub repo

VIDEO

How positional encoding works in transformers?

VIDEO

Why do we need biases

VIDEO

Why do we need activation functions

Cover Image for Directional Alignment layer

Directional Alignment layer

Repo for custom Directional Alignment layer created to better extract features from speech spectrograms, compared with Conv1d/Conv2d Resnet-like stacks.

GitHub repo

Cover Image for Forward-Forward

Forward-Forward

Custom implementation of George Hinton's Forward-Forward algorithm with rather stupid but working loss.

GitHub repo

Cover Image for Tiny custom PDF chat

Tiny custom PDF chat

FastAPI/uvcorn custom microservice to process PDF and extract information using ChatGPT

GitHub repo

Cover Image for Improving Speaker Verification by introducing Alignment layer
PAPER

Improving Speaker Verification by introducing Alignment layer

Custom Alignment layer to improve extraction capabilities.

Link

Cover Image for Fixing Wide ResNet

Fixing Wide ResNet

Correct implementation of Wide ResNets (compared to other found on github).

Top1 Cifar10 for WRN 28x10 dropout=0.3


Augmentation strategy Top1 accuracy
padding 4 & crop, random horizontal flip 95.77%
+ Random Erase 96.84%
+ RE+RandAugment 97.27%
+ RE+RA+CutMix 97.28%