vessenes an hour ago

Nice! Can any signals/AI folks comment on whether using this would improve vocoder outputs? The visuals look much higher res, which makes me think a vocoder using them would have more nuance. But, I'm a hobbyist.

colanderman 3 hours ago

Nice! I've used a homegrown CQT-based visualizer for a while for audio analysis. It's far superior to the STFT-based view you get from e.g. Audacity, since it is multiresolution, which is a better match to how we actually experience sound. I have for a while wanted to switch my tool to a gammatone-filter-based method [1] but I didn't know how to make it efficient.

Actually I wonder if this technique can be adapted to use gammatone filters specifically, rather than simple bandpass filters.

[1] https://en.wikipedia.org/wiki/Gammatone_filter

  • mofeien 29 minutes ago

    If you already have the implementation for the CQT, wouldn't you just be able to replace the morlet wavelet used in the CQT by the gammatone wavelet without much of on efficiency hit? I'm just learning about the gammatone filter, and it sounds interesting since it apparently better models human hearing.

Mn7cB_3kL 3 hours ago

This project shows how visualizing sound in 3D can aid musical understanding. The combination of spectrograms, fundamental tracking, and 3D representations creates a beautiful window into sound physics. Would love to see this extended to compare multiple instruments or complex orchestrations.

  • zipy124 15 minutes ago

    Is this an AI comment? Where in this work does it talk about 3D anything?

james_a_craig 7 hours ago

For some reason the value of Pi given in the C++ code is wrong!

It's given in the source as 3.14159274101257324219 when the right value to the same number of digits is 3.14159265358979323846. Very weird. I noticed when I went to look at the C++ to see how this algorithm was actually implemented.

https://github.com/alexandrefrancois/noFFT/blob/main/src/Res... line 31.

  • 2YwaZHXV 7 hours ago

    seems since it's a float it's only 32-bits, and the representation of both 3.14159274101257324219 and 3.14159265358979323846 is the same in IEEE-754: 0x40490fdb

    though I agree that it is odd to see, and not sure I see a reason why they wouldn't use 3.14159265358979323846

    • james_a_craig 7 hours ago

      Yeah, it’s as if they wrote a program to calculate pi in a float and saved the output. Very strange choice given how many places the value of pi can be found.

      • arjf 6 hours ago

        Indeed... I honestly don't remember where or how I sourced the value, and why I did not use the "correct" one - I will correct in the next release of the package. Thanks for pointing it out!

  • pvg 7 hours ago

    That is a very 'childhood exposure to 8 digit calculators' thing to notice.

    • james_a_craig 7 hours ago

      Childhood exposure to pi generation algorithms; the correct version above was from memory.

      • pvg 6 hours ago

        Close enough! The wrong 7 jumped out at me instantly although I didn't remember more than a few after.

zevv 5 hours ago

I might be mistaking, but I don't see how this is novel. As far as I know, this has a proven DSP technique for ages, although it it usually only applied when a small amount of distinct frequencies need to be detected - for example DTMF.

When the number of frequencies/bins grows, it is computationally much cheaper to use the well known FFT algorithm instead, at the price of needing to handle input data by blocks instead of "streaming".

  • colanderman 4 hours ago

    The difference from FFT is this is a multiresolution technique, like the constant-Q transform. And, unlike CQT (which is noncausal), this provides a better match to the actual behavior of our ears (by being causal). It's also "fast" in the sense of FFT (which CQT is not).

    • zipy124 15 minutes ago

      There exists the multiresolution FFT, and other forms of FFT which are based around sliding windows/SFFT techniques. CQT can also be implemented extremely quickly, utilising FFT's and kernels or other methods, like in the librosa library (dubbed pseudo-CQT).

      I'm also not sure how this is causal? It has a weighted-time window (biasing the more recent sound), which is farily novel, but I wouldn't call that causal.

      This is not to say I don't think this is cool, it certainly looks better than existing techniques like synchrosqueezing for pushing the limit of the heisenberg uncertainty principle (technically given ideal conditions synchrosqueezing can outperform the principle, but only a specific subset of signals).

waffletower 5 hours ago

Curious if there is available math to show the gain scale properties of this technique across the spectrum -- in other words its frequency response. The system doesn't appear to be LTI so I don't believe we can utilize the Z-transform to do this. Phase response would also be important as well.

phkahler 6 hours ago

This is very much like doing a Fourier Transform without using recursion and the butterflies to reduce the computation. It would be even closer to that if a "moving average" of the right length was used instead of an IIR low-pass filter. This is something I've considered superior for decades but it does take a lot more computation. I guess we're there now ;-)