Fast filtering in Octave / Matlab by vectorizing

I can see your mind is made up. Good day.
Well you wanted to know the (lack of) features that make Javascript bad. In any case the little support by Javascript in multithreading - locking, wait states, mutexes, semaphores, barriers, etc., IMHO, disqualify it from being a proper and modern language.

Heck, Prof. Dijkstra has been talking about semaphores and other concepts since 1950's or 60's. All the usual programming languages have support for these features. And, Javascript missing out on these rather makes it not a proper programming language.

--
Dj Joofa
http://www.djjoofa.com
 
Last edited:
I can see your mind is made up. Good day.
Well you wanted to know the (lack of) features that make Javascript bad. In any case the little support by Javascript in multithreading - locking, wait states, mutexes, semaphores, barriers, etc., IMHO, disqualify it from being a proper and modern language.

Heck, Prof. Dijkstra has been talking about semaphores and other concepts since 1950's or 60's. All the usual programming languages have support for these features. And, Javascript missing out on these rather makes it not a proper programming language.
 
I can see your mind is made up. Good day.
Well you wanted to know the (lack of) features that make Javascript bad. In any case the little support by Javascript in multithreading - locking, wait states, mutexes, semaphores, barriers, etc., IMHO, disqualify it from being a proper and modern language.

Heck, Prof. Dijkstra has been talking about semaphores and other concepts since 1950's or 60's. All the usual programming languages have support for these features. And, Javascript missing out on these rather makes it not a proper programming language.
 
I can see your mind is made up. Good day.
Well you wanted to know the (lack of) features that make Javascript bad. In any case the little support by Javascript in multithreading - locking, wait states, mutexes, semaphores, barriers, etc., IMHO, disqualify it from being a proper and modern language.

Heck, Prof. Dijkstra has been talking about semaphores and other concepts since 1950's or 60's. All the usual programming languages have support for these features. And, Javascript missing out on these rather makes it not a proper programming language.
 
I can see your mind is made up. Good day.
Well you wanted to know the (lack of) features that make Javascript bad. In any case the little support by Javascript in multithreading - locking, wait states, mutexes, semaphores, barriers, etc., IMHO, disqualify it from being a proper and modern language.

Heck, Prof. Dijkstra has been talking about semaphores and other concepts since 1950's or 60's. All the usual programming languages have support for these features. And, Javascript missing out on these rather makes it not a proper programming language.
 
I can see your mind is made up. Good day.
Well you wanted to know the (lack of) features that make Javascript bad. In any case the little support by Javascript in multithreading - locking, wait states, mutexes, semaphores, barriers, etc., IMHO, disqualify it from being a proper and modern language.

Heck, Prof. Dijkstra has been talking about semaphores and other concepts since 1950's or 60's. All the usual programming languages have support for these features. And, Javascript missing out on these rather makes it not a proper programming language.
 
Panning javascript as terrible isn't exactly a balanced perspective. Whether you like it or not, it is by an order of magnitude the most popular programming language.
Wouldn't that be Excel? :-)
Excel is a data analysis tool, not a programming language :)
It's got syntax. You can write very complex programs.

I once wrote a complete Monte Carlo financial model for a school, with 20 year projections and lots of what-if's. I did use a plug in from these folks to make the job easier.

It's very easy to write Excel code that runs, but produces wrong answers, and it's often difficult to find the bugs. Another problem is that I don't think that most Excel users think they are programming, and inadequately test their code.

http://www.forbes.com/sites/timwors...angerous-software-on-the-planet/#2ced7e2372ae

Is Scratch a programming language? I think it is.

Jim
 
I wrote a jpeg decoder for MATLAB that used 70-ish seconds for a 1 megapixel image. Some tasks are inherently difficult to implement efficiently using a matrix algebra/JIT/... oriented framework. Mathworks have done many such things in C/Fortran and exposed the function as a native MATLAB function (often depending themselves on 3rd party libraries). You can do the same using using the mex wrapping.
Could not agree more. My 2 cents on this discussion regarding interpreted languages centered around easily vectorized data structures, e.g., MATLAB/Octave, but also Python with NumPy and such:

1. Sometimes the matrix-style algorithm is inherently more difficult to express, requiring contorted matrix-style code to express something which is trivial to express as a simple for loop. We have many such examples in my line of work: Satellite images often have "nodata" or fill values, e.g., clouds are often masked out when you are interested in the land surface. We have had many examples where a Python+NumPy implementation required the explicit creation of a mask array to allow matrix operations to correctly ignore the nodata values. Once the masking operations become more involved, the Python code rapidly becomes less readable, thereby losing its main perceived advantage. Dealing with large images (often 20k x 20k pixels) requires massive mask arrays in NumPy, which places an unnecessary load on the OS to allocate and manage, and chews up usable memory bandwidth needlessly. Many of these examples were subsequently converted to C++, which not only produced much easier to read code (the mask operation reduced to a simple "if" statement inside a simple "for" loop), but yielded a worthwhile speed-up over the NumPy implementation, probably because of the decreased load on memory bandwidth.

2. Conditional operations in a matrix-style language are inherently harder to express, similar to using SIMD operations. You typically have to calculate both branches of the conditional, and then use a mask to select the desired result. In many cases, this is not a problem, and many conditional operations can be accelerated nicely with SIMD. But not all of them. Looking at the assembler generated by g++, I have seen that simple conditionals often reduce to a conditional move (CMOV) instruction, which takes only about 5 clock cycles on modern processors (excluding the cost of the conditional test, which is only about 2 cycles, iirc). Since the conditional move instruction does not count as a jump, there does not appear to be any significant pipeline stalls and such. The implication of this is that even a very tight C++ loop can happily contain a conditional, which is contrary to what we had to do 20 years ago.

3. I obviously like C++, and I would advocate the use of C++ even for prototyping when dealing with image processing problems. I use OpenCV extensively, which means that most typical image processing operations are reduced to a single method applied to an OpenCV matrix object, i.e., the code is in my opinion just as simple and readable as the corresponding MATLAB/Octave or Python/NumPy code. But you have the advantage of being able to use a simple nested loop to perform some operations over the entire image without incurring the massive penalty that such operations incur in the interpreted languages.

I also like the "eigen" matrix library, which stores matrices in column-major form by default, and as a result plays very nicely with SIMD operations.

4. Modern C/C++ compilers are pretty good at spotting opportunities to automatically vectorize loops, meaning that what appears to be a simple for loop in the original code is actually compiled to a sequence of SIMD instructions. I am not an expert at writing inline SIMD code using the compiler intrinsics, but I have seen enough examples of where the compiler generated code is just as fast (or in some embarrassing cases, faster) than my explicit SIMD-ification of an algorithm, so I am slowly learning to trust the compiler to do the right thing.

5. With the advent of C++11, we now have many, many options to simplify the use of multi-threading. Once you have created a pool of threads (optionally with thread-local memory for storing results of operations that cannot be performed in-place), you can easily (well, easily if you copy a previously created example...) pass operations through a lambda function to be processed in parallel by the thread pool.

Alternatively, if the code is simple, and embarrassingly parallel, then there is always OpenMP, which will parallelize a for loop with the addition of a single line of code.

Anyhow, these are just my opinions on this matter, and are not to be taken as prescriptions :)

-Frans

 
  1. Sometimes the matrix-style algorithm is inherently more difficult to express, requiring contorted matrix-style code to express something which is trivial to express as a simple for loop. We have many such examples in my line of work: Satellite images often have "nodata" or fill values, e.g., clouds are often masked out when you are interested in the land surface. We have had many examples where a Python+NumPy implementation required the explicit creation of a mask array to allow matrix operations to correctly ignore the nodata values. Once the masking operations become more involved, the Python code rapidly becomes less readable, thereby losing its main perceived advantage. Dealing with large images (often 20k x 20k pixels) requires massive mask arrays in NumPy, which places an unnecessary load on the OS to allocate and manage, and chews up usable memory bandwidth needlessly. Many of these examples were subsequently converted to C++, which not only produced much easier to read code (the mask operation reduced to a simple "if" statement inside a simple "for" loop), but yielded a worthwhile speed-up over the NumPy implementation, probably because of the decreased load on memory bandwidth.
I think that what you are saying is a general thing: if you break your algorithm into a "pipeline" of small memory-to-memory operations, even if those operations are individually optimized to run as fast as physically possible on a piece of hardware, the performance will suffer due to memory constraints.

A "cure" for this would be to offer sufficiently diverse and complex library functions that most algorithms can just call the library directly. Unfortunately, this leads to an explosion in library complexity, and the programmer having to master a (usually) "unstructured" syntax.

This is in some ways the way that MATLAB has moved, but the same would be true if you called e.g. FFTW and BLAS and what not from C or FORTRAN code.

Another "cure" would be to express your stuff in a high-level form and then merge operations without going via memory. I guess that is what compilation and JIT does? I think that MATLAB does these things, as for-loops have gotten significant speedups over the ~15 years that I have been using MATLAB.
  1. Conditional operations in a matrix-style language are inherently harder to express, similar to using SIMD operations. You typically have to calculate both branches of the conditional, and then use a mask to select the desired result. In many cases, this is not a problem, and many conditional operations can be accelerated nicely with SIMD. But not all of them. Looking at the assembler generated by g++, I have seen that simple conditionals often reduce to a conditional move (CMOV) instruction, which takes only about 5 clock cycles on modern processors (excluding the cost of the conditional test, which is only about 2 cycles, iirc). Since the conditional move instruction does not count as a jump, there does not appear to be any significant pipeline stalls and such. The implication of this is that even a very tight C++ loop can happily contain a conditional, which is contrary to what we had to do 20 years ago.
I am using MATLAB mainly to prototype things and to develop an understanding. Thus, any way of expressing my ideas compact and readable using a small-ish set of instructions that I can re-use for decades is a good thing, even if it means increased execution time (up to a point).

One thing that annoys me is that certain problems (most of what I do) maps well to MATLAB, but also maps well to assembly instructions. However, they do not map well to C. So I end up writing things neatly and relatively fast in MATLAB, reimplement in C and feeling that neither the readability nor the execution is in line with expectations for C, then re-implementing once more in assembler (and/or interpreting compiler output to coax the compiler into doing what I know the hw is capable of). I.e. C is a cumbersome bottle-neck.
  1. I obviously like C++, and I would advocate the use of C++ even for prototyping when dealing with image processing problems.
I am more of a C guy. But I have a feeling that if I had been born 20/40 years earlier, I would have been very happy with FORTRAN.
I use OpenCV extensively, which means that most typical image processing operations are reduced to a single method applied to an OpenCV matrix object, i.e., the code is in my opinion just as simple and readable as the corresponding MATLAB/Octave or Python/NumPy code. But you have the advantage of being able to use a simple nested loop to perform some operations over the entire image without incurring the massive penalty that such operations incur in the interpreted languages.
Note that the mex interface is quite user-friendly, allowing you to call C/Fortran code directly. I find that when porting my stuff from MATLAB to C, doing one module at a time and calling it with the mex interface, keeping my MATLAB implemented tests and plotting luxury can be a real time saver.
  1. Modern C/C++ compilers are pretty good at spotting opportunities to automatically vectorize loops, meaning that what appears to be a simple for loop in the original code is actually compiled to a sequence of SIMD instructions. I am not an expert at writing inline SIMD code using the compiler intrinsics, but I have seen enough examples of where the compiler generated code is just as fast (or in some embarrassing cases, faster) than my explicit SIMD-ification of an algorithm, so I am slowly learning to trust the compiler to do the right thing.
I am guessing that your context is that of a "high performance compute" engineer/scientist. I.e. one using Intels compiler, running on Intel hw? If that is the case, I think that you are correct, Intel themselves advocate against doing low-level coding. There are, however, other platforms and other compilers where auto-vectorization has not reached the same level of finesse.
  1. With the advent of C++11, we now have many, many options to simplify the use of multi-threading. Once you have created a pool of threads (optionally with thread-local memory for storing results of operations that cannot be performed in-place), you can easily (well, easily if you copy a previously created example...) pass operations through a lambda function to be processed in parallel by the thread pool.
Alternatively, if the code is simple, and embarrassingly parallel, then there is always OpenMP, which will parallelize a for loop with the addition of a single line of code.
I have managed to not have to care about threading algorithms. As long as the number of cores is small, and there are more than one thing to do, it seems a lot easier to put one job on each core.
Anyhow, these are just my opinions on this matter, and are not to be taken as prescriptions :)

-Frans
I think that your opinions are well adviced. Doubly so as you clearly have an understanding about what image processing should do, not only how to do it in software.

-h
 
My view is that you should choose the most accessible tool first, then the fastest of what is available. For nonprogrammers, matlab/octave is often the most accessible. If matlab is faster, my view is that you should bite the bullet and pay the $100 or whatever for the personal license. If you are a diehard for FOSS, use octave even though it is slower. Last I saw, Octave's GUI is bordering on unusable (imo). Matlab's is better, and the code runs faster.
I think that the excellent IDE and plotting features is the main reason why I use MATLAB over Octave or Python, not the language/syntax/runtime itself.

If I can work 10% more efficiently by having a proper IDE, then the $2000 for a MATLAB licence pays itself quite fast.

-h
 
This particular anecdote serves my argument well: once you know what your solution should look like, it is time to move it to an efficient platform if you intend to use in a production set-up ( <cough> Imatest still running on Matlab ... )
I guess that depends on:
1. How sensitive are your users to efficiency
2. What is the added cost for re-implementing things more efficiently

I believe that the Nokia Symbian thing was really hw efficient. While certain modern cellphones run Java and similar high-level stuff. Not getting into a language war, my point is that if you want to attract a large number of hobbyist developers to make nice little apps for your platform, convenience can trump efficiency, even on a battery-driven platform.

A niche application for a small crowd of image specialists developed by 2 or 10 developers could sacrifice some efficiency and user experience in order to maximize the functional excellence possible given few resources.

-h
 
This particular anecdote serves my argument well: once you know what your solution should look like, it is time to move it to an efficient platform if you intend to use in a production set-up ( <cough> Imatest still running on Matlab ... )
I guess that depends on:
1. How sensitive are your users to efficiency
2. What is the added cost for re-implementing things more efficiently

I believe that the Nokia Symbian thing was really hw efficient. While certain modern cellphones run Java and similar high-level stuff. Not getting into a language war,
Yeah, I knew I have been stirring up trouble with the Imatest/Matlab comment, but I'll add that I meant that tongue-in-cheek.

Some days I just feel like I have earned the right to take the occasional pot-shot at Imatest :)

-F
 
This particular anecdote serves my argument well: once you know what your solution should look like, it is time to move it to an efficient platform if you intend to use in a production set-up ( <cough> Imatest still running on Matlab ... )
I guess that depends on:
1. How sensitive are your users to efficiency
2. What is the added cost for re-implementing things more efficiently

I believe that the Nokia Symbian thing was really hw efficient. While certain modern cellphones run Java and similar high-level stuff. Not getting into a language war,
Yeah, I knew I have been stirring up trouble with the Imatest/Matlab comment, but I'll add that I meant that tongue-in-cheek.

Some days I just feel like I have earned the right to take the occasional pot-shot at Imatest :)

-F
-F and -h,

As you know I am a hobbyist and newly new to code (meaning that other than a little Fortran, Basic and Assembler almost, ahem, forty years ago I haven't done any).

I am really impressed at how easily I was able to pick up Matlab after Jim and Joofa's suggestion a couple of years ago*. Even more impressed at how quickly and easily I can find my numerous errors to make the code work. And even more impressed at the massive knowledge base available out there at the touch of a couple of keys. The IDE makes all the difference, Octave was too finicky for my non-expert fingers. Plus $85 for home use made Matlab irresistible for a prototyping amateur like me. Speed is good enough for my imaging purposes on my aging computers.

I can definitely see why many use Matlab for prototyping and if required move to the application-specific appropriate language for state-of-the-art speed.

Jack

*PS I tried Python but I did not find it as immediate, nor the knowledge base as extensive. With Matlab I was able to get useful stuff done the hour I turned it on.
 
I think that what you are saying is a general thing: if you break your algorithm into a "pipeline" of small memory-to-memory operations, even if those operations are individually optimized to run as fast as physically possible on a piece of hardware, the performance will suffer due to memory constraints.

A "cure" for this would be to offer sufficiently diverse and complex library functions that most algorithms can just call the library directly. Unfortunately, this leads to an explosion in library complexity, and the programmer having to master a (usually) "unstructured" syntax.so as you clearly have an understanding about what image processing should do, not only how to do it in software.
When I was doing color science, my language of choice was Smalltalk. At IBM, that was fairly unusual. People would ask me, "Is it hard to learn Smalltalk?" I'd say, "No, I can teach you the entire language in an hour or two... but it will take you months to learn the class library."

Of course, Smalltalk is not a language to use if performance is at all important.

Jim
 
This particular anecdote serves my argument well: once you know what your solution should look like, it is time to move it to an efficient platform if you intend to use in a production set-up ( <cough> Imatest still running on Matlab ... )
I guess that depends on:
1. How sensitive are your users to efficiency
2. What is the added cost for re-implementing things more efficiently

I believe that the Nokia Symbian thing was really hw efficient. While certain modern cellphones run Java and similar high-level stuff. Not getting into a language war,
Yeah, I knew I have been stirring up trouble with the Imatest/Matlab comment, but I'll add that I meant that tongue-in-cheek.

Some days I just feel like I have earned the right to take the occasional pot-shot at Imatest :)

-F
-F and -h,

As you know I am a hobbyist and newly new to code (meaning that other than a little Fortran, Basic and Assembler almost, ahem, forty years ago I haven't done any).

I am really impressed at how easily I was able to pick up Matlab after Jim and Joofa's suggestion a couple of years ago*. Even more impressed at how quickly and easily I can find my numerous errors to make the code work. And even more impressed at the massive knowledge base available out there at the touch of a couple of keys. The IDE makes all the difference, Octave was too finicky for my non-expert fingers. Plus $85 for home use made Matlab irresistible for a prototyping amateur like me. Speed is good enough for my imaging purposes on my aging computers.

I can definitely see why many use Matlab for prototyping and if required move to the application-specific appropriate language for state-of-the-art speed.

Jack

*PS I tried Python but I did not find it as immediate, nor the knowledge base as extensive. With Matlab I was able to get useful stuff done the hour I turned it on.
One of the reasons that I started this thread was that I wanted to show that a very frequent operation in image processing where you run a window across an image for each pixel and then do some operation on that windowed data can be easily vectorized to give that windowed neighborhood of each pixel in a matrix array. And then you can perform corresponding operations on that array. That is where huge speeds up in both Matlab and Octave in hundreds followed up.

Matlab vs. Octave is a personal choice as many people are used to Matlab from a free university setting. However, as this thread shows that a larger portion of speedup comes from vectorizing - and that is frequently observed many times. I have seen poorly written Matlab and / or Octave code all along. I personally would encourage hobbyist people to spend time on gaining experience on writing better code. I don't know the actual byte code (machine code) level execution speed difference between Matlab and Octave. But for the type of simple operations done in this forum, such as FFT, Eigenanalysis, spectral densities, etc., I don't think it will be large, if any. Considering that I think Octave uses powerful and speedy packages such as FFTW3, lapack, etc.

I can still see some advantage of Matlab for some people. For me, personally, not many. In fact I get all the extra toolkits free for Octave the corresponding ones you have to pay for Matlab.

--
Dj Joofa
http://www.djjoofa.com
 
Last edited:
It's very easy to write Excel code that runs, but produces wrong answers, and it's often difficult to find the bugs. Another problem is that I don't think that most Excel users think they are programming, and inadequately test their code.
An amazing Excel-site:

 
I can still see some advantage of Matlab for some people. For me, personally, not many. In fact I get all the extra toolkits free for Octave the corresponding ones you have to pay for Matlab.
Don't forget R, its fairly efficient for vector ops. The Microsoft R Open variant directly uses the Intel MLK so is somewhat better even. Also R Studio Server gives you a multi-user, access from anywhere platform for computing.

-- Bob
http://bob-o-rama.smugmug.com -- Photos
http://www.vimeo.com/boborama/videos -- Videos
http://blog.trafficshaper.com -- Blog
 
I can still see some advantage of Matlab for some people. For me, personally, not many. In fact I get all the extra toolkits free for Octave the corresponding ones you have to pay for Matlab.
Don't forget R, its fairly efficient for vector ops. The Microsoft R Open variant directly uses the Intel MLK so is somewhat better even. Also R Studio Server gives you a multi-user, access from anywhere platform for computing.
I have used R for machine learning work. It has a lot of statistical packages. That makes it worthwhile. RStudio is better now that they added debugging. And, I like that there is a remote debugging server for RStudio also. I found that very helpful in running an RStudio Server in AWS while connecting to that machine remotely via a laptop.

However, the new cool thing in town is Spark. And, SparkR framework within Spark ecosystem is limited. Due to historical nature of R, things were not designed for a distributed environment. So typical R algorithms can't be made 'embarrassingly parallel' via execution in SparkR. I think it get executed on a single machine even if you have a Spark cluster. While Python Spark is great!

Furthermore, I found R poor for work involving data structures (of certain types), again perhaps due to the historical nature of R. People have their solutions for passing pointers or references around. But, it is messy. Unless you are doing pure statistical number crunching statistics, this can become an issue.

And, as a language I find it weaker than Python (and other languages). Not talking about 3rd party libraries which are massive in number, but the pure language itself.

There is that intellectual property issue also in commercial environment as it is cumbersome and messy to hide R text based scripts.

I'm looking forward to the day when Python has all those statistical packages that R has. Python is catching up pretty fast. A lot of 'standard' stuff in scientific computing including machine learning is already available.

--
Dj Joofa
http://www.djjoofa.com
 
Last edited:
When discussing technical topics on a website (like many of us frequently are), it is tempting to mention:
https://jupyter.readthedocs.io/en/latest/
https://uk.mathworks.com/help/matlab/matlab_prog/what-is-a-live-script.html

The idea of writing a single "object" that is both source-code, a paper, a webpage, all-in-one in a self-reproducing manner is extremely appealing to me.

Imagine posting an interactive "object" that describe your investigation of e.g. camera noise where the reader can simply modify noise variance and press "play" in their webbrowser fane to get an instant exploration of behaviour outside of the examples the author chose to present? Or where figures are not static jpegs or unknown origin, but proper auto-generated from code that execute (and can be read)?

I can't stand signal processing papers with seemingly impressive results or some figure that leads to the conclusion, where reproducing those results from the theory section is two weeks of hard work due to math being underdetermined, key "details" being left out etc. Often, you find that the results are less relevant to your problem, or that the authors took some shortcuts, meaning that the two weeks were wasted. What is up with documenting results and graphs with executable code/scripts, at least in the (many) cases where this is legally possible?

-h
 
Last edited:

Keyboard shortcuts

Back
Top