Website security

I’ve installed several security tools on my website, and what’s interesting is that I get notified whenever somebody attempts to login:

A user with IP address XXX.XX.XXX.XX has been locked out from the signing in or using the password recovery form for the following reason: Used an invalid user-name to try to sign in.
User IP: XXX.XX.XXX.XX
User host name: XXXXXXX.XX

I get about 50 of these warnings a day from different IP addresses/different host names, and my question to everybody trying to hack into the website is why?

Please feel free to comment below.

Robert

 

Visual Studio 2012 – Hiccups

I just switched over to Visual Studio 2012, and was very dismayed to find several problems!

First I updated to the latest-and-greatest version of PostSharp, because it had several fixes for VS2012 and .NET 4.5; that got things compiling but not working very well yet.

  • Visual Studio 2012 RTM wouldn’t open any of my source code!!

    Every time I opened the solution, for every single project it would ask to target the .NET 4.5 framework, but it did a really bad job of switching the projects.  I noticed this because it KEPT asking every time I opened the solution, AND Mercurial didn’t see any file changes.

    One of the major problems was that I couldn’t open any source code file!! Double clicking on a file in the Solution Explorer did absolutely nothing! The file would not open.

    I finally opened the properties for each project in the solution, and found that for many of them the target framework was blank! I manually selected the .NET 4.5 framework, saved the project and then I could open source code files again and VS2012 stopped asking me when I opened the solution (and I could check-in the changed project files).

  • Then, I couldn’t debug anything! The debugger would step over tens of lines at a time, and ever local variable was “null”. The code was acting VERY bizarrely in the debugger, but ran just fine without it. I decompiled my code and boy was it a mess! Turns out that for some reason when I upgraded VS2012 switched to my code-coverage test settings and was filling the class with lots of yucky code that apparently craps out the debugger. I switched to my other test settings and I can debug my code again.

Hopefully that’s all the problems; I need to be productive writing code – not solving stupid issues like these.

On the flip side, the new test runner is great! I really like that I can focus on fixing one test, and I don’t loose all the other failed tests. I also like being able to immediately see the problem with a failed test right in the text explorer, rather than having to open a document window with the test results (and then do all sorts of useless window maintenance).

The only frustrating thing is that the test run keyboard shortcuts seem to have been broken. Ctrl-R + A was running all tests, but just stopped and I had to redefine it. And there doesn’t seem to be any mechanism any more for running all the tests in the current class.

Robert

WPF and Multi-threaded programming design

Background

Today a friend of mine wrote me,

We are finally writing our first production WPF application which is fun. Where do you put your async stuff? Is it in your ViewModels? If so, how do you test them?

I’m currently writing a very complex piece of software for scientific researchers, it allows the non-programming savvy scientific researcher the ability to write their own programs in a highly specialized domain-language that I have created specifically for this purpose. There is a lot of threading going on€¦

Common threading uses

I use multi-threading when the application is starting to initialize suitable sub-components simultaneously (and report progress to the user)


Similarly, during long-running data exports from the database I use threading to report progress to the user via a progress-bar and respond quickly to cancel requests.


The language itself

I also use threading to decouple the very slow data persistence layer from the very performance hungry requirements of the domain-language execution. A running program written in the domain language saves a lot of data, and I don’t want to slow-down the running program waiting for the hard drive to persist it all. So the database layer has its own threads for persisting data.

Even the domain-language itself allows multi-threading, although it is hopefully completely transparent. For example, here is a small snippet of the domain-language written by a scientific researcher (note that I didn’t write this for this post) that is inherently multi-threaded:

define phase FixationPhase
// Take away the food hopper – they need to earn the food
move FoodHopper to down position
// Show the fixation circle on the screen
fixation.colour = #55AA2B
fixation.IsVisible = true

// If the subject clicks the fixation circle then
// we need to move to the next phase
when fixation.ClickCount changes then
goto phase Test1Phase
end when
wait(60min * 24)
end phase

This is multi-threaded in a few ways:

  • First, the WPF GUI and the “codes” (as programs written in this domain-language have come to be called) run on their own threads. This of course helps keeps the IDE responsive and also allows the experimenter to use the IDE and review data while the subject is running through the experiment on a second monitor.
  • But even within the experiment this is multi-threaded. The second last line contains a wait(60min * 24) which obviously needs to wait for 24 hours. But if the subject clicks on the fixation shape, then the when clause needs to run, which in this case means going to a completely different phase of the experiment and cancelling the wait request. As shown in the sample above the domain-language makes this easy (hopefully!), but under the covers there is a fair bit of non-trivial threading and cross-thread communication happening to make all of this work and work very fast.

Logging

I also use the wonderful Gibraltar Hub so that I can analyse application performance; specifically I’m looking for memory leaks and degrading performance over time, but of course if there are errors in the log I want to see those as well. By default Gibraltar sends the log files when the application exits, but I’ve discovered that scientific-researches tend to leave the program running all the time, and they never shut off the computers. This means that the logs can become very large and are very rarely sent. I asked the wonderful folks at Gibraltar (they really are amazing) if there was a better way and there was:

Gibraltar 3.0 will make incremental updates of logs much more efficient, but you can address the long running session issue now. You should call Log.SendSessions or Log.SendSessionsAsync once in a while — maybe on a timer.

Easy enough. My original solution was much more complex than it needed be, using threads and wait handles. But when I re-read their comment just now I realized that the much more elegant approach is to use a timer, just like they suggested.

There are actually three Timer classes in .NET:

System.Timers.Timer – Seems easier to use, but less documentation on which thread will make the call
System.Threading.Timer – More powerful but a little more complex
System.Widnows.Forms.Timer – Obviously not really useful for the WPF programmer

In this case I don’t need the extra power, so I’m using the System.Timers.Timer timer.

In my static constructor:

And then in the class itself declared at the bottom of the file along with all my other data:

The view-model and threading

For all of these examples, NONE of them are in the view-model! Typically if you’re talking about threading in the context of WPF you’re talking about using the Dispatcher to “move” a background thread request to the GUI thread. There’s a very complete MSDN article Build More Responsive Apps With The Dispatcher that discusses how this is typically done (it also discusses a new timer class for WPF, the DispatcherTimer).

In fact, I work hard to avoid dealing with the Dispatcher in my view-model, because if I was to switch to the web, WinForms, or Windows8 stuff the dispatcher logic could be completely different, reducing the overall use of the view-model. My goal with the view-model is to imagine a different delivery mechanism for the application that nevertheless looks and works very similar (perhaps so that users can switch back and forth between the two types of applications more easily). So perhaps a Windows 8 version for tablets; of course I’d like to use the same view-model layer if at all possible. Maybe that’s just dreaming and I’m making a (very small) amount of work for myself but I think it’s still worthwhile to strive for the goal: the worst that has happened to me as a result of this is a very clean view-model layer. 🙂

Domain logic layer collections

One place where I do deal with the dispatcher in the view-model layer is when I’m wrapping observable collections. With .NET 4 the ObservableCollection(Of T) class has been nicely placed in the System.Collections namespace (specifically System.Collections.ObjectModel) and the always present System.dll making it very acceptable to use in the domain logic layer. This is a great help and simplification from .NET 3 and 3.5 where we had to jump through hoops to have the view-model layer notified when a collection changed.

However, the WPF GUI really doesn’t like receiving CollectionChanged events from a background thread, and it usually complains bitterly by throwing an InvalidOperationException exception.

However, to simplify, or rather completely eliminate, the need to deal with the dispatcher on a regular basis I use aspect oriented programming. Specifically, I use the amazing and very well designed PostSharp from SharpCrafters. They actually document two very nice dispatcher aspects that can cause a method to automatically run on the GUI thread or a background worker thread.

So after searching my entire code-base for this attribute I found only two occurrences, one when the domain-logic-layer adds a shape to the screen, and the other when it is removed:

Every other occurrence of the ExeucteOnGuiThread aspect in my solution is in the presentation layer where I have the presentation layer subscribing to an event from something that is documented as possibly raising the event on a background thread (such as the splash-screen progress callback or the data-export progress call-back).

Testing

So how do I test this stuff? Unfortunately, usually I test it manually, which is why I try and keep it to a minimum! Another option is to modify your dispatcher code, as I have done, so that you can control the dispatcher that is used, or tell it not to use a dispatcher at all. This means that in your unit tests you can configure no dispatcher and your unit test is essentially single-threaded. This is useful for GUI tests, but not necessary when testing methods attributed with the execute-on-background-thread aspect. This is because although it’s tricky, it’s not really that hard to test multi-threaded code in the domain logic and database layers. For this reason (and improved reuse of course) this is why I much prefer multi-threaded code to be in a non-GUI related layer.

My customized ExecuteOnGuiThread aspect

Extending the PostSharp aspect is remarkably simple, you just have to get over the increasingly common yet unjustified paranoia of all things static. Most unit test zealots will tell you that using static member variables, especially public ones, make your code less testable. However in this case we’re actually using a public static property to make our code much easier to test.
So, in my ExecuteOnGuiThreadAttribute aspect I declare the following public static property:

In my unit tests I simply don’t bother to set this and then the aspect will run it on the same thread as the rest of the unit test, simplifying the unit test greatly.

However, in my actually application start-up code I simply set this as follows:
ExecuteOnGuiThreadAttribute.ApplicationUiDispatcher = System.Windows.Threading.Dispatcher.CurrentDispatcher;
And then any code with that aspect applied will use the application dispatcher.

Here’s the actual code for the aspect’s OnInvoke() method to make this happen pretty seamlessly:

From then on, most of the time you really don’t have to think about it at all, things just work the way they should. That means that I can be a more productive programmer because I’m thinking about the domain problem and not complex threading issues. Once again, a carefully designed aspect makes us much more productive. 🙂 It also means that my view-model layer has no threading code in it whatsoever.

The Presentation layer and threading

For the data-export progress I cheated a little and by-passed the view-model layer. In my data-export component I defined a generic interface:

My dialog box WPF window actually has two constructors, which is something that a lot of WPF developers seem to forget is possible. The first is your basic default constructor created by Visual Studio when I created the Window, and is used by Visual Studio and Expression Blend:

The second constructor is the one my command uses (in this case, the command is defined in the GUI,
which I don’t mind because this particular common would need to be re-written for a dramatically different
GUI version of the application anyway).
This second constructor expects to be given an instance of an ISessionExporter interface, and the first thing it does is subscribe to the OnExportProgress event

The OnExportProgress method of course has my ExecuteOnGuiThread aspect applied to it, making it then easy to update the progress bar control on the dialog box:

So in this case I’m not using MVVM for my progress bar, but instead I cheated slightly and was done programming
the feature slightly faster.
In this case I believe the choice was the right one, because the data-export code is still highly abstracted by the interface,
and the dialog box is not likely to change so much in the future that I would wish I had a view-model layer.
Additionally, in this case there is very little data-translation that needs to be done,
which also reduced the usefulness of trying to build a view-model to encapsulate the data-export progress.

I hope this helps,

Robert

Lies, damn lies, and “l’ll document it later”

One of the most frequent lies programmers tell themselves is that “I’ll document it later”. Programmers are incredibly busy people (often on multiple projects) and rarely have an opportunity to go back and document code. Additionally, the more code you write, the larger your backlog becomes and the likelihood of going back and documenting your code – and especially documenting it well – continuously decreases, eventually to zero.

The major problem with this is that I have never believed-in or even seen self-documenting code (that was not completely trivial). I’m a big proponent of descriptive function names and long variable names but I do not believe they tell the whole story. Recently, a fellow developer and I got in a discussion about self-documenting code; he argued that all code should be self-documenting and I argued only trivial code could be self-documenting. So I took a nontrivial piece of code that had passed code review with flying colors and removed all of the documentation from it; then I showed it to him. He could tell it was doing something with percentages but beyond that was at a loss. When I added the comments back in he groaned and said, “Well of course!” which I thought proved my point rather nicely.

The problem: what is supposedly self-documenting to the original programmer when they wrote it may not be self-documenting to a completely different programmer. The original programmer has no way of knowing the background and experience level of the programmer that will be maintaining the code in the future. So while the following might be completely decent documentation for the original author:

The above code and documentation is probably completely useless to you or me. What is a “dingle-fart”? Why does this class have one? What are the valid values for the integer and what effect does it have on the class? Where can I read about the concepts? For the really lazy, is there at least a Wikipedia article URL that could be included in the code that contains relevant domain information? (Don’t copy the Wikipedia article’s text – that takes the information out of context, bloats the code, and doesn’t get updated by the Wikipedia authors).

Of course, this is only one example of completely poor documentation. Another is:

The documentation on this example provides absolutely no further insight into the code than the code does itself. I could’ve written this documentation without having the faintest clue about what was going on. Much better would be at least the name of the equation being used, again, perhaps a link to the relevant Wikipedia page or internal company documentation or use case (if the use cases are stored in some well indexed persistent location).

Moreover, even the original author may not understand the code two or three years down the road (or less if the programmer is incredibly busy and writing lots and lots of code while not getting very much sleep, like if you have a brand new baby girl in the house :-)). Personally, I have written lots of code that I have no memory of writing years later (I have 4 kids, all ≤5-years old – that’s my excuse anyway) so I view good documentation as a message to myself in the future on what I was trying to do at the time and why I was doing it. Then, when I have some sleep, if I find code where the code doesn’t seem to match the documentation, I know I have found something that needs to be investigated and may be the bug I’m currently looking for.

I certainly do NOT view it as a liability when the code and documentation don’t seem to match!  It just means that something needs to be changed (either the code or the documentation) and I need to figure out which. Without the documentation, I would have absolutely no way of knowing that the programmer’s original intent differed from what was actually coded. Therefore having the documentation is still very valuable contrary to what the “self-documenting crowd” would have you believe.  Additionally, the documentation is in English, and should be conversationally written (not stilted and formal) so it’s much more likely to represent what the developer was actually trying to accomplish with the code.

Thus, code reviews should also examine the documentation to make sure it properly describes the intent of the code.  Of course if it doesn’t, the developer should update it!

The goal

Good documentation should communicate why the programmer coded the algorithm the way they did and what they hoped to accomplish. The how is of course right there in the code, however even then it is often useful to provide higher-level descriptions. With the code example above I would like to know what the property actually is in layman’s terms, or in terms that you could reasonably expect somebody in the (appropriate) industry to understand.

From “The Pragmatic Programmer”, Andrew Hunt and David Thomas suggest that based on the Do-Not Repeat Yourself (DRY) principle:
[quote style=”1″]The DRY principle tells us to keep the low-level knowledge in the code where it belongs, and reserve the comments for other, hi-level explanations.[/quote]So one of the major problems with “documenting later” is that the “why?” and the “what?” are not as clear in the programmer’s mind as they were at the time they wrote the code. Of course the problem gets worse the longer we wait to document the code.  Eventually the “why?” and “what?” may be lost completely, and even if we do eventually go back to the code and try to add documentation, will be attempting to reconstruct it from the “how?”

Finally

Note that I want to see documentation on classes, on methods and properties, and even on private member variables, and especially inside methods! Here’s an example snippet from some real production code I wrote a few years ago inside a method:

I grabbed this code basically at randomly from my application, and even without seeing all of the code in the method or the documentation on the method I believe that this code is reasonably easy to understand.  By happy coincidence this is actually some of the most complex code I’ve written.  The green comments become visual separators within the code, helping the reader to “chunk” the related pieces of code together (which is also why white space is very important!). You might be able to understand the code without the comments, but with them the code is much easier to read.

Leaky Data Access Layer Encapsulation

I’m doing a code review right now of some domain logic layer code. Unfortunately, sprinkled throughout the domain logic layer are properties such as:

This is definitely the database implementation leaking into the domain logic layer. I’m not going to comment (yet?) on whether a string column in the database with “Y” and “N” values is a good idea, but it’s definitely a bad idea in the domain logic layer. Furthermore, notice that the setter doesn’t even check that the value is actually ‘Y’ or ‘N’ (and are ‘y’ and ‘n’ okay too, or are they “right out”?). The convenient use of automatic properties is handy, but in this case completely inappropriate because I could set “Negatory!” – and I probably won’t notice until the database blows up (hopefully). Additionally, because this is a string property I’m forcing all the code that wants to use this property to manually check the value:

Sadly, this further propagates how the database is storing its values, and provides lots and lots of code duplication anywhere this property needs to be used. (Notice also that this would fail if somebody had set the property to “Yes” or “1”).

Therefore, MUCH better is the following, normal .NET code:

By changing the domain logic layer property to be a Boolean property we don’t need to document what the “appropriate” string values are, and we also don’t need to check that the appropriate value is being set. This also makes all use of the property obvious and easy, and unaware of the underlying database implementation:

Notice that this change does not impose any change on the database – the Database Access Layer (DAL) should completely encapsulate how the database actually stores this particular property, and there is no ambiguity in the domain logic layer code and all the higher level layers that use it. This is also the first step towards refactoring the database to store our Boolean property as something other than a VARCHAR…

Part 2: Another leaky encapsulation

Another leaky encapsulation of the database implementation is properties such as:

This of course shares all the same flaws as the previous leaky encapsulation, but also, what the hell are “Inir” and “Inna“?!

MUCH better is to provide a public enumeration in the domain logic layer:

And then change the property to use the enumeration:

Again, the database access layer should be completely and strongly encapsulating the strange codes that the database designers chose to use for the various values.

Conclusion:

Be very aware of how database implementation details can leak into the domain logic layer and beyond, and put a stop to it if you can help it! Your code will be much better off, easier to understand and much easier to maintain and improve over time.

Catching exceptions

Last night I had a bit of a problem: I had deployed db4otool.exe to a Windows Server 2008 build server, and it was failing during the build with error code -2. Nowhere could I find out what return code -2 actually means. Now the same program worked just find on my local Windows 7 server, operating on exactly the same collection of DLLs.

Sigh.

Now db4otool.exe is open source, and I could have downloaded the source code, got it compiling, installed the debug version on my build server, and then attempted to connect to it remotely with a debugger to identify the problem. Or I could have searched through the source for where the code returned -2. Neither idea seemed very quick or guaranteed.

So, after some additional thinking about the problem, I downloaded Process Monitor from technet, and ran it. It gathers information about what every process on the computer is doing, and then you can filter the gathered data to just what you’re interested in.

Usability aside:

Unfortunately, the program suffers from lots of usability issues, but it’s powerful nonetheless. For example, to start capturing data you click on a button that looks like this:

Clicking it converts it to a button that looks like this:

Alan Cooper talks about the usability problems of this approach in About Face 3, Chapter 21 on page 445: “Flip-flop buttons: A selection idiom to avoid”. Briefly, he says:

“Flip-flop buttons are an all-too-common control variant used to save interface real-estate. Unfortunately, this savings comes at the cost of considerable user confusion.
…
The problem with flip-flop controls is that they fail to fulfill the second duty of every control – to inform users of their current state.”

In this case, when you are not gathering data the icon shows as a magnifying glass with an X, so it is showing you the current state – but when I first saw it I assumed that it was already off, but I needed to find a button that would turn data gathering on. So Cooper is still right: the button can’t show you an action and be displaying the current state at the same time; he goes on to say, “Don’t use them”.

Anyway, back to diagnosing db4otool.exe:

I gathered data on the failing machine, and on my Windows 7 machine where db4otool.exe was working, and then I filtered it and compared the two (manually, I couldn’t find any way within the tool to do the compare). This was actually pretty easy.

Anyway, here’s the problem:

After doing quite a bit of work, db4otool.exe was trying to open Mono.Cecil.Pdb.dll, and failing (notice the “NAME NOT FOUND” in the result column). SO! I deployed the missing DLL to the build server and db4otool.exe immediately worked just fine.

I should have received a bug huge ugly and incredibly helpful error message: System.IO.FileNotFoundException which contains details on exactly which DLL cannot be found! Instead, the db4otool.exe source code must have a generic:

catch(Exception e) {
…
}

block in it somewhere, and they simply return -2.

Point being, this is a great example of how catching System.Exception can make solving problems much harder! If the db4otool.exe authors had been more careful and caught only the exceptions they were expecting, I would have been able to solve this error in 30 seconds rather than 20 minutes.

Don’t catch System.Exception, unless you’re re-throwing it or displaying some really useful error message (and “-2” doesn’t count as useful…).