frequency shrinks towards early elements even if they have 0 likelihood


This will always give the smallest failing case as [[:a]]. The initial tests are correct in that :a never appears in the vector because its likelihood is 0. However, due to the way frequency shrinks, the smallest failing test case is going to [[:a]] in basically all cases. In other words, the during shrinking the generator is creating test cases that would never legitimately appear during regular testing.

I have a situation where I am generating templated generators and then using a separate configuration to set the likelihoods. And there are many cases where I would like to configure certain elements of the template to never appear even (or especially) during shrinking.

I would actually suggest that the shrinking behavior of frequency is probably not the most obvious or useful behavior. For elements where likelihoods are the same it makes sense to shrink towards the earlier elements. However, when likelihoods differ I think frequency should shrink towards the highest likelihood elements. This would seem like a more natural shrinking strategy (at least to me) and would have the nice benefit of fixing the above behavior.




May 10, 2017, 1:09 PM

I think the opt-in might be necessary. I think a common pattern with frequency (e.g., here, though arguably it'd be better for the 0.0 to go first) is to have some simple cases with small weights and complex cases with larger weights. I'm not sure how to get the "shrinking toward simpler things" behavior with the new algorithm we were discussing.

May 10, 2017, 1:13 PM

Now that I think about it, do you have a natural example of a use where shrinking toward the larger weight makes sense?

May 10, 2017, 1:14 PM

I'm thinking that A) maybe it's not that natural after all, and B) users can always get this behavior manually by sorting their collection however they want before they call gen/frequency.

May 11, 2017, 12:23 AM

You're right that with the solution you suggested it's fairly easy to get the desired behavior with a new generated that just sorts the pairs first and then calls frequency. So thanks for that!

On the other hand, I still think that the proposed fix makes sense. I think it's probably pretty common expectation (it certainly is/was mine) that shrinking should result in cases that resemble samples you would get with small sizes in the normal (non-shrinking) mode. For example:

The vast majority of samples are going to be sequences of all :c. If the failure case is when a :b appears in the sequence, the current behavior will return something like [:b :a :a :a :a] but this is not really a shrunk case IMO. I think part of the implicit meaning of shrunk is simpler. In which case [:b :c :c :c :c] is really the simpler case. It's a failure case that is MUCH more likely to occur according to the frequency specification and for that reason is probably easier to debug than the more "complex" shrunk case it returned.

Or another way of looking at it is that the current shrink method tends to shrink to cases that are of a different nature than the first failure. This behavior is what led to me discovering the bug. For a more concrete example related to what I'm doing:

I know that audio tag is just plain broken but I suspect that there is something wrong with the div tag, I update the weights (from a config file) to {:audio 0, :div 1000}. Now I run my test and it does in fact fail due to a div tag. With the current frequency behavior I will invariably end up with a shrunk case containing only an audio tag. Obviously not what I wanted.

Now consider the behavior if we filter out 0 likelihood pairs. The test will no longer shrink to just an audio tag, but it may be that (unbeknownst to me) the applet tag is also broken in some cases. So I run the test again and it fails again on the div case. The shrink process begins and it does initially search for some simpler div failure cases, but eventually it tries an applet case which fails and so it continues shrinking that. The final result is a shrunk case that is not exhibiting the original bug it found. While it's of course interesting to know about the new applet failure case, that's not really what I was asking for. I could eliminate all the tags except div, but then my failure goes away because the real failure is caused by an "a" tag within a div tag. So now my tests all pass.

One mode that I will be testing is pair-wise testing. I.e. using the weights configuration to set two tags to relatively high likelihoods and the rest of the values to very small or zero likelihoods. For something like pair-wise testing it is important that the shrinking process tries to shrink to the higher probability pairs otherwise it will focus on the wrong things.

Anyways, I can get by with the current workaround, but I do think that the current behavior is a bit unexpected when you really start to use frequency in anger.

Also, regardless of this change, would it be more efficient if the shrink process binary searched the pairs rather than just iterating through them from 0 up to the current idx?

October 13, 2017, 2:45 PM

I just added a commit on master to filter out zero-weighted entries before doing anything.

W.r.t. the binary search idea, I don't think it would make shrinking more efficient if the requirement is that we find the earliest generator in the list that returns a failure.

If you think that's not a good requirement, I'm interested to know what alternative you'd prefer.

Thanks for the ticket!