Executive summary :

In January 2021, the cabinet of Mark Rutte, Dutch prime minister, was forced into early resignation following the revelation that the Dutch tax administration and in particular its toeslagen unit (social security unit) had unlawfully requested the full retroactive reimbursement of all childcare allowances received to an estimated 35.000 parents. Childcare costs in the Netherlands being one of the highest in OECD countries, requesting the full retroactive reimbursement, led tens of thousands of parents into serious financial hardship, forcing them to sell their real estate, and even to suicide for some of them. The report of the parliamentary investigative committee, titled ‘unprecedented injustice’ showed how machine-learning algorithms and unfettered automation played a certain part in the seven years long conundrum called the toeslagenaffaire.

Facts of the case:

The toeslagenaffaire is the conjecture of a number of legal, political and socio-economic factors, in addition to the use of machine-learning algorithms. In this short piece, we will focus on the use of machine-learning and the discrimination which resulted from such use. Should you be interested in the complete recounting of these events, we invite you to read: D. Hadwick & S. Lan, ‘Lessons to be learned from the Dutch childcare allowance scandal: A comparative review of algorithmic governance by tax administrations in the Netherlands, France and Germany’ (2021) World Tax Journal, Vol. 13, Issue 4.

The events of the toeslagenaffaire began around the year 2011-2012, with the revelation in the press of a number of fraud cases involving social security payments made to individuals who in fact were effectively residing in other EU Member States, and thus were not legally eligible to such payments. This bulk of cases were baptized as the Bulgarenfraude (Bulgarian frauds) as it appeared that some of these cases involved families of Bulgarian nationality, sometimes linked to organized childcare institutions that would receive part of the proceeds of these social security frauds. The Bulgarenfraude coupled with serious backlogs in Dutch social security institutions created the incentive for the Dutch government to revamp the social security system and increase the efficiency of the recovery process in case of errors or frauds by welfare recipients. Increasing the speed of the recovery process was done through several means, for instance by being less lenient on welfare recipients who commit mistakes and treating any mistake over a certain threshold as a potential fraud, thus discontinuing allowance payments even for bona fide parents. Another mean by which the Dutch government sought to increase the efficiency of the recovery process was the integration of machine-learning algorithms, to automate the identification of errors and fraud. In practice, the Dutch toeslagen unit integrated a risk-detection algorithm to process social security documents remitted by parents to the Dutch tax administration. In case the algorithm found the slightest mistake in these documents, e.g. a box wrongly filled or an omitted signature, childcare allowances would be automatically discontinued and all aids retroactively perceived would be reclaimed by the tax administration. In turn, this meant that the slightest mistake on social security documentation would cost parents tens of thousands of euros in reimbursement, as the law on income-related schemes (Art. 26 AWIR), barred the administration from applying the principle of proportionality – a doctrine which was ruled lawful by the Dutch Supreme Administrative Court on multiple occasions. The use of this risk-detection algorithm, extremely accurate at discovering mistakes in a legal regime of zero-tolerance for any mistake created the first bulk of victims of the toeslagenaffaire. After the scandal unfolded in the press, it was revealed that the aids reclaimed by the administration to parents amounted to 27.500-30.000 euros on average for parents, which they had to pay in full without any possibility to repay this debt in several tranches, and often with the addition of late fees which could accumulate on top of it. One does not need to have a PhD in economics to understand that for most families, this would lead to serious financial hardship.

Moreover, the toeslagen unit integrated a second machine-learning tool, a risk-scoring algorithm to automate the selection of childcare allowance recipients, for further audits by tax officials of the Belastingdienst. Characteristic of machine-learning algorithms, the system implemented by for toeslagen derived risk-factors on the basis of the analysis of historical data, i.e. known positive and negative cases of fraud, in order to automatically process documentation and select welfare recipients for audits. As mentioned in the piece on the SyRI case, such method creates a well-documented serious risk of machine-bias and discrimination. To negate such risk, the data processed both during and after the learning process must accurately represent the target population, i.e. welfare recipient, and the risk-factors (so-called weights in machine-learning terminology) must be as free as possible from any undesired prejudicial effect on minorities, or individual of lower socio-economic status. The elimination of such bias requires vigilance, constant monitoring, random audits and a certain skepticism vis-à-vis the conclusions of the algorithm. In the toeslagenaffaire, the Dutch Data Protection Authority (AP) and the National Audit Service (ADR) showed how, the complete opposite attitude could be observed. The toeslagen unit, did not apply vigilance and monitor
whether the algorithm is free of bias, in fact tax officials indirectly induced the algorithm to be biased. So much so that when the AP operated a so-called ‘twin test’, it found that the algorithm would increase the risk of fraud for individuals who were not Dutch, compared to Dutch individuals in analogous situations, all other things being equal. A ‘twin test’ is a simple procedure where one creates two identical fictitious profiles, but for one characteristic, in this case the characteristic was ‘Dutch/non-Dutch’. If one individual was not Dutch, such as an EU foreigner permanently residing in the Netherlands, the algorithm would automatically predict that the risk of fraud was higher. In other words, the algorithm was discriminating upon residents of foreign origins. In its report, the AP describes one particularly disturbing case, where upon fraud signals regarding one specific childcare institutions, in which approximately hundred parents of Ghanian origins where suspected of fraud, the toeslagen unit arbitrarily decided to investigate all 6.047 parents of Ghanian origins in the Netherlands. In such conditions, where supposed ‘objective’ ‘data-driven’ decisions are in fact arbitrary, biased and discriminatory, it is hard to see how a machine would not adopt the same discriminatory stance as human tax officials did. This case of Ghanian parents is not the only instance, where foreigners were targeted as a result of an arbitrary top-down decision of the upper management of the Belastingdienst. By adding, little by little, these arbitrary manual targeting of
foreigners in the machine-learning system – which constantly learn and update the risk-factors based on historical data, the toeslagen unit induced a bias in the algorithm. As a result, the algorithm heuristically concluded that non-Dutch welfare recipients were more prone to fraud, which so far has never been corroborated in peer-reviewed literature. Even more disturbing is the fact that welfare recipients which the algorithm predicted to be potential fraudsters had their allowance discontinued without any ex-post assessment by human tax officials. As said, the use of machine-learning requires skepticism and vigilance, not a blind faith in its predictions as the Belastingdienst exhibited in the toeslagenaffaire. These parents constitute the second bulk of victims of the toeslagenaffaire, welfare recipients who had been discriminated by an machine-learning which had learned from human tax officials that foreigners were more prone to social security fraud. Ultimately, it seems that the toeslagen unit, remained strongly animated by the events of the Bulgarenfraude, to such an extent that they concluded that their archetypal focus should be on non-Dutch residents. Yet, while the events of the Bulgarenfraude are estimated to have resulted in 10 million euros losses for the Belastindienst, the Dutch government had to create a fund of 500 million to compensate the victims of the toeslagenaffaire. Hence, this bias resulted in much heftier bill for the Belastingdienst, the Dutch government and the Dutch society as a whole.

Key takeaways:

The first takeaway from the toeslagenaffaire is that one cannot automate a process that is initially flawed, as it will only magnify the scale of the flaws of these processes. Machine-learning algorithms were used in the toeslagenaffaire to reduce the serious backlogs in the childcare allowance recovery process. Yet, as the recovery process was seriously flawed in the first place, primarily due to the lack of proportionality in case of errors by welfare recipients, the algorithms had the complete opposite result to what was intended. It further increased the backlog at the Belastingdienst and the number of complaints of welfare recipients against the toeslagen unit.

The second takeaway is that States should not use machine-learning and automation without creating ex-ante the legal and technological conditions for these algorithms to be used in a lawful and ethical manner. Prior to the toeslagenaffaire, it had been numerous times posited that machine-learning creates serious risks of bias and discrimination, but also to the right to a fair trial, to good administration, to privacy, data protection, etc. On that basis, integrating machine-learning algorithms without bolstering the safeguards and the protection of these rights, is nothing short of a recipe for disaster.

Which leads to the third takeaway, the adoption of machine-learning tools roughly coincide with the global financial crisis of 2008-2011, where austerity measures led to the reduction of the workforce of tax administrations in the EU. This should be cause for concern, as a reduction in staff is likely to reduce the number of tax officials who deal with ex-post complaints of taxpayers, and thus is likely to reduce the degree of review which can be materially operated by human tax officials. In other words, we are not creating the conditions for human-centric AI. Rather, we are creating a system of AI-centric decision-making, which is antithetic to the objectives we should embrace in regimes of algorithmic governance. The lack of regulation and safeguards to negate the risks of machine-learning, as well as the reduction of the workforce of the tax administration can be observed in the entire EU. Hence it is crucial not to candidly imagine that the toeslagenaffaire is a Dutch-isolated phenomenon, which cannot repeat itself in other EU Member States.

References:

D. Hadwick & S. Lan, ‘Lessons to Be Learned from the Dutch Childcare Allowance Scandal: A Comparative Review of Algorithmic Governance by Tax Administrations in the Netherlands, France and Germany’ World Tax Journal (2021), Vol. 13, Issue 4