(Ab)using Rust traits to write silly things
Recently I’ve found some cursed Rust code and decided to make a little joke/question on twitter. In the tweet I’ve presented some unusual code and asked “how could it compile?”. The solution was found quite fast by @Veykril (kudos to them!) and in this post I want to explain it in detail.
The joke/question⌗
So in the tweet I’ve presented the following code:
fn main() {
for x in lib::iter::<u32> {
let _: u32 = x; // assert that `x` has type u32
}
for y in lib::iter::<String>() {
let _: String = y;
}
for z in lib::iter { // infer type
let _: &'static str = z;
}
for k in lib::iter() {
let _: f64 = k;
}
}
That compiles with stable compiler, if you have a right lib
!
The interesting bit here is of course that function lib::iter
can be called with or without parentheses.
This is normally not possible, so what is going on?
First of all, some low hanging fruit: for
loops in Rust desugar to something like this:
// Original
for x in i { f(x); }
// Desugaring
{
let mut iter = IntoIterator::into_iter(i);
while let Some(x) = iter.next() { f(x); }
// while let itself desugars to loop+match,
// but that's not the point
}
So before iteration starts into_iter
is called.
Which allows you to pass any type implementing IntoIterator
, not just Iterator
s.
That’s just to say that we need iter
and iter()
somehow evaluate to type(s) that implement IntoIterator
.
As a side note: on nightly there is a trait IntoFuture
that is just like IntoIterator
, but for futures and is used for .await
desugaring.
You can do all the same stuff with it, so async functions with optional ()
are possible too:
#![feature(into_future)]
async fn _f() {
let _: String = lib2::fut::<String>.await;
let _: [u8; 2] = lib2::fut::<[u8; 2]>().await;
let _: &str = lib2::fut.await;
let _: u128 = lib2::fut().await;
}
BTW I think it should be stabilized soon, so keep an eye on the tracking issue (or don’t (I’m just very excited for this feature)).
But, this all still leaves us with a question: how can functions be called without parenthesis?
Tempting idea that doesn’t work⌗
The title of this section is concerning, but why can’t we just make a const
+impl IntoIterator for fn()
?
So the idea is to write something like this:
const iter: fn() -> Iter = || Iter;
impl IntoIterator for fn() -> Iter { /* not important */ }
struct Iter;
impl Iterator for Iter {}
Then for _ in iter {}
would work because of the IntoIterator
impl
and for _ in iter() {}
would work because iter
’s type is a function pointer that returns a type that implements Iterator
.
But… this doesn’t work for the following two reasons:
- You can’t implement a foreign trait (
IntoIterator
) for a foreign type (fn() -> _
) and standard library doesn’t (yet?) implementIntoIterator
for function pointers. (you could patchstd
but then this won’t work with the stable compiler) - Constants can’t have generic parameters! So
iter::<T>
won’t work.
So, we need to find something else to (ab)use.
Hack #1⌗
Uh? There are multiple hacks at play here already?
Yes! There is a hack for iter::<T>
and for iter::<T>()
, we’ll start with the former.
iter::<T>
looks a lot like a unit structure with a generic parameter.
Maybe it is a unit structure with a generic parameter?
If only things were that simple… In Rust you need to use all generic parameters in the type, or else your code won’t compile:
struct S<T>(u8);
//~^ error: parameter `T` is never used
//~| help: consider removing `T`, referring to it in a field, or using a marker such as `PhantomData`
//~| help: if you intended `T` to be a const parameter, use `const T: usize` instead
This is because compiler wants to infer variance of all parameters. Since a unit structure, by definition, doesn’t have any fields, you can’t use generic parameters in it!
Wait, compiler mentioned PhantomData
, isn’t that a unit structure with a generic parameter?…
It is!
So I assume we can’t use it because we can’t impl IntoIterator
for it either.
But why can’t we copy its definition into our code?
Well… just look at its definition:
#[lang = "phantom_data"] // <-- *compiler magic*
pub struct PhantomData<T: ?Sized>;
Oh.
Yeah…
So, there is no way around this, to define a PhantomData
-like type, we need to do something hacky…
A hack to do this I first saw implemented by dtolnay (not surprising, is it?) in their crate ghost
.
The hack basically looks like this:
mod imp {
pub enum Void {}
pub enum Type<T> {
Type,
__Phantom(T, Void),
}
pub mod reexport_hack {
pub use super::Type::Type;
}
}
#[doc(hidden)]
pub use imp::reexport_hack::*;
pub type Type<T> = imp::Type<T>;
Wha-
It may seem convoluted at first, but it’s actually quite simple!
So, let’s unpack this item-by-item:
pub enum Void {}
defines a type with no values, also known as uninhabited type. The nice property for us is that values of this type can’t be created. That is basically a stable replacement for the!
type.Type<T>
has two variants: unit variantType
and a struct variant__Phantom(T, Void)
. The latter usesT
, solving the “parameterT
is never used” error while simultaneously being impossible to construct because of theVoid
field. Since__Phantom
variant is impossible to create / uninhabited,Type<_>
effectively has only a single usable variant.reexport_hack
reexportsType::Type
(variantType
of the typeType
)pub use imp::reexport_hack::*;
is a glob reexport that reexportsType::Type
that was reexported byreexport_hack
. I’m not entirely sure why, but using glob is important.pub type Type<T> = imp::Type<T>;
basically reexports theType
itself. It’s just rendered in docs in a nicer way than if reexported bypub use
And now the magic: Type
now refers both to the type and to the variant.
This works because Rust has different namespaces for types and values.
Glob reexport somehow suppresses an error about clashing names that arises when importing directly.
Idk why it’s this way :shrug:
Ah, and let _ = Type::<u8>
works because you can apply generic parameters to variants.
It’s the same way as None::<Fish>
is an expression of type Option<Fish>
or Ok::<_, E>(())
is an expression of type Result<(), E>
.
That’s a lot… But I think I’ve grasped the concept
With this hack, you can define types that are indistinguishable from PhantomData
!
And this time we can use it to define an iter<_>
“unit struct”:
pub type iter<T> = imp::iter<T>;
pub use imp::reexport_hack::*;
impl<T> IntoIterator for iter<T> {
type Item = T;
type IntoIter = Iter<T>;
// capitalized -^
}
struct Iter<T>(...);
impl<T> Iterator for Iter<T> { /* not that important */ }
mod imp { /* basically the same as before */}
This already allows us to do cool stuff like this:
for x in lib::iter::<u32> {
let _: u32 = x; // assert that `x` has type u32
}
for z in lib::iter { // infer type
let _: &'static str = z;
}
let iter: lib::iter<()> = lib::iter::<()>;
// type ---^^^^^^^^ ^^^^^^^^^^--- constant
Now to the next hack, that would allow us to call iter
too instead of using it as a constant!
Hack #2⌗
I want to make a guess of what we’ll do!!
Uh ok, go ahead!
We could! But we can’t. These traits are unstable and I’m in the stable-compiler jail today.
Oh… Okay then… Do you have another “impossible to guess if haven’t seen before” kind of thing?
Kind of!
We can (ab)use Deref
trait:
impl<T> Deref for iter<T> {
type Target = fn() -> Iter<T>;
fn deref(&self) -> &Self::Target {
&((|| Iter([])) as _)
}
}
Normally Deref
is used for smart pointers like Box
or Arc
so that
- You can use the dereference operator on them (
*my_beloved_arc
) - You can call methods of the inner type (
my_beloved_arc.nice()
)
This makes a lot of sense because smart pointers still just point to values and it’s nice to be able to just call methods.
But!
There is nothing stopping you from implementing Deref
for non-smart pointer types (besides, what is a smart pointer?).
And so abnormally Deref
is used to forward methods.
Is this considered a bad practice?
Uh well mmm yemmm aaa phhhh mmm… mh.. yes? But everyone uses it anyway. It’s even used this way in the compiler itself, so who cares?
Ok, so what was I- Ah, right, and what came as a surprise to me, when you are writing f()
deref coercions can deref f
too!
So f()
can become more like (&*f)()
or in other words f.deref()()
.
This means that by implementing deref to a function pointer for our iter<T>
we can allow to call it!
Full code is on the playground if you want to play with it.
That’s all I have for today, two hacks that I saw used “in the wild” (right, this one is also by dtolnay) and thought that it’s quite surprising and fun thing.
bye.