**UPDATE:** Offer of mentorship is closed, since I received sufficiently many candidates for now. Offer of collaboration remains open for experienced researchers (i.e. researchers that (i) have some track record of original math / theoretical compsci research, and (ii) are able to take on concrete open problems without much guidance).

I have two motivations for making this offer. First, there have been discussions regarding the lack of mentorship in the AI alignment community, and that beginners find it difficult to enter the field since the experienced researchers are too busy working on their research to provide guidance. Second, I have my own research programme which has a significant number of shovel ready open problems and only one person working on it (me). The way I see it, my research programme is a very promising approach that attacks the very core of the AI alignment problem.

Therefore, I am looking for people who would like to either receive mentorship in AI alignment relevant topics from me, or collaborate with me on my research programme, or both.

# Mentorship

I am planning to allocate about 4 hours / week to mentorship, which can be done over Skype, Discord, email or any other means of remote communication. For people who happen to be located in Israel, we can do in person sessions. The mathematical topics in which I feel qualified to provide guidance include: linear algebra, calculus, functional analysis, probability theory, game theory, computability theory, computational complexity theory, statistical/computational learning theory. I am also more or less familiar with the state of the art in the various approaches other people pursue to AI alignment.

Naturally, people who are interested in working on my own research programme are those who would benefit the most from my guidance. People who want to work on empirical ML approaches (which seem to be dominant in OpenAI, DeepMind and CHAI) would benefit somewhat from my guidance, since many theoretical insights from computational learning theory in general and my own research in particular, are to some extent applicable even to deep learning algorithms whose theoretical understanding is far from complete. People who want to work on MIRI's core research agenda would also benefit somewhat from my guidance but I am less knowledgeable or interested in formal logic and approaches based on formal logic.

# Collaboration

People who want to collaborate on problems within the learning-theoretic research programme might receive a significantly larger fraction of my time, depending on details. The communication would still be mostly remote (unless the collaborator is in Israel), but physical meetings involving flights are also an option.

The original essay about the learning-theoretic programme does mention a number of more or less concrete research directions, but since then more shovel ready problems joined the list (and also, there are a couple of new results). Interested people are advised to contact me to hear about those problems and discuss the details.

# Contact

Anyone who wants to contact me regarding the above should email me at vanessa.kosoy@intelligence.org, and give me a brief intro about emself, including knowledge in math / theoretical compsci and previous research if relevant. Conversely, you are welcome to browse my writing on this forum to form an impression of my abilities. If we find each other mutually compatible, we will discuss further details.

In two weeks or so can you please post an update about how many people reached out. This might partially demonstrate how much actual demand there’s for something like this. Also, thank you!

Update: In total, 16 people contacted me. Offer of mentorship is closed, since I have sufficiently many candidates for now. Offer of collaboration remains open for experienced researchers (i.e. researchers that (i) have some track record of original math / theoretical compsci research, and (ii) are able to take on concrete open problems without much guidance).

Thanks for the update! I’m glad to hear there was some traction. Certainly more than I expected.

I'm curious how this has turned out. Could you give an update (or point me to an existing one, in case I missed it)?

I accepted 3 candidates. Unfortunately, all of them dropped out some time into the programme (each of them lasted a few months give or take). I'm not sure whether it's because (i) I'm a poor mentor (ii) I chose the wrong candidates (iii) there were no suitable candidates or (iv) just bad luck. Currently I

amworking with a collaborator, but ey arrived in a different way. Maybe I will write a detailed post-mortem some time, but I'm not sure.Sure, I will do it :)

I'm really excited that you are doing this. I recognize that it's time consuming and not often immediately profitable (in various senses of that word, depending on your conditions) to do this sort of thing, especially when you might be working with someone more junior who you have to spend time and effort on training or otherwise bringing-up-to-speed on skills and knowledge, but I expect the long-term benefits to the AI safety project may be significant in expectation, and hope more people find ways to do this in settings like this outside traditional mentoring and collaboration channels.

I hope it works out, and look forward to seeing what results it produces!

Thank you, I appreciate the positive feedback :)

Hi, I also have a reasonable understanding of various relevant math and AI theory. I expect to have plenty of free time after 11 June (Finals). So if you want to work with me on something, I'm interested. I've got some interesting ideas relating to self validating proof systems and logical counterfactuals, but not complete yet.

Can you tell me more about your ideas related to logical counterfactuals? They're an area of been working on as well.

Here is a description of how it could work for peano arithmatic, other proof systens are similar.

First I define an expression to consist of a number, a variable, or a function of several other expressions.

Fixed expressions are ones in which any variables are associated with some function.

eg (3×infx((x×(x+5))+2)) is a valid fixed expression. But (y+4)×3 isn't fixed.

Semantically, all fixed expressions have a meaning. Syntactically, local manipulations on the parse tree can turn one expression into another. eg (a+b)×c going to a×b+a×c for arbitrary expressions a,b,c.

I think that with some set of basic functions and manipulations, this system can be as powerful as PA.

I now have an infinite network with all fixed expressions as nodes, and basic transformations as edges. eg the associativity transform links the nodes (3+4)+5 and 3+(4+5).

These graphs form connected components for each number, as well as components that are not evaluatable using the rules. (there is a path from (3+4) to 7. There is not a path from 3+4 to 9. ) now

You now define a spread as an infinite positive sequence that sums to 1. (this is kind of like a probability distribution over numbers.) If you were doing counterfactual ZFC, it would be a function from sets to reals.

Each node is assigned a spread. This spread represents how much the expression is considered to have each value in a counterfactual.

Assign the node (3) a spread that assigns 1.0 to 3 and 0.0 to the rest. (even in a logical counterfactual, 3 is definitely 3). Assign all other fixed expressions a spread that is the weighted (smaller expressions are more heavy) average of its neighbours. (the spreads of the nodes it shares an edge with). To take the counterfactual of A is B, for A and B expressions with the same free variables, merge any node which has A as a subexpression, with the version that has B as a subexpression and solve for the spreads.

I know this is rough, Im still working on it.

You've explained the system. But what's the motivation behind this?

Even though I only have a high level understanding of what you're doing, I generally disagree with this kind of approach on a philosophical level. It seems like you're reifying logical counterfactuals, when I see them more like an analogy, ie. positing a logical counterfactual is an operation that takes place on the level of the map, not the territory.

The general philosophy is deconfusion. Logical counterfactuals show up in several relevant looking places, like functional decision theory. It seems that a formal model of logical counterfactuals would let more properties of these algorithms be proved. There is an important step in going from an intuitive fealing of uncertainty, into a formalized theory of probability. It might also suggest other techniques based on it. I am not sure what you mean by logical counterfactuals being part of the map? Are you saying that they are something an algorithm might use to understand the world, not features of the world itself, like probabilities?

Using this, I think that self understanding, two boxing embedded FDT agents can be fully formally understood, in a universe that contains the right type of hyper-computation.

I mean that there isn't a property of logical counterfactuals in the universe itself. However, once we've created a model (/map) of the universe, we can then define logical counterfactuals as about asking a particular question about this model. We just need to figure out what that question is.