Statistics
Confidence | 50% | 60% | 70% | 80% | 90% | 100% | Total |
---|---|---|---|---|---|---|---|
Accuracy | 100% | 64% | 52% | 73% | 56% | 0% | |
Sample Size | 3 | 11 | 21 | 15 | 9 | 0 | 59 |
Displaying all predictions made by TurnTrout. Filter by:
-
Subtracting “Anger” embedding from layer 0, position 1 of GPT-2-XL will decrease anger ( 70% confidence; 1 comment )
Created by TurnTrout on 2023-06-07; known on 2023-06-07; judged wrong by TurnTrout on 2023-06-07.
-
The large coefficient results are real ( 75% confidence )
Created by TurnTrout on 2023-05-10; known on 2023-06-06.
-
X-vector in GPT2 isn't just driven by sequence position of x-vector tokens. ( 70% confidence )
Created by TurnTrout on 2023-04-04; known on 2023-06-06; judged right by TurnTrout on 2023-04-04.
-
GPT-J-6B doesn't allow “Love”-“Hate” edit for “I don't wanna hang out with you anymore because”. ( 75% confidence; 1 comment )
Created by TurnTrout on 2023-03-29; known on 2023-06-06; judged right by TurnTrout on 2023-03-29.
-
GPT j 6b's wedding vector works (even tho Garrett said most didn't) ( 40% confidence; 1 comment )
Created by TurnTrout on 2023-03-28; known on 2023-06-06; judged right by TurnTrout on 2023-03-28.
-
IGPT converges faster than DT in offline RL paper ( 60% confidence )
Created by TurnTrout on 2023-06-06; known on 2023-06-07; judged wrong by TurnTrout on 2023-06-06.
-
Garrett's activation additions work for a residual network trained on mnist. ( 85% confidence )
Created by TurnTrout on 2023-05-22; known on 2023-05-22; judged unknown by TurnTrout on 2023-05-23.
-
MNIST 1->3 steering vector makes non-1 digits come out as “3” ( 20% confidence; 1 comment )
Created by TurnTrout on 2023-05-17; known on 2023-05-17; judged wrong by TurnTrout on 2023-05-17.
-
MNIST 1->3 steering vector transfers to other 1s ( 35% confidence )
Created by TurnTrout on 2023-05-17; known on 2023-05-17; judged right by TurnTrout on 2023-05-17.
-
Optimized wedding vector works really well ( 75% confidence )
Created by TurnTrout on 2023-05-16; known on 2023-05-16; judged right by TurnTrout on 2023-05-16.
-
Within 8 months of serial research, a competent team can find a “secret spilling” steering vector which gets a smart model to give up in-context secrets across a variety of situations. ( 54% confidence; 3 wagers; 2 comments )
Created by TurnTrout on 2023-05-13; known on 2025-05-13.
-
David Udell can get both anger and wedding vectors to superimpose in GPT2XL within 20 minutes, showing qualitative effects from both ActivationAdditions ( 45% confidence )
Created by TurnTrout on 2023-05-04; known on 2023-05-05; judged wrong by TurnTrout on 2023-05-05.
-
Adding " wedding" to the middle of ~20-token prompts, will, on average, increase wedding-related wordcount relative to adding to the beginning or end. ( 60% confidence )
Created by TurnTrout on 2023-05-01; known on 2023-05-02.
-
Weighted prompt superposition works in Vicuna on first few tries. ( 60% confidence )
Created by TurnTrout on 2023-04-21; known on 2023-04-22; judged right by TurnTrout on 2023-04-21.
-
Uli got AVE to work on Vicuna-13b ( 40% confidence; 2 comments )
Created by TurnTrout on 2023-04-21; known on 2023-04-21; judged right by TurnTrout on 2023-04-21.
-
https://pastebin.com/F9jwvTPB will only subtly modify completions relative to normal. ( 80% confidence; 1 comment )
Created by TurnTrout on 2023-04-21; known on 2023-04-21; judged wrong by TurnTrout on 2023-04-21.
-
[For Monte's model sweep (https://pastebin.com/zymtNS3v),] The + 64 coefficient additions generally produce less wedding-focused completions than +4 ( 65% confidence; 1 comment )
Created by TurnTrout on 2023-04-14; known on 2023-04-14.
-
[For Monte's model sweep (https://pastebin.com/zymtNS3v),] The +/- 64 coefficient additions produce less coherent behavior ( 65% confidence; 1 comment )
Created by TurnTrout on 2023-04-14; known on 2023-04-14.
-
[For Monte's model sweep (https://pastebin.com/zymtNS3v),] Subtracting this “wedding vector” often increases the incidence of wedding-discussion ( 85% confidence; 1 comment )
Created by TurnTrout on 2023-04-14; known on 2023-04-14; judged right by TurnTrout on 2023-04-14.
-
[For Monte's model sweep (https://pastebin.com/zymtNS3v),] The second half of the model is harder to intervene on and get wedding outputs ( 80% confidence )
Created by TurnTrout on 2023-04-14; known on 2023-04-14; judged right by TurnTrout on 2023-04-14.
-
[For Monte's model sweep (https://pastebin.com/zymtNS3v),] the first prompt tends to elicit more wedding words than the second (about Batman) ( 75% confidence; 1 comment )
Created by TurnTrout on 2023-04-14; known on 2023-04-14; judged wrong by TurnTrout on 2023-04-14.
-
AI x-risk is supported by right-wing US political actors, conditional on it becoming a sufficiently mainstream political issue and ignoring roughly bipartisan consensus ( 35% confidence; 2 wagers; 1 comment )
Created by TurnTrout on 2023-04-10; known on 2025-04-10.
-
Uli is not subscribed to Discord Nitro? ( 75% confidence )
Created by TurnTrout on 2023-04-09; known on 2023-04-09; judged wrong by TurnTrout on 2023-04-09.
-
The “Bush did 9/11” – " " vector (block 23) works in order to generate conspiratorial completions for the prompt “Barack Obama was born in”. ( 10% confidence; 1 comment )
Created by TurnTrout on 2023-04-04; known on 2023-04-04; judged right by TurnTrout on 2023-04-04.
-
Subtracting the mouse-translation vector will make lower mouse go into lower wall, or vice-versa for addition (make the upper mouse stop at imagined wall). ( 14% confidence; 2 wagers; 1 comment )
Created by TurnTrout on 2023-03-14; known on 2023-03-14; judged wrong by TurnTrout on 2023-03-30.
-
Cheese vector and top-right vector will compose in Procgen maze environment (i.e. you can non-destructively apply both at once, and get at least some of their effects). ( 75% confidence )
Created by TurnTrout on 2023-03-13; known on 2023-03-15; judged right by TurnTrout on 2023-03-30.
-
Specific attempt at cheese vector will work ( 10% confidence )
Created by TurnTrout on 2023-03-13; known on 2023-03-13; judged right by TurnTrout on 2023-06-06.
-
X vector works in environments besides Procgen ( 97% confidence )
Created by TurnTrout on 2023-03-11; known on 2023-06-11; judged right by TurnTrout on 2023-03-30.
-
Modifying a maze env to have an extra-high up top right pathway, compared to the normal version, and then taking a top-right vector over this — will work comparably well to the cheese vector. ( 30% confidence; 1 comment )
Created by TurnTrout on 2023-03-10; known on 2023-03-10; judged right by TurnTrout on 2023-03-10.
-
Cheese vector works on at least 50% of alternate size pretrained models, as qualitatively determined by me. ( 80% confidence; 2 comments )
Created by TurnTrout on 2023-03-08; known on 2023-03-08; judged right by TurnTrout on 2023-03-09.
-
In small mazes (<6×6), c55 doesn't pos activate on cheese very reliably (>1.5x false negative rate compared to larger mazes). ( 40% confidence; 2 comments )
Created by TurnTrout on 2023-03-01; known on 2023-03-01; judged wrong by TurnTrout on 2023-03-01.
-
[At decision squares, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 99.5% of the time ( 2% confidence; 4 wagers; 2 comments )
Created by TurnTrout on 2023-03-01; known on 2023-02-16.
-
[At decision squares, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 95% of the time ( 13% confidence; 4 wagers; 1 comment )
Created by TurnTrout on 2023-02-09; known on 2023-02-16.
-
[At decision squares, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 75% of the time ( 49% confidence; 7 wagers; 4 comments )
Created by TurnTrout on 2023-02-09; known on 2023-02-16.
-
[At decision squares, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 50% of the time ( 73% confidence; 4 wagers; 1 comment )
Created by TurnTrout on 2023-02-09; known on 2023-02-16.
-
[At decision squares, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 25% of the time ( 95% confidence; 4 wagers; 3 comments )
Created by TurnTrout on 2023-02-09; known on 2023-02-16.
-
Created by TurnTrout on 2023-03-01; known on 2023-02-16.
-
[Randomly generating train-distribution mazes and considering squares in the top-right 5×5 corner, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 95% of the time ( 26% confidence; 2 wagers )
Created by TurnTrout on 2023-02-09; known on 2023-02-16.
-
[Randomly generating train-distribution mazes and considering squares in the top-right 5×5 corner, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 75% of the time ( 65% confidence; 2 wagers )
Created by TurnTrout on 2023-02-09; known on 2023-02-16.
-
[Randomly generating train-distribution mazes and considering squares in the top-right 5×5 corner, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 50% of the time ( 80% confidence; 2 wagers )
Created by TurnTrout on 2023-02-09; known on 2023-02-16.
-
[Randomly generating train-distribution mazes and considering squares in the top-right 5×5 corner, the 5×5 rand-region cheese maze network will put max cumulative probability on the maximal-advantage action at least] 25% of the time ( 92% confidence; 2 wagers )
Created by TurnTrout on 2023-02-09; known on 2023-02-16.
-
At 1030pm on Sunday Feb 26, there will be a person functionally checking whether or not you are registered for the party via fb/partiful. ( 42% confidence; 2 wagers )
Created by TurnTrout on 2023-02-25; known on 2023-02-27.
-
When scale=1, the policy network will still have at least one translation-equivariant cheese-detecting conv channel in block2.res1.resadd_out. ( 90% confidence )
Created by TurnTrout on 2023-02-21; known on 2023-02-28; judged right by TurnTrout on 2023-02-22.
-
Doubling channel 55 will work to increase cheese-seeking by at least 10% without destroying performance, in >25% of mazes with decision-squares. ( 20% confidence )
Created by TurnTrout on 2023-02-17; known on 2023-02-17.
-
Statistically significant propensity to go to nearby-cheese, controlling for shortest-path-distance and position in maze relative to top-right corner (goal misgeneralization networks, behavioral tests) ( 72% confidence; 2 wagers )
Created by TurnTrout on 2023-01-14; known on 2023-02-14; judged right by TurnTrout on 2023-04-22.