Scheduling Parallelism in Plans problem (Eng) (from source)
Why non-reasoning LLM fails (source)
Claude Sonnet 3.5 illustration of solution (happens to be optimum)
ChatGPT o1 reasoning feasible outcome, optimized by human
3.7 Extended
41 s (misinterpreted)3.7 extended, optimum in 81 s,
4 little experiments in a row (edits)
not always woks (can misinterpret) 85 s
3.7 extended 35 s (misinterpreted)
3.7 Extended, 7 s (misinterpretation), prompt "Use A*"
What if Emily arrived at the airport at 4:30
Sonnet 3.5 illustration (attention bias occurs. some constraints forgotten)
ChatGPT o1 feasible (also optimum) at the first try. Solution space is tremendously limited.
What if Emily arrived at the airport at 2:30
Sonnet 3.5 illustration (attention bias occurs. some constraints forgotten)
ChatGPT o1 reasoning feasible outcome, optimized by human
沒有留言:
張貼留言