星期日, 3月 02, 2025

晚餐接機大作戰(半導體廠製程排程): Part III Standard LLM vs. Reasoning

 Scheduling Parallelism in Plans problem  (Eng) (from source)


感恩節全家晚餐大作戰





Why non-reasoning LLM fails (source)







Claude Sonnet 3.5 illustration of solution (happens to be optimum)





ChatGPT o1 reasoning feasible outcome, optimized by human


3.7 Extended
  41 s (misinterpreted)







4 little experiments in a row (edits)
not always woks (can misinterpret) 85 s

3.7 extended 35 s (misinterpreted)

3.7 Extended, 7 s  (misinterpretation), prompt "Use A*" 




    Grok 3 did it in 79 s



    What if Emily arrived at the airport at 4:30

    Sonnet 3.5 illustration (attention bias occurs. some constraints forgotten)

    ChatGPT o1 feasible (also optimum) at the first try. Solution space is tremendously limited.



    What if Emily arrived at the airport at 2:30


    Sonnet 3.5 illustration (attention bias occurs. some constraints forgotten)

    ChatGPT o1 reasoning feasible outcome, optimized by human






    沒有留言:

    張貼留言