summaryrefslogtreecommitdiff
path: root/kernel/trace/ftrace.c
AgeCommit message (Collapse)Author
2021-10-27tracing: Have all levels of checks prevent recursionSteven Rostedt (VMware)
commit ed65df63a39a3f6ed04f7258de8b6789e5021c18 upstream. While writing an email explaining the "bit = 0" logic for a discussion on making ftrace_test_recursion_trylock() disable preemption, I discovered a path that makes the "not do the logic if bit is zero" unsafe. The recursion logic is done in hot paths like the function tracer. Thus, any code executed causes noticeable overhead. Thus, tricks are done to try to limit the amount of code executed. This included the recursion testing logic. Having recursion testing is important, as there are many paths that can end up in an infinite recursion cycle when tracing every function in the kernel. Thus protection is needed to prevent that from happening. Because it is OK to recurse due to different running context levels (e.g. an interrupt preempts a trace, and then a trace occurs in the interrupt handler), a set of bits are used to know which context one is in (normal, softirq, irq and NMI). If a recursion occurs in the same level, it is prevented*. Then there are infrastructure levels of recursion as well. When more than one callback is attached to the same function to trace, it calls a loop function to iterate over all the callbacks. Both the callbacks and the loop function have recursion protection. The callbacks use the "ftrace_test_recursion_trylock()" which has a "function" set of context bits to test, and the loop function calls the internal trace_test_and_set_recursion() directly, with an "internal" set of bits. If an architecture does not implement all the features supported by ftrace then the callbacks are never called directly, and the loop function is called instead, which will implement the features of ftrace. Since both the loop function and the callbacks do recursion protection, it was seemed unnecessary to do it in both locations. Thus, a trick was made to have the internal set of recursion bits at a more significant bit location than the function bits. Then, if any of the higher bits were set, the logic of the function bits could be skipped, as any new recursion would first have to go through the loop function. This is true for architectures that do not support all the ftrace features, because all functions being traced must first go through the loop function before going to the callbacks. But this is not true for architectures that support all the ftrace features. That's because the loop function could be called due to two callbacks attached to the same function, but then a recursion function inside the callback could be called that does not share any other callback, and it will be called directly. i.e. traced_function_1: [ more than one callback tracing it ] call loop_func loop_func: trace_recursion set internal bit call callback callback: trace_recursion [ skipped because internal bit is set, return 0 ] call traced_function_2 traced_function_2: [ only traced by above callback ] call callback callback: trace_recursion [ skipped because internal bit is set, return 0 ] call traced_function_2 [ wash, rinse, repeat, BOOM! out of shampoo! ] Thus, the "bit == 0 skip" trick is not safe, unless the loop function is call for all functions. Since we want to encourage architectures to implement all ftrace features, having them slow down due to this extra logic may encourage the maintainers to update to the latest ftrace features. And because this logic is only safe for them, remove it completely. [*] There is on layer of recursion that is allowed, and that is to allow for the transition between interrupt context (normal -> softirq -> irq -> NMI), because a trace may occur before the context update is visible to the trace recursion logic. Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/ Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com> Cc: Guo Ren <guoren@kernel.org> Cc: stable@vger.kernel.org Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-16ftrace: Do not blindly read the ip address in ftrace_bug()Steven Rostedt (VMware)
commit 6c14133d2d3f768e0a35128faac8aa6ed4815051 upstream. It was reported that a bug on arm64 caused a bad ip address to be used for updating into a nop in ftrace_init(), but the error path (rightfully) returned -EINVAL and not -EFAULT, as the bug caused more than one error to occur. But because -EINVAL was returned, the ftrace_bug() tried to report what was at the location of the ip address, and read it directly. This caused the machine to panic, as the ip was not pointing to a valid memory address. Instead, read the ip address with copy_from_kernel_nofault() to safely access the memory, and if it faults, report that the address faulted, otherwise report what was in that location. Link: https://lore.kernel.org/lkml/20210607032329.28671-1-mark-pk.tsai@mediatek.com/ Cc: stable@vger.kernel.org Fixes: 05736a427f7e1 ("ftrace: warn on failure to disable mcount callers") Reported-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-22ftrace: Handle commands when closing set_ftrace_filter fileSteven Rostedt (VMware)
commit 8c9af478c06bb1ab1422f90d8ecbc53defd44bc3 upstream. # echo switch_mm:traceoff > /sys/kernel/tracing/set_ftrace_filter will cause switch_mm to stop tracing by the traceoff command. # echo -n switch_mm:traceoff > /sys/kernel/tracing/set_ftrace_filter does nothing. The reason is that the parsing in the write function only processes commands if it finished parsing (there is white space written after the command). That's to handle: write(fd, "switch_mm:", 10); write(fd, "traceoff", 8); cases, where the command is broken over multiple writes. The problem is if the file descriptor is closed, then the write call is not processed, and the command needs to be processed in the release code. The release code can handle matching of functions, but does not handle commands. Cc: stable@vger.kernel.org Fixes: eda1e32855656 ("tracing: handle broken names in ftrace filter") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23fgraph: Initialize tracing_graph_pause at task creationSteven Rostedt (VMware)
commit 7e0a9220467dbcfdc5bc62825724f3e52e50ab31 upstream. On some archs, the idle task can call into cpu_suspend(). The cpu_suspend() will disable or pause function graph tracing, as there's some paths in bringing down the CPU that can have issues with its return address being modified. The task_struct structure has a "tracing_graph_pause" atomic counter, that when set to something other than zero, the function graph tracer will not modify the return address. The problem is that the tracing_graph_pause counter is initialized when the function graph tracer is enabled. This can corrupt the counter for the idle task if it is suspended in these architectures. CPU 1 CPU 2 ----- ----- do_idle() cpu_suspend() pause_graph_tracing() task_struct->tracing_graph_pause++ (0 -> 1) start_graph_tracing() for_each_online_cpu(cpu) { ftrace_graph_init_idle_task(cpu) task-struct->tracing_graph_pause = 0 (1 -> 0) unpause_graph_tracing() task_struct->tracing_graph_pause-- (0 -> -1) The above should have gone from 1 to zero, and enabled function graph tracing again. But instead, it is set to -1, which keeps it disabled. There's no reason that the field tracing_graph_pause on the task_struct can not be initialized at boot up. Cc: stable@vger.kernel.org Fixes: 380c4b1411ccd ("tracing/function-graph-tracer: append the tracing_graph_flag") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339 Reported-by: pierre.gondois@arm.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-01ftrace: Setup correct FTRACE_FL_REGS flags for moduleChengming Zhou
[ Upstream commit 8a224ffb3f52b0027f6b7279854c71a31c48fc97 ] When module loaded and enabled, we will use __ftrace_replace_code for module if any ftrace_ops referenced it found. But we will get wrong ftrace_addr for module rec in ftrace_get_addr_new, because rec->flags has not been setup correctly. It can cause the callback function of a ftrace_ops has FTRACE_OPS_FL_SAVE_REGS to be called with pt_regs set to NULL. So setup correct FTRACE_FL_REGS flags for rec when we call referenced_filters to find ftrace_ops references it. Link: https://lkml.kernel.org/r/20200728180554.65203-1-zhouchengming@bytedance.com Cc: stable@vger.kernel.org Fixes: 8c4f3c3fa9681 ("ftrace: Check module functions being traced on reload") Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12ftrace: Avoid potential division by zero in function profilerWen Yang
commit e31f7939c1c27faa5d0e3f14519eaf7c89e8a69d upstream. The ftrace_profile->counter is unsigned long and do_div truncates it to 32 bits, which means it can test non-zero and be truncated to zero for division. Fix this issue by using div64_ul() instead. Link: http://lkml.kernel.org/r/20200103030248.14516-1-wenyang@linux.alibaba.com Cc: stable@vger.kernel.org Fixes: e330b3bcd8319 ("tracing: Show sample std dev in function profiling") Fixes: 34886c8bc590f ("tracing: add average time in function to function profiler") Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27kprobes: Mark ftrace mcount handler functions nokprobeMasami Hiramatsu
commit fabe38ab6b2bd9418350284c63825f13b8a6abba upstream. Mark ftrace mcount handler functions nokprobe since probing on these functions with kretprobe pushes return address incorrectly on kretprobe shadow stack. Reported-by: Francis Deslauriers <francis.deslauriers@efficios.com> Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/155094062044.6137.6419622920568680640.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-21tracing: Fix memory leak of instance function hash filtersSteven Rostedt (VMware)
commit 2840f84f74035e5a535959d5f17269c69fa6edc5 upstream. The following commands will cause a memory leak: # cd /sys/kernel/tracing # mkdir instances/foo # echo schedule > instance/foo/set_ftrace_filter # rmdir instances/foo The reason is that the hashes that hold the filters to set_ftrace_filter and set_ftrace_notrace are not freed if they contain any data on the instance and the instance is removed. Found by kmemleak detector. Cc: stable@vger.kernel.org Fixes: 591dffdade9f ("ftrace: Allow for function tracing instance to filter functions") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16ftrace: Remove incorrect setting of glob search fieldSteven Rostedt (VMware)
commit 7b6586562708d2b3a04fe49f217ddbadbbbb0546 upstream. __unregister_ftrace_function_probe() will incorrectly parse the glob filter because it resets the search variable that was setup by filter_parse_regex(). Al Viro reported this: After that call of filter_parse_regex() we could have func_g.search not equal to glob only if glob started with '!' or '*'. In the former case we would've buggered off with -EINVAL (not = 1). In the latter we would've set func_g.search equal to glob + 1, calculated the length of that thing in func_g.len and proceeded to reset func_g.search back to glob. Suppose the glob is e.g. *foo*. We end up with func_g.type = MATCH_MIDDLE_ONLY; func_g.len = 3; func_g.search = "*foo"; Feeding that to ftrace_match_record() will not do anything sane - we will be looking for names containing "*foo" (->len is ignored for that one). Link: http://lkml.kernel.org/r/20180127031706.GE13338@ZenIV.linux.org.uk Fixes: 3ba009297149f ("ftrace: Introduce ftrace_glob structure") Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12ftrace: Fix kmemleak in unregister_ftrace_graphShu Wang
commit 2b0b8499ae75df91455bbeb7491d45affc384fb0 upstream. The trampoline allocated by function tracer was overwriten by function_graph tracer, and caused a memory leak. The save_global_trampoline should have saved the previous trampoline in register_ftrace_graph() and restored it in unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was only used in unregister_ftrace_graph as default value 0, and it overwrote the previous trampoline's value. Causing the previous allocated trampoline to be lost. kmmeleak backtrace: kmemleak_vmalloc+0x77/0xc0 __vmalloc_node_range+0x1b5/0x2c0 module_alloc+0x7c/0xd0 arch_ftrace_update_trampoline+0xb5/0x290 ftrace_startup+0x78/0x210 register_ftrace_function+0x8b/0xd0 function_trace_init+0x4f/0x80 tracing_set_tracer+0xe6/0x170 tracing_set_trace_write+0x90/0xd0 __vfs_write+0x37/0x170 vfs_write+0xb2/0x1b0 SyS_write+0x55/0xc0 do_syscall_64+0x67/0x180 return_from_SYSCALL_64+0x0/0x6a [ Looking further into this, I found that this was left over from when the function and function graph tracers shared the same ftrace_ops. But in commit 5f151b2401 ("ftrace: Fix function_profiler and function tracer together"), the two were separated, and the save_global_trampoline no longer was necessary (and it may have been broken back then too). -- Steven Rostedt ] Link: http://lkml.kernel.org/r/20170912021454.5976-1-shuwang@redhat.com Fixes: 5f151b2401 ("ftrace: Fix function_profiler and function tracer together") Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27ftrace: Fix memleak when unregistering dynamic ops when tracing disabledSteven Rostedt (VMware)
commit edb096e00724f02db5f6ec7900f3bbd465c6c76f upstream. If function tracing is disabled by the user via the function-trace option or the proc sysctl file, and a ftrace_ops that was allocated on the heap is unregistered, then the shutdown code exits out without doing the proper clean up. This was found via kmemleak and running the ftrace selftests, as one of the tests unregisters with function tracing disabled. # cat kmemleak unreferenced object 0xffffffffa0020000 (size 4096): comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s) hex dump (first 32 bytes): 55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89 U.t$.UH...t$.UH. e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c .H......H.D$PH.L backtrace: [<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0 [<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0 [<ffffffff8109697f>] module_alloc+0x4f/0x90 [<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420 [<ffffffff81249947>] ftrace_startup+0xe7/0x300 [<ffffffff81249bd2>] register_ftrace_function+0x72/0x90 [<ffffffff81263786>] trace_selftest_ops+0x204/0x397 [<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624 [<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7 [<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192 [<ffffffff81002230>] do_one_initcall+0x90/0x1e2 [<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe [<ffffffff81d61ec3>] kernel_init+0x13/0x122 [<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40 [<ffffffffffffffff>] 0xffffffffffffffff Fixes: 12cce594fa ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27ftrace: Fix uninitialized variable in match_records()Dan Carpenter
commit 2e028c4fe12907f226b8221815f16c2486ad3aa7 upstream. My static checker complains that if "func" is NULL then "clear_filter" is uninitialized. This seems like it could be true, although it's possible something subtle is happening that I haven't seen. kernel/trace/ftrace.c:3844 match_records() error: uninitialized symbol 'clear_filter'. Link: http://lkml.kernel.org/r/20170712073556.h6tkpjcdzjaozozs@mwanda Fixes: f0a3b154bd7 ("ftrace: Clarify code for mod command") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-21ftrace: Fix removing of second function probeSteven Rostedt (VMware)
commit 82cc4fc2e70ec5baeff8f776f2773abc8b2cc0ae upstream. When two function probes are added to set_ftrace_filter, and then one of them is removed, the update to the function locations is not performed, and the record keeping of the function states are corrupted, and causes an ftrace_bug() to occur. This is easily reproducable by adding two probes, removing one, and then adding it back again. # cd /sys/kernel/debug/tracing # echo schedule:traceoff > set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter Causes: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220 Modules linked in: [...] CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 Call Trace: dump_stack+0x68/0x9f __warn+0x111/0x130 ? trace_irq_work_interrupt+0xa0/0xa0 warn_slowpath_null+0x1d/0x20 ftrace_get_addr_curr+0x143/0x220 ? __fentry__+0x10/0x10 ftrace_replace_code+0xe3/0x4f0 ? ftrace_int3_handler+0x90/0x90 ? printk+0x99/0xb5 ? 0xffffffff81000000 ftrace_modify_all_code+0x97/0x110 arch_ftrace_update_code+0x10/0x20 ftrace_run_update_code+0x1c/0x60 ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0 register_ftrace_function_probe+0x4b6/0x590 ? ftrace_startup+0x310/0x310 ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30 ? update_stack_state+0x88/0x110 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? preempt_count_sub+0x18/0xd0 ? mutex_lock_nested+0x104/0x800 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? __unwind_start+0x1c0/0x1c0 ? _mutex_lock_nest_lock+0x800/0x800 ftrace_trace_probe_callback.isra.3+0xc0/0x130 ? func_set_flag+0xe0/0xe0 ? __lock_acquire+0x642/0x1790 ? __might_fault+0x1e/0x20 ? trace_get_user+0x398/0x470 ? strcmp+0x35/0x60 ftrace_trace_onoff_callback+0x48/0x70 ftrace_regex_write.isra.43.part.44+0x251/0x320 ? match_records+0x420/0x420 ftrace_filter_write+0x2b/0x30 __vfs_write+0xd7/0x330 ? do_loop_readv_writev+0x120/0x120 ? locks_remove_posix+0x90/0x2f0 ? do_lock_file_wait+0x160/0x160 ? __lock_is_held+0x93/0x100 ? rcu_read_lock_sched_held+0x5c/0xb0 ? preempt_count_sub+0x18/0xd0 ? __sb_start_write+0x10a/0x230 ? vfs_write+0x222/0x240 vfs_write+0xef/0x240 SyS_write+0xab/0x130 ? SyS_read+0x130/0x130 ? trace_hardirqs_on_caller+0x182/0x280 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0xad RIP: 0033:0x7fe61c157c30 RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30 RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001 RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700 R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400 R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c ? trace_hardirqs_off_caller+0xc0/0x110 ---[ end trace 99fa09b3d9869c2c ]--- Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150) Fixes: 59df055f1991 ("ftrace: trace different functions with a different tracer") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-06Merge tag 'trace-v4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracking updates from Steven Rostedt: "Most of the changes are clean ups and small fixes. Some of them have stable tags to them. I searched through my INBOX just as the merge window opened and found lots of patches to pull. I ran them through all my tests and they were in linux-next for a few days. Features added this release: ---------------------------- - Module globbing. You can now filter function tracing to several modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov) - Tracer specific options are now visible even when the tracer is not active. It was rather annoying that you can only see and modify tracer options after enabling the tracer. Now they are in the options/ directory even when the tracer is not active. Although they are still only visible when the tracer is active in the trace_options file. - Trace options are now per instance (although some of the tracer specific options are global) - New tracefs file: set_event_pid. If any pid is added to this file, then all events in the instance will filter out events that are not part of this pid. sched_switch and sched_wakeup events handle next and the wakee pids" * tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits) tracefs: Fix refcount imbalance in start_creating() tracing: Put back comma for empty fields in boot string parsing tracing: Apply tracer specific options from kernel command line. tracing: Add some documentation about set_event_pid ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark tracing: Allow dumping traces without tracking trace started cpus ring_buffer: Fix more races when terminating the producer in the benchmark ring_buffer: Do no not complete benchmark reader too early tracing: Remove redundant TP_ARGS redefining tracing: Rename max_stack_lock to stack_trace_max_lock tracing: Allow arch-specific stack tracer recordmcount: arm64: Replace the ignored mcount call into nop recordmcount: Fix endianness handling bug for nop_mcount tracepoints: Fix documentation of RCU lockdep checks tracing: ftrace_event_is_function() can return boolean tracing: is_legal_op() can return boolean ring-buffer: rb_event_is_commit() can return boolean ring-buffer: rb_per_cpu_empty() can return boolean ring_buffer: ring_buffer_empty{cpu}() can return boolean ring-buffer: rb_is_reader_page() can return boolean ...
2015-10-20ftrace: add module globbingDmitry Safonov
Extend module command for function filter selection with globbing. It uses the same globbing as function filter. sh# echo '*alloc*:mod:*' > set_ftrace_filter Will trace any function with the letters 'alloc' in the name in any module but not in kernel. sh# echo '!*alloc*:mod:ipv6' >> set_ftrace_filter Will prevent from tracing functions with 'alloc' in the name from module ipv6 (do not forget to append to set_ftrace_filter file). sh# echo '*alloc*:mod:!ipv6' > set_ftrace_filter Will trace functions with 'alloc' in the name from kernel and any module except ipv6. sh# echo '*alloc*:mod:!*' > set_ftrace_filter Will trace any function with the letters 'alloc' in the name only from kernel, but not from any module. sh# echo '*:mod:!*' > set_ftrace_filter or sh# echo ':mod:!' > set_ftrace_filter Will trace every function in the kernel, but will not trace functions from any module. sh# echo '*:mod:*' > set_ftrace_filter or sh# echo ':mod:' > set_ftrace_filter As the opposite will trace all functions from all modules, but not from kernel. sh# echo '*:mod:*snd*' > set_ftrace_filter Will trace your sound drivers only (if any). Link: http://lkml.kernel.org/r/1443545176-3215-4-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> [ Made format changes ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-20ftrace: Introduce ftrace_glob structureDmitry Safonov
ftrace_match parameters are very related and I reduce the number of local variables & parameters with it. This is also preparation for module globbing as it would introduce more realated variables & parameters. Link: http://lkml.kernel.org/r/1443545176-3215-3-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> [ Made some formatting changes ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-16ftrace: Clarify code for mod commandDmitry Safonov
"Not" is too abstract variable name - changed to clear_filter. Removed ftrace_match_module_records function: comparison with !* or * not does the general code in filter_parse_regex() as it works without mod command for sh# echo '!*' > /sys/kernel/debug/tracing/set_ftrace_filter Link: http://lkml.kernel.org/r/1443545176-3215-2-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-13ftrace: Remove redundant strsep in mod_callbackDmitry Safonov
By now there isn't any subcommand for mod. Before: sh$ echo '*:mod:ipv6:a' > set_ftrace_filter sh$ echo '*:mod:ipv6' > set_ftrace_filter had the same results, but now first will result in: sh$ echo '*:mod:ipv6:a' > set_ftrace_filter -bash: echo: write error: Invalid argument Also, I clarified ftrace_mod_callback code a little. Link: http://lkml.kernel.org/r/1443545176-3215-1-git-send-email-0x7f454c46@gmail.com Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> [ converted 'if (ret == 0)' to 'if (!ret)' ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-06sched/core: Fix trace_sched_switch()Peter Zijlstra
__trace_sched_switch_state() is the last remaining PREEMPT_ACTIVE user, move trace_sched_switch() from prepare_task_switch() to __schedule() and propagate the @preempt argument. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-01ftrace: Remove redundant swap functionRasmus Villemoes
To cover the common case of sorting an array of pointers, Daniel Wagner recently modified the library sort() to use a specific swap function for size==8, in addition to the size==4 case which was already handled. Since sizeof(long) is either 4 or 8, ftrace_swap_ips() is redundant and we can just let sort() pick an appropriate and fast swap callback. Link: http://lkml.kernel.org/r/1441834023-13130-1-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-30tracing: Move sleep-time and graph-time options out of the core trace_flagsSteven Rostedt (Red Hat)
The sleep-time and graph-time options are only for the function graph tracer and are not used by anything else. As tracer options are now visible when the tracer is not activated, its better to move the function graph specific tracer options into the function graph tracer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-08Merge tag 'trace-v4.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing update from Steven Rostedt: "Mostly this is just clean ups and micro optimizations. The changes with more meat are: - Allowing the trace event filters to filter on CPU number and process ids - Two new markers for trace output latency were added (10 and 100 msec latencies) - Have tracing_thresh filter function profiling time I also worked on modifying the ring buffer code for some future work, and moved the adding of the timestamp around. One of my changes caused a regression, and since other changes were built on top of it and already tested, I had to operate a revert of that change. Instead of rebasing, this change set has the code that caused a regression as well as the code to revert that change without touching the other changes that were made on top of it" * tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated" tracing: Don't make assumptions about length of string on task rename tracing: Allow triggers to filter for CPU ids and process names ftrace: Format MCOUNT_ADDR address as type unsigned long tracing: Introduce two additional marks for delay ftrace: Fix function_graph duration spacing with 7-digits ftrace: add tracing_thresh to function profile tracing: Clean up stack tracing and fix fentry updates ring-buffer: Reorganize function locations ring-buffer: Make sure event has enough room for extend and padding ring-buffer: Get timestamp after event is allocated ring-buffer: Move the adding of the extended timestamp out of line ring-buffer: Add event descriptor to simplify passing data ftrace: correct the counter increment for trace_buffer data tracing: Fix for non-continuous cpu ids tracing: Prefer kcalloc over kzalloc with multiply
2015-07-24ftrace: Fix breakage of set_ftrace_pidSteven Rostedt (Red Hat)
Commit 4104d326b670 ("ftrace: Remove global function list and call function directly") simplified the ftrace code by removing the global_ops list with a new design. But this cleanup also broke the filtering of PIDs that are added to the set_ftrace_pid file. Add back the proper hooks to have pid filtering working once again. Cc: stable@vger.kernel.org # 3.16+ Reported-by: Matt Fleming <matt@console-pimps.org> Reported-by: Richard Weinberger <richard.weinberger@gmail.com> Tested-by: Matt Fleming <matt@console-pimps.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20ftrace: add tracing_thresh to function profileUmesh Tiwari
This patch extends tracing_thresh functionality to function profile tracer. If tracing_thresh is set, print those entries only, whose average is > tracing thresh. Link: http://lkml.kernel.org/r/1434972488-8571-1-git-send-email-umesh.t@samsung.com Signed-off-by: Umesh Tiwari <umesh.t@samsung.com> [ Removed unnecessary 'moved' comment ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-14Merge tag 'trace-v4.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Some clean ups and small fixes, but the biggest change is the addition of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints. Tracepoints have helper functions for the TP_printk() called __print_symbolic() and __print_flags() that lets a numeric number be displayed as a a human comprehensible text. What is placed in the TP_printk() is also shown in the tracepoint format file such that user space tools like perf and trace-cmd can parse the binary data and express the values too. Unfortunately, the way the TRACE_EVENT() macro works, anything placed in the TP_printk() will be shown pretty much exactly as is. The problem arises when enums are used. That's because unlike macros, enums will not be changed into their values by the C pre-processor. Thus, the enum string is exported to the format file, and this makes it useless for user space tools. The TRACE_DEFINE_ENUM() solves this by converting the enum strings in the TP_printk() format into their number, and that is what is shown to user space. For example, the tracepoint tlb_flush currently has this in its format file: __print_symbolic(REC->reason, { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" }, { TLB_REMOTE_SHOOTDOWN, "remote shootdown" }, { TLB_LOCAL_SHOOTDOWN, "local shootdown" }, { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }) After adding: TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH); TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN); Its format file will contain this: __print_symbolic(REC->reason, { 0, "flush on task switch" }, { 1, "remote shootdown" }, { 2, "local shootdown" }, { 3, "local mm shootdown" })" * tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits) tracing: Add enum_map file to show enums that have been mapped writeback: Export enums used by tracepoint to user space v4l: Export enums used by tracepoints to user space SUNRPC: Export enums in tracepoints to user space mm: tracing: Export enums in tracepoints to user space irq/tracing: Export enums in tracepoints to user space f2fs: Export the enums in the tracepoints to userspace net/9p/tracing: Export enums in tracepoints to userspace x86/tlb/trace: Export enums in used by tlb_flush tracepoint tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM() tracing: Allow for modules to convert their enums to values tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation tracing: Give system name a pointer brcmsmac: Move each system tracepoints to their own header iwlwifi: Move each system tracepoints to their own header mac80211: Move message tracepoints to their own header tracing: Add TRACE_SYSTEM_VAR to xhci-hcd tracing: Add TRACE_SYSTEM_VAR to kvm-s390 tracing: Add TRACE_SYSTEM_VAR to intel-sst ...
2015-04-14Merge tag 'trace-4.1-tracefs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracefs from Steven Rostedt: "This adds the new tracefs file system. This has been in linux-next for more than one release, as I had it ready for the 4.0 merge window, but a last minute thing that needed to go into Linux first had to be done. That was that perf hard coded the file system number when reading /sys/kernel/debugfs/tracing directory making sure that the path had the debugfs mount # before it would parse the tracing file. This broke other use cases of perf, and the check is removed. Now when mounting /sys/kernel/debug, tracefs is automatically mounted in /sys/kernel/debug/tracing such that old tools will still see that path as expected. But now system admins can mount tracefs directly and not need to mount debugfs, which can expose security issues. A new directory is created when tracefs is configured such that system admins can now mount it separately (/sys/kernel/tracing)" * tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have mkdir and rmdir be part of tracefs tracefs: Add directory /sys/kernel/tracing tracing: Automatically mount tracefs on debugfs/tracing tracing: Convert the tracing facility over to use tracefs tracefs: Add new tracefs file system tracing: Create cmdline tracer options on tracing fs init tracing: Only create tracer options files if directory exists debugfs: Provide a file creation function that also takes an initial size
2015-04-02ftrace/x86: Let dynamic trampolines call ops->func even for dynamic fopsSteven Rostedt (Red Hat)
Dynamically allocated trampolines call ftrace_ops_get_func to get the function which they should call. For dynamic fops (FTRACE_OPS_FL_DYNAMIC flag is set) ftrace_ops_list_func is always returned. This is reasonable for static trampolines but goes against the main advantage of dynamic ones, that is avoidance of going through the list of all registered callbacks for functions that are only being traced by a single callback. We can fix it by returning ops->func (or recursion safe version) from ftrace_ops_get_func whenever it is possible for dynamic trampolines. Note that dynamic trampolines are not allowed for dynamic fops if CONFIG_PREEMPT=y. Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1501291023000.25445@pobox.suse.cz Link: http://lkml.kernel.org/r/1424357773-13536-1-git-send-email-mbenes@suse.cz Reported-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-03-09ftrace: Fix ftrace enable ordering of sysctl ftrace_enabledSteven Rostedt (Red Hat)
Some archs (specifically PowerPC), are sensitive with the ordering of the enabling of the calls to function tracing and setting of the function to use to be traced. That is, update_ftrace_function() sets what function the ftrace_caller trampoline should call. Some archs require this to be set before calling ftrace_run_update_code(). Another bug was discovered, that ftrace_startup_sysctl() called ftrace_run_update_code() directly. If the function the ftrace_caller trampoline changes, then it will not be updated. Instead a call to ftrace_startup_enable() should be called because it tests to see if the callback changed since the code was disabled, and will tell the arch to update appropriately. Most archs do not need this notification, but PowerPC does. The problem could be seen by the following commands: # echo 0 > /proc/sys/kernel/ftrace_enabled # echo function > /sys/kernel/debug/tracing/current_tracer # echo 1 > /proc/sys/kernel/ftrace_enabled # cat /sys/kernel/debug/tracing/trace The trace will show that function tracing was not active. Cc: stable@vger.kernel.org # 2.6.27+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-03-09ftrace: Fix en(dis)able graph caller when en(dis)abling record via sysctlPratyush Anand
When ftrace is enabled globally through the proc interface, we must check if ftrace_graph_active is set. If it is set, then we should also pass the FTRACE_START_FUNC_RET command to ftrace_run_update_code(). Similarly, when ftrace is disabled globally through the proc interface, we must check if ftrace_graph_active is set. If it is set, then we should also pass the FTRACE_STOP_FUNC_RET command to ftrace_run_update_code(). Consider the following situation. # echo 0 > /proc/sys/kernel/ftrace_enabled After this ftrace_enabled = 0. # echo function_graph > /sys/kernel/debug/tracing/current_tracer Since ftrace_enabled = 0, ftrace_enable_ftrace_graph_caller() is never called. # echo 1 > /proc/sys/kernel/ftrace_enabled Now ftrace_enabled will be set to true, but still ftrace_enable_ftrace_graph_caller() will not be called, which is not desired. Further if we execute the following after this: # echo nop > /sys/kernel/debug/tracing/current_tracer Now since ftrace_enabled is set it will call ftrace_disable_ftrace_graph_caller(), which causes a kernel warning on the ARM platform. On the ARM platform, when ftrace_enable_ftrace_graph_caller() is called, it checks whether the old instruction is a nop or not. If it's not a nop, then it returns an error. If it is a nop then it replaces instruction at that address with a branch to ftrace_graph_caller. ftrace_disable_ftrace_graph_caller() behaves just the opposite. Therefore, if generic ftrace code ever calls either ftrace_enable_ftrace_graph_caller() or ftrace_disable_ftrace_graph_caller() consecutively two times in a row, then it will return an error, which will cause the generic ftrace code to raise a warning. Note, x86 does not have an issue with this because the architecture specific code for ftrace_enable_ftrace_graph_caller() and ftrace_disable_ftrace_graph_caller() does not check the previous state, and calling either of these functions twice in a row has no ill effect. Link: http://lkml.kernel.org/r/e4fbe64cdac0dd0e86a3bf914b0f83c0b419f146.1425666454.git.panand@redhat.com Cc: stable@vger.kernel.org # 2.6.31+ Signed-off-by: Pratyush Anand <panand@redhat.com> [ removed extra if (ftrace_start_up) and defined ftrace_graph_active as 0 if CONFIG_FUNCTION_GRAPH_TRACER is not set. ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-03-09ftrace: Clear REGS_EN and TRAMP_EN flags on disabling record via sysctlSteven Rostedt (Red Hat)
When /proc/sys/kernel/ftrace_enabled is set to zero, all function tracing is disabled. But the records that represent the functions still hold information about the ftrace_ops that are hooked to them. ftrace_ops may request "REGS" (have a full set of pt_regs passed to the callback), or "TRAMP" (the ops has its own trampoline to use). When the record is updated to represent the state of the ops hooked to it, it sets "REGS_EN" and/or "TRAMP_EN" to state that the callback points to the correct trampoline (REGS has its own trampoline). When ftrace_enabled is set to zero, all ftrace locations are a nop, so they do not point to any trampoline. But the _EN flags are still set. This can cause the accounting to go wrong when ftrace_enabled is cleared and an ops that has a trampoline is registered or unregistered. For example, the following will cause ftrace to crash: # echo function_graph > /sys/kernel/debug/tracing/current_tracer # echo 0 > /proc/sys/kernel/ftrace_enabled # echo nop > /sys/kernel/debug/tracing/current_tracer # echo 1 > /proc/sys/kernel/ftrace_enabled # echo function_graph > /sys/kernel/debug/tracing/current_tracer As function_graph uses a trampoline, when ftrace_enabled is set to zero the updates to the record are not done. When enabling function_graph again, the record will still have the TRAMP_EN flag set, and it will look for an op that has a trampoline other than the function_graph ops, and fail to find one. Cc: stable@vger.kernel.org # 3.17+ Reported-by: Pratyush Anand <panand@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-12Merge tag 'trace-v3.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The updates included in this pull request for ftrace are: o Several clean ups to the code One such clean up was to convert to 64 bit time keeping, in the ring buffer benchmark code. o Adding of __print_array() helper macro for TRACE_EVENT() o Updating the sample/trace_events/ to add samples of different ways to make trace events. Lots of features have been added since the sample code was made, and these features are mostly unknown. Developers have been making their own hacks to do things that are already available. o Performance improvements. Most notably, I found a performance bug where a waiter that is waiting for a full page from the ring buffer will see that a full page is not available, and go to sleep. The sched event caused by it going to sleep would cause it to wake up again. It would see that there was still not a full page, and go back to sleep again, and that would wake it up again, until finally it would see a full page. This change has been marked for stable. Other improvements include removing global locks from fast paths" * tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Do not wake up a splice waiter when page is not full tracing: Fix unmapping loop in tracing_mark_write tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT() tracing: Add TRACE_EVENT_FN example tracing: Add TRACE_EVENT_CONDITION sample tracing: Update the TRACE_EVENT fields available in the sample code tracing: Separate out initializing top level dir from instances tracing: Make tracing_init_dentry_tr() static trace: Use 64-bit timekeeping tracing: Add array printing helper tracing: Remove newline from trace_printk warning banner tracing: Use IS_ERR() check for return value of tracing_init_dentry() tracing: Remove unneeded includes of debugfs.h and fs.h tracing: Remove taking of trace_types_lock in pipe files tracing: Add ref count to tracer for when they are being read by pipe
2015-02-03tracing: Convert the tracing facility over to use tracefsSteven Rostedt (Red Hat)
debugfs was fine for the tracing facility as a quick way to get an interface. Now that tracing has matured, it should separate itself from debugfs such that it can be mounted separately without needing to mount all of debugfs with it. That is, users resist using tracing because it requires mounting debugfs. Having tracing have its own file system lets users get the features of tracing without needing to bring in the rest of the kernel's debug infrastructure. Another reason for tracefs is that debubfs does not support mkdir. Currently, to create instances, one does a mkdir in the tracing/instance directory. This is implemented via a hack that forces debugfs to do something it is not intended on doing. By converting over to tracefs, this hack can be removed and mkdir can be properly implemented. This patch does not address this yet, but it lays the ground work for that to be done. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-02Merge branch 'debugfs_automount' of ↵Steven Rostedt (Red Hat)
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into trace/ftrace/tracefs Pull in Al Viro's changes to debugfs that implement the new primitive: debugfs_create_automount(), that creates a directory in debugfs that will safely mount another file system automatically when debugfs is mounted. This will let tracefs automount itself on top of debugfs/tracing directory.
2015-01-22tracing: Use IS_ERR() check for return value of tracing_init_dentry()Steven Rostedt (Red Hat)
tracing_init_dentry() will soon return NULL as a valid pointer for the top level tracing directroy. NULL can not be used as an error value. Instead, switch to ERR_PTR() and check the return status with IS_ERR(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-01-15ftrace: Check both notrace and filter for old hashSteven Rostedt (Red Hat)
Using just the filter for checking for trampolines or regs is not enough when updating the code against the records that represent all functions. Both the filter hash and the notrace hash need to be checked. To trigger this bug (using trace-cmd and perf): # perf probe -a do_fork # trace-cmd start -B foo -e probe # trace-cmd record -p function_graph -n do_fork sleep 1 The trace-cmd record at the end clears the filter before it disables function_graph tracing and then that causes the accounting of the ftrace function records to become incorrect and causes ftrace to bug. Link: http://lkml.kernel.org/r/20150114154329.358378039@goodmis.org Cc: stable@vger.kernel.org [ still need to switch old_hash_ops to old_ops_hash ] Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-01-15ftrace: Fix updating of filters for shared global_ops filtersSteven Rostedt (Red Hat)
As the set_ftrace_filter affects both the function tracer as well as the function graph tracer, the ops that represent each have a shared ftrace_ops_hash structure. This allows both to be updated when the filter files are updated. But if function graph is enabled and the global_ops (function tracing) ops is not, then it is possible that the filter could be changed without the update happening for the function graph ops. This will cause the changes to not take place and may even cause a ftrace_bug to occur as it could mess with the trampoline accounting. The solution is to check if the ops uses the shared global_ops filter and if the ops itself is not enabled, to check if there's another ops that is enabled and also shares the global_ops filter. In that case, the modification still needs to be executed. Link: http://lkml.kernel.org/r/20150114154329.055980438@goodmis.org Cc: stable@vger.kernel.org # 3.17+ Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-21ftrace, kprobes: Support IPMODIFY flag to find IP modify conflictMasami Hiramatsu
Introduce FTRACE_OPS_FL_IPMODIFY to avoid conflict among ftrace users who may modify regs->ip to change the execution path. If two or more users modify the regs->ip on the same function entry, one of them will be broken. So they must add IPMODIFY flag and make sure that ftrace_set_filter_ip() succeeds. Note that ftrace doesn't allow ftrace_ops which has IPMODIFY flag to have notrace hash, and the ftrace_ops must have a filter hash (so that the ftrace_ops can hook only specific entries), because it strongly depends on the address and must be allowed for only few selected functions. Link: http://lkml.kernel.org/r/20141121102516.11844.27829.stgit@localhost.localdomain Cc: Jiri Kosina <jkosina@suse.cz> Cc: Seth Jennings <sjenning@redhat.com> Cc: Petr Mladek <pmladek@suse.cz> Cc: Vojtech Pavlik <vojtech@suse.cz> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> [ fixed up some of the comments ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-20tracing/trivial: Fix typos and make an int into a boolSteven Rostedt (Red Hat)
Fix up a few typos in comments and convert an int into a bool in update_traceon_count(). Link: http://lkml.kernel.org/r/546DD445.5080108@hitachi.com Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19ftrace/x86/extable: Add is_ftrace_trampoline() functionSteven Rostedt (Red Hat)
Stack traces that happen from function tracing check if the address on the stack is a __kernel_text_address(). That is, is the address kernel code. This calls core_kernel_text() which returns true if the address is part of the builtin kernel code. It also calls is_module_text_address() which returns true if the address belongs to module code. But what is missing is ftrace dynamically allocated trampolines. These trampolines are allocated for individual ftrace_ops that call the ftrace_ops callback functions directly. But if they do a stack trace, the code checking the stack wont detect them as they are neither core kernel code nor module address space. Adding another field to ftrace_ops that also stores the size of the trampoline assigned to it we can create a new function called is_ftrace_trampoline() that returns true if the address is a dynamically allocate ftrace trampoline. Note, it ignores trampolines that are not dynamically allocated as they will return true with the core_kernel_text() function. Link: http://lkml.kernel.org/r/20141119034829.497125839@goodmis.org Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-13tracing: Replace seq_printf by simpler equivalentsRasmus Villemoes
Using seq_printf to print a simple string or a single character is a lot more expensive than it needs to be, since seq_puts and seq_putc exist. These patches do seq_printf(m, s) -> seq_puts(m, s) seq_printf(m, "%s", s) -> seq_puts(m, s) seq_printf(m, "%c", c) -> seq_putc(m, c) Subsequent patches will simplify further. Link: http://lkml.kernel.org/r/1415479332-25944-2-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-13ftrace: Have the control_ops get a trampolineSteven Rostedt (Red Hat)
With the new logic, if only a single user of ftrace function hooks is used, it will get its own trampoline assigned to it. The problem is that the control_ops is an indirect ops that perf ops uses. What that means is that when perf registers its ops with register_ftrace_function(), it has the CONTROL flag set and gets added to the control list instead of the global ftrace list. The control_ops gets added to that instead and the mcount trampoline calls the control_ops function. The control_ops function will iterate the control list and call the ops functions that are attached to it. But currently the trampoline is added to the perf ops and not the control ops, and when ftrace tries to find a trampoline hook for it, it fails to find one and gives the following splat: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 10133 at kernel/trace/ftrace.c:2033 ftrace_get_addr_new+0x6f/0xc0() Modules linked in: [...] CPU: 0 PID: 10133 Comm: perf Tainted: P 3.18.0-rc1-test+ #388 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 00000000000007f1 ffff8800c2643bc8 ffffffff814fca6e ffff88011ea0ed01 0000000000000000 ffff8800c2643c08 ffffffff81041ffd 0000000000000000 ffffffff810c388c ffffffff81a5a350 ffff880119b00000 ffffffff810001c8 Call Trace: [<ffffffff814fca6e>] dump_stack+0x46/0x58 [<ffffffff81041ffd>] warn_slowpath_common+0x81/0x9b [<ffffffff810c388c>] ? ftrace_get_addr_new+0x6f/0xc0 [<ffffffff810001c8>] ? 0xffffffff810001c8 [<ffffffff81042031>] warn_slowpath_null+0x1a/0x1c [<ffffffff810c388c>] ftrace_get_addr_new+0x6f/0xc0 [<ffffffff8102e938>] ftrace_replace_code+0xd6/0x334 [<ffffffff810c4116>] ftrace_modify_all_code+0x41/0xc5 [<ffffffff8102eba6>] arch_ftrace_update_code+0x10/0x19 [<ffffffff810c293c>] ftrace_run_update_code+0x21/0x42 [<ffffffff810c298f>] ftrace_startup_enable+0x32/0x34 [<ffffffff810c3049>] ftrace_startup+0x14e/0x15a [<ffffffff810c307c>] register_ftrace_function+0x27/0x40 [<ffffffff810dc118>] perf_ftrace_event_register+0x3e/0xee [<ffffffff810dbfbe>] perf_trace_init+0x29d/0x2a9 [<ffffffff810eb422>] perf_tp_event_init+0x27/0x3a [<ffffffff810f18bc>] perf_init_event+0x9e/0xed [<ffffffff810f1ba4>] perf_event_alloc+0x299/0x330 [<ffffffff810f236b>] SYSC_perf_event_open+0x3ee/0x816 [<ffffffff8115a066>] ? mntput+0x2d/0x2f [<ffffffff81142b00>] ? __fput+0xa7/0x1b2 [<ffffffff81091300>] ? do_gettimeofday+0x22/0x3a [<ffffffff810f279c>] SyS_perf_event_open+0x9/0xb [<ffffffff81502a92>] system_call_fastpath+0x12/0x17 ---[ end trace 81a53565150e4982 ]--- Bad trampoline accounting at: ffffffff810001c8 (run_init_process+0x0/0x2d) (10000001) Update the control_ops trampoline instead of the perf ops one. Reported-by: lkp@01.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11ftrace: Add more information to ftrace_bug() outputSteven Rostedt (Red Hat)
With the introduction of the dynamic trampolines, it is useful that if things go wrong that ftrace_bug() produces more information about what the current state is. This can help debug issues that may arise. Ftrace has lots of checks to make sure that the state of the system it touchs is exactly what it expects it to be. When it detects an abnormality it calls ftrace_bug() and disables itself to prevent any further damage. It is crucial that ftrace_bug() produces sufficient information that can be used to debug the situation. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolinesSteven Rostedt (Red Hat)
When the static ftrace_ops (like function tracer) enables tracing, and it is the only callback that is referencing a function, a trampoline is dynamically allocated to the function that calls the callback directly instead of calling a loop function that iterates over all the registered ftrace ops (if more than one ops is registered). But when it comes to dynamically allocated ftrace_ops, where they may be freed, on a CONFIG_PREEMPT kernel there's no way to know when it is safe to free the trampoline. If a task was preempted while executing on the trampoline, there's currently no way to know when it will be off that trampoline. But this is not true when it comes to !CONFIG_PREEMPT. The current method of calling schedule_on_each_cpu() will force tasks off the trampoline, becaues they can not schedule while on it (kernel preemption is not configured). That means it is safe to free a dynamically allocated ftrace ops trampoline when CONFIG_PREEMPT is not configured. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-10-31ftrace/x86: Show trampoline call function in enabled_functionsSteven Rostedt (Red Hat)
The file /sys/kernel/debug/tracing/eneabled_functions is used to debug ftrace function hooks. Add to the output what function is being called by the trampoline if the arch supports it. Add support for this feature in x86_64. Cc: H. Peter Anvin <hpa@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-10-31ftrace/x86: Add dynamic allocated trampoline for ftrace_opsSteven Rostedt (Red Hat)
The current method of handling multiple function callbacks is to register a list function callback that calls all the other callbacks based on their hash tables and compare it to the function that the callback was called on. But this is very inefficient. For example, if you are tracing all functions in the kernel and then add a kprobe to a function such that the kprobe uses ftrace, the mcount trampoline will switch from calling the function trace callback to calling the list callback that will iterate over all registered ftrace_ops (in this case, the function tracer and the kprobes callback). That means for every function being traced it checks the hash of the ftrace_ops for function tracing and kprobes, even though the kprobes is only set at a single function. The kprobes ftrace_ops is checked for every function being traced! Instead of calling the list function for functions that are only being traced by a single callback, we can call a dynamically allocated trampoline that calls the callback directly. The function graph tracer already uses a direct call trampoline when it is being traced by itself but it is not dynamically allocated. It's trampoline is static in the kernel core. The infrastructure that called the function graph trampoline can also be used to call a dynamically allocated one. For now, only ftrace_ops that are not dynamically allocated can have a trampoline. That is, users such as function tracer or stack tracer. kprobes and perf allocate their ftrace_ops, and until there's a safe way to free the trampoline, it can not be used. The dynamically allocated ftrace_ops may, although, use the trampoline if the kernel is not compiled with CONFIG_PREEMPT. But that will come later. Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-10-24ftrace: Fix checking of trampoline ftrace_ops in finding trampolineSteven Rostedt (Red Hat)
When modifying code, ftrace has several checks to make sure things are being done correctly. One of them is to make sure any code it modifies is exactly what it expects it to be before it modifies it. In order to do so with the new trampoline logic, it must be able to find out what trampoline a function is hooked to in order to see if the code that hooks to it is what's expected. The logic to find the trampoline from a record (accounting descriptor for a function that is hooked) needs to only look at the "old_hash" of an ops that is being modified. The old_hash is the list of function an ops is hooked to before its update. Since a record would only be pointing to an ops that is being modified if it was already hooked before. Currently, it can pick a modified ops based on its new functions it will be hooked to, and this picks the wrong trampoline and causes the check to fail, disabling ftrace. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> ftrace: squash into ordering of ops for modification
2014-10-24ftrace: Set ops->old_hash on modifying what an ops hooks toSteven Rostedt (Red Hat)
The code that checks for trampolines when modifying function hooks tests against a modified ops "old_hash". But the ops old_hash pointer is not being updated before the changes are made, making it possible to not find the right hash to the callback and possibly causing ftrace to break in accounting and disable itself. Have the ops set its old_hash before the modifying takes place. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-09-12ftrace: Add sanity check when unregistering last ftrace_opsSteven Rostedt (Red Hat)
When the last ftrace_ops is unregistered, all the function records should have a zeroed flags value. Make sure that is the case when the last ftrace_ops is unregistered. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-09-10ftrace: Replace tramp_hash with old_*_hash to save spaceSteven Rostedt (Red Hat)
Allowing function callbacks to declare their own trampolines requires that each ftrace_ops that has a trampoline must have some sort of accounting that keeps track of which ops has a trampoline attached to a record. The easy way to solve this was to add a "tramp_hash" that created a hash entry for every function that a ops uses with a trampoline. But since we can have literally tens of thousands of functions being traced, that means we need tens of thousands of descriptors to map the ops to the function in the hash. This is quite expensive and can cause enabling and disabling the function graph tracer to take some time to start and stop. It can take up to several seconds to disable or enable all functions in the function graph tracer for this reason. The better approach albeit more complex, is to keep track of how ops are being enabled and disabled, and use that along with the counting of the number of ops attached to records, to determive what ops has a trampoline attached to a record at enabling and disabling of tracing. To do this, the tramp_hash has been replaced with an old_filter_hash and old_notrace_hash, which get the copy of the ops filter_hash and notrace_hash respectively. The old hashes is kept until the ops has been modified or removed and the old hashes are used with the logic of the accounting to determine the ops that have the trampoline of a record. The reason this has less of a footprint is due to the trick that an "empty" hash in the filter_hash means "all functions" and an empty hash in the notrace hash means "no functions" in the hash. This is much more efficienct, doesn't have the delay, and takes up much less memory, as we do not need to map all the functions but just figure out which functions are mapped at the time it is enabled or disabled. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-09-10ftrace: Annotate the ops operation on updateSteven Rostedt (Red Hat)
Add three new flags for ftrace_ops: FTRACE_OPS_FL_ADDING FTRACE_OPS_FL_REMOVING FTRACE_OPS_FL_MODIFYING These will be set for the ftrace_ops when they are first added to the function tracing, being removed from function tracing or just having their functions changed from function tracing, respectively. This will be needed to remove the tramp_hash, which can grow quite big. The tramp_hash is used to note what functions a ftrace_ops is using a trampoline for. Denoting which ftrace_ops is being modified, will allow us to use the ftrace_ops hashes themselves, which are much smaller as they have a global flag to denote if a ftrace_ops is tracing all functions, as well as a notrace hash if the ftrace_ops is tracing all but a few. The tramp_hash just creates a hash item for every function, which can go into the 10s of thousands if all functions are using the ftrace_ops trampoline. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>